Access the full text.
Sign up today, get DeepDyve free for 14 days.
N. Qian (1999)
On the momentum term in gradient descent learning algorithmsNeural networks : the official journal of the International Neural Network Society, 12 1
Lawrence Evans (2010)
Partial Differential Equations, Second edition, 19
Bao Wang, Difan Zou, Quanquan Gu, S. Osher (2019)
Laplacian Smoothing Stochastic Gradient Markov Chain Monte CarloSIAM J. Sci. Comput., 43
L. Evans (2000)
Partial Diﬀerential Equations
P. Chaudhari, Adam Oberman, S. Osher, Stefano Soatto, G. Carlier (2017)
Deep relaxation: partial differential equations for optimizing deep neural networksResearch in the Mathematical Sciences, 5
Yann LeCun, L. Bottou, Yoshua Bengio, P. Haffner (1998)
Gradient-based learning applied to document recognitionProc. IEEE, 86
(2012)
Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude
P. Chaudhari, A. Choromańska, Stefano Soatto, Yann LeCun, Carlo Baldassi, C. Borgs, J. Chayes, Levent Sagun, R. Zecchina (2016)
Entropy-SGD: biasing gradient descent into wide valleysJournal of Statistical Mechanics: Theory and Experiment, 2019
Martín Abadi, Ashish Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, Andy Davis, J. Dean, Matthieu Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Yangqing Jia, R. Józefowicz, Lukasz Kaiser, M. Kudlur, J. Levenberg, Dandelion Mané, R. Monga, Sherry Moore, D. Murray, C. Olah, M. Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, P. Tucker, Vincent Vanhoucke, Vijay Vasudevan, F. Viégas, Oriol Vinyals, P. Warden, M. Wattenberg, M. Wicke, Yuan Yu, Xiaoqiang Zheng (2016)
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed SystemsArXiv, abs/1603.04467
(2016)
and et al
Volodymyr Mnih, K. Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller (2013)
Playing Atari with Deep Reinforcement LearningArXiv, abs/1312.5602
(2019)
Privacy-preserving erm by laplacian smoothing stochastic gradient descent
(1986)
Two problems with backpropagation and other steepest-descent learning procedures for networks
L. Bottou (2012)
Stochastic Gradient Descent Tricks
W. Marsden (2012)
I and J
D. Bertsekas (1995)
Nonlinear Programming
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Aaron Defazio, F. Bach, Simon Lacoste-Julien (2014)
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
Hao Li, Zheng Xu, Gavin Taylor, T. Goldstein (2017)
Visualizing the Loss Landscape of Neural Nets
Zeyuan Zhu (2016)
Katyusha: the first direct acceleration of stochastic gradient methodsProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing
Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton (2013)
On the importance of initialization and momentum in deep learning
Bao Wang, Quanquan Gu, M. Boedihardjo, Farzin Barekat, S. Osher (2019)
DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM
Timothy Dozat (2016)
Incorporating Nesterov Momentum into Adam
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, E. Yang, Zach DeVito, Zeming Lin, Alban Desmaison, L. Antiga, Adam Lerer (2017)
Automatic differentiation in PyTorch
Martín Arjovsky, Soumith Chintala, Léon Bottou (2017)
Wasserstein GANArXiv, abs/1701.07875
Stanislaw Jastrzebski, Z. Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, A. Storkey (2018)
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step LengtharXiv: Machine Learning
O. Shamir, Tong Zhang (2012)
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging SchemesArXiv, abs/1212.1824
L. Bottou, Frank Curtis, J. Nocedal (2016)
Optimization Methods for Large-Scale Machine LearningSIAM Rev., 60
Lihua Lei, Cheng Ju, Jianbo Chen, Michael Jordan (2017)
Non-convex Finite-Sum Optimization Via SCSG Methods
Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun (2015)
Deep Residual Learning for Image Recognition2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
A. Senior, G. Heigold, Marc'Aurelio Ranzato, Ke Yang (2013)
An empirical study of learning rates in deep neural networks for speech recognition2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Penghang Yin, Minh Pham, Adam Oberman, S. Osher (2017)
Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means ClusteringJournal of Scientific Computing, 77
(2018)
DNN's Sharpest Directions Along the SGD Trajectory.
H. Robbins (1951)
A Stochastic Approximation MethodAnnals of Mathematical Statistics, 22
M. Welling, Y. Teh (2011)
Bayesian Learning via Stochastic Gradient Langevin Dynamics
N. Keskar, Dheevatsa Mudigere, J. Nocedal, M. Smelyanskiy, P. Tang (2016)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp MinimaArXiv, abs/1609.04836
Yann LeCun, Corinna Cortes (2005)
The mnist database of handwritten digits
Alon Gonen, S. Shalev-Shwartz (2017)
Fast Rates for Empirical Risk Minimization of Strict Saddle Problems
A. Shapiro, Y. Wardi (1996)
Convergence analysis of gradient descent stochastic algorithmsJournal of Optimization Theory and Applications, 91
Martín Arjovsky, L. Bottou (2017)
Towards Principled Methods for Training Generative Adversarial NetworksArXiv, abs/1701.04862
A. Krogh, J. Hertz (1991)
A Simple Weight Decay Can Improve Generalization
Y. Nesterov (1983)
A method for solving the convex programming problem with convergence rate O(1/k^2)Proceedings of the USSR Academy of Sciences, 269
Y Nesterov (1983)
A method for solving the convex programming problem with convergence rate o(1/k2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o(1/k^2)$$\end{document}Dokl. Akad. Nauk SSSR, 269
Z Allen-Zhu (2018)
Katyusha: The first direct acceleration of stochastic gradient methodsJ. Mach. Learn. Res., 18
R. Bhatia (1996)
Matrix Analysis
I. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014)
Generative Adversarial Nets
David Silver, Aja Huang, Chris Maddison, A. Guez, L. Sifre, George Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, S. Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis (2016)
Mastering the game of Go with deep neural networks and tree searchNature, 529
Rie Johnson, Tong Zhang (2013)
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
Miyoun Jung, G. Chung, G. Sundaramoorthi, L. Vese, A. Yuille (2009)
Sobolev gradients and joint variational image segmentation, denoising, and deblurring, 7246
Alec Radford, Luke Metz, Soumith Chintala (2015)
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial NetworksCoRR, abs/1511.06434
Diederik Kingma, Jimmy Ba (2014)
Adam: A Method for Stochastic OptimizationCoRR, abs/1412.6980
Sashank Reddi, Satyen Kale, Surinder Kumar (2018)
On the Convergence of Adam and BeyondArXiv, abs/1904.09237
Stanislaw Jastrzebski, Z. Kenton, Nicolas Ballas, Asja Fischer, A. Storkey, Yoshua Bengio (2018)
SGD Smooths The Sharpest Directions
(1998)
Introductory lectures on convex programming volume i: Basic course
O. Bousquet, A. Elisseeff (2002)
Stability and GeneralizationJ. Mach. Learn. Res., 2
Moritz Hardt, B. Recht, Y. Singer (2015)
Train faster, generalize better: Stability of stochastic gradient descentArXiv, abs/1509.01240
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, J. Schulman, Jie Tang, Wojciech Zaremba (2016)
OpenAI GymArXiv, abs/1606.01540
Yuxin Wu, Kaiming He (2018)
Group NormalizationInternational Journal of Computer Vision, 128
S. Mandt, M. Hoffman, D. Blei (2017)
Stochastic Gradient Descent as Approximate Bayesian InferenceArXiv, abs/1704.04289
J. Moreau (1965)
Proximité et dualité dans un espace hilbertienBulletin de la Société Mathématique de France, 93
Volodymyr Mnih, K. Kavukcuoglu, David Silver, Andrei Rusu, J. Veness, Marc Bellemare, Alex Graves, Martin Riedmiller, A. Fidjeland, Georg Ostrovski, Stig Petersen, Charlie Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, D. Kumaran, Daan Wierstra, S. Legg, D. Hassabis (2015)
Human-level control through deep reinforcement learningNature, 518
J. Schmidhuber (2014)
Deep learning in neural networks: An overviewNeural networks : the official journal of the International Neural Network Society, 61
(2017)
Mikhail
Matthew Zeiler (2012)
ADADELTA: An Adaptive Learning Rate MethodArXiv, abs/1212.5701
Claus Nebauer (1998)
Evaluation of convolutional neural networks for visual recognitionIEEE transactions on neural networks, 9 4
A. Krizhevsky (2009)
Learning Multiple Layers of Features from Tiny Images
John Duchi, Elad Hazan, Y. Singer (2011)
Adaptive Subgradient Methods for Online Learning and Stochastic OptimizationJ. Mach. Learn. Res., 12
We propose a class of very simple modifications of gradient descent and stochastic gradient descent leveraging Laplacian smoothing. We show that when applied to a large variety of machine learning problems, ranging from logistic regression to deep neural nets, the proposed surrogates can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix (which can be computed efficiently by FFT) with a low condition number coming from a one-dimensional discrete Laplacian or its high-order generalizations. Given any vector, e.g., gradient vector, Laplacian smoothing preserves the mean and increases the smallest component and decreases the largest component. Moreover, we show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev Hσp\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$H_\sigma ^p$$\end{document} sense and reduce the optimality gap for convex optimization problems. The code is available at: https://github.com/BaoWangMath/LaplacianSmoothing-GradientDescent.
Research in the Mathematical Sciences – Springer Journals
Published: Sep 1, 2022
Keywords: Laplacian smoothing; Machine learning; Optimization
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.