Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Deep relaxation: partial differential equations for optimizing deep neural networks

Deep relaxation: partial differential equations for optimizing deep neural networks Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton–Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Research in the Mathematical Sciences Springer Journals

Deep relaxation: partial differential equations for optimizing deep neural networks

Loading next page...
 
/lp/springer-journals/deep-relaxation-partial-differential-equations-for-optimizing-deep-Z8xCsFdCkr

References (90)

Publisher
Springer Journals
Copyright
Copyright © 2018 by SpringerNature
Subject
Mathematics; Mathematics, general; Applications of Mathematics; Computational Mathematics and Numerical Analysis
eISSN
2197-9847
DOI
10.1007/s40687-018-0148-y
Publisher site
See Article on Publisher Site

Abstract

Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton–Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

Journal

Research in the Mathematical SciencesSpringer Journals

Published: Jun 28, 2018

There are no references for this article.