Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

How does momentum benefit deep neural networks architecture design? A few case studies

How does momentum benefit deep neural networks architecture design? A few case studies We present and review an algorithmic and theoretical framework for improving neural network architecture design via momentum. As case studies, we consider how momentum can improve the architecture design for recurrent neural networks (RNNs), neural ordinary differential equations (ODEs), and transformers. We show that integrating momentum into neural network architectures has several remarkable theoretical and empirical benefits, including (1) integrating momentum into RNNs and neural ODEs can overcome the vanishing gradient issues in training RNNs and neural ODEs, resulting in effective learning long-term dependencies; (2) momentum in neural ODEs can reduce the stiffness of the ODE dynamics, which significantly enhances the computational efficiency in training and testing; (3) momentum can improve the efficiency and accuracy of transformers. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Research in the Mathematical Sciences Springer Journals

How does momentum benefit deep neural networks architecture design? A few case studies

Loading next page...
 
/lp/springer-journals/how-does-momentum-benefit-deep-neural-networks-architecture-design-a-cjKJW1w0B8
Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
eISSN
2197-9847
DOI
10.1007/s40687-022-00352-0
Publisher site
See Article on Publisher Site

Abstract

We present and review an algorithmic and theoretical framework for improving neural network architecture design via momentum. As case studies, we consider how momentum can improve the architecture design for recurrent neural networks (RNNs), neural ordinary differential equations (ODEs), and transformers. We show that integrating momentum into neural network architectures has several remarkable theoretical and empirical benefits, including (1) integrating momentum into RNNs and neural ODEs can overcome the vanishing gradient issues in training RNNs and neural ODEs, resulting in effective learning long-term dependencies; (2) momentum in neural ODEs can reduce the stiffness of the ODE dynamics, which significantly enhances the computational efficiency in training and testing; (3) momentum can improve the efficiency and accuracy of transformers.

Journal

Research in the Mathematical SciencesSpringer Journals

Published: Sep 1, 2022

References