Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Sketch-Based Empirical Natural Gradient Methods for Deep Learning

Sketch-Based Empirical Natural Gradient Methods for Deep Learning In this paper, we develop an efficient sketch-based empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement are still not tractable due to the high dimensionality. We design randomized techniques for different neural network structures to resolve these challenges. For layers with a reasonable dimension, sketching can be performed on a regularized least squares subproblem. Otherwise, since the gradient is a vectorization of the product between two matrices, we apply sketching on the low-rank approximations of these matrices to compute the most expensive parts. A distributed version of SENG is also developed for extremely large-scale applications. Global convergence to stationary points is established under mild assumptions and a fast linear convergence is analyzed under the neural tangent kernel (NTK) case. Extensive experiments on convolutional neural networks show the competitiveness of SENG compared with the state-of-the-art methods. On the task ResNet50 with ImageNet-1k, SENG achieves 75.9% Top-1 testing accuracy within 41 epochs. Experiments on the distributed large-batch training Resnet50 with ImageNet-1k show that the scaling efficiency is quite reasonable. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Scientific Computing Springer Journals

Sketch-Based Empirical Natural Gradient Methods for Deep Learning

Loading next page...
 
/lp/springer-journals/sketch-based-empirical-natural-gradient-methods-for-deep-learning-00YBVDd8zx

References (41)

Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
ISSN
0885-7474
eISSN
1573-7691
DOI
10.1007/s10915-022-01911-x
Publisher site
See Article on Publisher Site

Abstract

In this paper, we develop an efficient sketch-based empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement are still not tractable due to the high dimensionality. We design randomized techniques for different neural network structures to resolve these challenges. For layers with a reasonable dimension, sketching can be performed on a regularized least squares subproblem. Otherwise, since the gradient is a vectorization of the product between two matrices, we apply sketching on the low-rank approximations of these matrices to compute the most expensive parts. A distributed version of SENG is also developed for extremely large-scale applications. Global convergence to stationary points is established under mild assumptions and a fast linear convergence is analyzed under the neural tangent kernel (NTK) case. Extensive experiments on convolutional neural networks show the competitiveness of SENG compared with the state-of-the-art methods. On the task ResNet50 with ImageNet-1k, SENG achieves 75.9% Top-1 testing accuracy within 41 epochs. Experiments on the distributed large-batch training Resnet50 with ImageNet-1k show that the scaling efficiency is quite reasonable.

Journal

Journal of Scientific ComputingSpringer Journals

Published: Sep 1, 2022

Keywords: Deep learning; Natural gradient methods; Sketch-based methods; Convergence; 90C06; 90C26

There are no references for this article.