Access the full text.
Sign up today, get DeepDyve free for 14 days.
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, R. Balasubramonian, J. Strachan, Miao Hu, R. Williams, Vivek Srikumar (2016)
ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
H. Wong, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick Chen, M. Tsai (2012)
Metal–Oxide RRAMProceedings of the IEEE, 100
Matthieu Courbariaux, Jean-Pierre David, Yoshua Bengio (2014)
Low precision storage for deep learningarXiv:1412.7024.
Matthieu Courbariaux, Yoshua Bengio, J. David (2014)
Low precision storage for deep learningarXiv: Learning
K. Simonyan, Andrew Zisserman (2014)
Very Deep Convolutional Networks for Large-Scale Image RecognitionCoRR, abs/1409.1556
C. Yakopcic, T. Taha (2013)
Energy efficient perceptron pattern recognition using segmented memristor crossbar arraysThe 2013 International Joint Conference on Neural Networks (IJCNN)
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang (2016)
PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memoryProceedings of the 43rd International Symposium on Computer Architecture (ISCA’16)
Zhe Chen, Bin Gao, Zheng Zhou, Peng Huang, Haitong Li, Wenjia Ma, Dongbin Zhu (2015)
Optimized learning scheme for grayscale image recognition in a RRAM based analog neuromorphic systemProceedings of the 2015 IEEE International Electron Devices Meeting (IEDM’15). IEEE, 2015
Yu Ji, Youhui Zhang, Wenguang Chen, Yuan Xie (2017)
Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network CompilerProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
T. Taha, Raqibul Hasan, C. Yakopcic, M. McLean (2013)
Exploring the design space of specialized multicore neural processorsThe 2013 International Joint Conference on Neural Networks (IJCNN)
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio (2016)
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and ActivationsArXiv, abs/1609.07061
Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, J. Yang (2015)
RENO: A high-efficient reconfigurable neuromorphic computing accelerator design2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC)
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos (2017)
Deep learning with low precision by half-wave Gaussian quantizationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’17).
(2016)
Ran El-Yaniv, and Yoshua Bengio
Song Han, Huizi Mao, William J. Dally (2015)
Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman codingarXiv:1510.00149.
Zhaowei Cai, Xiaodong He, Jian Sun, N. Vasconcelos (2017)
Deep Learning with Low Precision by Half-Wave Gaussian Quantization2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, I. Reid (2017)
Towards Effective Low-Bitwidth Convolutional Neural Networks2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Song Han, Huizi Mao, W. Dally (2015)
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman CodingarXiv: Computer Vision and Pattern Recognition
Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, Jian Cheng (2018)
Two-Step Quantization for Low-bit Neural Networks2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, Yuheng Zou (2016)
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth GradientsArXiv, abs/1606.06160
Ping Chi, Shuangchen Li, Conglei Xu, Zhang Tao, Jishen Zhao, Yongpan Liu, Yu Wang, Yuan Xie (2016)
PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams (2016)
ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbarsProceedings of the 43rd International Symposium on Computer Architecture (ISCA’16)
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, E. Yang, Zach DeVito, Zeming Lin, Alban Desmaison, L. Antiga, Adam Lerer (2017)
Automatic differentiation in PyTorch
Miao Hu, J. Strachan, Zhiyong Li, Emmanuelle Grafals, N. Dávila, Catherine Graves, Sity Lam, Ning Ge, R. Williams, Jianhua Yang, Hewlett Packard (2016)
Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)
Linghao Song, Xuehai Qian, Hai Li, Yiran Chen (2017)
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)
D. Miyashita, Edward Lee, B. Murmann (2016)
Convolutional Neural Networks using Logarithmic Data RepresentationArXiv, abs/1603.01025
S. Lloyd (1982)
Least squares quantization in PCMIEEE Trans. Inf. Theory, 28
Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun (2015)
Deep Residual Learning for Image Recognition2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Yongtae Kim, Yong Zhang, Peng Li (2015)
A Reconfigurable Digital Neuromorphic Processor with Memristive Synaptic Crossbar for Cognitive ComputingACM Journal on Emerging Technologies in Computing Systems (JETC), 11
Sergey Ioffe, Christian Szegedy (2015)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftArXiv, abs/1502.03167
Miao Hu, Hai Li, Qing Wu, G. Rose (2012)
Hardware realization of BSB recall function using memristor crossbar arraysDAC Design Automation Conference 2012
M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. Adam, K. Likharev, D. Strukov (2014)
Training and operation of an integrated neuromorphic network based on metal-oxide memristorsNature, 521
Boxun Li, Yi Shan, Miao Hu, Yu Wang, Yiran Chen, Huazhong Yang (2013)
Memristor-based approximated computationInternational Symposium on Low Power Electronics and Design (ISLPED)
Zhe Chen, B. Gao, Zheng Zhou, Peng Huang, Haitong Li, Wenjia Ma, D. Zhu, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang, Hong-Yu Chen (2015)
Optimized learning scheme for grayscale image recognition in a RRAM based analog neuromorphic system2015 IEEE International Electron Devices Meeting (IEDM)
G. W. Burr, P. Narayanan, R. M. Shelby, Severin Sidler, Irem Boybat, Carmelo di Nolfo, Yusuf Leblebici (2015)
Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power)Proceedings of the 2015 IEEE International Electron Devices Meeting (IEDM’15). IEEE, 2015
G. Burr, P. Narayanan, R. Shelby, Severin Sidler, I. Boybat, C. Nolfo, Y. Leblebici (2015)
Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power)2015 IEEE International Electron Devices Meeting (IEDM)
Recent works have demonstrated the promise of using resistive random access memory (ReRAM) to perform neural network computations in memory. In particular, ReRAM-based crossbar structures can perform matrix-vector multiplication directly in the analog domain, but the resolutions of ReRAM cells and digital/analog converters limit the precisions of inputs and weights that can be directly supported. Although convolutional neural networks (CNNs) can be trained with low-precision weights and activations, previous quantization approaches are either not amenable to ReRAM-based crossbar implementations or have poor accuracies when applied to deep CNNs on complex datasets. In this article, we propose a new CNN training and implementation approach that implements weights using a trained biased number representation, which can achieve near full-precision model accuracy with as little as 2-bit weights and 2-bit activations on the CIFAR datasets. The proposed approach is compatible with a ReRAM-based crossbar implementation. We also propose an activation-side coalescing technique that combines the steps of batch normalization, non-linear activation, and quantization into a single stage that simply performs a clipped-rounding operation. Experiments demonstrate that our approach outperforms previous low-precision number representations for VGG-11, VGG-13, and VGG-19 models on both the CIFAR-10 and CIFAR-100 datasets.
ACM Journal on Emerging Technologies in Computing Systems (JETC) – Association for Computing Machinery
Published: Mar 26, 2019
Keywords: Resistive Memory
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.