Access the full text.
Sign up today, get DeepDyve free for 14 days.
D. Rumelhart, Geoffrey Hinton, Ronald Williams (1986)
Learning representations by back-propagating errorsNature, 323
[ (2019)
7Proceedings of the 2019 IEEE International Solid- State Circuits Conference, 2019
Qiang Yang, Yang Liu, Tianjian Chen, Yongxin Tong (2019)
Federated Machine LearningACM Transactions on Intelligent Systems and Technology (TIST), 10
Xi Chen, Xiaolin Hu, Hucheng Zhou, Ningyi Xu (2016)
FxpNet : Training deep convolutional neural network in fixed-point representation
[ (2019)
A 2Proceedings of the 2019 IEEE Asian Solid-State Circuits Conference, 2019
Seungkyu Choi, Jaekang Shin, Yeongjae Choi, L. Kim (2019)
An Optimized Design Technique of Low-bit Neural Network Training for Personalization on IoT Devices2019 56th ACM/IEEE Design Automation Conference (DAC)
P. Rosenfeld, E. Cooper-Balis, B. Jacob (2011)
DRAMSim2: A Cycle Accurate Memory System SimulatorIEEE Computer Architecture Letters, 10
(2014)
computing’s energy problem (and what we can do about it)
Fengbin Tu, S. Yin, Ouyang Peng, Shibin Tang, Leibo Liu, Shaojun Wei (2017)
Deep Convolutional Neural Network Architecture With Reconfigurable Computation PatternsIEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25
B. Fleischer, Sunil Shukla, M. Ziegler, J. Silberman, Jinwook Oh, V. Srinivasan, Jungwook Choi, S. Mueller, A. Agrawal, Tina Babinsky, N. Cao, Chia-Yu Chen, P. Chuang, T. Fox, G. Gristede, Michael Guillorn, Howard Haynie, M. Klaiber, Dongsoo Lee, S. Lo, G. Maier, M. Scheuermann, Swagath Venkataramani, Christos Vezyrtzis, Naigang Wang, F. Yee, Ching Zhou, P. Lu, B. Curran, Leland Chang, K. Gopalakrishnan (2018)
A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference2018 IEEE Symposium on VLSI Circuits
Yawen Wu, Zhepeng Wang, Yiyu Shi, J. Hu (2020)
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map PruningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39
Alon Amid, David Biancolin, Abraham Gonzalez, D. Grubb, S. Karandikar, Harrison Liew, Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, P. Rigge, Colin Schmidt, J. Wright, Jerry Zhao, Y. Shao, K. Asanović, B. Nikolić (2020)
Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCsIEEE Micro, 40
Chunyou Su, Sheng Zhou, Liang Feng, W. Zhang (2020)
Towards high performance low bitwidth training for deep neural networksJournal of Semiconductors, 41
Sheng Li, Ke Chen, Jung Ahn, J. Brockman, N. Jouppi (2011)
CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
[ (2014)
DaDianNao: A machine-learning supercomputerProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society
Yu-hsin Chen, Tien-Ju Yang, J. Emer, V. Sze (2018)
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile DevicesIEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9
(2020)
Scala based HDL v1.4.0
[ (2012)
Chisel: Constructing hardware in a scala embedded languageProceedings of the 49th Annual Design Automation Conference. Association for Computing Machinery
Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang (2019)
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Yuan Cheng, Guangya Li, Ngai Wong, Hai-Bao Chen, Hao Yu (2020)
DEEPEYE: A Deeply Tensor-Compressed Neural Network for Video Comprehension on Terminal DevicesACM Trans. Embed. Comput. Syst., 19
Donghyeon Han, Jinsu Lee, Jinmook Lee, H. Yoo (2019)
A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking ApplicationIEEE Transactions on Circuits and Systems I: Regular Papers, 66
2014. 1.1 computing's energy problem (and what we can do about it)
(2018)
ChiselTest, a test harness for Chisel-based RTL designs
C. Frenkel, M. Lefebvre, D. Bol (2021)
Learning Without Feedback: Fixed Random Learning Signals Allow for Feedforward Training of Deep Neural NetworksFrontiers in Neuroscience, 15
(1998)
Efficient backprop. In Proceedings of the Neural Networks: Tricks of the Trade, This Book is an Outgrowth
Zidong Du, Robert Fasthuber, Tianshi Chen, P. Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, O. Temam (2015)
ShiDianNao: Shifting vision processing closer to the sensor2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)
Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun (2015)
Deep Residual Learning for Image Recognition2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Sergey Ioffe, Christian Szegedy (2015)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftArXiv, abs/1502.03167
Maolin Wang, Seyedramin Rasoulinezhad, P. Leong, Hayden So (2020)
NITI: Training Integer Neural Networks Using Integer-Only ArithmeticIEEE Transactions on Parallel and Distributed Systems, 33
Cheng-Hsun Lu, Yi-Chung Wu, Chia-Hsiang Yang (2019)
A 2.25 TOPS/W Fully-Integrated Deep CNN Learning Processor with On-Chip Training2019 IEEE Asian Solid-State Circuits Conference (A-SSCC)
Mingxing Tan, Quoc Le (2019)
EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksArXiv, abs/1905.11946
Zhouyuan Huo, Bin Gu, Heng Huang (2018)
Training Neural Networks Using Features Replay
Shun-Jie Li, Zhiyuan Yang, D. Reddy, Ankur Srivastava, B. Jacob (2020)
DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM SimulatorIEEE Computer Architecture Letters, 19
Article 19. Publication date
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, O. Temam (2014)
DaDianNao: A Machine-Learning Supercomputer2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
Jinsu Lee, Juhyoung Lee, Donghyeon Han, Jinmook Lee, Gwangtae Park, H. Yoo (2019)
7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP162019 IEEE International Solid- State Circuits Conference - (ISSCC)
Hasan Genç, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, P. Prakash, Jerry Zhao, D. Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Steffl, J. Wright, I. Stoica, Jonathan Ragan-Kelley, K. Asanović, B. Nikolić, Y. Shao (2019)
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration2021 58th ACM/IEEE Design Automation Conference (DAC)
[ (2021)
9Proceedings of the 2021 IEEE International Solid- State Circuits Conference Vol, 2021
Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, Yeongjae Choi, Hyeonuk Kim, L. Kim (2020)
An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart DevicesIEEE Journal of Solid-State Circuits, 55
Jeongwoo Park, Sunwoo Lee, Dongsuk Jeon (2021)
A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree2021 IEEE International Solid- State Circuits Conference (ISSCC), 64
Keith Bonawitz, Hubert Eichner, W. Grieskamp, Dzmitry Huba, A. Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konecný, S. Mazzocchi, H. McMahan, Timon Overveldt, David Petrou, Daniel Ramage, Jason Roselander (2019)
Towards Federated Learning at Scale: System DesignArXiv, abs/1902.01046
K. Asanović, Rimas Avizienis, J. Bachrach, S. Beamer, David Biancolin, Christopher Celio, Henry Cook, D. Dabbelt, J. Hauser, Adam Izraelevitz, S. Karandikar, Benjamin Keller, Donggyu Kim, Jack Koenig, Krste Asanovi´c, Rimas Avizienis, Palmer Dabbelt, Benjamin Keller, Yunsup Lee, Eric Love, Martin Maas, Albert Magyar, Howard Mao, Miquel Moretó, Albert Ou, D. Patterson, B. Richards, Colin Schmidt, Stephen Twigg, Huy Vo, Andrew Waterman (2016)
The Rocket Chip Generator
P. Baldi, Peter Sadowski, Zhiqin Lu (2016)
Learning in the machine: Random backpropagation and the deep learning channelArtificial intelligence, 260
[ (2014)
1Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2014
Max Jaderberg, Wojciech Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, K. Kavukcuoglu (2016)
Decoupled Neural Interfaces using Synthetic Gradients
S. Yin, Ouyang Peng, Shibin Tang, Fengbin Tu, Xiudong Li, Shixuan Zheng, Tianyi Lu, Jiangyuan Gu, Leibo Liu, Shaojun Wei (2018)
A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning ApplicationsIEEE Journal of Solid-State Circuits, 53
Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana, Sebastian Goldt (2020)
The dynamics of learning with feedback alignmentArXiv, abs/2011.12428
Gunhee Lee, Hanmin Park, Namhyung Kim, Joonsang Yu, Sujeong Jo, Kiyoung Choi (2019)
Acceleration of DNN Backward Propagation by Selective Computation of Gradients2019 56th ACM/IEEE Design Automation Conference (DAC)
H. McMahan, Eider Moore, Daniel Ramage, S. Hampson, B. Arcas (2016)
Communication-Efficient Learning of Deep Networks from Decentralized Data
J. Bachrach, Huy Vo, B. Richards, Yunsup Lee, Andrew Waterman, Rimas Avizienis, J. Wawrzynek, K. Asanović (2012)
Chisel: Constructing hardware in a Scala embedded languageDAC Design Automation Conference 2012
K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, K. Goossens (2013)
Towards variation-aware system-level power estimation of DRAMs: An empirical approach2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)
Pengcheng Dai, Jianlei Yang, Xucheng Ye, Xingzhou Cheng, Junyu Luo, Linghao Song, Yiran Chen, Weisheng Zhao (2020)
SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training2020 57th ACM/IEEE Design Automation Conference (DAC)
Norman Jouppi, C. Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Taraneh Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Ho, Doug Hogberg, John Hu, R. Hundt, Dan Hurt, Julian Ibarz, A. Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, R. Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Yoon (2017)
In-datacenter performance analysis of a tensor processing unit2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
Kyeongryeol Bong, Sungpill Choi, Changhyeon Kim, Donghyeon Han, H. Yoo (2018)
A Low-Power Convolutional Neural Network Face Recognition Processor and a CIS Integrated With Always-on Face DetectorIEEE Journal of Solid-State Circuits, 53
Donghyeon Han, Jinsu Lee, H. Yoo (2021)
DF-LNPU: A Pipelined Direct Feedback Alignment-Based Deep Neural Network Learning Processor for Fast Online LearningIEEE Journal of Solid-State Circuits, 56
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, O. Temam (2014)
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learningProceedings of the 19th international conference on Architectural support for programming languages and operating systems
T. Lillicrap, D. Cownden, D. Tweed, C. Akerman (2014)
Random feedback weights support learning in deep neural networksArXiv, abs/1411.0247
Ziyang Hong, C. Yue (2021)
Efficient Training Convolutional Neural Networks on Edge Devices with Gradient-pruned Sign-symmetric Feedback AlignmentArXiv, abs/2103.02889
Jia Deng, Wei Dong, R. Socher, Li-Jia Li, K. Li, Li Fei-Fei (2009)
ImageNet: A large-scale hierarchical image database2009 IEEE Conference on Computer Vision and Pattern Recognition
Navjot Kukreja, Alena Shilova, Olivier Beaumont, J. Hückelheim, N. Ferrier, P. Hovland, G. Gorman (2019)
Training on the Edge: The why and the how2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
[ (2019)
Learning in the machine: Random backpropagation and the deep learning channelProceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization
Arild Nøkland (2016)
Direct Feedback Alignment Provides Learning in Deep Neural Networks
C. Frenkel, J. Legat, D. Bol (2020)
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based Retinas2020 IEEE International Symposium on Circuits and Systems (ISCAS)
[ (2011)
CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniquesProceedings of the ICCAD: International Conference on Computer-Aided Design
D. Gajski, L. Ramachandran (2009)
An Introduction to High-Level SynthesisIEEE Design & Test of Computers, 26
Scala based HDL v1
A. Samajdar, J. Joseph, Yuhao Zhu, P. Whatmough, Matthew Mattina, T. Krishna (2020)
A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Yann LeCun, L. Bottou, G. Orr, K. Müller (2012)
Efficient BackProp
A. Krizhevsky (2009)
Learning Multiple Layers of Features from Tiny Images
[ (2019)
Federated machine learning: Concept and applicationsACM Transactions on Intelligent Systems and Technology, 10
Z. Hong, Wenxiao Fang, Y. En, Chengyang Luo, Weiheng Shao, Lei Wang, Zhiyuan He, E. Shao (2019)
Electromagnetic Pattern Extraction and Grouping for Near-Field Scanning of Integrated Circuits by PCA and K-Means ApproachesIEEE Transactions on Electromagnetic Compatibility, 61
With the prospering of mobile devices, the distributed learning approach, enabling model training with decentralized data, has attracted great interest from researchers. However, the lack of training capability for edge devices significantly limits the energy efficiency of distributed learning in real life. This article describes Efficient-Grad, an algorithm-hardware co-design approach for training deep convolutional neural networks, which improves both throughput and energy saving during model training, with negligible validation accuracy loss.The key to Efficient-Grad is its exploitation of two observations. Firstly, the sparsity has potential for not only activation and weight, but gradients and the asymmetry residing in the gradients for the conventional back propagation (BP). Secondly, a dedicated hardware architecture for sparsity utilization and efficient data movement can be optimized to support the Efficient-Grad algorithm in a scalable manner. To the best of our knowledge, Efficient-Grad is the first approach that successfully adopts a feedback-alignment (FA)-based gradient optimization scheme for deep convolutional neural network training, which leads to its superiority in terms of energy efficiency. We present case studies to demonstrate that the Efficient-Grad design outperforms the prior arts by 3.72x in terms of energy efficiency.
ACM Transactions on Embedded Computing Systems (TECS) – Association for Computing Machinery
Published: Feb 8, 2022
Keywords: Deep neural networks
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.