Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Memory performance estimation of CUDA programs

Memory performance estimation of CUDA programs Memory Performance Estimation of CUDA Programs YOOSEONG KIM and AVIRAL SHRIVASTAVA, Arizona State University CUDA has successfully popularized GPU computing, and GPGPU applications are now used in various embedded systems. The CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider numerous architectural details, and small changes in source code, especially on the memory access pattern, can affect performance significantly. This makes it very difficult to optimize CUDA programs. This article presents CuMAPz, which is a tool to analyze and compare the memory performance of CUDA programs. CuMAPz can help programmers explore different ways of using shared and global memories, and optimize their program for efficient memory behavior. CuMAPz models several memory-performance-related factors: data reuse, global memory access coalescing, global memory latency hiding, shared memory bank conflict, channel skew, and branch divergence. Experimental results show that CuMAPz can accurately estimate performance with correlation coefficient of 0.96. By using CuMAPz to explore the memory access design space, we could improve the performance of our benchmarks by 30% more than the previous approach [Hong and Kim 2010]. Categories and Subject Descriptors: B.8.2 [Performance and http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Embedded Computing Systems (TECS) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/memory-performance-estimation-of-cuda-programs-rd7ar0y0t0

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2013 by ACM Inc.
ISSN
1539-9087
DOI
10.1145/2514641.2514648
Publisher site
See Article on Publisher Site

Abstract

Memory Performance Estimation of CUDA Programs YOOSEONG KIM and AVIRAL SHRIVASTAVA, Arizona State University CUDA has successfully popularized GPU computing, and GPGPU applications are now used in various embedded systems. The CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider numerous architectural details, and small changes in source code, especially on the memory access pattern, can affect performance significantly. This makes it very difficult to optimize CUDA programs. This article presents CuMAPz, which is a tool to analyze and compare the memory performance of CUDA programs. CuMAPz can help programmers explore different ways of using shared and global memories, and optimize their program for efficient memory behavior. CuMAPz models several memory-performance-related factors: data reuse, global memory access coalescing, global memory latency hiding, shared memory bank conflict, channel skew, and branch divergence. Experimental results show that CuMAPz can accurately estimate performance with correlation coefficient of 0.96. By using CuMAPz to explore the memory access design space, we could improve the performance of our benchmarks by 30% more than the previous approach [Hong and Kim 2010]. Categories and Subject Descriptors: B.8.2 [Performance and

Journal

ACM Transactions on Embedded Computing Systems (TECS)Association for Computing Machinery

Published: Sep 1, 2013

There are no references for this article.