Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Efficient compilation of CUDA kernels for high-performance computing on FPGAs

Efficient compilation of CUDA kernels for high-performance computing on FPGAs Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs ALEXANDROS PAPAKONSTANTINOU, University of Illinois at Urbana-Champaign KARTHIK GURURAJ, University of California, Los Angeles JOHN A. STRATTON and DEMING CHEN, University of Illinois at Urbana-Champaign JASON CONG, University of California, Los Angeles WEN-MEI W. HWU, University of Illinois at Urbana-Champaign The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Embedded Computing Systems (TECS) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/efficient-compilation-of-cuda-kernels-for-high-performance-computing-n4IqsWa9GL

References (37)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2013 by ACM Inc.
ISSN
1539-9087
DOI
10.1145/2514641.2514652
Publisher site
See Article on Publisher Site

Abstract

Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs ALEXANDROS PAPAKONSTANTINOU, University of Illinois at Urbana-Champaign KARTHIK GURURAJ, University of California, Los Angeles JOHN A. STRATTON and DEMING CHEN, University of Illinois at Urbana-Champaign JASON CONG, University of California, Los Angeles WEN-MEI W. HWU, University of Illinois at Urbana-Champaign The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which

Journal

ACM Transactions on Embedded Computing Systems (TECS)Association for Computing Machinery

Published: Sep 1, 2013

There are no references for this article.