Access the full text.
Sign up today, get DeepDyve free for 14 days.
Allen (2004)
Optimizing Compilers for Modern Architectures
Junguk Cho, Shahnam Mirzaei, J. Oberg, R. Kastner (2009)
Fpga-based face detection system using Haar classifiers
John Stratton, S. Stone, Wen-mei Hwu (2008)
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
GeForce 8 series
R. Allen, K. Kennedy (2001)
Optimizing Compilers for Modern Architectures: A Dependence-based Approach
(2012)
Accelerated processing units. http://www.amd.com/us/products/technologies/fusion/Pages/fusion
Amir Hormati, M. Kudlur, S. Mahlke, D. Bacon, R. Rabbah (2008)
Optimus: efficient realization of streaming applications on FPGAs
P. Diniz, Mary Hall, Joonseok Park, Byoungro So, H. Ziegler (2005)
Automatic mapping of C to FPGAs with the DEFACTO compilation and synthesis systemMicroprocess. Microsystems, 29
S. Gupta, Rajesh Gupta, N. Dutt, A. Nicolau (2004)
Coordinated parallelizing compiler optimizations and high-level synthesisACM Trans. Design Autom. Electr. Syst., 9
M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, Wen-mei Hwu (2011)
QP: A Heterogeneous Multi-Accelerator Cluster
Jason Williams, A. George, J. Richardson, Kunal Gosrani, S. Suresh (2008)
Computational Density of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
Virtex-5 FXT ML510 embedded development platform. http://www.xilinx.com/products/boards- and-kits
(2011)
OpenCL specification, version 1.1
J. Cong, B. Liu, S. Neuendorffer, Juanjo Noguera, K. Vissers, Zhiru Zhang (2011)
High-Level Synthesis for FPGAs: From Prototyping to DeploymentIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30
Tilera corporation
Liu Ling, Neal Oliver, Bhushan Chitlur, Qigang Wang, A. Chen, Wenbo Shen, Zhihong Yu, Arthur Sheiman, I. McCallum, Joseph Grecco, H. Mitchel, Dong Liu, Prabhat Gupta (2009)
High-performance, energy-efficient platforms using in-socket FPGA accelerators
Deming Chen, J. Cong, Yiping Fan, Guoling Han, Wei Jiang, Zhiru Zhang (2005)
xPilot: A Platform-Based Behavioral Synthesis System
(2006)
The cell architecture
(2012)
Parboil benchmarks
David Thomas, Lee Howes, W. Luk (2009)
A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation
A. Aho, M. Lam, R. Sethi, J. Ullman (2006)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Sang Lee, Troy Johnson, R. Eigenmann (2003)
Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation
Mingjie Lin, Ilia Lebedev, J. Wawrzynek (2010)
OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices2010 International Conference on Field Programmable Logic and Applications
Chunhui He, Alexandros Papakonstantinou, Deming Chen (2009)
A novel SoC architecture on FPGA for ultra fast face detection2009 IEEE International Conference on Computer Design
Shuai Che, Jie Li, J. Sheaffer, K. Skadron, J. Lach (2008)
Accelerating Compute-Intensive Applications with GPUs and FPGAs2008 Symposium on Application Specific Processors
J. Cong, Yi Zou (2008)
Lithographic aerial image simulation with FPGA-based hardwareacceleration
S. Huang, Amir Hormati, D. Bacon, R. Rabbah (2008)
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary
(2007)
The LLVM compiler infrastructure
Muhsen Owaida, Nikolaos Bellas, Konstantis Daloukas, C. Antonopoulos (2011)
Synthesis of Platform Architectures from OpenCL Programs2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
CUDA developer zone. http://developer.nvidia.com/category/zone/cuda-zone
Zhiru Zhang, Yiping Fan, Wei Jiang, Guoling Han, Changqi Yang, J. Cong (2008)
AutoPilot: A Platform-Based ESL Synthesis System
(2003)
Impulse accelerated technologies inc
D. Gajski (2003)
NISC: The Ultimate Reconfigurable Component
(2012)
Catapult C synthesis overview
Michael Parker (2012)
DesignCon 2011 Hardware-Based Floating-Point Design Flow
(2010)
The AutoESL AutoPilot High-Level Synthesis Tool
DATA v5. http://www.nallatech.com/Modules/data-v5-xilinx-virtex-5-fpga-ddr2-sdramqdr- ii-sram-and-io-module.html
Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs ALEXANDROS PAPAKONSTANTINOU, University of Illinois at Urbana-Champaign KARTHIK GURURAJ, University of California, Los Angeles JOHN A. STRATTON and DEMING CHEN, University of Illinois at Urbana-Champaign JASON CONG, University of California, Los Angeles WEN-MEI W. HWU, University of Illinois at Urbana-Champaign The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which
ACM Transactions on Embedded Computing Systems (TECS) – Association for Computing Machinery
Published: Sep 1, 2013
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.