Access the full text.
Sign up today, get DeepDyve free for 14 days.
A. Jones, Debabrata Bagchi, S. Pal, Xiaoyong Tang, A. Choudhary, P. Banerjee (2002)
PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations
T. Callahan, J. Hauser, J. Wawrzynek (2000)
The Garp Architecture and C CompilerComputer, 33
N. Jha (2001)
Low power system scheduling and synthesisIEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281)
P. Banerjee, U. Shenoy, A. Choudhary, S. Hauck, Christopher Bachmann, M. Haldar, P. Joisha, A. Jones, Abhay Kanhare, A. Nayak, S. Periyacheri, Michael Walkden, David Zaretsky, R. Tessier (2000)
A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systemsProceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871)
E. Musoll, J. Cortadella (1995)
High-level synthesis techniques for reducing the activity of functional units
Zhining Huang, S. Malik (2002)
Exploiting operation level parallelism through dynamically reconfigurable datapathsProceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324)
S. Hauck, T. Fry, Matthew Hosler, Jeffrey Kao (1997)
The Chimaera reconfigurable functional unitIEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12
Zhining Huang, S. Malik, N. Moreano, G. Araújo (2004)
The design of dynamically reconfigurable datapath coprocessorsACM Trans. Embed. Comput. Syst., 3
A. Nene, S. Talla, B. Goldberg, Hansoo Kim, R. Rabbah (1998)
Trimaran - An Infrastructure for Compiler Research in Instruction Level Parallelism
(2000)
Synopsys launches power tool
B. Levine, H. Schmit (2003)
Efficient application representation for HASTE: Hybrid Architectures with a Single, Transformable Executable11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003.
Xiaoyong Tang, Tianyi Jiang, A. Jones, P. Banerjee (2005)
Behavioral synthesis of data-dominated circuits for minimal energy implementation18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
D. Goodwin, D. Petkov (2003)
Automatic generation of application specific processors
(2001)
An 8 × 8 idct implementation on an fpgaaugmented trimedia
Brucek Khailany, W. Dally, U. Kapasi, P. Mattson, Jinyung Namkoong, John Owens, Brian Towles, Andrew Chang, S. Rixner (2001)
Imagine: Media Processing with StreamsIEEE Micro, 21
(2002)
Piperench: Power & performance evaluation of a programmable pipelined datapath. presented at Hot Chips 14
R. Gonzalez (2000)
Xtensa: A Configurable and Extensible ProcessorIEEE Micro, 20
(2001)
Area and power reduction of embedded dsp systems using instruction compression and reconfigurable encoding
M. Sima, S. Cotofana, J. Eijndhoven, S. Vassiliadis, K. Vissers (2001)
An 8x8 IDCT Implementation on an FPGA-Augmented TriMediaThe 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01)
Brucek Khailany, W. Dally, Andrew Chang, U. Kapasi, Jinyung Namkoong, Brian Towles (2002)
VLSI design and verification of the Imagine processorProceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors
(2005)
Received February
Xun Liu, M. Papaefthymiou (2002)
A Markov chain sequence generator for power macromodelingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23
(2002)
Piperench: Power & performance evaluation of a programmable pipelined datapath. presented at Hot Chips
P. Banerjee, M. Haldar, A. Nayak, Victor Kim, V. Saxena, Steven Parkes, Debabrata Bagchi, S. Pal, Nikhil Tripathi, David Zaretsky, Robert Anderson, J. Uribe (2004)
Overview of a compiler for synthesizing MATLAB programs onto FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12
Design compiler and primepower manual. www.synopsys.com
A. Raghunathan, N. Jha (1994)
Behavioral synthesis for low powerProceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors
(2006)
Reducing Power while Increasing Performance with SuperCISC @BULLET 685
(2005)
An energy-efficient coarse-grained reconfigurable fabric arch itecture
Zhanping Chen, K. Roy (1998)
A power macromodeling technique based on power sensitivityProceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)
S. Gupta (2004)
SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits
L. Benini, A. Macii, E. Macii, M. Poncino (1999)
Selective instruction compression for memory energy reduction in embedded systemsProceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477)
C. Ebeling, Darren Cronquist, Paul Franklin (1996)
RaPiD - Reconfigurable Pipelined Datapath
(2000)
Synopsys launches power tool. EETimes
J.-G. Cousin, O. Sentieys, D. Chillet (2000)
Multi-algorithm ASIP synthesis and power estimation for DSP applications2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353), 2
T. Glokler, H. Meyr (2001)
Power reduction for ASIPS: a case study2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)
Z.X. Shen, C. Jong (1997)
Exploring module selection space for architectural synthesis of low power designsProceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97, 3
Subodh Gupta, F. Najm (1997)
Power Macromodeling For High Level Power EstimationProceedings of the 34th Design Automation Conference
A. Jones, R. Hoare, D. Kusic, Joshua Fazekas, J. Foster (2005)
An FPGA-based VLIW processor with custom hardware execution
Xun Liu, M. Papaefthymiou (2001)
A static power estimation methodology for IP-based designProceedings Design, Automation and Test in Europe. Conference and Exhibition 2001
E. Mirsky, A. DeHon (1996)
MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines
(2004)
Catapult c synthesis-based design flow: Speeding implementation and increasing flexibility
Massoud Pedram, Jui-Ming Chang (1996)
Module assignment for low powerProceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition
R. Hoare, S. Tung, K. Werger (2004)
An 88-way multiprocessor within an FPGA with customizable instructions18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
(2005)
Design compiler and primepower manual
(2004)
The design of dynamically reconfig
The lisatek solution: Automated embedded processor design and software development tool generation
(2004)
A 64way vliw/simd fpga processing architecture and design flow
(2002)
Piperench: Power & performance evaluation of a programmable pipelined datapath
K. Khouri, G. Lakshminarayana, N. Jha (1998)
IMPACT: A high-level synthesis system for low power control-flow intensive circuitsProceedings Design, Automation and Test in Europe
F. Najm (1994)
A survey of power estimation techniques in VLSI circuitsIEEE Trans. Very Large Scale Integr. Syst., 2
Subash Chandar, M. Mehendale, R. Govindarajan (2001)
Area and Power Reduction of Embedded DSP Systems using Instruction Compression and Re-configurable EncodingJournal of VLSI signal processing systems for signal, image and video technology, 44
S. Dutta, A. Wolfe, W. Wolf, K. O'Connor (1996)
Design issues for very-long-instruction-word VLSI video signal processorsVLSI Signal Processing, IX
Chunho Lee, M. Potkonjak, W. Mangione-Smith (1997)
MediaBench: a tool for evaluating and synthesizing multimedia and communications systemsProceedings of 30th Annual International Symposium on Microarchitecture
Jongeun Lee, Kiyoung Choi, N. Dutt (2003)
Energy-efficient instruction set synthesis for application-specific processorsProceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.
R. Hoare, A. Jones, D. Kusic, Joshua Fazekas, J. Foster, S. Tung, M. McCloud (2006)
Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware FunctionsEURASIP Journal on Advances in Signal Processing, 2006
H. Schmit, David Whelihan, Andrew Tsai, M. Moe, B. Levine, R. Taylor (2002)
PipeRench: A virtualized programmable datapath in 0.18 micron technologyProceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285)
(2003)
A 64-way simd processing architecture on an fpga
A. Chandrakasan, S. Sheng, R. Brodersen (1992)
Low-power CMOS digital designIEEE Journal of Solid-state Circuits, 27
Multiprocessor Systems on Chips (MPSoCs) have become a popular architectural technique to increase performance. However, MPSoCs may lead to undesirable power consumption characteristics for computing systems that have strict power budgets, such as PDAs, mobile phones, and notebook computers. This paper presents the super-complex instruction-set computing (SuperCISC) Embedded Processor Architecture and, in particular, investigates performance and power consumption of this device compared to traditional processor architecture-based execution. SuperCISC is a heterogeneous, multicore processor architecture designed to exceed performance of traditional embedded processors while maintaining a reduced power budget compared to low-power embedded processors. At the heart of the SuperCISC processor is a multicore VLIW (Very Large Instruction Word) containing several homogeneous execution cores/functional units. In addition, complex and heterogeneous combinational hardware function cores are tightly integrated to the core VLIW engine providing an opportunity for improved performance and reduced energy consumption. Our SuperCISC processor core has been synthesized for both a 90-nm Stratix II Field Programmable Gate Aray (FPGA) and a 160-nm standard cell Application-Specific Integrated Circuit (ASIC) fabrication process from OKI, each operating at approximately 167 MHz for the VLIW core. We examine several reasons for speedup and power improvement through the SuperCISC architecture, including predicated control flow , cycle compression , and a reduction in arithmetic power consumption, which we call power compression . Finally, testing our SuperCISC processor with multimedia and signal-processing benchmarks, we show how the SuperCISC processor can provide performance improvements ranging from 7X to 160X with an average of 60X, while also providing orders of magnitude of power improvements for the computational kernels. The power improvements for our benchmark kernels range from just over 40X to over 400X, with an average savings exceeding 130X. By combining these power and performance improvements, our total energy improvements all exceed 1000X. As these savings are limited to the computational kernels of the applications, which often consume approximately 90% of the execution time, we expect our savings to approach the ideal application improvement of 10X.
ACM Transactions on Embedded Computing Systems (TECS) – Association for Computing Machinery
Published: Aug 1, 2006
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.