Access the full text.
Sign up today, get DeepDyve free for 14 days.
Frank Werner, Y. Sotskov (2006)
Linear equations and inequalities
Uday Bondhugula, Albert Hartono, J. Ramanujam, P. Sadayappan (2008)
A practical automatic polyhedral parallelizer and locality optimizer
Xiao Zhang, S. Dwarkadas, Kai Shen (2009)
Towards practical page coloring-based multicore cache management
Hyoseung Kim, Arvind Kandhalu, R. Rajkumar (2013)
A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems2013 25th Euromicro Conference on Real-Time Systems
A. Monsifrot, F. Bodin, R. Quiniou (2002)
A Machine Learning Approach to Automatic Production of Compiler Heuristics
F. Agakov, Edwin Bonilla, John Cavazos, Björn Franke, G. Fursin, M. O’Boyle, John Thomson, M. Toussaint, Christopher Williams (2006)
Using machine learning to focus iterative optimizationInternational Symposium on Code Generation and Optimization (CGO'06)
J. Shawcross, Filippo Falcone (2010)
— — — — — — — — — — — — ACME –
Dimitrios Nikolopoulos (2003)
Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
Jacob Lidman, Daniel J. Quinlan, Chunhua Liao, Sally A. McKee (2012)
ROSE: FTTransform-A source-to-source translation framework for exascale fault-tolerance researchProceedings of the 2012 IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W’12). IEEE, 2012
Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven Reeves, Devika Subramanian, Linda Torczon, Todd Waterman (2005)
ACME: Adaptive compilation made efficientACM SIGPLAN Not., 40
Dimitris Kaseridis, Jeffrey Stuecheli, L. John (2009)
Bank-aware Dynamic Cache Partitioning for Multicore Architectures2009 International Conference on Parallel Processing
Sheng Li, Jung Ahn, Richard Strong, J. Brockman, D. Tullsen, N. Jouppi (2009)
McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Jun Liu, Yuanrui Zhang, Wei Ding, Mahmut T. Kandemir (2011)
On-chip cache hierarchy-aware tile scheduling for multicore machinesProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE
I-Jui Sung, N. Anssari, John Stratton, Wen-mei Hwu (2010)
Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core ApplicationsInternational Journal of Parallel Programming, 40
Vasilios Kelefouras, A. Kritikakou, C. Goutis (2015)
A methodology for speeding up loop kernels by exploiting the software information and the memory architectureComput. Lang. Syst. Struct., 41
L. Almagor, K. Cooper, Alexander Grosul, T. Harvey, Steven Reeves, D. Subramanian, L. Torczon, Todd Waterman (2004)
Finding effective compilation sequences
Jacob Lidman, D. Quinlan, C. Liao, S. Mckee (2012)
ROSE::FTTransform - A source-to-source translation framework for exascale fault-tolerance researchIEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012)
P. Kulkarni, D. Whalley, G. Tyson, J. Davidson (2009)
Practical exhaustive optimization phase order exploration and evaluationACM Trans. Archit. Code Optim., 6
Y. Ye, R. West, Zhuoqun Cheng, Ye Li (2014)
COLORIS: A dynamic cache partitioning system using page coloring2014 23rd International Conference on Parallel Architecture and Compilation (PACT)
U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan (2008a)
A practical automatic polyhedral parallelizer and locality optimizerACM SIGPLAN Not., 43
Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye, Michelle Mills Strout (2007)
Parameterized tiled loops for freeACM SIGPLAN Not., 42
K. Cooper, Alexander Grosul, T. Harvey, Steven Reeves, D. Subramanian, L. Torczon, Todd Waterman (2005)
ACME: adaptive compilation made efficient
Bin Bao, C. Ding (2013)
Defensive loop tiling for shared cacheProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
E. Gutiérrez, O. Plata, Emilio Zapata (2004)
Data partitioning-based parallel irregular reductions: Research ArticlesConcurrency and Computation: Practice and Experience, 16
L. Almagor, K. Cooper, Alexander Grosul, T. Harvey, Steven Reeves, D. Subramanian, L. Torczon, Todd Waterman (2004)
Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms†
. Harvey , Steven Reeves , Devika Subramanian , Linda Torczon , and
Miquel Moret, Francisco J. Cazorla, Alex Ramrez, Mateo Valero (2008)
MLP-aware dynamic cache partitioningIn HiPEAC. Lecture Notes in Computer Science, Vol. 4917. Springer, 337--352. Retrieved from http://dblp.uni-trier.de/db/conf/hipeac/hipeac2008.html.
P. Knijnenburg, T. Kisuki, K. Gallivan, M. O’Boyle (2004)
The effect of cache models on iterative compilation for combined tiling and unrollingConcurrency and Computation: Practice and Experience, 16
(2003)
AND O’REILLY, U.-M
N. Binkert, Bradford Beckmann, Gabriel Black, S. Reinhardt, A. Saidi, Arkaprava Basu, Joel Hestness, Derek Hower, T. Krishna, S. Sardashti, Rathijit Sen, Korey Sewell, Muhammad Altaf, Nilay Vaish, M. Hill, D. Wood (2011)
The gem5 simulatorSIGARCH Comput. Archit. News, 39
Jichuan Chang, G. Sohi (2007)
Cooperative cache partitioning for chip multiprocessors
Eunjung Park, Sameer Kulkarni, John Cavazos (2011)
An evaluation of different modeling techniques for iterative compilation2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES)
Jun Liu, Yuanrui Zhang, W. Ding, M. Kandemir (2011)
On-chip cache hierarchy-aware tile scheduling for multicore machinesInternational Symposium on Code Generation and Optimization (CGO 2011)
Mark Stephenson, Saman Amarasinghe, Martin Martin, Una-May O’Reilly (2003)
Meta optimization: Improving compiler heuristics with machine learningACM SIGPLAN Not., 38
(2012)
PolyBench/C Benchmark Suite
Shi-Kuo Chang (2003)
Data Structures and Algorithms
Miquel Moretó, F. Cazorla, Alex Ramírez, M. Valero (2007)
MLP-Aware Dynamic Cache Partitioning16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)
H. Dybdahl, P. Stenström (2007)
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors2007 IEEE 13th International Symposium on High Performance Computer Architecture
Xiaoning Ding, Kaibo Wang, Xiaodong Zhang (2011)
ULCC: a user-level facility for optimizing shared cache performance on multicores
M. M. Baskaran, N. Vydyanathan, U. K. R. Bondhugula, J. Ramanujam, A. Rountev, P. Sadayappan (2009)
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processorsACM SIGPLAN Not., 44
Lakshminarayanan Renganarayanan, DaeGon Kim, S. Rajopadhye, M. Strout (2007)
Parameterized tiled loops for free
David Tam, R. Azimi, Livio Soares, M. Stumm (2007)
Managing Shared L2 Caches on Multicore Systems in Software
Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steve Reeves, Devika Subramanian, Linda Torczon, Todd Waterman (2006)
Exploring the structure of the space of compilation sequences using randomized search algorithmsJ. Supercomput., 36
Q. Lu, Christophe Alias, Uday Bondhugula, Thomas Henretty, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, Yongjian Chen, Haibo Lin, Tin-fook Ngai (2009)
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors2009 18th International Conference on Parallel Architectures and Compilation Techniques
M. Kandemir, Taylan Yemliha, Sai Muralidhara, Shekhar Srikantaiah, M. Irwin, Yuanrui Zhang (2010)
Cache topology aware computation mapping for multicores
R. Whaley, A. Petitet, J. Dongarra (2001)
Automated empirical optimizations of software and the ATLAS projectParallel Comput., 27
Xing Zhou, J. Giacalone, M. Garzarán, R. Kuhn, Yang Ni, D. Padua (2012)
Hierarchical overlapped tiling
M. Haneda, P. Knijnenburg, H. Wijshoff (2005)
Automatic selection of compiler options using non-parametric inferential statistics14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)
Jiang Lin, Q. Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, P. Sadayappan (2008)
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems2008 IEEE 14th International Symposium on High Performance Computer Architecture
M. Tartara, S. Crespi-Reghizzi (2013)
Continuous learning of compiler heuristicsACM Trans. Archit. Code Optim., 9
P. Kulkarni, S. Hines, Jason Hiser, D. Whalley, J. Davidson, Douglas Jones (2004)
Fast searches for effective optimization phase sequences
R. Reddy, Peter Petrov (2010)
Cache partitioning for energy-efficient and interference-free embedded multitaskingACM Trans. Embed. Comput. Syst., 9
P. Kulkarni, D. Whalley, G. Tyson (2007)
Evaluating Heuristic Optimization Phase Order Search AlgorithmsInternational Symposium on Code Generation and Optimization (CGO'07)
DaeGon Kim, Lakshminarayanan Renganarayanan, D. Rostron, S. Rajopadhye, M. Strout (2007)
Multi-level tiling: M for the price of oneProceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)
Uday Bondhugula, J. Ramanujam, P. Sadayappan (2015)
PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System
Karthik Sundararajan, Vasileios Porpodas, Timothy Jones, N. Topham, Björn Franke (2012)
Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPsIEEE International Symposium on High-Performance Comp Architecture
B. Bui, M. Caccamo, L. Sha, Joseph Martinez (2008)
Impact of Cache Partitioning on Multi-tasking Real Time Embedded Systems2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
L. Almagor, Keith D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, T. Waterman (2004)
Finding effective compilation sequencesACM SIGPLAN Not., 39
Mahmut Kandemir, Taylan Yemliha, SaiPrashanth Muralidhara, Shekhar Srikantaiah, Mary Jane Irwin, Yuanrui Zhnag (2010)
Cache topology aware computation mapping for multicoresACM SIGPLAN Not., 45
Zbigniew Chamski (1994)
Nested loop sequences: towards efficient loop structures in automatic parallelization1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, 2
M. Kandemir, Sai Muralidhara, S. Narayanan, Yuanrui Zhang, O. Ozturk (2009)
Optimizing shared cache behavior of chip multiprocessors2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Prasad Kulkarni, Stephen Hines, Jason Hiser, David Whalley, Jack Davidson, Douglas Jones (2004)
Fast searches for effective optimization phase sequencesACM SIGPLAN Not., 39
O. Yasar, Y. Deng, Robert Tuzun, D. Saltz (2001)
New trends in high performance computingParallel Comput., 27
Chenjie Yu, Peter Petrov (2010)
Off-chip memory bandwidth minimization through cache partitioning for multi-core platformsDesign Automation Conference
M. Stephenson, Saman Amarasinghe, M. Martin, U. O'Reilly (2003)
Meta optimization: improving compiler heuristics with machine learning
One of the biggest challenges in multicore platforms is shared cache management, especially for data-dominant applications. Two commonly used approaches for increasing shared cache utilization are cache partitioning and loop tiling. However, state-of-the-art compilers lack efficient cache partitioning and loop tiling methods for two reasons. First, cache partitioning and loop tiling are strongly coupled together, and thus addressing them separately is simply not effective. Second, cache partitioning and loop tiling must be tailored to the target shared cache architecture details and the memory characteristics of the corunning workloads. To the best of our knowledge, this is the first time that a methodology provides (1) a theoretical foundation in the above-mentioned cache management mechanisms and (2) a unified framework to orchestrate these two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory accesses by an order of magnitude keeping at the same time the number of arithmetic/addressing instructions to a minimal level. We motivate this work by showcasing that cache partitioning, loop tiling, data array layouts, shared cache architecture details (i.e., cache size and associativity), and the memory reuse patterns of the executing tasks must be addressed together as one problem, when a (near)-optimal solution is requested. To this end, we present a search space exploration analysis where our proposal is able to offer a vast deduction in the required search space.
ACM Transactions on Embedded Computing Systems (TECS) – Association for Computing Machinery
Published: May 22, 2018
Keywords: Cache partitioning
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.