Access the full text.
Sign up today, get DeepDyve free for 14 days.
Karl Pettis, R. Hansen, J. Davidson (1990)
Profile guided code positioning
Chuanjun Zhang, F. Vahid, W. Najjar (2003)
A highly configurable cache architecture for embedded systems30th Annual International Symposium on Computer Architecture, 2003. Proceedings.
Nicholas Gloy, T. Blackwell, Michael Smith, B. Calder (1997)
Procedure placement using temporal ordering informationProceedings of 30th Annual International Symposium on Microarchitecture
M. Hill, A. Smith (1989)
Evaluating Associativity in CPU CachesIEEE Trans. Computers, 38
Intel: http://www.intel.com/design/intelxscale
ARM9: http://www.arm.com/products
Tiantian Liu, Minming Li, C. Xue (2009)
Minimizing WCET for Real-Time Embedded Systems via Static Instruction Cache Locking2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium
MIPS32: http://www.mips.com
Tiantian Liu, Minming Li, C. Xue (2012)
Instruction Cache Locking for Embedded Systems using Probability ProfileJournal of Signal Processing Systems, 69
(2006)
MPC8XX performance-driven optimization of caches and mmu configuration
T. Ball, J. Larus (1993)
Branch prediction for free
Yun Liang, T. Mitra (2010)
Instruction cache locking using temporal reuse profileDesign Automation Conference
(2006)
Dynamic Instruction Cache Locking in Hard Real-Time Systems
K. Anand, R. Barua (2009)
Instruction cache locking inside a binary rewriter
Matthew Guthaus, J. Ringenberg, Dan Ernst, T. Austin, T. Mudge, Richard Brown (2001)
MiBench: A free, commercially representative embedded benchmark suiteProceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538)
Chengkai Li (2020)
Cluster AnalysisUnivariate, Bivariate, and Multivariate Statistics Using R
H. Kawaji, Yosuke Yamaguchi, H. Matsuda, A. Hashimoto (2001)
A graph-based clustering method for a large set of sequences using a graph partitioning algorithm.Genome informatics. International Conference on Genome Informatics, 12
H. Patil, R. Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi (2004)
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
MPC8XX: http://cache.freescale.com
Chris Lattner, Vikram Adve (2004)
LLVM: a compilation framework for lifelong program analysis & transformationInternational Symposium on Code Generation and Optimization, 2004. CGO 2004.
Youfeng Wu, J. Larus (1994)
Static branch frequency and program profile analysisProceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture
(2013)
Article 156, Publication date
B. Buck, J. Hollingsworth (2000)
An API for Runtime Code PatchingThe International Journal of High Performance Computing Applications, 14
Pin: http://pintool.org
Amir Hashemi, D. Kaeli, B. Calder (1997)
Efficient procedure mapping using cache line coloring
S. McFarling (1989)
Program optimization for instruction caches
(2001)
MIPS32 architecture for programmers volume ii: The mips32 instruction set
(2000)
ARM940T technical reference manual
(2007)
3 rd generation intel xscale microarchirtecture
H. Patil, R. Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi (2004)
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation37th International Symposium on Microarchitecture (MICRO-37'04)
Branch Prediction-Directed Dynamic Instruction Cache Locking for Embedded Systems KENI QIU, MENGYING ZHAO, and CHUN JASON XUE, City University of Hong Kong ALEX ORAILOGLU, University of California, San Diego Cache locking is a cache management technique to preclude the replacement of locked cache contents. Cache locking is often adopted to improve cache access predictability in Worst-Case Execution Time (WCET) analysis. Static cache locking methods have been proposed recently to improve Average-Case Execution Time (ACET) performance. This article presents an approach, Branch Prediction-directed Dynamic Cache Locking (BPDCL), to improve system performance through cache conflict miss reduction. In the proposed approach, the control flow graph of a program is first partitioned into disjoint execution regions, then memory blocks worth locking are determined by calculating the locking profit for each region. These two steps are conducted during compilation time. At runtime, directed by branch predictions, locking routines are prefetched into a small high-speed buffer. The predetermined cache locking contents are loaded and locked at specific execution points during program execution. Experimental results show that the proposed BPDCL method exhibits an average improvement of 25.9%, 13.8%, and 8.0% on cache miss rate reduction in comparison to cases with no cache locking, the
ACM Transactions on Embedded Computing Systems (TECS) – Association for Computing Machinery
Published: Oct 6, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.