Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core... On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Microthreaded Many-Core Architecture with Distributed Caches QIANG YANG, JIAN FU, RAPHAEL POSS, and CHRIS JESSHOPE, University of Amsterdam When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions. Categories and Subject Descriptors: C.4.0 [Performance of Systems]: Design Studies General Terms: Design, Experimentation, Performance http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Embedded Computing Systems (TECS) Association for Computing Machinery

On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

Loading next page...
 
/lp/association-for-computing-machinery/on-chip-traffic-regulation-to-reduce-coherence-protocol-cost-on-a-8rLNBwmVVK

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2014 by ACM Inc.
ISSN
1539-9087
DOI
10.1145/2567931
Publisher site
See Article on Publisher Site

Abstract

On-Chip Traffic Regulation to Reduce Coherence Protocol Cost on a Microthreaded Many-Core Architecture with Distributed Caches QIANG YANG, JIAN FU, RAPHAEL POSS, and CHRIS JESSHOPE, University of Amsterdam When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions. Categories and Subject Descriptors: C.4.0 [Performance of Systems]: Design Studies General Terms: Design, Experimentation, Performance

Journal

ACM Transactions on Embedded Computing Systems (TECS)Association for Computing Machinery

Published: Mar 1, 2014

There are no references for this article.