Access the full text.
Sign up today, get DeepDyve free for 14 days.
Asaf Cidon, Stephen Rumble, Ryan Stutsman, S. Katti, J. Ousterhout, M. Rosenblum (2013)
Copysets: Reducing the Frequency of Data Loss in Cloud Storage
I. Iliadis, V. Venkatesan (2015)
Rebuttal to “Beyond MTTDL: A Closed-Form RAID-6 Reliability Equation”ACM Transactions on Storage (TOS), 11
(2015)
ACM Transactions on Storage
S. Weil, S. Brandt, E. Miller, C. Maltzahn (2006)
CRUSH: Controlled, Scalable, Decentralized Placement of Replicated DataACM/IEEE SC 2006 Conference (SC'06)
V. Venkatesan, I. Iliadis, C. Fragouli, R. Urbanke (2011)
Reliability of Clustered vs. Declustered Replica Placement in Data Storage Systems2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
(2007)
The Hadoop Distributed File System: Architecture and Design
K. Greenan, J. Plank, Jay Wylie (2010)
Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability
Eduardo Pinheiro, W. Weber, L. Barroso (2007)
Failure Trends in a Large Disk Drive Population
Bianca Schroeder, Garth Gibson (2007)
Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?
Lakshmi Bairavasundaram, Garth Goodson, S. Pasupathy, J. Schindler (2007)
An analysis of latent sector errors in disk drives
Peter Chen, Edward Lee, Garth Gibson, R. Katz, D. Patterson (1994)
RAID: high-performance, reliable secondary storageACM Comput. Surv., 26
(2016)
Article 24, Publication date
J. Elerath, J. Schindler (2014)
Beyond MTTDL: A Closed-Form RAID 6 Reliability EquationACM Trans. Storage, 10
Vincenzo Guerriero (2012)
Power Law Distribution: Method of Multi-scale Inferential Statistics, 1
S. Weil, S. Brandt, E. Miller, D. Long, C. Maltzahn (2006)
Ceph: a scalable, high-performance distributed file system
V. Venkatesan, I. Iliadis (2012)
A General Reliability Model for Data Storage Systems2012 Ninth International Conference on Quantitative Evaluation of Systems
J. Angus (1988)
On computing MTBF for a k-out-of-n:G repairable systemIEEE Transactions on Reliability, 37
Bianca Schroeder, Sotirios Damouras, Phillipa Gill (2010)
Understanding latent sector errors and how to protect against themACM Trans. Storage, 6
V. Venkatesan, I. Iliadis, R. Haas (2012)
Reliability of Data Storage Systems under Network Rebuild Bandwidth Constraints2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Qin Xin, E. Miller, T. Schwarz, D. Long, S. Brandt, W. Litwin (2003)
Reliability mechanisms for very large storage systems20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings.
(2014)
Freezing exabytes of data at Facebook's cold storage
Hsu Kao, Jehan-Francois Pâris, T. Schwarz, D. Long (2013)
A flexible simulation tool for estimating data loss risks in storage arrays2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST)
D. Ford, François Labelle, Florentina Popovici, M. Stokely, Van-Anh Truong, L. Barroso, C. Grimes, Sean Quinlan (2010)
Availability in Globally Distributed Storage Systems
J. Elerath, M. Pecht (2007)
Enhanced Reliability Modeling of RAID Storage Systems37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)
Jason Resch, Ilya Volvovski (2013)
Reliability Models for Highly Fault-tolerant Storage SystemsArXiv, abs/1310.4702
Michael Ovsiannikov, S. Rus, Damian Reeves, Paul Sutter, Sriram Rao, Jim Kelly (2013)
A The Quantcast File SystemProc. VLDB Endow., 6
KK Rao, J. Hafner, Richard Golding (2006)
Reliability for Networked Storage NodesIEEE Transactions on Dependable and Secure Computing, 8
M. Storer, K. Greenan, E. Miller, K. Voruganti (2008)
Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage
Tools for Predicting the Reliability of Large-Scale Storage Systems ROBERT J. HALL, AT&T Labs Research Data-intensive applications require extreme scaling of their underlying storage systems. Such scaling, together with the fact that storage systems must be implemented in actual data centers, increases the risk of data loss from failures of underlying components. Accurate engineering requires quantitatively predicting reliability, but this remains challenging due to the need to account for extreme scale, redundancy scheme type and strength, distribution architecture, and component dependencies. This article introduces CQSIM-R, a tool suite for predicting the reliability of large-scale storage system designs and deployments. CQSIM-R includes (a) direct calculations based on an only-drives-fail failure model and (b) an event-based simulator for detailed prediction that handles failures of and failure dependencies among arbitrary (drive or nondrive) components. These are based on a common combinatorial framework for modeling placement strategies. The article demonstrates CQSIM-R using models of common storage systems, including replicated and erasure coded designs. New results, such as the poor reliability scaling of spread-placed systems and a quantification of the impact of data center distribution and rack-awareness on reliability, demonstrate the usefulness and generality of the tools. Analysis and empirical studies show the
ACM Transactions on Storage (TOS) – Association for Computing Machinery
Published: Aug 16, 2016
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.