Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Protocol-Aware Recovery for Consensus-Based Distributed Storage

Protocol-Aware Recovery for Consensus-Based Distributed Storage We introduce protocol-aware recovery (Par), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of Par through the design and implementation of <underline>c</underline>orruption-<underline>t</underline>olerant <underline>r</underline>ep<underline>l</underline>ication (Ctrl), a Par mechanism specific to replicated state machine (RSM) systems. We experimentally show that the Ctrl versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the Ctrl versions achieve this reliability with little performance overheads. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Storage (TOS) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/protocol-aware-recovery-for-consensus-based-distributed-storage-2s0Y42Fp2A

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2018 ACM
ISSN
1553-3077
eISSN
1553-3093
DOI
10.1145/3241062
Publisher site
See Article on Publisher Site

Abstract

We introduce protocol-aware recovery (Par), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of Par through the design and implementation of <underline>c</underline>orruption-<underline>t</underline>olerant <underline>r</underline>ep<underline>l</underline>ication (Ctrl), a Par mechanism specific to replicated state machine (RSM) systems. We experimentally show that the Ctrl versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the Ctrl versions achieve this reliability with little performance overheads.

Journal

ACM Transactions on Storage (TOS)Association for Computing Machinery

Published: Oct 3, 2018

Keywords: Storage faults

References