Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of data generated today to derive insights. It is challenging to perform this computation, however, while fully utilizing the underlying storage media. This is because, while storage servers with large core counts are widely available, single-core performance and memory bandwidth per core grow slower than the core count per die. Computational storage offers a promising solution to this problem by utilizing dedicated compute resources along the storage processing path. We present DeltaFS Indexed Massive Directories (IMDs), a new approach to computational storage. DeltaFS IMDs harvest available (i.e., not dedicated) compute, memory, and network resources on the compute nodes of an application to perform computation on data. We demonstrate the efficiency of DeltaFS IMDs by using them to dynamically reorganize the output of a real-world simulation application across 131,072 CPU cores. DeltaFS IMDs speed up reads by 1,740× while only slightly slowing down the writing of data during simulation I/O for in situ data processing. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Storage (TOS) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/streaming-data-reorganization-at-scale-with-deltafs-indexed-massive-2VJ0GtSgbs

References (110)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2020 ACM
ISSN
1553-3077
eISSN
1553-3093
DOI
10.1145/3415581
Publisher site
See Article on Publisher Site

Abstract

Complex storage stacks providing data compression, indexing, and analytics help leverage the massive amounts of data generated today to derive insights. It is challenging to perform this computation, however, while fully utilizing the underlying storage media. This is because, while storage servers with large core counts are widely available, single-core performance and memory bandwidth per core grow slower than the core count per die. Computational storage offers a promising solution to this problem by utilizing dedicated compute resources along the storage processing path. We present DeltaFS Indexed Massive Directories (IMDs), a new approach to computational storage. DeltaFS IMDs harvest available (i.e., not dedicated) compute, memory, and network resources on the compute nodes of an application to perform computation on data. We demonstrate the efficiency of DeltaFS IMDs by using them to dynamically reorganize the output of a real-world simulation application across 131,072 CPU cores. DeltaFS IMDs speed up reads by 1,740× while only slightly slowing down the writing of data during simulation I/O for in situ data processing.

Journal

ACM Transactions on Storage (TOS)Association for Computing Machinery

Published: Sep 24, 2020

Keywords: In situ processing

There are no references for this article.