Access the full text.
Sign up today, get DeepDyve free for 14 days.
Anil Shanbhag, Alekh Jindal, Yi Lu, S. Madden (2016)
Amoeba: A Shape changing Storage System for Big DataProc. VLDB Endow., 9
Bin He, Hui I. Hsiao, Ziyang Liu, Yu Huang, Yi Chen (2012)
Efficient iceberg query evaluation using compressed bitmap indexIEEE Trans. Knowl. Data Eng., 24
Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, H. Wei, W. Shih (2018)
UnistorFSACM Transactions on Storage (TOS), 14
J. Paredaens, D. Gucht (1988)
Possibilities and limitations of using flat operators in nested algebra expressionsProceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Daniel Tahara, Thaddeus Diamond, D. Abadi (2014)
Sinew: a SQL system for multi-structured dataProceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Google (2017)
Protocol bufferRetrieved June 13, 2019 from http://code.google.com/p/protobuf/., 13
Yuliang Sun, Yu Wang, Huazhong Yang (2018)
Bidirectional Database Storage and SQL Query Exploiting RRAM-Based Process-in-Memory StructureACM Transactions on Storage (TOS), 14
Medha Bhadkamkar, Fernando Farfán, Vagelis Hristidis, R. Rangaswami (2009)
Storing semi-structured data on disk drivesACM Trans. Storage, 5
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck Hua Lee (2015)
Oracle database in-memory: A dual format in-memory databaseProceedings of the IEEE International Conference on Data Engineering
Apache (2017)
Apache ParquetRetrieved June 13, 2019 from https://parquet.apache.org., 13
Anastassia Ailamaki, David J. Dewitt, Mark D. Hill (2002)
Data Page Layouts for Relational Databases on Deep Memory HierarchiesSpringer-Verlag New York
Sagar S. Mane, M. Emmanuel (2015)
Review and comparative study of bitmap indexing techniquesData Mining Knowl. Eng., 7
J. Shute, Radek Vingralek, Bart Samwel, B. Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, J. Cieslewicz, Ian Rae, T. Stancescu, Himani Apte (2013)
F1: A Distributed SQL Database That ScalesProc. VLDB Endow., 6
L. Soulier, L. Tamine (2017)
On the Collaboration Support in Information RetrievalACM Computing Surveys (CSUR), 50
K. Beyer, R. Ramakrishnan (1999)
Bottom-Up Computation of Sparse and Iceberg CUBEs
TPC. (2017)
TPC-H benchmarkRetrieved June 13, 2019 from http://www.tpc.org/tpch., 13
Apache (2017)
Apache TezRetrieved June 13, 2019 from https://tez.apache.org., 13
Apache (2017)
Apache Hive TMRetrieved June 13, 2019 from https://hive.apache.org, 13
NCBI. (2018)
PubMedRetrieved June 13, 2019 from http://www.ncbi.nlm.nih.gov., 13
Vinayak Borkar, Michael Carey, Raman Grover (2011)
Hyracks: A flexible and extensible foundation for data-intensive computingProceedings of the IEEE International Conference on Data Engineering
Z. Liu, B. Hammerschmidt, Douglas Mcmahon (2014)
JSON data management: supporting schema-less development in RDBMSProceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Samy Chambi, D. Lemire, R. Godin, K. Boukhalfa, Charles Allen, Fangjin Yang (2016)
Optimizing Druid with Roaring bitmapsProceedings of the 20th International Database Engineering & Applications Symposium
A. Yoshitaka, T. Ichikawa (1999)
A Survey on Content-Based Retrieval for Multimedia DatabasesIEEE Trans. Knowl. Data Eng., 11
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler (2010)
The Hadoop distributed file systemProceedings of the IEEE Symposium on MASS Storage Systems and Technologies
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das (2012)
Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computingProceedings of the USENIX Symposium on Operating Systems Design and Implementation
A. Floratou, U. Minhas, Fatma Özcan (2014)
SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database ArchitecturesProc. VLDB Endow., 7
Sergey Melnik, Andrey Gubarev, Jing Jing Long (2010)
Dremel: Interactive analysis of web-scale datasetsCommun. ACM, 3
S. Melnik, Andrey Gubarev, Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis (2010)
DremelCommunications of the ACM, 54
François Bancilhon, Philippe Richard, Michel Scholl (1982)
On line processing of compacted relationsProceedings of the 8th International Conference on Very Large Data Bases
Aubrey L. Tatarowicz, Carlo Curino, Evan P. C. Jones, Sam Madden (2012)
Lookup tables: Fine-grained partitioning for distributed databasesProceedings of the IEEE International Conference on Data Engineering
Andrew Lamb, Matt Fuller, R. Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, Chuck Bear (2012)
The Vertica Analytic Database: C-Store 7 Years LaterArXiv, abs/1208.4173
Apache (2018)
Apache AvroRetrieved June 13, 2019 from https://avro.apache.org., 13
Yang Li (2018)
CoresRetrieved June 13, 2019 from https://github.com/lwhay/cores., 13
Kurt Stockinger (2001)
Design and implementation of bitmap indices for scientific dataProceedings of the International Database Engineering and Applications Symposium
C. Chasseur, Yinan Li, J. M. Patel (2013)
Enabling JSON document stores in relational systems (long version)Proceedings of the International Workshop on the Web and Databases
Sattam Alsubaiee, Alexander Behm, V. Borkar, Zachary Heilbron, Young-Seok Kim, M. Carey, Markus Dreseler, Chen Li (2014)
Storage Management in AsterixDBProc. VLDB Endow., 7
Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alexander Behm, V. Borkar, Yingyi Bu, M. Carey, Inci Cetindil, Madhusudan Cheelangi, Khurram Faraaz, Eugenia Gabrielova, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li, Guangqiang Li, J. Ok, Nicola Onose, Pouria Pirzadeh, V. Tsotras, R. Vernica, Jian Wen, T. Westmann (2014)
AsterixDB: A Scalable, Open Source BDMSArXiv, abs/1407.0454
Mike Stonebraker, Daniel J. Abadi, Adam Batkin (2005)
C-store: A column-oriented DBMSProceedings of the International Conference on Very Large Data Bases
S. Wandelt, D. Deng, Stefan Gerdjikov, Shashwati Mishra, Petar Mitankin, Manish Patil, Enrico Siragusa, A. Tiskin, Wei Wang, Jiaying Wang, U. Leser (2014)
State-of-the-art in string similarity search and joinSIGMOD Rec., 43
Babak Behzad, Huong Vu, Thanh Luu, Joseph Huchette, S. Byna, R. Aydt, Q. Koziol, M. Snir (2013)
Taming parallel I/O complexity with auto-tuning2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
H. Paul, H. Schek, M. Scholl, G. Weikum, U. Deppisch (1987)
Architecture and implementation of the Darmstadt database kernel system
Hang Liu, H. Howie Huang (2017)
Graphene: Fine-grained IO management for graph computingProceedings of the USENIX Conference on File and Storage Technologies
Marc H. Scholl, H.-Bernhard Paul, Hans-Jörg Schek (1987)
Supporting flat relations by a nested relational kernelProceedings of the International Conference on Very Large Data Bases
M. Egenhofer (1994)
Spatial SQL: A Query and Presentation LanguageIEEE Trans. Knowl. Data Eng., 6
Apache (2018)
Apache AsterixDBRetrieved June 13, 2019 from https://asterixdb.apache.org., 13
Martin Kaufmann (2013)
Storing and Processing Temporal Data in a Main Memory Column StoreProc. VLDB Endow., 6
Eunji Lee, H. Bahn (2014)
Caching Strategies for High-Performance Storage MediaACM Trans. Storage, 10
P. Boncz, Torsten Grust, M. Keulen, S. Manegold, J. Rittinger, J. Teubner (2006)
MonetDB/XQuery: a fast XQuery processor powered by a relational engineProceedings of the 2006 ACM SIGMOD international conference on Management of data
Raúl Gracia-Tinedo, Josep Sampé (2017)
Crystal: Software-defined storage for multi-tenant object storesProceedings of the USENIX Conference on File and Storage Technologies.
Yuan Yu, Michael Isard, Dennis Fetterly (2009)
DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level languageProceedings of the USENIX Symposium on Operating Systems Design and Implementation
M. Roth, H. Korth, A. Silberschatz (1988)
Extended algebra and calculus for nested relational databasesACM Trans. Database Syst., 13
Liwen Sun, S. Krishnan, Reynold Xin, M. Franklin (2014)
A Partitioning Framework for Aggressive Data SkippingProc. VLDB Endow., 7
Peng Lu, Sai Wu, Lidan Shou, Kian-Lee Tan (2013)
An efficient and compact indexing scheme for large-scale data storeProceedings of the IEEE International Conference on Data Engineering
Brent Welch, Marc Unangst, Zainul Abbasi (2008)
Scalable performance of the Panasas parallel file systemProceedings of the USENIX Conference on File and Storage Technologies
Chin-Hsien Wu, Kuo-Yi Huang (2015)
Data Sorting in Flash MemoryACM Transactions on Storage (TOS), 11
F. Afrati, Daniel Delorey, Mosha Pasumansky, J. Ullman (2014)
Storing and Querying Tree-Structured Records in DremelProc. VLDB Endow., 7
Yansong Zhang, Xuan Zhou, Ying Zhang (2016)
Virtual denormalization via array index reference for main memory OLAPIEEE Trans. Knowl. Data Eng., 28
Jianfeng Jia, Chen Li, Michael J. Carey (2017)
Drum: A rhythmic approach to interactive analytics on large dataProceedings of the IEEE International Conference on Big Data.
(2011)
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Vijayshankar Raman, Gopi Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, J. Leenstra, S. Lightstone, Shaorong Liu, G. Lohman, Timothy Malkemus, René Müller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam Storm, Liping Zhang (2013)
DB2 with BLU Acceleration: So Much More than Just a Column StoreProc. VLDB Endow., 6
Jeffrey Dean, Sanjay Ghemawat (2004)
MapReduce: Simplified data processing on large clustersProceedings of the USENIX Symposium on Operating Systems Design and Implementation
Pengfei Xuan, Walter B. Ligon, Pradip K. Srimani, Rong Ge, Feng Luo (2016)
Accelerating big data analytics on HPC clusters using two-level storageParallel Comput, 61
Apache (2017)
Apache SparkRetrieved June 13, 2019 from https://spark.apache.org., 13
Zhiyi Wang, Shimin Chen (2017)
Exploiting Common Patterns for Tree-Structured DataProceedings of the 2017 ACM International Conference on Management of Data
Douglas W. Comer, Philip S. Yu (1987)
A vertical partitioning algorithm for relational databasesProceedings of the IEEE International Conference on Data Engineering
Michael Rys, Gerhard Weikum (1994)
Heuristic optimization of speedup and benefit/cost for parallel database scans on shared-memory multiprocessorsProceedings of the International Parallel Processing Symposium
The relatively high cost of record deserialization is increasingly becoming the bottleneck of column-based storage systems in tree-structured applications [58]. Due to record transformation in the storage layer, unnecessary processing costs derived from fields and rows irrelevant to queries may be very heavy in nested schemas, significantly wasting the computational resources in large-scale analytical workloads. This leads to the question of how to reduce both the deserialization and IO costs of queries with highly selective filters following arbitrary paths in a nested schema. We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly selective filters down into column-based storage engines, where each filter consists of several filtering conditions on a field. By applying highly selective filters in the storage layer, we demonstrate that both the deserialization and IO costs could be significantly reduced. We show how to introduce fine-grained composition on filtering results. We generalize this technique by two pair-wise operations, rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open-source platform. For practical purposes, we highlight how to build a column storage engine and how to drive a query efficiently based on a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad hoc queries. The experiments, including a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7--26.9 compared to state-of-the-art platforms in scan-intensive workloads.
ACM Transactions on Storage (TOS) – Association for Computing Machinery
Published: Jun 26, 2019
Keywords: Columnar storage
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.