Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Big data multi-query optimisation with Apache Flink

Big data multi-query optimisation with Apache Flink Big data analytic frameworks, such as MapReduce, Spark and Flink, have recently gained more popularity to process large data. Flink is an open-source Apache-hosted big data analytic framework for processing batch and streaming data. For historical data processing (batch), Flink's query optimiser is built based on techniques which have been used in the parallel database systems. Flink query optimiser translates the queries into jobs which are repeatedly submitted with similar tasks. Therefore, exploiting the similarity of tasks can avoid redundant computation. In this paper, Flink multi-query optimisation system, Flink-MQO, has been proposed and built on top of Flink software stack. It is considered as an add-on to Apache Flink to optimise multi-query based on data sharing. The Flink-MQO system exploits the data sharing opportunities of selection operators to eliminate the redundancy and duplication of data in-network movement of multi-query. Experimental results show that the exploiting of shared selection operators in big data multi-query can provide promising query execution time. Therefore, Flink-MQO system can potentially be used in the stream processing to improve the performance of the real-time applications. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Engineering and Technology Inderscience Publishers

Loading next page...
 
/lp/inderscience-publishers/big-data-multi-query-optimisation-with-apache-flink-4ncilo78qY
Publisher
Inderscience Publishers
Copyright
Copyright © Inderscience Enterprises Ltd
ISSN
1476-1289
eISSN
1741-9212
DOI
10.1504/IJWET.2018.092401
Publisher site
See Article on Publisher Site

Abstract

Big data analytic frameworks, such as MapReduce, Spark and Flink, have recently gained more popularity to process large data. Flink is an open-source Apache-hosted big data analytic framework for processing batch and streaming data. For historical data processing (batch), Flink's query optimiser is built based on techniques which have been used in the parallel database systems. Flink query optimiser translates the queries into jobs which are repeatedly submitted with similar tasks. Therefore, exploiting the similarity of tasks can avoid redundant computation. In this paper, Flink multi-query optimisation system, Flink-MQO, has been proposed and built on top of Flink software stack. It is considered as an add-on to Apache Flink to optimise multi-query based on data sharing. The Flink-MQO system exploits the data sharing opportunities of selection operators to eliminate the redundancy and duplication of data in-network movement of multi-query. Experimental results show that the exploiting of shared selection operators in big data multi-query can provide promising query execution time. Therefore, Flink-MQO system can potentially be used in the stream processing to improve the performance of the real-time applications.

Journal

International Journal of Web Engineering and TechnologyInderscience Publishers

Published: Jan 1, 2018

There are no references for this article.