Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

REMIAN

REMIAN Missing value (MV) imputation is a critical preprocessing means for data mining. Nevertheless, existing MV imputation methods are mostly designed for batch processing, and thus are not applicable to streaming data, especially those with poor quality. In this article, we propose a framework, called Real-time and Error-tolerant Missing vAlue ImputatioN (REMAIN), to impute MVs in poor-quality streaming data. Instead of imputing MVs based on all the observed data, REMAIN first initializes the MV imputation model based on a-RANSAC which is capable of detecting and rejecting anomalies in an efficient manner, and then incrementally updates the model parameters upon the arrival of new data to support real-time MV imputation. As the correlations among attributes of the data may change over time in unforseenable ways, we devise a deterioration detection mechanism to capture the deterioration of the imputation model to further improve the imputation accuracy. Finally, we conduct an extensive evaluation on the proposed algorithms using real-world and synthetic datasets. Experimental results demonstrate that REMAIN achieves significantly higher imputation accuracy over existing solutions. Meanwhile, REMAIN improves up to one order of magnitude in time cost compared with existing approaches. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Knowledge Discovery from Data (TKDD) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/remian-ypOdLkhrlH

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2020 ACM
ISSN
1556-4681
eISSN
1556-472X
DOI
10.1145/3412364
Publisher site
See Article on Publisher Site

Abstract

Missing value (MV) imputation is a critical preprocessing means for data mining. Nevertheless, existing MV imputation methods are mostly designed for batch processing, and thus are not applicable to streaming data, especially those with poor quality. In this article, we propose a framework, called Real-time and Error-tolerant Missing vAlue ImputatioN (REMAIN), to impute MVs in poor-quality streaming data. Instead of imputing MVs based on all the observed data, REMAIN first initializes the MV imputation model based on a-RANSAC which is capable of detecting and rejecting anomalies in an efficient manner, and then incrementally updates the model parameters upon the arrival of new data to support real-time MV imputation. As the correlations among attributes of the data may change over time in unforseenable ways, we devise a deterioration detection mechanism to capture the deterioration of the imputation model to further improve the imputation accuracy. Finally, we conduct an extensive evaluation on the proposed algorithms using real-world and synthetic datasets. Experimental results demonstrate that REMAIN achieves significantly higher imputation accuracy over existing solutions. Meanwhile, REMAIN improves up to one order of magnitude in time cost compared with existing approaches.

Journal

ACM Transactions on Knowledge Discovery from Data (TKDD)Association for Computing Machinery

Published: Sep 28, 2020

Keywords: Missing value

References