Access the full text.
Sign up today, get DeepDyve free for 14 days.
Rémi Domingues, M. Filippone, P. Michiardi, Jihane Zouaoui (2018)
A comparative evaluation of outlier detection algorithms: Experiments and analysesPattern Recognit., 74
P. García-Laencina, J. Sancho-Gómez, A. Figueiras-Vidal (2010)
Pattern classification with missing data: a reviewNeural Computing and Applications, 19
J. Cano, F. Herrera, M. Lozano (2003)
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental studyIEEE Trans. Evol. Comput., 7
Victoria Hodge, J. Austin
White Rose Consortium ePrints Repository
D. Gupta, Bharat Richhariya, Parashjyoti Borah (2019)
A fuzzy twin support vector machine based on information entropy for class imbalance learningNeural Computing and Applications
Bhagat Raghuwanshi, Sanyam Shukla (2019)
Class imbalance learning using UnderBagging based kernelized extreme learning machineNeurocomputing, 329
SIGMOD Record, 29
Wei-Chao Lin, Chih-Fong Tsai (2019)
Missing value imputation: a review and analysis of the literature (2006–2017)Artificial Intelligence Review, 53
Huaping Guo, Jun Zhou, C. Wu (2020)
Ensemble learning via constraint projection and undersampling technique for class-imbalance problemSoft Computing, 24
Qinbao Song, Yuchen Guo, M. Shepperd (2019)
A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect PredictionIEEE Transactions on Software Engineering, 45
Lijun Yang, Qingsheng Zhu, Jinlong Huang, Quanwang Wu, Dongdong Cheng, Xiaolu Hong (2019)
Constraint nearest neighbor for instance reductionSoft Computing
Shehroz Khan, M. Madden (2013)
One-class classification: taxonomy of study and review of techniquesThe Knowledge Engineering Review, 29
C. Bellinger, Shiven Sharma, N. Japkowicz, Osmar Zaiane (2019)
Framework for extreme imbalance classification: SWIM—sampling with the majority classKnowledge and Information Systems, 62
B. Krawczyk, I. Triguero, S. García, Michal Wozniak, F. Herrera (2019)
Instance reduction for one-class classificationKnowledge and Information Systems, 59
J. Demšar (2006)
Statistical Comparisons of Classifiers over Multiple Data SetsJ. Mach. Learn. Res., 7
Xu Han, Runbang Cui, Yanfei Lan, Yanzhe Kang, Jiang Deng, Ning Jia (2019)
A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data setsInternational Journal of Machine Learning and Cybernetics
M. Saidi, M. Bechar, N. Settouti, Amine Chikh (2017)
Instances selection algorithm by ensemble marginJournal of Experimental & Theoretical Artificial Intelligence, 30
M. Wasikowski, Xue-wen Chen (2010)
Combating the Small Sample Class Imbalance Problem Using Feature SelectionIEEE Transactions on Knowledge and Data Engineering, 22
V. Chandola, A. Banerjee, Vipin Kumar (2009)
Anomaly detection: A surveyACM Comput. Surv., 41
D. Wilson, T. Martinez, R. Holte (2000)
Reduction Techniques for Instance-Based Learning AlgorithmsMachine Learning, 38
J. Olvera-López, J. Carrasco-Ochoa, José Trinidad, J. Kittler (2010)
A review of instance selection methodsArtificial Intelligence Review, 34
Min-Wei Huang, Wei-Chao Lin, Chih-Fong Tsai (2018)
Outlier Removal in Model-Based Missing Value Imputation for Medical DatasetsJournal of Healthcare Engineering, 2018
B. Pes (2019)
Ensemble feature selection for high-dimensional data: a stability analysis across multiple domainsNeural Computing and Applications, 32
Artificial Intelligence Review, 22
Jie Sun, Hui Li, H. Fujita, Binbin Fu, Wenguo Ai (2020)
Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weightingInf. Fusion, 54
Wei-Chao Lin, Chih-Fong Tsai, Shih-Wen Ke, Chia-Wen Hung, W. Eberle (2015)
Learning to detect representative data for large scale instance selectionJ. Syst. Softw., 106
M. Breunig, H. Kriegel, R. Ng, J. Sander (2000)
LOF: identifying density-based local outliers
D. Aha, D. Kibler, M. Albert (2004)
Instance-based learning algorithmsMachine Learning, 6
Zhenxiang Chen, Qiben Yan, Hongbo Han, Shanshan Wang, Lizhi Peng, L. Wang, Bo Yang (2017)
Machine learning based mobile malware detection using highly imbalanced network trafficInf. Sci., 433-434
Victoria López, Alberto Fernández, S. García, V. Palade, F. Herrera (2013)
An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristicsInf. Sci., 250
Fei Liu, K. Ting, Zhi-Hua Zhou (2008)
Isolation Forest2008 Eighth IEEE International Conference on Data Mining
X. Guo, Yilong Yin, Cailing Dong, Gongping Yang, Guang-Tong Zhou (2008)
On the Class Imbalance Problem2008 Fourth International Conference on Natural Computation, 4
D. Tax, R. Duin (1999)
Support vector domain descriptionPattern Recognit. Lett., 20
S. García, J. Derrac, J. Cano, F. Herrera (2012)
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical StudyIEEE Transactions on Pattern Analysis and Machine Intelligence, 34
Chih-Fong Tsai, Wei-Chao Lin, Ya-Han Hu, Guan-Ting Yao (2019)
Under-sampling class imbalanced datasets by combining clustering analysis and instance selectionInf. Sci., 477
Qing Chen, Anguo Zhang, Tingwen Huang, Qianping He, Yongduan Song (2018)
Imbalanced dataset-based echo state networks for anomaly detectionNeural Computing and Applications, 32
S. R, Punniyamoorthy M. (2019)
Performance enhanced Boosted SVM for Imbalanced datasetsAppl. Soft Comput., 83
Paula Branco, L. Torgo, Rita Ribeiro (2016)
A Survey of Predictive Modeling on Imbalanced DomainsACM Computing Surveys (CSUR), 49
(2012)
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
Sara Fotouhi, S. Asadi, M. Kattan (2019)
A comprehensive data level analysis for cancer diagnosis on imbalanced dataJournal of biomedical informatics, 90
Chih-Fong Tsai, Ya-Ting Sung (2020)
Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approachesKnowl. Based Syst., 203
Class imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.Design/methodology/approachIn this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.FindingsThe experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.Originality/valueThe novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared.
Data Technologies and Applications – Emerald Publishing
Published: Oct 11, 2021
Keywords: Data mining; One-class classifiers; Class imbalance; Machine learning; Instance selection; Missing value imputation
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.