Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Correlation-based feature subset selection technique for web spam classification

Correlation-based feature subset selection technique for web spam classification In past years, different machine learning algorithms and web spam features have been created to recognise the spam. The key part of progression of machine learning (ML) depends on the features being utilised. If we have features which correlate with each other then it is easy for ML to learn and if we have features which are very complex then ML may not be able to learn. It is the most imperative and basic area where the majority of the applications in a machine learning are going on. In this paper, correlation-based feature selection (CFS) technique (with best-first search) is used which selects features that are most efficient. Two datasets (WebSpam-UK2006 and WebSpam-UK2007) and four classifiers (Naïve Bayes, J48, random forest and AdaBoost) are used for conducting the experiment. The results have shown significant improvement in AUC (area under receiver operating characteristic curve) for Naïve Bayes and J48. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Engineering and Technology Inderscience Publishers

Correlation-based feature subset selection technique for web spam classification

Loading next page...
 
/lp/inderscience-publishers/correlation-based-feature-subset-selection-technique-for-web-spam-tTkU7OeFJs

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Inderscience Publishers
Copyright
Copyright © Inderscience Enterprises Ltd
ISSN
1476-1289
eISSN
1741-9212
DOI
10.1504/IJWET.2018.097562
Publisher site
See Article on Publisher Site

Abstract

In past years, different machine learning algorithms and web spam features have been created to recognise the spam. The key part of progression of machine learning (ML) depends on the features being utilised. If we have features which correlate with each other then it is easy for ML to learn and if we have features which are very complex then ML may not be able to learn. It is the most imperative and basic area where the majority of the applications in a machine learning are going on. In this paper, correlation-based feature selection (CFS) technique (with best-first search) is used which selects features that are most efficient. Two datasets (WebSpam-UK2006 and WebSpam-UK2007) and four classifiers (Naïve Bayes, J48, random forest and AdaBoost) are used for conducting the experiment. The results have shown significant improvement in AUC (area under receiver operating characteristic curve) for Naïve Bayes and J48.

Journal

International Journal of Web Engineering and TechnologyInderscience Publishers

Published: Jan 1, 2018

There are no references for this article.