Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Two-stage Text Feature Selection Algorithm for Improving Text Classification

A Two-stage Text Feature Selection Algorithm for Improving Text Classification As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three benchmark datasets, and the results show that the proposed classification algorithm has achieved the maximum accuracy when compared to the existing algorithms. The proposed algorithm is compared to Term Frequency-Inverse Document Frequency, Balanced Accuracy Measure, GINI Index, Information Gain, and Chi-Square. The experimental results clearly show the strength of the proposed algorithm. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/a-two-stage-text-feature-selection-algorithm-for-improving-text-AdUlvgZxiw

References (56)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2021 Association for Computing Machinery.
ISSN
2375-4699
eISSN
2375-4702
DOI
10.1145/3425781
Publisher site
See Article on Publisher Site

Abstract

As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three benchmark datasets, and the results show that the proposed classification algorithm has achieved the maximum accuracy when compared to the existing algorithms. The proposed algorithm is compared to Term Frequency-Inverse Document Frequency, Balanced Accuracy Measure, GINI Index, Information Gain, and Chi-Square. The experimental results clearly show the strength of the proposed algorithm.

Journal

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)Association for Computing Machinery

Published: May 6, 2021

Keywords: Feature selection

There are no references for this article.