Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

ClassiNet -- Predicting Missing Features for Short-Text Classification

ClassiNet -- Predicting Missing Features for Short-Text Classification Short and sparse texts such as tweets, search engine snippets, product reviews, and chat messages are abundant on the Web. Classifying such short-texts into a pre-defined set of categories is a common problem that arises in various contexts, such as sentiment classification, spam detection, and information recommendation. The fundamental problem in short-text classification is feature sparseness -- the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex vi in the ClassiNet, where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge eij connecting a vertex vi to a vertex vj represents the conditional probability that given vi exists in an instance, vj also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x, we find similar features from ClassiNet that did not appear in x, and append those features in the representation of x. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Knowledge Discovery from Data (TKDD) Association for Computing Machinery

ClassiNet -- Predicting Missing Features for Short-Text Classification

Loading next page...
 
/lp/association-for-computing-machinery/classinet-predicting-missing-features-for-short-text-classification-o0yr6T8Hon

References (79)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2018 ACM
ISSN
1556-4681
eISSN
1556-472X
DOI
10.1145/3201578
Publisher site
See Article on Publisher Site

Abstract

Short and sparse texts such as tweets, search engine snippets, product reviews, and chat messages are abundant on the Web. Classifying such short-texts into a pre-defined set of categories is a common problem that arises in various contexts, such as sentiment classification, spam detection, and information recommendation. The fundamental problem in short-text classification is feature sparseness -- the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex vi in the ClassiNet, where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge eij connecting a vertex vi to a vertex vj represents the conditional probability that given vi exists in an instance, vj also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x, we find similar features from ClassiNet that did not appear in x, and append those features in the representation of x. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

Journal

ACM Transactions on Knowledge Discovery from Data (TKDD)Association for Computing Machinery

Published: Jun 27, 2018

Keywords: Classifier networks

There are no references for this article.