Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automatic text classification method based on Zipf’s law

Automatic text classification method based on Zipf’s law This paper describes a method for automatic text classification based on analysing the deviation of the word distribution from Zipf’s law, combined with the zonal data-processing approach. Deviation is understood as the difference between the actual numerical score of a word and its score according to Zipf’s law. The proposed method involves the division of input and reference texts into J 0, J 1, and J 2 zones, and the creation of a numerical series using the words that are contained in the J 0 zone. The constructed numerical series shows the difference between the real scores of words and the scores calculated according to Zipf’s law. The proposed method can significantly reduce text dimensionality and thus improve the running speed of automatic text classification. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Automatic Documentation and Mathematical Linguistics Springer Journals

Automatic text classification method based on Zipf’s law

Loading next page...
 
/lp/springer-journals/automatic-text-classification-method-based-on-zipf-s-law-DtAozdRO08
Publisher
Springer Journals
Copyright
Copyright © 2015 by Allerton Press, Inc.
Subject
Computer Science; Information Storage and Retrieval
ISSN
0005-1055
eISSN
1934-8371
DOI
10.3103/S0005105515030048
Publisher site
See Article on Publisher Site

Abstract

This paper describes a method for automatic text classification based on analysing the deviation of the word distribution from Zipf’s law, combined with the zonal data-processing approach. Deviation is understood as the difference between the actual numerical score of a word and its score according to Zipf’s law. The proposed method involves the division of input and reference texts into J 0, J 1, and J 2 zones, and the creation of a numerical series using the words that are contained in the J 0 zone. The constructed numerical series shows the difference between the real scores of words and the scores calculated according to Zipf’s law. The proposed method can significantly reduce text dimensionality and thus improve the running speed of automatic text classification.

Journal

Automatic Documentation and Mathematical LinguisticsSpringer Journals

Published: Aug 1, 2015

References