Access the full text.
Sign up today, get DeepDyve free for 14 days.
Yiming Yang, T. Joachims (2008)
Text categorizationScholarpedia, 3
V. Yatsko (2015)
Automatic text classification method based on Zipf’s lawAutomatic Documentation and Mathematical Linguistics, 49
M. Keyvanpour, Maryam Imani (2013)
Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithmsIntell. Data Anal., 17
M. Dalal, M. Zaveri (2011)
AUTOMATIC TEXT CLASSIFICATION: A TECHNICAL REVIEWInternational Journal of Computer Applications, 28
V. Korde, C. Mahender (2012)
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEYInternational Journal of Artificial Intelligence & Applications, 3
(2008)
https://plus. maths.org/content/mystery-zipf
Kamran Kowsari, K. Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, Donald Brown (2019)
Text Classification Algorithms: A SurveyInf., 10
Zied Haj-Yahia, Adrien Sieg, Léa Deleris (2019)
Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings
V. Yatsko, M. Starikov, A. Butakov (2010)
Automatic genre recognition and adaptive text summarizationAutomatic Documentation and Mathematical Linguistics, 44
Ainura Madylova, Ş. Öğüdücü (2009)
A taxonomy based semantic similarity of documents using the cosine measure2009 24th International Symposium on Computer and Information Sciences
(2021)
Free eBooks -Project Gutenberg
S. Piantadosi (2014)
Zipf’s word frequency law in natural language: A critical review and future directionsPsychonomic Bulletin & Review, 21
(2013)
TF*IDF revisited, Int
(2008)
The Mystery of Zipf
C. Fox (1989)
A stop list for general textSIGIR Forum, 24
K. McKeown, Min-Yen Kan (1999)
Information Extraction and Summarization: Domain Independence through Focus Types
V. Yatsko (2020)
A Methodology of Using a Concordancer and Table Processor for Authorship AttributionAutomatic Documentation and Mathematical Linguistics, 54
This paper describes the procedures and specific features of application of a new method of automatic classification based on calculation of deviations of stop-words distribution from Zipfian score. To neutralize discrepancies in texts lengths the author describes and applies the text undersampling methodology. The concept of an iterative threshold level is introduced to reduce text dimensionality to several dozen units. To evaluate the method’s efficiency the author has developed discriminative and similarative powers indicators that underlie the generalized efficiency score. Fourteen tests have been conducted, including comparison with the cosine similarity measure, that proved high efficiency of the proposed method for the solution of the tasks of authorship attribution of texts of fiction and clusterization of political texts.
Automatic Documentation and Mathematical Linguistics – Springer Journals
Published: May 1, 2021
Keywords: automatic text classification; methods and algorithms; Zipf distribution; reduction of text dimensionality; threshold levels; efficiency indices
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.