Access the full text.
Sign up today, get DeepDyve free for 14 days.
Y. Gui, Zhiqiang Gao, Renyong Li, Xin Yang (2012)
Hierarchical Text Classification for News Articles Based-on Named Entities
S. Chaudhari (2013)
Classification of News and Research Articles Using Text Pattern MiningIOSR Journal of Computer Engineering, 14
Kevin Scannell (2007)
The Crúbadán Project: Corpus building for under-resourced languages
T. Jurka, Loren Collingwood, Amber Boydstun, E. Grossman, W. Atteveldt (2013)
RTextTools: A Supervised Learning Package for Text ClassificationR J., 5
Ioannis Antonellis, C. Bouras, V. Poulopoulos (2006)
Personalized News Categorization Through Scalable Text Classification
S. Raschka (2015)
Python Machine Learning
N. Hartmann, Erick Fonseca, C. Shulby, Marcos Treviso, Jéssica Rodrigues, S. Aluísio (2017)
Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language TasksArXiv, abs/1708.06025
Dimitris Liparas, Yaakov HaCohen-Kerner, A. Moumtzidou, S. Vrochidis, Y. Kompatsiaris (2014)
News Articles Classification Using Random Forests and Weighted Multimodal Features
Robin Swezey, Hiroyuki Sano, Shun Shiramatsu, Tadachika Ozono, T. Shintani (2012)
Automatic Detection of News Articles of Interest to Regional Communities
Ray Larson (2008)
Introduction to Information Retrieval
T. Rubin, America Chambers, Padhraic Smyth, M. Steyvers (2011)
Statistical topic models for multi-label document classificationMachine Learning, 88
(2010)
South-East European Times : A parallel corpus of Balkan languages , Francis Tyers and
Tomas Mikolov, Kai Chen, G. Corrado, J. Dean (2013)
Efficient Estimation of Word Representations in Vector Space
Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov (2016)
Bag of Tricks for Efficient Text ClassificationArXiv, abs/1607.01759
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov (2016)
Enriching Word Vectors with Subword InformationTransactions of the Association for Computational Linguistics, 5
Daniel Zhou, P. Resnick, Q. Mei (2011)
Classifying the Political Leaning of News Articles and Users from User VotesProceedings of the International AAAI Conference on Web and Social Media
Corinna Cortes, V. Vapnik (1995)
Support-Vector NetworksMachine Learning, 20
K. Crammer, O. Dekel, Joseph Keshet, S. Shalev-Shwartz, Y. Singer (2003)
Online Passive-Aggressive AlgorithmsJ. Mach. Learn. Res., 7
Fabian Pedregosa, G. Varoquaux, Alexandre Gramfort, V. Michel, B. Thirion, O. Grisel, Mathieu Blondel, Gilles Louppe, P. Prettenhofer, Ron Weiss, Ron Weiss, J. Vanderplas, Alexandre Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay (2011)
Scikit-learn: Machine Learning in PythonArXiv, abs/1201.0490
AbstractBackground: Text classification is a very important task in information retrieval. Its objective is to classify new text documents in a set of predefined classes, using different supervised algorithms. Objectives: We focus on the text classification for Albanian news articles using two approaches. Methods/Approach: In the first approach, the words in a collection are considered as independent components, allocating to each of them a conforming vector in the vector’s space. Here we utilized nine classifiers from the scikit-learn package, training the classifiers with part of news articles (80%) and testing the accuracy with the remaining part of these articles. In the second approach, the text classification treats words based on their semantic and syntactic word similarities, supposing a word is formed by n-grams of characters. In this case, we have used the fastText, a hierarchical classifier, that considers local word order, as well as sub-word information. We have measured the accuracy for each classifier separately. We have also analyzed the training and testing time. Results: Our results show that the bag of words model does better than fastText when testing the classification process for not a large dataset of text. FastText shows better performance when classifying multi-label text. Conclusions: News articles can serve to create a benchmark for testing classification algorithms of Albanian texts. The best results are achieved with a bag of words model, with an accuracy of 94%.
Business Systems Research Journal – de Gruyter
Published: Apr 1, 2019
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.