Short-text feature construction and selection in social media data: a survey

Antonela Tommasel; Daniela Godoy

doi:10.1007/s10462-016-9528-0

Loading next page...

References (68)

Yixiang Fang, Haijun Zhang, Yunming Ye, Xutao Li (2014)
Detecting hot topics from Twitter: A multiview approach
Journal of Information Science, 40
Kevin Rosa, Jeffrey Ellen (2009)
Text Classification Methodologies Applied to Micro-Text in Military Chat
2009 International Conference on Machine Learning and Applications
Yue Wu, S. Hoi, Tao Mei (2014)
Massive-scale Online Feature Selection for Sparse Ultra-high Dimensional Data
ArXiv, abs/1409.7794
H. Becker, Mor Naaman, L. Gravano (2011)
Beyond Trending Topics: Real-World Event Identification on Twitter
Proceedings of the International AAAI Conference on Web and Social Media
P. Marsden, N. Friedkin (1993)
Network Studies of Social Influence
Sociological Methods & Research, 22
Hassan Saif, Miriam Fernández, Yulan He, Harith Alani (2014)
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Online Group Feature Selection
M. McPherson, L. Smith-Lovin, J. Cook (2001)
Birds of a Feather: Homophily in Social Networks
Review of Sociology, 27
Bing-kun Wang, Yongfeng Huang, Wanxia Yang, Xing Li (2012)
Short text classification based on strong feature thesaurus
Journal of Zhejiang University SCIENCE C, 13
S. Perkins, J. Theiler (2003)
Online Feature Selection using Grafting
(2014)
Clustering tweets using wikipedia concepts
Zitao Liu, Wenchao Yu, Wei Chen, Shuran Wang, Fengyi Wu (2010)
Short Text Feature Selection for Micro-Blog Mining
2010 International Conference on Computational Intelligence and Software Engineering
Salem Alelyani, Jiliang Tang, Huan Liu (2018)
Feature Selection for Clustering: A Review
George Forman (2003)
An Extensive Empirical Study of Feature Selection Metrics for Text Classification
J. Mach. Learn. Res., 3
Jiliang Tang, Huan Liu (2012)
Feature Selection with Linked Data in Social Media
Long Jiang, Mo Yu, M. Zhou, Xiaohua Liu, T. Zhao (2011)
Target-dependent Twitter Sentiment Classification
M. Alexandrov, Alexander Gelbukh, Paolo Rosso (2005)
An Approach to Clustering Abstracts
Olena Medelyan, C. Legg, David Milne, I. Witten (2008)
Mining Meaning from Wikipedia
ArXiv, abs/0809.4530
Y Saeys, Inza In, P Larrañaga (2007)
A review of feature selection techniques in bioinformatics
Bioinformatics, 23
P. Rafeeque, S. Sendhilkumar (2011)
A survey on Short text analysis in Web
2011 Third International Conference on Advanced Computing
Lei Yu, Huan Liu (2004)
Efficient Feature Selection via Analysis of Relevance and Redundancy
J. Mach. Learn. Res., 5
Yiming Yang, Jan Pedersen (1997)
A Comparative Study on Feature Selection in Text Categorization
Jiliang Tang, Xia Hu, Huiji Gao, Huan Liu (2013)
Unsupervised Feature Selection for Multi-View Data in Social Media
Helmut Schmidt (1994)
Probabilistic part-of-speech tagging using decision trees
Jialei Wang, P. Zhao, S. Hoi, Rong Jin (2014)
Online Feature Selection and Its Applications
IEEE Transactions on Knowledge and Data Engineering, 26
Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chua (2009)
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Quanquan Gu, Jiawei Han (2011)
Towards feature selection in network
Jundong Li, Xia Hu, Jiliang Tang, Huan Liu (2015)
Unsupervised Streaming Feature Selection in Social Media
Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
Jiliang Tang, Xufei Wang, Huiji Gao, Xia Hu, Huan Liu (2012)
Enriching short text representation in microblog for clustering
Frontiers of Computer Science, 6
Hassan Saif, Yulan He, Harith Alani (2012)
Alleviating Data Sparsity for Twitter Sentiment Analysis
Duyu Tang, Furu Wei, Nan Yang, M. Zhou, Ting Liu, Bing Qin (2014)
Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
Ou Jin, N. Liu, Kai Zhao, Yong Yu, Qiang Yang (2011)
Transferring topical knowledge from auxiliary long texts for short text clustering
Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics
S. Hoi, Jialei Wang, P. Zhao, Rong Jin (2012)
Online feature selection for mining big data
Tomas Mikolov, Ilya Sutskever, Kai Chen, G. Corrado, J. Dean (2013)
Distributed Representations of Words and Phrases and their Compositionality
A. Zubiaga, Damiano Spina, Raquel Martínez-Unanue, Víctor Fresno-Fernández (2014)
Real‐time classification of Twitter trends
Journal of the Association for Information Science and Technology, 66
Isabelle Guyon, A. Elisseeff (2003)
An Introduction to Variable and Feature Selection
J. Mach. Learn. Res., 3
F. Sebastiani (2001)
Machine learning in automated text categorization
ArXiv, cs.IR/0110053
Huan Liu, Lei Yu (2005)
Toward integrating feature selection algorithms for classification and clustering
IEEE Transactions on Knowledge and Data Engineering, 17
Xindong Wu, Kui Yu, Hao Wang, W. Ding (2010)
Online Streaming Feature Selection
Jing Zhou, Dean Foster, R. Stine, L. Ungar (2006)
Streamwise Feature Selection
J. Mach. Learn. Res., 7
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Short Text Classification Improved by Learning Multi-Granularity Topics
Y Han, L Yu (2012)
A variance reduction framework for stable feature selection
Stat Anal Data Min, 5
(2005)
Algorithms, Experimentation
P. Moradi, M. Rostami (2015)
A graph theoretic approach for unsupervised feature selection
Eng. Appl. Artif. Intell., 44
Jiliang Tang, Huan Liu (2014)
Feature Selection for Social Media Data
ACM Trans. Knowl. Discov. Data, 8
(2008)
Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Chenliang Li, Aixin Sun, Anwitaman Datta (2012)
Twevent: segment-based event detection from tweets
Proceedings of the 21st ACM international conference on Information and knowledge management
Yue Han, Lei Yu (2010)
A Variance Reduction Framework for Stable Feature Selection
2010 IEEE International Conference on Data Mining
Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, M. Zhou, Ke Xu (2014)
Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification
Zongyang Ma, Aixin Sun, G. Cong (2013)
On predicting the popularity of newly emerging hashtags in Twitter
J. Assoc. Inf. Sci. Technol., 64
George John, Ron Kohavi, Karl Pfleger (1994)
Irrelevant Features and the Subset Selection Problem
Sudha Verma, Sarah Vieweg, William Corvey, L. Palen, James Martin, Martha Palmer, Aaron Schram, K. Anderson (2011)
Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency
Proceedings of the International AAAI Conference on Web and Social Media
Yansong Peng, Z. Xuefeng, Zhuo Jianyong, Xiao Yumhong (2009)
Lazy learner text categorization algorithm based on embedded feature selection
Journal of Systems Engineering and Electronics, 20
E. Airoldi, D. Blei, S. Fienberg, E. Xing (2007)
Mixed Membership Stochastic Blockmodels
Journal of machine learning research : JMLR, 9
Silvio Amir, Miguel Almeida, Bruno Martins, João Filgueiras, Mário Silva (2014)
TUGAS: Exploiting unlabelled data for Twitter sentiment analysis
Özer Özdikis, P. Senkul, Halit Oğuztüzün (2012)
Semantic Expansion of Tweet Contents for Enhanced Event Detection in Twitter
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
V. Strassen (1969)
Gaussian elimination is not optimal
Numerische Mathematik, 13
Jiliang Tang, Salem Alelyani, Huan Liu (2014)
Feature Selection for Classification: A Review
Jeffrey Pennington, R. Socher, Christopher Manning (2014)
GloVe: Global Vectors for Word Representation
Aliaksei Severyn, Alessandro Moschitti (2015)
Twitter Sentiment Analysis with Deep Convolutional Neural Networks
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
E. Gabrilovich, Shaul Markovitch (2006)
Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge
Fernando Pérez-Téllez, David Pinto, J. Cardiff, Paolo Rosso (2010)
On the difficulty of clustering company tweets
Jiliang Tang, Huan Liu (2014)
An Unsupervised Feature Selection Framework for Social Media Data
IEEE Transactions on Knowledge and Data Engineering, 26
George Forman (2004)
A pitfall and solution in multi-class feature selection for text classification
Proceedings of the twenty-first international conference on Machine learning
S. Perkins, Kevin Lacker, J. Theiler (2003)
Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space
J. Mach. Learn. Res., 3
Salem Alelyani, Huan Liu, Lei Wang (2011)
The Effect of the Characteristics of the Dataset on the Selection Stability
2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
P. Ferragina, Ugo Scaiella (2010)
Fast and Accurate Annotation of Short Texts with Wikipedia Pages
IEEE Software, 29

Publisher: Springer Journals
Copyright: Copyright © 2016 by Springer Science+Business Media Dordrecht
Subject: Computer Science; Artificial Intelligence (incl. Robotics); Computer Science, general
ISSN: 0269-2821
eISSN: 1573-7462
DOI: 10.1007/s10462-016-9528-0
Publisher site: See Article on Publisher Site

Abstract

Social networking sites such as Facebook or Twitter attract millions of users, who everyday post an enormous amount of content in the form of tweets, comments and posts. Since social network texts are usually short, learning tasks have to deal with a very high dimensional and sparse feature space, in which most features have low frequencies. As a result, extracting useful knowledge from such noisy data is a challenging task, that converts large-scale short-text learning tasks in social environments into one of the most relevant problems in machine learning and data mining. Feature selection is one of the most known and commonly used techniques for reducing the impact of the high dimensional feature space in text learning. A wide variety of feature selection techniques can be found in the literature applied to traditional, long-texts and document collections. However, short-texts coming from the social Web pose new challenges to this well-studied problem as texts’ shortness offers a limited context to extract enough statistical evidence about words relations (e.g. correlation), and instances usually arrive in continuous streams (e.g. Twitter timeline), so that the number of features and instances is unknown, among other problems. This paper surveys feature selection techniques for dealing with short texts in both offline and online settings. Then, open issues and research opportunities for performing online feature selection over social media data are discussed.

Journal

Artificial Intelligence Review – Springer Journals

Published: Nov 15, 2016

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Short-text feature construction and selection in social media data: a survey

Short-text feature construction and selection in social media data: a survey

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Short-text feature construction and selection in social media data: a survey

Short-text feature construction and selection in social media data: a survey

References (68)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies