Access the full text.
Sign up today, get DeepDyve free for 14 days.
(2015)
Tensorflow: Large-scale machine learning on heterogeneous systems
Huan Liu, R. Setiono (1995)
Chi2: feature selection and discretization of numeric attributesProceedings of 7th IEEE International Conference on Tools with Artificial Intelligence
I. Dinov (2018)
Deep Learning, Neural Networks
L. Ladha (2011)
FEATURE SELECTION METHODS AND ALGORITHMS
Yoshua Bengio, Olivier Delalleau, Nicolas Roux (2005)
The Curse of Highly Variable Functions for Local Kernel Machines
Nasser M. Nasrabadi (2007)
Pattern recognition and machine learningJournal of Electronic Imaging, 16
Yoshua Bengio, Aaron Courville, Pascal Vincent (2012)
Representation Learning: A Review and New PerspectivesIEEE Transactions on Pattern Analysis and Machine Intelligence, 35
Kui Yu, Xindong Wu, Wei Ding, Jian Pei (2016)
Scalable and accurate online feature selection for big dataACM Transactions on Knowledge Discovery from Data, 11
S. Al-Semari, F. Alajaji, T. Fuja (1999)
Sequence MAP decoding of trellis codes for Gaussian and Rayleigh channelsIEEE Transactions on Vehicular Technology, 48
H. Sak, A. Senior, F. Beaufays (2014)
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
(1997)
On the pairing Of the softmax activation and cross entropy penalty functions and the derivation of the softmax activation function
Kui Yu, Xindong Wu, W. Ding, J. Pei (2014)
Towards Scalable and Accurate Online Feature Selection for Big Data2014 IEEE International Conference on Data Mining
Xingquan Zhu (2011)
Cross-Domain Semi-Supervised Learning Using Feature FormulationIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41
C. Brodley, P. Utgoff (1995)
Multivariate Decision TreesMachine Learning, 19
Marcello Federico, M. Cettolo (2007)
Efficient Handling of N-gram Language Models for Statistical Machine Translation
Huan Liu, H. Motoda (1998)
Feature Extraction, Construction and Selection: A Data Mining PerspectiveJournal of the American Statistical Association, 94
Jie Guo, Bin Song, F. Yu, Yuhao Chi, C. Yuen (2019)
Fast Video Frame Correlation Analysis for Vehicular Networks by Using CVS–CNNIEEE Transactions on Vehicular Technology, 68
D. Ciresan, U. Meier, J. Schmidhuber (2012)
Multi-column deep neural networks for image classification2012 IEEE Conference on Computer Vision and Pattern Recognition
D. Lunga, S. Prasad, M. Crawford, O. Ersoy (2014)
Manifold-Learning-Based Feature Extraction for Classification of Hyperspectral Data: A Review of Advances in Manifold LearningIEEE Signal Processing Magazine, 31
Yann LeCun, Yoshua Bengio, Geoffrey Hinton (2015)
Deep LearningNature, 521
Dan Zhang, Jingdong Wang, Fei Wang, Changshui Zhang (2008)
Semi-Supervised Classification with Universum
Adebayo Kolawole John, Luigi Di Caro, Guido Boella (2016)
ImageNet classification with deep convolutional neural networksProceedings of the 12th International Conference on Semantic Systems.
A. Ghosh (2006)
On optimum choice of k in nearest neighbor classification
Alex Graves, J. Schmidhuber (2005)
Framewise phoneme classification with bidirectional LSTM and other neural network architecturesNeural networks : the official journal of the International Neural Network Society, 18 5-6
A. Ghosh (2006)
On optimum choice of kComput. Stat. Data Anal., 50
S. Hochreiter, J. Schmidhuber (1997)
Long Short-Term MemoryNeural Computation, 9
LeeChangki, LeeGary Geunbae (2006)
Information gain and divergence-based feature selection for machine learning-based text categorizationInformation Processing and Management
R. Govil (2000)
Neural Networks in Signal Processing
L. Maaten, Geoffrey Hinton (2008)
Visualizing Data using t-SNEJournal of Machine Learning Research, 9
B. Kosko (1992)
Neural networks for signal processing
(2015)
The UCR Time Series Classification Archive. Retrieved from www.cs.ucr.edu/∼eamonn/time_series_data
Huan Liu, H. Motoda (1998)
Feature Extraction, Construction and Selection
Yue Wu, S. Hoi, Tao Mei, Nenghai Yu (2014)
Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse DataACM Transactions on Knowledge Discovery from Data (TKDD), 11
V. Rokhlin, Arthur Szlam, M. Tygert (2008)
A Randomized Algorithm for Principal Component AnalysisSIAM J. Matrix Anal. Appl., 31
Catherine Blake (1998)
UCI Repository of machine learning databases
Alex Krizhevsky, Ilya Sutskever, Geoffry Hinton (2012)
ImageNet classification with deep convolutional neural networksProceedings of the 26th Annual Conference on Neural Information Processing Systems
Huimei Han, Xingquan Zhu, Ying Li (2018)
EDLT: Enabling Deep Learning for Generic Data Classification2018 IEEE International Conference on Data Mining (ICDM)
Adam Pauls, D. Klein (2011)
Faster and Smaller N-Gram Language Models
Yann LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel (1989)
Handwritten Digit Recognition with a Back-Propagation Network
Isabelle Guyon, A. Elisseeff (2003)
An Introduction to Variable and Feature SelectionJ. Mach. Learn. Res., 3
L. J. P. van der Maaten, G. E. Hinton (2008)
Visualizing High-dimensional data using t-SNEJournal of Machine Learning Research, 9
Gareth James, D. Witten, T. Hastie, R. Tibshirani (2013)
An Introduction to Statistical LearningSpringer Texts in Statistics
K. Hornik (1991)
Approximation capabilities of multilayer feedforward networksNeural Networks, 4
Huan Liu, Hiroshi Motoda (1998)
Feature Extraction, Construction and Selection: A Data Mining PerspectiveKluwer Academic Publishers.
M. Bermingham, R. Pong-Wong, A. Spiliopoulou, C. Hayward, I. Rudan, H. Campbell, A. Wright, James Wilson, F. Agakov, P. Navarro, C. Haley (2015)
Application of high-dimensional feature selection: evaluation for genomic prediction in manScientific Reports, 5
M. F. A. Hady, F. Schwenker (2013)
Semi-supervised Learning,in Handbook on Neural Information ProcessingSpringer
Diederik Kingma, Jimmy Ba (2014)
Adam: A Method for Stochastic OptimizationCoRR, abs/1412.6980
Yoshua Bengio, P. Simard, P. Frasconi (1994)
Learning long-term dependencies with gradient descent is difficultIEEE transactions on neural networks, 5 2
Jan Hauke, T. Kossowski (2011)
Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data, 30
Yequan Wang, Minlie Huang, Xiaoyan Zhu, Li Zhao (2016)
Attention-based LSTM for Aspect-level Sentiment Classification
Felix Gers, N. Schraudolph, J. Schmidhuber (2003)
Learning Precise Timing with LSTM Recurrent NetworksJ. Mach. Learn. Res., 3
Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang (2017)
Network Representation Learning: A SurveyIEEE Transactions on Big Data, 6
Corinna Cortes, V. Vapnik (1995)
Support-Vector NetworksMachine Learning, 20
P. Langley (1994)
Selection of Relevant Features in Machine Learning
Huimei Han, Ying Li, Xingquan Zhu (2019)
Convolutional neural network learning for generic data classificationInf. Sci., 477
Fabian Pedregosa, G. Varoquaux, Alexandre Gramfort, V. Michel, B. Thirion, O. Grisel, Mathieu Blondel, Gilles Louppe, P. Prettenhofer, Ron Weiss, Ron Weiss, J. Vanderplas, Alexandre Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay (2011)
Scikit-learn: Machine Learning in PythonArXiv, abs/1201.0490
Radford Neal (2006)
Pattern Recognition and Machine LearningPattern Recognition and Machine Learning
Avrim Blum, P. Langley (1997)
Selection of Relevant Features and Examples in Machine LearningArtif. Intell., 97
Ron Kohavi, George John (1997)
Wrappers for Feature Subset SelectionArtif. Intell., 97
A. Graves, A. R. Mohamed, G. Hinton (2013)
Speech recognition with deep recurrent neural networksProceedings of the IEEE International Conference on Acoustics
Mohamed Hady, F. Schwenker (2013)
Semi-supervised Learning
Tianqi Chen, Carlos Guestrin (2016)
XGBoost: A Scalable Tree Boosting SystemProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Changki Lee, G. Lee (2006)
Information gain and divergence-based feature selection for machine learning-based text categorizationInf. Process. Manag., 42
J. Schmidhuber (2014)
Deep learning in neural networks: An overviewNeural networks : the official journal of the International Neural Network Society, 61
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista (2015)
The UCR Time Series Classification ArchiveRetrieved from www.cs.ucr.edu/∼eamonn/time_series_data/.
D. Newman, S. Hettich, C. Blake, C. Merz (1998)
UCI repository of machine learning databases, IrvineUniversity of California, Department of Information and Computer Science, CA. Retrieved from http://www.ics.uci.edu/∼mlearn/MLRepository.html.
C. Bishop (1995)
Neural networks for pattern recognition
Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton (2013)
Speech recognition with deep recurrent neural networks2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Xiaojun Chang, F. Nie, Yi Yang, Heng Huang (2014)
Convex Sparse PCA for Unsupervised Feature LearningACM Transactions on Knowledge Discovery from Data (TKDD), 11
A. Krizhevsky, Ilya Sutskever, Geoffrey Hinton (2012)
ImageNet classification with deep convolutional neural networksCommunications of the ACM, 60
Man Wu, Shirui Pan, Xingquan Zhu, Chuan Zhou, Lei Pan (2019)
Domain-Adversarial Graph Neural Networks for Text Classification2019 IEEE International Conference on Data Mining (ICDM)
Long Short-Term Memory (LSTM) network, a popular deep-learning model, is particularly useful for data with temporal correlation, such as texts, sequences, or time series data, thanks to its well-sought after recurrent network structures designed to capture temporal correlation. In this article, we propose to generalize LSTM to generic machine-learning tasks where data used for training do not have explicit temporal or sequential correlation. Our theme is to explore feature correlation in the original data and convert each instance into a synthetic sentence format by using a two-gram probabilistic language model. More specifically, for each instance represented in the original feature space, our conversion first seeks to horizontally align original features into a sequentially correlated feature vector, resembling to the letter coherence within a word. In addition, a vertical alignment is also carried out to create multiple time points and simulate word sequential order in a sentence (i.e., word correlation). The two dimensional horizontal-and-vertical alignments not only ensure feature correlations are maximally utilized, but also preserve the original feature values in the new representation. As a result, LSTM model can be utilized to achieve good classification accuracy, even if the underlying data do not have temporal or sequential dependency. Experiments on 20 generic datasets show that applying LSTM to generic data can improve the classification accuracy, compared to conventional machine-learning methods. This research opens a new opportunity for LSTM deep learning to be broadly applied to generic machine-learning tasks.
ACM Transactions on Knowledge Discovery from Data (TKDD) – Association for Computing Machinery
Published: Feb 10, 2020
Keywords: Deep learning
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.