Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Document classification using deep neural network with different word embedding techniques

Document classification using deep neural network with different word embedding techniques Document classification has played a major role in many fields like information retrieval, data mining, etc. where machine learning and deep learning models can be applied. But, before applying any model for classification, textual data must be converted into a numerical measure, where word embedding can help. The selection of appropriate word embedding techniques plays a vital role in classification. So, we analysed the classification performance by widely used deep learning models long short-term memory (LSTM) and convolution neural network (CNN) with various word embedding techniques on five benchmark datasets. The pre-processed dataset is converted into vector representation using a word embedding techniques TF-IDF, Word2Vec, and Doc2Vec. The output is given to the LSTM and CNN classifier and documents are classified as per their context. The CNN classifier with Doc2Vec word embedding technique achieves almost 12% more accuracy as compared to other word embedding techniques on all the datasets. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Engineering and Technology Inderscience Publishers

Document classification using deep neural network with different word embedding techniques

Loading next page...
 
/lp/inderscience-publishers/document-classification-using-deep-neural-network-with-different-word-7QN1JV6lNm
Publisher
Inderscience Publishers
Copyright
Copyright © Inderscience Enterprises Ltd
ISSN
1476-1289
eISSN
1741-9212
DOI
10.1504/ijwet.2022.125654
Publisher site
See Article on Publisher Site

Abstract

Document classification has played a major role in many fields like information retrieval, data mining, etc. where machine learning and deep learning models can be applied. But, before applying any model for classification, textual data must be converted into a numerical measure, where word embedding can help. The selection of appropriate word embedding techniques plays a vital role in classification. So, we analysed the classification performance by widely used deep learning models long short-term memory (LSTM) and convolution neural network (CNN) with various word embedding techniques on five benchmark datasets. The pre-processed dataset is converted into vector representation using a word embedding techniques TF-IDF, Word2Vec, and Doc2Vec. The output is given to the LSTM and CNN classifier and documents are classified as per their context. The CNN classifier with Doc2Vec word embedding technique achieves almost 12% more accuracy as compared to other word embedding techniques on all the datasets.

Journal

International Journal of Web Engineering and TechnologyInderscience Publishers

Published: Jan 1, 2022

There are no references for this article.