Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Combining co-clustering with noise detection for theme-based summarization

Combining co-clustering with noise detection for theme-based summarization Combining Co-Clustering with Noise Detection for Theme-Based Summarization XIAOYAN CAI, Northwest Agricultural and Forestry University WENJIE LI, The Hong Kong Polytechnic University RENXIAN ZHANG, The Hong Kong Polytechnic University and Samsung Electronics Research Center To overcome the fact that the length of sentences is short and their content is limited, we regard words as independent text objects rather than features of sentences in sentence clustering and develop two coclustering frameworks, namely integrated clustering and interactive clustering, to cluster sentences and words simultaneously. Since real-world datasets always contain noise, we incorporate noise detection and removal to enhance clustering of sentences and words. Meanwhile, a semisupervised approach is explored to incorporate the query information (and the sentence information in early document sets) in themebased summarization. Thorough experimental studies are conducted. When evaluated on the DUC2005-2007 datasets and TAC 2008-2009 datasets, the performance of the two noise-detecting co-clustering approaches is comparable with that of the top three systems. The results also demonstrate that the interactive with noise detection algorithm is more effective than the noise-detecting integrated algorithm. Categories and Subject Descriptors: I.2.7 [Natural Language Processing]: Text analysis General Terms: Design, Algorithms, Performance Additional Key Words and Phrases: Document analysis, theme-based summarization, http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Speech and Language Processing (TSLP) Association for Computing Machinery

Combining co-clustering with noise detection for theme-based summarization

Loading next page...
 
/lp/association-for-computing-machinery/combining-co-clustering-with-noise-detection-for-theme-based-4lNkUqMVQf
Publisher
Association for Computing Machinery
Copyright
Copyright © 2013 by ACM Inc.
ISSN
1550-4875
DOI
10.1145/2513563
Publisher site
See Article on Publisher Site

Abstract

Combining Co-Clustering with Noise Detection for Theme-Based Summarization XIAOYAN CAI, Northwest Agricultural and Forestry University WENJIE LI, The Hong Kong Polytechnic University RENXIAN ZHANG, The Hong Kong Polytechnic University and Samsung Electronics Research Center To overcome the fact that the length of sentences is short and their content is limited, we regard words as independent text objects rather than features of sentences in sentence clustering and develop two coclustering frameworks, namely integrated clustering and interactive clustering, to cluster sentences and words simultaneously. Since real-world datasets always contain noise, we incorporate noise detection and removal to enhance clustering of sentences and words. Meanwhile, a semisupervised approach is explored to incorporate the query information (and the sentence information in early document sets) in themebased summarization. Thorough experimental studies are conducted. When evaluated on the DUC2005-2007 datasets and TAC 2008-2009 datasets, the performance of the two noise-detecting co-clustering approaches is comparable with that of the top three systems. The results also demonstrate that the interactive with noise detection algorithm is more effective than the noise-detecting integrated algorithm. Categories and Subject Descriptors: I.2.7 [Natural Language Processing]: Text analysis General Terms: Design, Algorithms, Performance Additional Key Words and Phrases: Document analysis, theme-based summarization,

Journal

ACM Transactions on Speech and Language Processing (TSLP)Association for Computing Machinery

Published: Dec 1, 2013

There are no references for this article.