Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Web resources for language modeling in conversational speech recognition

Web resources for language modeling in conversational speech recognition This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Speech and Language Processing (TSLP) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/web-resources-for-language-modeling-in-conversational-speech-vl5Wf3yaZt

References (61)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2007 by ACM Inc.
ISSN
1550-4875
DOI
10.1145/1322391.1322392
Publisher site
See Article on Publisher Site

Abstract

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

Journal

ACM Transactions on Speech and Language Processing (TSLP)Association for Computing Machinery

Published: Dec 1, 2007

There are no references for this article.