Access the full text.
Sign up today, get DeepDyve free for 14 days.
C. Cieri, David Miller, Kevin Walker (2003)
From switchboard to fisher: telephone collection protocols, their uses and yields
J. Bellegarda (1998)
Exploiting both local and global constraints for multi-span statistical language modelingProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 2
A. Venkataraman, Wen Wang (2003)
Techniques for effective vocabulary selection
R. Iyer, Mari Ostendorf (1999)
Relevance weighting for combining multi-domain data for n-gram language modelingComput. Speech Lang., 13
G. Evermann, R. Chan, M. Gales, Thomas Hain, Xunying Liu, D. Mrva, Lan Wang, P. Woodland (2004)
Development of the 2003 CU-HTK conversational telephone speech transcription system2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1
A. Dempster, N. Laird, D. Rubin (1977)
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
Frank Keller, Mirella Lapata (2003)
Using the Web to Obtain Frequencies for Unseen BigramsComputational Linguistics, 29
R. Iyer, Mari Ostendorf (1997)
Transforming out-of-domain estimates to improve in-domain language models
D. Klakow (2000)
Selecting articles from the language model training corpus2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 3
(2005)
Received November
Joshua Goodman (2001)
A bit of progress in language modelingArXiv, cs.CL/0108005
J. Godfrey, E. Holliman, J. McDaniel (1992)
SWITCHBOARD: telephone speech corpus for research and development[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1
Fu-hua Liu, M. Picheny, Patibandla Srinivasa, M. Monkowski, C. Chen (1996)
Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1
M. Hwang, X. Lei, Tim Ng, Mari Ostendorf, A. Stolcke, Wen Wang, Jing Zheng, V. Gadde (2006)
Porting Decipher from English to Mandarin
Sven Martin, Jörg Liermann, H. Ney (1997)
Adaptive topic - dependent language modelling using word - based varigrams
L. Lamel, G. Adda, Eric Bilinski, J. Gauvain (2005)
Transcribing lectures and seminars
Q. Zhu, A. Stolcke, Barry Chen, N. Morgan (2005)
Using MLP features in SRI's conversational speech recognition system
A. Stolcke (2000)
Entropy-based Pruning of Backoff Language ModelsArXiv, cs.CL/0006025
Kevin Duh, K. Kirchhoff (2005)
POS Tagging of Dialectal Arabic: A Minimally Supervised Approach
K. Ries (1997)
A class based approach to domain adaptation and constraint integration for empirical m-gram models
(2004)
Development of the 2004 CU-HTK English CTS system using more than 2000 hours of data
Yiming Yang, Jan Pedersen (1997)
A Comparative Study on Feature Selection in Text Categorization
M. Hwang, Xuedong Huang, F. Alleva (1993)
Predicting unseen triphones with senones1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2
M. Mahajan, Doug Beeferman, Xuedong Huang (1999)
Improved topic-dependent language modeling using information retrieval techniques1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1
Mari Ostendorf, Constantinos Boulis (2005)
Topic learning in text and conversational speech
A. Ratnaparkhi (1996)
A Maximum Entropy Model for Part-Of-Speech Tagging
I. Bulyko, Mari Ostendorf, A. Stolcke (2003)
Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures
P. Clarkson, A. Robinson (1997)
Language model adaptation using mixtures and an exponentially decaying cache1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2
A. Kilgarriff, G. Grefenstette (2003)
Introduction to the Special Issue on the Web as CorpusComputational Linguistics, 29
A. Stolcke (2002)
SRILM - an extensible language modeling toolkit
Yong-Bae Lee, Sung-Hyon Myaeng (2002)
Text genre classification with genre-revealing and subject-revealing features
(1996)
Modeling long range dependencies in languages
S. Schwarm, I. Bulyko, Mari Ostendorf (2004)
Adaptive language modeling with varied sources to cover new vocabulary itemsIEEE Transactions on Speech and Audio Processing, 12
Kristina Toutanvoa, Christopher Manning (2000)
Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger
Thomas Hain, L. Burget, J. Dines, I. McCowan, G. Garau, M. Karafiát, Mike Lincoln, Darren Moore, V. Wan, R. Ordelman, S. Renals (2005)
The AMI System for the Transcription of Speech in Meetings2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 4
Michele Banko, Eric Brill (2001)
Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing
P. Woodland, S. Young (1993)
The HTK tied-state continuous speech recogniser
A. Stolcke, Xavier Miró, K. Boakye, Ö. Çetin, F. Grézl, Adam Janin, Arindam Mandal, B. Peskin, Chuck Wooters, Jing Zheng (2005)
Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System
N. Morgan, D. Baron, S. Bhagat, Hannah Carvey, R. Dhillon, Jane Edwards, David Gelbart, Adam Janin, A. Krupski, B. Peskin, T. Pfau, Elizabeth Shriberg, A. Stolcke, Chuck Wooters (2003)
Meetings about meetings: research at ICSI on speech in multiparty conversations2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 4
(2003)
Speech-to-text research at SRI-ICSI-UW. NIST RT-03 Workshop
Tim Ng, Mari Ostendorf, M. Hwang, M. Siu, I. Bulyko, X. Lei (2005)
Web-data augmented language models for Mandarin conversational speech recognitionProceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 1
R. Iyer, Mari Ostendorf, M. Meteer (1997)
Analyzing and predicting language model improvements1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings
R. Sarikaya, A. Gravano, Yuqing Gao (2005)
Rapid language model development using external resources for new spoken dialog domainsProceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 1
S. Besling, H. Meier (1995)
Language model speaker adaptation
Stanley Chen, Joshua Goodman (1996)
An Empirical Study of Smoothing Techniques for Language ModelingArXiv, cmp-lg/9606011
D. Biber (1993)
Using Register-Diversified Corpora for General Language StudiesComput. Linguistics, 19
(2007)
Article 1, Publication date
A. Berger, Robert Miller (1998)
Just-in-time language modellingProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 2
Murat Akbacak, Yuqing Gao, L. Gu, H. Kuo (2005)
Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources
Yuqing Gao, L. Gu, H. Kuo (2005)
Portability challenges in developing interactive dialogue systemsProceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 5
Wen Wang, A. Stolcke, M. Harper (2004)
The use of a linguistically motivated language model in conversational speech recognition2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1
D. Biber (1988)
Variation across speech and writing: Methodology
P. Scheytt, P. Geutner, A. Waibel (1998)
Serbo-Croatian LVCSR on the dictation and broadcast news domainProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 2
Mirella Lapata, Frank Keller (2005)
Web-based models for natural language processingACM Trans. Speech Lang. Process., 2
R. Sproat, A. Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, Christopher Richards (2001)
Normalization of non-standard wordsComput. Speech Lang., 15
Alexander Rudnicky (1995)
Language Modeling with Limited Domain Data
A. Sethy, P. Georgiou, Shrikanth Narayanan (2005)
Building topic specific language models from webdata using competitive models
Ö. Çetin, A. Stolcke (2005)
Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System
D. Gildea (2001)
Corpus Variation and Parser Performance
P. Xu, L. Mangu (2005)
Using random forest language models in the IBM RT-04 CTS system
Xiaojin Zhu, R. Rosenfeld (2001)
Improving trigram language modeling with the World Wide Web2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 1
This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.
ACM Transactions on Speech and Language Processing (TSLP) – Association for Computing Machinery
Published: Dec 1, 2007
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.