Access the full text.
Sign up today, get DeepDyve free for 14 days.
F. Gallwitz, E. Nöth, H. Niemann (1996)
A category based approach for recognition of out-of-vocabulary wordsProceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1
Mathias Creutz, K. Lagus (2005)
Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0
P. Ircing, Pavel Krbec, Jan Hajic, J. Psutka, S. Khudanpur, F. Jelinek, W. Byrne (2001)
On large vocabulary continuous speech recognition of highly inflectional language - czech
(2007)
Article 3, Publication date: December
C. Marcken (1996)
Unsupervised language acquisitionArXiv, cmp-lg/9611002
Z. Harris (1955)
From Phoneme to MorphemeLanguage, 31
R. Ordelman, A. Hessen, F. Jong (2003)
Compound decomposition in dutch large vocabulary speech recognition
(2002)
DCD library, Speech recognition decoder library. AT&T Labs Research
E. Whittaker, P. Woodland (2000)
Particle-based language modelling
J. Bilmes, K. Kirchhoff (2003)
Factored Language Models and Generalized Parallel Backoff
P. Geutner, M. Finke, P. Scheytt (1998)
Adaptive vocabularies for transcribing multilingual broadcast newsProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 2
X. Mou, V. Zue (2001)
Sub-lexical modelling using a finite state transducer framework2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 1
Mathias Creutz (2003)
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
E. Arisoy, M. Saraçlar (2006)
Lattice extension and rescoring based approaches for LVCSR of Turkish
(2002)
The HTK Book (for version 3.2 of HTK)
Issam Bazzi, James Glass (2000)
Modeling out-of-vocabulary words for robust speech recognition
A. Stolcke (2000)
Entropy-based Pruning of Backoff Language ModelsArXiv, cs.CL/0006025
Issam Bazzi, James Glass (2002)
Modelling out-of-vocabulary words for robust speech recognition
Vesa Siivola, B. Pellom (2005)
Growing an n-gram language model
Issam Bazzi, James Glass (2002)
A multi-class approach for modelling out-of-vocabulary words
Jan Kneissler, D. Klakow (2001)
Speech recognition for huge vocabularies by using optimized sub-word units
Lucian Galescu (2003)
Recognition of out-of-vocabulary words with sub-lexical language models
Y Received October 2006; revised March 2007; accepted June 2007
D. Klakow, G. Rose, X. Aubert (1999)
OOV-detection in large vocabulary system using automatically defined word-fragments as fillers
M. Bisani, H. Ney (2005)
Open vocabulary speech recognition with flat hybrid models
Mathias Creutz, K. Lagus (2002)
Unsupervised Discovery of Morphemes
M. Kurimo, Antti Puurula, E. Arisoy, Vesa Siivola, Teemu Hirsimäki, Janne Pylkkönen, Tanel Alumäe, M. Saraçlar (2006)
Unlimited vocabulary speech recognition for agglutinative languages
J. Goldsmith (2001)
Unsupervised Learning of the Morphology of a Natural LanguageComputational Linguistics, 27
K. Kirchhoff, D. Vergyri, J. Bilmes, Kevin Duh, A. Stolcke (2006)
Morphology-based language modeling for conversational Arabic speech recognitionComput. Speech Lang., 20
Mathias Creutz (2006)
Induction of the morphology of natural language : unsupervised morpheme segmentation with application to automatic speech recognition
ACM Transactions on Speech and Language Processing
A. Stolcke (2002)
SRILM - an extensible language modeling toolkit
Patrick Schone, Dan Jurafsky (2000)
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis
Patrick Schone, Dan Jurafsky (2001)
Knowledge-Free Induction of Inflectional Morphologies
M. Larson, D. Willett, J. Köhler, G. Rigoll (2000)
Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches
O. Kwon, Jun Park (2003)
Korean large vocabulary continuous speech recognition with morpheme-based recognition unitsSpeech Commun., 39
Mathias Creutz, K. Lagus (2004)
Induction of a Simple Morphology for Highly-Inflecting LanguagesProceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology on - SIGMorPhon '04
S. Goldwater, T. Griffiths, Mark Johnson (2006)
Contextual Dependencies in Unsupervised Word Segmentation
E. Arisoy, H. Dutagaci, L. Arslan (2006)
A unified language model for large vocabulary continuous speech recognition of TurkishSignal Process., 86
I. Shafran, Keith Hall (2006)
Corrective Models for Speech Recognition of Inflected Languages
J. Rissanen (1989)
Stochastic Complexity in Statistical Inquiry, 15
M. Brent (1999)
An Efficient, Probabilistically Sound Algorithm for Segmentation and Word DiscoveryMachine Learning, 34
D. Vergyri, K. Kirchhoff, Kevin Duh, A. Stolcke (2004)
Morphology-based language modeling for arabic speech recognition
Stanley Chen, Joshua Goodman (1996)
An Empirical Study of Smoothing Techniques for Language ModelingArXiv, cmp-lg/9606011
Morph-Based Speech Recognition and Modeling of OOV's Across Languages
Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, M. Kurimo, Sami Virpioja, Janne Pylkkönen (2006)
Unlimited vocabulary speech recognition with morph language models applied to FinnishComput. Speech Lang., 20
M. Kurimo, Mathias Creutz, Matti Varjokallio, E. Arisoy, M. Saraçlar (2006)
Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition
Issam Bazzi, James Glass (2001)
Learning units for domain-independent out-of- vocabulary word modelling
(2006)
Received October
A. Berton, Pablo Fetter, Peter Regel-Brietzmann (1996)
Compound words in large-vocabulary German speech recognition systemsProceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 2
M. Kurimo, Mathias Creutz, Matti Varjokallio, E. Arisoy, M. Saraçlar (2006)
Unsupervised segmentation of words into morphemes - Challenge 2005, An Introduction and Evaluation Report
K. Hacioglu, B. Pellom, T. Çiloglu, Özlem Öztürk, M. Kurimo, Mathias Creutz (2003)
On lexicon creation for turkish LVCSR
Z. Harris (1970)
Morpheme Boundaries within Words: Report on a Computer Test
Mathias Creutz, K. Lagus (2007)
Unsupervised models for morpheme segmentation and morphology learningACM Trans. Speech Lang. Process., 4
Mathias Creutz, K. Lagus (2005)
INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT
Janne Pylkkönen
AN EFFICIENT ONE-PASS DECODER FOR FINNISH LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Reinhard Kneser, H. Ney (1995)
Improved backing-off for M-gram language modeling1995 International Conference on Acoustics, Speech, and Signal Processing, 1
We explore the use of morph-based language models in large-vocabulary continuous-speech recognition systems across four so-called morphologically rich languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n -gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity. Standard word models suffer from high out-of-vocabulary (OOV) rates, whereas the morph models can recognize previously unseen word forms by concatenating morphs. It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words. The Arabic experiment constitutes the only exception since here the standard word model outperforms the morph model. Differences in the datasets and the amount of data are discussed as a plausible explanation.
ACM Transactions on Speech and Language Processing (TSLP) – Association for Computing Machinery
Published: Dec 1, 2007
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.