Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Persian sentences to phoneme sequences conversion based on recurrent neural networks

Persian sentences to phoneme sequences conversion based on recurrent neural networks AbstractGrapheme to phoneme conversion is one of themain subsystems of Text-to-Speech (TTS) systems. Convertingsequence of written words to their correspondingphoneme sequences for the Persian language is more challengingthan other languages; because in the standard orthographyof this language the short vowels are omittedand the pronunciation ofwords depends on their positionsin a sentence. Common approaches used in the Persiancommercial TTS systems have several modules and complicatedmodels for natural language processing and homographdisambiguation that make the implementationharder as well as reducing the overall precision of system.In this paper we define the grapheme-to-phoneme conversionas a sequential labeling problem; and use the modifiedRecurrent Neural Networks (RNN) to create a smartand integrated model for this purpose. The recurrent networksare modified to be bidirectional and equipped withLong-Short Term Memory (LSTM) blocks to acquire mostof the past and future contextual information for decisionmaking. The experiments conducted in this paper showthat in addition to having a unified structure the bidirectionalRNN-LSTM has a good performance in recognizingthe pronunciation of the Persian sentences with the precisionmore than 98 percent. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Open Computer Science de Gruyter

Persian sentences to phoneme sequences conversion based on recurrent neural networks

Loading next page...
 
/lp/de-gruyter/persian-sentences-to-phoneme-sequences-conversion-based-on-recurrent-r4dQ0ulOXo

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
de Gruyter
Copyright
© 2016 Yasser Mohseni Behbahani et al.,
eISSN
2299-1093
DOI
10.1515/comp-2016-0019
Publisher site
See Article on Publisher Site

Abstract

AbstractGrapheme to phoneme conversion is one of themain subsystems of Text-to-Speech (TTS) systems. Convertingsequence of written words to their correspondingphoneme sequences for the Persian language is more challengingthan other languages; because in the standard orthographyof this language the short vowels are omittedand the pronunciation ofwords depends on their positionsin a sentence. Common approaches used in the Persiancommercial TTS systems have several modules and complicatedmodels for natural language processing and homographdisambiguation that make the implementationharder as well as reducing the overall precision of system.In this paper we define the grapheme-to-phoneme conversionas a sequential labeling problem; and use the modifiedRecurrent Neural Networks (RNN) to create a smartand integrated model for this purpose. The recurrent networksare modified to be bidirectional and equipped withLong-Short Term Memory (LSTM) blocks to acquire mostof the past and future contextual information for decisionmaking. The experiments conducted in this paper showthat in addition to having a unified structure the bidirectionalRNN-LSTM has a good performance in recognizingthe pronunciation of the Persian sentences with the precisionmore than 98 percent.

Journal

Open Computer Sciencede Gruyter

Published: Jan 1, 2016

References