Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Improving bilingual word embeddings mapping with monolingual context information

Improving bilingual word embeddings mapping with monolingual context information Bilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train BWEs are based on bilingual supervision. However, bilingual resources are not available for many low-resource language pairs. Although some studies addressed this issue with unsupervised methods, monolingual contextual data are not used to improve the performance of low-resource BWEs. To address these issues, we propose an unsupervised method to improve BWEs using optimized monolingual context information without any parallel corpora. In particular, we first build a bilingual word embeddings mapping model between two languages by aligning monolingual word embedding spaces based on unsupervised adversarial training. To further improve the performance of these mappings, we use monolingual context information to optimize them during the course. Experimental results show that our method outperforms other baseline systems significantly, including results for four low-resource language pairs. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Machine Translation Springer Journals

Improving bilingual word embeddings mapping with monolingual context information

Loading next page...
 
/lp/springer-journals/improving-bilingual-word-embeddings-mapping-with-monolingual-context-09a5wKF0iX

References (24)

Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer Nature B.V. 2021
ISSN
0922-6567
eISSN
1573-0573
DOI
10.1007/s10590-021-09274-0
Publisher site
See Article on Publisher Site

Abstract

Bilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train BWEs are based on bilingual supervision. However, bilingual resources are not available for many low-resource language pairs. Although some studies addressed this issue with unsupervised methods, monolingual contextual data are not used to improve the performance of low-resource BWEs. To address these issues, we propose an unsupervised method to improve BWEs using optimized monolingual context information without any parallel corpora. In particular, we first build a bilingual word embeddings mapping model between two languages by aligning monolingual word embedding spaces based on unsupervised adversarial training. To further improve the performance of these mappings, we use monolingual context information to optimize them during the course. Experimental results show that our method outperforms other baseline systems significantly, including results for four low-resource language pairs.

Journal

Machine TranslationSpringer Journals

Published: Dec 1, 2021

Keywords: Bilingual word embeddings; Low-resource; Unsupervised emthod

There are no references for this article.