Improving bilingual word embeddings mapping with monolingual context information

Shaolin Zhu; Chenggang Mi; Tianqi Li; Fuhua Zhang; Zhifeng Zhang; Yu Sun

doi:10.1007/s10590-021-09274-0

Loading next page...

References (24)

F. Khan, Usman Qamar, Saba Bashir (2016)
SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection
Appl. Soft Comput., 39
Chao Xing, Dong Wang, Chao Liu, Yiye Lin (2015)
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation
Meng Zhang, Haoruo Peng, Yang Liu, Huanbo Luan, Maosong Sun (2017)
Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision
Hailong Cao, T. Zhao, Shu Zhang, Yao Meng (2016)
A Distribution-based Model to Learn Bilingual Word Embeddings
Antonio Barone (2016)
Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Herv'e J'egou (2017)
Word Translation Without Parallel Data
ArXiv, abs/1710.04087
(2013)
Le QV, Sutskever I (2013b) xploiting similarities among languages for machine translatio
S. Strassel, Jennifer Tracey (2016)
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
Anders Søgaard, Sebastian Ruder, Ivan Vulic (2018)
On the Limitations of Unsupervised Bilingual Dictionary Induction
Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun (2014)
Face Alignment at 3000 FPS via Regressing Local Binary Features
2014 IEEE Conference on Computer Vision and Pattern Recognition
Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha, Jie Hao (2016)
Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation
I. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014)
Generative adversarial networks
Communications of the ACM, 63
Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun (2017)
Adversarial Training for Unsupervised Bilingual Lexicon Induction
Barun Patra, Joel Moniz, Sarthak Garg, Matthew Gormley, Graham Neubig (2019)
Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces
Moustapha Cissé, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier (2017)
Parseval Networks: Improving Robustness to Adversarial Examples
ArXiv, abs/1704.08847
S. Jagadeesha, S. Sinha, D. Mehra (1994)
A recursive modified Gram-Schmidt algorithm based adaptive beamformer
Signal Process., 39
H. Chinaei, M. Dreyer, J. Gillenwater, S. Hassan, T. Kwiatkowski, M. Lehr, A. Levenberg, Ziheng Lin, P. Mannem, E. Morley, Nathan Schneider, S. Bergsma, D. Bernhard, F Yang (2011)
Association for Computational Linguistics: Human Language Technologies
Mikel Artetxe, Gorka Labaka, Eneko Agirre (2018)
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
ArXiv, abs/1805.06297
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov (2016)
Enriching Word Vectors with Subword Information
Transactions of the Association for Computational Linguistics, 5
Manaal Faruqui, Chris Dyer (2014)
Improving Vector Space Word Representations Using Multilingual Correlation
Ivan Vulic, A. Korhonen (2016)
On the Role of Seed Lexicons in Learning Bilingual Word Embeddings
, 1
Samuel Smith, David Turban, Steven Hamblin, Nils Hammerla (2017)
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
ArXiv, abs/1702.03859
Mikel Artetxe, Gorka Labaka, Eneko Agirre (2016)
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah Smith (2016)
Massively Multilingual Word Embeddings
ArXiv, abs/1602.01925

Publisher: Springer Journals
Copyright: Copyright © The Author(s), under exclusive licence to Springer Nature B.V. 2021
ISSN: 0922-6567
eISSN: 1573-0573
DOI: 10.1007/s10590-021-09274-0
Publisher site: See Article on Publisher Site

Abstract

Bilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train BWEs are based on bilingual supervision. However, bilingual resources are not available for many low-resource language pairs. Although some studies addressed this issue with unsupervised methods, monolingual contextual data are not used to improve the performance of low-resource BWEs. To address these issues, we propose an unsupervised method to improve BWEs using optimized monolingual context information without any parallel corpora. In particular, we first build a bilingual word embeddings mapping model between two languages by aligning monolingual word embedding spaces based on unsupervised adversarial training. To further improve the performance of these mappings, we use monolingual context information to optimize them during the course. Experimental results show that our method outperforms other baseline systems significantly, including results for four low-resource language pairs.

Journal

Machine Translation – Springer Journals

Published: Dec 1, 2021

Keywords: Bilingual word embeddings; Low-resource; Unsupervised emthod

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Improving bilingual word embeddings mapping with monolingual context information

Improving bilingual word embeddings mapping with monolingual context information

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Improving bilingual word embeddings mapping with monolingual context information

Improving bilingual word embeddings mapping with monolingual context information

References (24)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies