Access the full text.
Sign up today, get DeepDyve free for 14 days.
N. Wacholder (2003)
Spotting and Discovering Terms Through Natural Language ProcessingInformation Retrieval, 6
Pascale Fung (1998)
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
(2009)
Received January
Tony McEnery, R. Xiao (2007)
Chapter 2. Parallel and Comparable Corpora: What is Happening?
I. Melamed (1997)
A Word-to-Word Model of Translational EquivalenceArXiv, cmp-lg/9706026
G. Grefenstette (1999)
The World Wide Web as a Resource for Example-Based Machine Translation Tasks
K. Kageura, B. Daille, H. Nakagawa, Lee-Feng Chien (2004)
Introduction: Recent trends in com putational terminology, 10
Agata Savary, C. Jacquemin (2000)
Reducing Information Variation in Text
B. Daille (1994)
Study and Implementation of Combined Techniques for Automatic Extraction of Terminology
B. Daille (2003)
Conceptual Structuring through Term Variations
Kenneth Church, R. Mercer (1993)
Introduction to the Special Issue on Computational Linguistics Using Large CorporaComput. Linguistics, 19
(2000)
L’Homme, Eds. 64–70
Hervé Déjean, Éric Gaussier, F. Sadat (2002)
An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction
Timothy Baldwin, Takaaki Tanaka (2004)
Translation by Machine of Complex Nominals: Getting it Right
G. Salton, M. Lesk (1968)
Computer Evaluation of Indexing and Text ProcessingJ. ACM, 15
Hervé Déjean, É. Gaussier (2007)
Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables
Farid Cerbah (2000)
Exogeneous and Endogeneous Approaches to Semantic Categorization of Unknown Technical Terms
(1999)
Japanese morphological analysis system Chasen 2.0 users manual
C. Jacquemin (1997)
Spotting and Discovering Terms through Natural Language Processing
B. Daille (2002)
Terminology Mining
(1998)
Nested collocation and compound noun for term recognition
G. Salton, C. Buckley (1988)
Term-Weighting Approaches in Automatic Text RetrievalInf. Process. Manag., 24
M. Teresa, Cabré Castellví, Rosa Bagot, Jordi Palatresi (2001)
Automatic term detection: A review of current systems
G. Grefenstette (1994)
Explorations in automatic thesaurus discovery
R. Rapp (1999)
Automatic Identification of Word Translations from Unrelated English and German Corpora
Lynne Bowker, Jennifer Pearson (2002)
Working with Specialized Language: A Practical Guide to Using Corpora
A. McEnery, R. Xiao (2007)
Parallel and comparable corpora: What are they up to?
D. Tufis (2002)
Empirical Methods for Exploiting Parallel TextsLit. Linguistic Comput., 17
I. Melamed, Mitchell Marcus (2001)
Empirical Methods for Exploiting Parallel Texts
B. Daille, E. Morin (2005)
French-English Terminology Extraction from Comparable Corpora
Takenobu Tokunaga (1997)
Book Reviews: The Balancing Act: Combining Symbolic and Statistical Approaches to Language
Koichi Takeuchi, K. Kageura, B. Daille, Laurent Romary (2004)
Construction of Grammar Based Term Extraction Model for Japanese
R. Rapp (1995)
Identifying Word Translations in Non-Parallel Texts
Yun-Chuang Chiao, Pierre Zweigenbaum (2002)
Looking for Candidate Translational Equivalents in Specialized, Comparable CorporaProceedings of the 19th international conference on Computational linguistics -
(2010)
Article 1, Publication date
Pascale Fung (1997)
Finding Terminology Translations from Non-parallel CorporaJournal of Visual Languages and Computing
Jussi Karlgren, D. Cutting (1994)
Recognizing Text Genres With Simple Metrics Using Discriminant AnalysisArXiv, abs/cmp-lg/9410008
Eric Brill (1994)
Some Advances in Transformation-Based Part of Speech TaggingArXiv, abs/cmp-lg/9406010
P. Brown, S. Pietra, V. Pietra, R. Mercer (1993)
The Mathematics of Statistical Machine Translation: Parameter EstimationComput. Linguistics, 19
F. Sadat, Masatoshi Yoshikawa, Shunsuke Uemura (2003)
Learning Bilingual Translations from Comparable Corpora to Cross-Language Information Retrieval: Hybrid Statistics-based and Linguistics-based Approach
(1989)
4th Ed
G. Barnard, R. Fano (1961)
Transmission of Information: A Statistical Theory of Communications., 125
D. Biber (1995)
Dimensions of Register Variation: A Cross-Linguistic Comparison
Xavier Robitaille, Yasuhiro Sasaki, Masatsugu Tonoike, Satoshi Sato, T. Utsuro (2006)
Compiling French-Japanese Terminologies from the Web
Thomas Beauvisage (2001)
Morphosyntaxe et genres textuels : Exploiter des données morphosyntaxiques pour l'étude statistique des genres textuels : application au roman policier, 42
Eugenio Picchi, C. Peters (1998)
Cross-Language Information Retrieval: A System for Comparable Corpus Querying
T. Dunning (1993)
Accurate Methods for the Statistics of Surprise and CoincidenceComput. Linguistics, 19
Christopher Manning, Hinrich Schütze (1999)
Book Reviews: Foundations of Statistical Natural Language Processing
Fiammetta Namer (2000)
FLEMM : Un analyseur flexionnel du français à base de règles, 41
Mona Diab, S. Finch (2000)
A statistical word-level translation model for comparable corpora
(1989)
Gakujutu Yogo Goki-Hyo. National Language Research Institute
Yunbo Cao, Hang Li (2002)
Base Noun Phrase Translation Using Web Data and the EM Algorithm
D. Biber (1993)
Representativeness in corpus designLiterary and Linguistic Computing, 8
Adeline Nazarenko, T. Hamon (2002)
Structuration de terminologie: quels outils pour quelles pratiques ?, 43
Current research in text mining favors the quantity of texts over their representativeness. But for bilingual terminology mining, and for many language pairs, large comparable corpora are not available. More importantly, as terms are defined vis-à-vis a specific domain with a restricted register, it is expected that the representativeness rather than the quantity of the corpus matters more in terminology mining. Our hypothesis, therefore, is that the representativeness of the corpus is more important than the quantity and ensures the quality of the acquired terminological resources. This article tests this hypothesis on a French-Japanese bilingual term extraction task. To demonstrate how important the type of discourse is as a characteristic of the comparable corpora, we used a state-of-the-art multilingual terminology mining chain composed of two extraction programs, one in each language, and an alignment program. We evaluated the candidate translations using a reference list, and found that taking discourse type into account resulted in candidate translations of a better quality even when the corpus size was reduced by half.
ACM Transactions on Speech and Language Processing (TSLP) – Association for Computing Machinery
Published: Aug 1, 2010
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.