Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Evaluation of five single-word term recognition methods on a legal English corpus

Evaluation of five single-word term recognition methods on a legal English corpus <jats:p> Specialised texts are characterised by, amongst other features, the presence of terminology which conveys domain-specific concepts that are essential for the specialist who is interested in analysing such texts. Automatic Term Recognition methods (ATR) are employed to identify those terms automatically, which is especially helpful in view of the large size of corpora nowadays. However, they tend to concentrate on the identification of Multi-Word Terms (MWTs) neglecting Single-Word Terms (SWTs) to a certain extent. This might be related to the greater number of the former found in fields such as biomedicine. However, so far as legal English is concerned, testing has shown that SWTs represent 65.22 percent of the items in the specialised glossary employed for the evaluation of the ATR methods examined herein. This paper presents the evaluation of five SWT recognition methods, namely, those of Chung (2003) , Drouin (2003) , Kit and Liu (2008) , Keywords (2008), and TF-IDF (term frequency-inverse document frequency). These were tested on the United Kingdom Supreme Court Corpus (UKSCC), a legal corpus of 2.6 million words which was compiled for this purpose. The results indicate that Drouin's TermoStat software is the best performing method, achieving 73.45 percent precision on the top 2,000 candidate terms. </jats:p> http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Corpora Edinburgh University Press

Evaluation of five single-word term recognition methods on a legal English corpus

Corpora , Volume 9 (1): 83 – May 1, 2014

Loading next page...
 
/lp/edinburgh-university-press/evaluation-of-five-single-word-term-recognition-methods-on-a-legal-fBaTRCSwBu

References (39)

Publisher
Edinburgh University Press
Copyright
© Edinburgh University Press
Subject
Linguistics
ISSN
1749-5032
eISSN
1755-1676
DOI
10.3366/cor.2014.0052
Publisher site
See Article on Publisher Site

Abstract

<jats:p> Specialised texts are characterised by, amongst other features, the presence of terminology which conveys domain-specific concepts that are essential for the specialist who is interested in analysing such texts. Automatic Term Recognition methods (ATR) are employed to identify those terms automatically, which is especially helpful in view of the large size of corpora nowadays. However, they tend to concentrate on the identification of Multi-Word Terms (MWTs) neglecting Single-Word Terms (SWTs) to a certain extent. This might be related to the greater number of the former found in fields such as biomedicine. However, so far as legal English is concerned, testing has shown that SWTs represent 65.22 percent of the items in the specialised glossary employed for the evaluation of the ATR methods examined herein. This paper presents the evaluation of five SWT recognition methods, namely, those of Chung (2003) , Drouin (2003) , Kit and Liu (2008) , Keywords (2008), and TF-IDF (term frequency-inverse document frequency). These were tested on the United Kingdom Supreme Court Corpus (UKSCC), a legal corpus of 2.6 million words which was compiled for this purpose. The results indicate that Drouin's TermoStat software is the best performing method, achieving 73.45 percent precision on the top 2,000 candidate terms. </jats:p>

Journal

CorporaEdinburgh University Press

Published: May 1, 2014

There are no references for this article.