Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An information-theoretic measure to evaluate parsing difficulty across treebanks

An information-theoretic measure to evaluate parsing difficulty across treebanks An Information-Theoretic Measure to Evaluate Parsing Difficulty Across Treebanks ` ANNA CORAZZA, Universita di Napoli "Federico II" ALBERTO LAVELLI, FBK-irst ` GIORGIO SATTA, Universita di Padova With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing General Terms: Experimentation, Performance Additional Key Words and Phrases: Natural language parsing, probabilistic context-free grammars ACM Reference Format: Corazza, A., Lavelli, A., and Satta, G. 2013. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Speech and Language Processing (TSLP) Association for Computing Machinery

An information-theoretic measure to evaluate parsing difficulty across treebanks

Loading next page...
 
/lp/association-for-computing-machinery/an-information-theoretic-measure-to-evaluate-parsing-difficulty-across-3VCghfxMO8

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2013 by ACM Inc.
ISSN
1550-4875
DOI
10.1145/2407736.2407737
Publisher site
See Article on Publisher Site

Abstract

An Information-Theoretic Measure to Evaluate Parsing Difficulty Across Treebanks ` ANNA CORAZZA, Universita di Napoli "Federico II" ALBERTO LAVELLI, FBK-irst ` GIORGIO SATTA, Universita di Padova With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing General Terms: Experimentation, Performance Additional Key Words and Phrases: Natural language parsing, probabilistic context-free grammars ACM Reference Format: Corazza, A., Lavelli, A., and Satta, G. 2013.

Journal

ACM Transactions on Speech and Language Processing (TSLP)Association for Computing Machinery

Published: Jan 1, 2013

There are no references for this article.