An information-theoretic measure to evaluate parsing difficulty across treebanks

Anna Corazza; Alberto Lavelli; Giorgio Satta

doi:10.1145/2407736.2407737

Corazza, Anna; Lavelli, Alberto; Satta, Giorgio

2013-01-01 00:00:00

An Information-Theoretic Measure to Evaluate Parsing Difficulty Across Treebanks ` ANNA CORAZZA, Universita di Napoli "Federico II" ALBERTO LAVELLI, FBK-irst ` GIORGIO SATTA, Universita di Padova With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing General Terms: Experimentation, Performance Additional Key Words and Phrases: Natural language parsing, probabilistic context-free grammars ACM Reference Format: Corazza, A., Lavelli, A., and Satta, G. 2013.

http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

ACM Transactions on Speech and Language Processing (TSLP) Association for Computing Machinery

http://www.deepdyve.com/lp/association-for-computing-machinery/an-information-theoretic-measure-to-evaluate-parsing-difficulty-across-3VCghfxMO8

An information-theoretic measure to evaluate parsing difficulty across treebanks

Loading next page...

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher: Association for Computing Machinery
ISSN: 1550-4875
DOI: 10.1145/2407736.2407737
Publisher site: See Article on Publisher Site

Abstract

Journal

ACM Transactions on Speech and Language Processing (TSLP) – Association for Computing Machinery

Published: Jan 1, 2013

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An information-theoretic measure to evaluate parsing difficulty across treebanks

An information-theoretic measure to evaluate parsing difficulty across treebanks

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An information-theoretic measure to evaluate parsing difficulty across treebanks

An information-theoretic measure to evaluate parsing difficulty across treebanks

References

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies