Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Representing Tone in Levenshtein Distance

Representing Tone in Levenshtein Distance <jats:p> Levenshtein distance, also known as string edit distance, has been shown to correlate strongly with both perceived distance and intelligibility in various Indo-European languages ( Gooskens and Heeringa, 2004 ; Gooskens, 2006 ). We apply Levenshtein distance to dialect data from Bai ( Allen, 2004 ), a Sino-Tibetan language, and Hongshuihe (HSH) Zhuang (Castro and Hansen, accepted), a Tai language. In applying Levenshtein distance to languages with contour tone systems, we ask the following questions: 1) How much variation in intelligibility can tone alone explain? and 2) Which representation of tone results in the Levenshtein distance that shows the strongest correlation with intelligibility test results? This research evaluates six representations of tone: onset, contour and offset; onset and contour only; contour and offset only; target approximation ( Xu &amp; Wang, 2001 ), autosegments of H and L, and Chao's (1930) pitch numbers. For both languages, the more fully explicit onset-contour-offset and onset-contour representations showed significantly stronger inverse correlations with intelligibility. This suggests that, for cross-dialectal listeners, the optimal representation of tone in Levenshtein distance should be at a phonetically explicit level and include information on both onset and contour. </jats:p> http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Humanities and Arts Computing Edinburgh University Press

Representing Tone in Levenshtein Distance

Loading next page...
 
/lp/edinburgh-university-press/representing-tone-in-levenshtein-distance-DGHpdaiE7c

References (10)

Publisher
Edinburgh University Press
Copyright
© Edinburgh University Press and the Association of History and Computing 2009
Subject
Historical Studies
ISSN
1753-8548
eISSN
1755-1706
DOI
10.3366/E1753854809000391
Publisher site
See Article on Publisher Site

Abstract

<jats:p> Levenshtein distance, also known as string edit distance, has been shown to correlate strongly with both perceived distance and intelligibility in various Indo-European languages ( Gooskens and Heeringa, 2004 ; Gooskens, 2006 ). We apply Levenshtein distance to dialect data from Bai ( Allen, 2004 ), a Sino-Tibetan language, and Hongshuihe (HSH) Zhuang (Castro and Hansen, accepted), a Tai language. In applying Levenshtein distance to languages with contour tone systems, we ask the following questions: 1) How much variation in intelligibility can tone alone explain? and 2) Which representation of tone results in the Levenshtein distance that shows the strongest correlation with intelligibility test results? This research evaluates six representations of tone: onset, contour and offset; onset and contour only; contour and offset only; target approximation ( Xu &amp; Wang, 2001 ), autosegments of H and L, and Chao's (1930) pitch numbers. For both languages, the more fully explicit onset-contour-offset and onset-contour representations showed significantly stronger inverse correlations with intelligibility. This suggests that, for cross-dialectal listeners, the optimal representation of tone in Levenshtein distance should be at a phonetically explicit level and include information on both onset and contour. </jats:p>

Journal

International Journal of Humanities and Arts ComputingEdinburgh University Press

Published: Oct 1, 2008

There are no references for this article.