Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Wenzhou Spoken Corpus

The Wenzhou Spoken Corpus The creation of the Wenzhou Spoken Corpus, an online searchable corpus of a modern Chinese dialect, presents a number of challenges that are of interest to the corpus linguistic community. We review issues involved with collection of spoken data, its transcription and markup, as well as the functionality of the search tools. The transcription makes use of Chinese characters as well as IPA symbols for Wenzhou colloquial forms not conventionally represented by characters. XML was adopted as the standard for the basic format of files, with file searches expressed in XPath form. The search tools provide the usual options of restricting searches by age, gender, etc. , and yield concordances and tables of collocates. Though the collection of data for the corpus was ‘opportunistic’ in some ways, and so not ideally balanced or representative, it is nevertheless proving to be a valuable tool for corpus-based research on Wenzhou. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Corpora Edinburgh University Press

The Wenzhou Spoken Corpus

Corpora , Volume 2 (1): 97 – May 1, 2007

Loading next page...
 
/lp/edinburgh-university-press/the-wenzhou-spoken-corpus-fTUNx0N02q

References (9)

Publisher
Edinburgh University Press
Copyright
© Edinburgh University Press
ISSN
1749-5032
eISSN
1755-1676
DOI
10.3366/cor.2007.2.1.97
Publisher site
See Article on Publisher Site

Abstract

The creation of the Wenzhou Spoken Corpus, an online searchable corpus of a modern Chinese dialect, presents a number of challenges that are of interest to the corpus linguistic community. We review issues involved with collection of spoken data, its transcription and markup, as well as the functionality of the search tools. The transcription makes use of Chinese characters as well as IPA symbols for Wenzhou colloquial forms not conventionally represented by characters. XML was adopted as the standard for the basic format of files, with file searches expressed in XPath form. The search tools provide the usual options of restricting searches by age, gender, etc. , and yield concordances and tables of collocates. Though the collection of data for the corpus was ‘opportunistic’ in some ways, and so not ideally balanced or representative, it is nevertheless proving to be a valuable tool for corpus-based research on Wenzhou.

Journal

CorporaEdinburgh University Press

Published: May 1, 2007

There are no references for this article.