Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Encoding document information in a corpus of student writing: the British Academic Written English corpus

Encoding document information in a corpus of student writing: the British Academic Written... The information contained in a document is only partly represented by the wording of the text; in addition, features of formatting and layout can be combined to lend specific functionality to chunks of text (e.g., section headings, highlighting, enumeration through list formatting, etc .). Such functional features, although based on the ‘objective’ typographical surface of the document, are often inconsistently realised and encoded only implicitly, i.e., they depend on deciphering by a competent reader. They are characteristic of documents produced with standard text-processing tools. We discuss the representation of such information with reference to the British Academic Written English (BAWE) corpus of student writing, currently under construction at the universities of Warwick, Reading and Oxford Brookes. Assignments are usually submitted to the corpus as Microsoft Word documents and make heavy use of surface-based functional features. As the documents are to be transformed into XML-encoded corpus files, this information can only be preserved through explicit annotation, based on interpretation. We present a discussion of the choices made in the BAWE corpus and the practical requirements for a tagging interface. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Corpora Edinburgh University Press

Encoding document information in a corpus of student writing: the British Academic Written English corpus

Corpora , Volume 2 (2): 241 – Nov 1, 2007

Loading next page...
 
/lp/edinburgh-university-press/encoding-document-information-in-a-corpus-of-student-writing-the-PugIzDDVch

References (9)

Publisher
Edinburgh University Press
Copyright
© Edinburgh University Press
ISSN
1749-5032
eISSN
1755-1676
DOI
10.3366/cor.2007.2.2.241
Publisher site
See Article on Publisher Site

Abstract

The information contained in a document is only partly represented by the wording of the text; in addition, features of formatting and layout can be combined to lend specific functionality to chunks of text (e.g., section headings, highlighting, enumeration through list formatting, etc .). Such functional features, although based on the ‘objective’ typographical surface of the document, are often inconsistently realised and encoded only implicitly, i.e., they depend on deciphering by a competent reader. They are characteristic of documents produced with standard text-processing tools. We discuss the representation of such information with reference to the British Academic Written English (BAWE) corpus of student writing, currently under construction at the universities of Warwick, Reading and Oxford Brookes. Assignments are usually submitted to the corpus as Microsoft Word documents and make heavy use of surface-based functional features. As the documents are to be transformed into XML-encoded corpus files, this information can only be preserved through explicit annotation, based on interpretation. We present a discussion of the choices made in the BAWE corpus and the practical requirements for a tagging interface.

Journal

CorporaEdinburgh University Press

Published: Nov 1, 2007

There are no references for this article.