Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences

Jung-wei Fan; Elly W Yang; Min Jiang; Rashmi Prasad; Richard M Loomis; Daniel S Zisook; Josh C Denny; Hua Xu; Yang Huang

doi:10.1136/amiajnl-2013-001810

Loading next page...

References (42)

Slav Petrov, D. Klein (2007)
Improved Inference for Unlexicalized Parsing
W. Chapman, P. Nadkarni, L. Hirschman, Leonard D'Avolio, G. Savova, Özlem Uzuner (2011)
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions
Journal of the American Medical Informatics Association : JAMIA, 18 5
R. Taira, Vijayaraghavan Bashyam, H. Kangarloo (2007)
A Field Theoretical Approach to Medical Natural Language Processing
IEEE Transactions on Information Technology in Biomedicine, 11
Mitchell Marcus, Beatrice Santorini, Mary Marcinkiewicz (1993)
Building a Large Annotated Corpus of English: The Penn Treebank
Comput. Linguistics, 19
P. Kantor (2001)
Foundations of Statistical Natural Language Processing
Information Retrieval, 4
Özlem Uzuner (2009)
Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data
J. Am. Medical Informatics Assoc., 16
Eugene Charniak, Mark Johnson (2005)
Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
C. Friedman, Pauline Kra, A. Rzhetsky (2002)
Two biomedical sublanguages: a description based on the theories of Zellig Harris
Journal of biomedical informatics, 35 4
N. Sager, L. Hirschman (1982)
Chapter 2. Automatic Information Formatting of a Medical Sublanguage
Jung-wei Fan, C. Friedman (2011)
Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies
Journal of biomedical informatics, 44 5
Yang Huang, H. Lowe (2007)
Research Paper: A Novel Hybrid Approach to Automated Negation Detection in Clinical Radiology Reports
Journal of the American Medical Informatics Association : JAMIA, 14 3
D. Albright, Arrick Lanfranchi, Anwen Fredriksen, IV WilliamF.Styler, Colin Warner, Jena Hwang, Jinho Choi, Dmitriy Dligach, Rodney Nielsen, James Martin, Wayne Ward, Martha Palmer, G. Savova (2013)
Towards comprehensive syntactic and semantic annotations of the clinical narrative
Journal of the American Medical Informatics Association : JAMIA, 20
(2012)
Bracketing biomedical text: an addendum to Penn Treebank II guidelines. Institute of Cognitive Science, University of Colorado at Boulder
S. Meystre, G. Savova, K. Kipper-Schuler, John Hurdle (2008)
Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research
Yearbook of Medical Informatics, 17
D. Klein, Christopher Manning (2003)
Accurate Unlexicalized Parsing
P. Nadkarni, L. Ohno-Machado, W. Chapman (2011)
Natural language processing: an introduction
Journal of the American Medical Informatics Association : JAMIA, 18 5
Jung-wei Fan, R. Prasad, Rommel Yabut, R. Loomis, D. Zisook, J. Mattison, Y. Huang (2011)
Part-of-speech tagging for clinical text: wall or bridge between institutions?
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2011
Yuka Tateisi, Akane Yakushiji, Tomoko Ohta, Junichi Tsujii (2005)
Syntax Annotation for the GENIA Corpus
C. Friedman, P. Alderson, J. Austin, J. Cimino, Stephen Johnson (1994)
Research Paper: A General Natural-language Text Processor for Clinical Radiology
Journal of the American Medical Informatics Association : JAMIA, 1 2
Jennifer Foster (2007)
Treebanks Gone Bad Parser Evaluation and Retraining using a Treebank of Ungrammatical Sentences
Jiaping Zheng, W. Chapman, Timothy Miller, Chen Lin, R. Crowley, G. Savova (2012)
A system for coreference resolution for the clinical narrative
Journal of the American Medical Informatics Association : JAMIA, 19 4
Stephan Kepser, I. Steiner, W. Sternefeld (2004)
Annotating and Querying a Treebank of Suboptimal Structures
G. Chung (2009)
Towards identifying intervention arms in randomized controlled trials: Extracting coordinating constructions
Journal of biomedical informatics, 42 5
Ann Bies, Mark Ferguson, Karen Katz, R. MacIntyre, Victoria Tredinnick, Grace Kim, Mary Marcinkiewicz, Britta Schasberger (2002)
Bracketing Guidelines for Treebank II Style
T. Morton, J. LaCivita (2003)
WordFreak: An Open Tool for Linguistic Annotation
Özlem Uzuner, I. Solti, Eithon Cadag (2010)
Extracting medication information from clinical text
Journal of the American Medical Informatics Association : JAMIA, 17 5
Yusuke Miyao, Kenji Sagae, Rune Sætre, Takuya Matsuzaki, Jun'ichi Tsujii (2005)
Data and text mining
Karen Jensen, George Heidorn, L. Miller, Yael Ravin (1983)
Parse Fitting and Prose Fixing: Getting a Hold on III-Formedness
Am. J. Comput. Linguistics, 9
S. Rea, Jyotishman Pathak, G. Savova, T. Oniki, Les Westberg, C. Beebe, C. Tao, C. Parker, P. Haug, S. Huff, C. Chute (2012)
Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project
Journal of biomedical informatics, 45 4
Zach Solan, D. Horn, E. Ruppin, S. Edelman (2009)
Unsupervised learning of natural languages
Proceedings of the National Academy of Sciences of the United States of America, 102 33
Yang Huang, H. Lowe, D. Klein, R. Cucina (2005)
Application of Information Technology: Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon
J. Am. Medical Informatics Assoc., 12
U. Hahn, M. Romacker, S. Schulz (2002)
MedSynDikate - a natural language system for the extraction of medical information from findings reports
International journal of medical informatics, 67 1-3
R. Kittredge, John Lehrberger (1982)
Sublanguage : studies of language in restricted semantic domains
Yue Wang, M. Halper, D. Wei, Huanying Gu, Y. Perl, Junchuan Xu, Gai Elhanan, Yan Chen, K. Spackman, James Case, G. Hripcsak (2012)
Auditing complex concepts of SNOMED using a refined hierarchical abstraction network
Journal of biomedical informatics, 45 1
G. Hripcsak, A. Rothschild (2005)
Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval
Journal of the American Medical Informatics Association : JAMIA, 12 3
Peter Spyns (1996)
Natural Language Processing in Medicine: An Overview
Methods of Information in Medicine, 35
J. Carbonell, P. Hayes (1983)
Recovery Strategies for Parsing Extragrammatical Language
Am. J. Comput. Linguistics, 9
Z. Harris (1952)
Methods in structural linguistics.
Modern Language Notes, 68
Özlem Uzuner, B. South, Shuying Shen, S. Duvall (2011)
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text
Journal of the American Medical Informatics Association : JAMIA, 18 5
M. Marneffe, Bill MacCartney, Christopher Manning (2006)
Generating Typed Dependency Parses from Phrase Structure Parses
Eugene Charniak (2000)
A Maximum-Entropy-Inspired Parser
Elizabeth Shriberg (1996)
DISFLUENCIES IN SWITCHBOARD

Publisher: Oxford University Press
Copyright: Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions
ISSN: 1067-5027
eISSN: 1527-974X
DOI: 10.1136/amiajnl-2013-001810
pmid: 23907286
Publisher site: See Article on Publisher Site

Abstract

AbstractObjective To develop, evaluate, and share: (1) syntactic parsing guidelines for clinical text, with a new approach to handling ill-formed sentences; and (2) a clinical Treebank annotated according to the guidelines. To document the process and findings for readers with similar interest.Methods Using random samples from a shared natural language processing challenge dataset, we developed a handbook of domain-customized syntactic parsing guidelines based on iterative annotation and adjudication between two institutions. Special considerations were incorporated into the guidelines for handling ill-formed sentences, which are common in clinical text. Intra- and inter-annotator agreement rates were used to evaluate consistency in following the guidelines. Quantitative and qualitative properties of the annotated Treebank, as well as its use to retrain a statistical parser, were reported.Results A supplement to the Penn Treebank II guidelines was developed for annotating clinical sentences. After three iterations of annotation and adjudication on 450 sentences, the annotators reached an F-measure agreement rate of 0.930 (while intra-annotator rate was 0.948) on a final independent set. A total of 1100 sentences from progress notes were annotated that demonstrated domain-specific linguistic features. A statistical parser retrained with combined general English (mainly news text) annotations and our annotations achieved an accuracy of 0.811 (higher than models trained purely with either general or clinical sentences alone). Both the guidelines and syntactic annotations are made available at https://sourceforge.net/projects/medicaltreebank.Conclusions We developed guidelines for parsing clinical text and annotated a corpus accordingly. The high intra- and inter-annotator agreement rates showed decent consistency in following the guidelines. The corpus was shown to be useful in retraining a statistical parser that achieved moderate accuracy.

Journal

Journal of the American Medical Informatics Association – Oxford University Press

Published: Nov 1, 2013

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences

References (42)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies