Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma

Joeky T. Senders; Logan D. Cho; Paola Calvachi; John J. McNulty; Joanna L. Ashby; Isabelle S. Schulte; Ahmad Kareem Almekkawi; Alireza Mehrtash; William B. Gormley; Timothy R. Smith; Marike L.D. Broekman; Omar Arnaout

doi:10.1200/CCI.19.00060

Loading next page...

References (39)

J. Hanley, B. McNeil (1982)
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
Radiology, 143 1
Glaser AP, Jordan BJ, Cohen J
Automated extraction of grade, stage, and quality information from transurethral resection of bladder tumor pathology reports using natural language processing
Nguyen AN, Moore J, O'Dwyer J
Assessing the utility of automatic cancer registry notifications data extraction from free-text pathology reports
R. Jackson, R. Patel, N. Jayatilleke, A. Kolliakou, M. Ball, Genevieve Gorrell, A. Roberts, R. Dobson, R. Stewart (2017)
Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project
BMJ Open, 7
G. Leroy, Yang Gu, S. Pettygrove, Maureen Galindo, Ananyaa Arora, M. Kurzius-Spencer (2018)
Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
Journal of Medical Internet Research, 20
R. Khor, Anthony Nguyen, J. O'Dwyer, G. Kothari, J. Sia, David Chang, S. Ng, G. Duchesne, F. Foroudi (2019)
Extracting tumour prognostic factors from a diverse electronic record dataset in genito-urinary oncology
International journal of medical informatics, 121
P. Pruitt, A. Naidech, Jonathan Ornam, P. Borczuk, W. Thompson (2019)
A natural language processing algorithm to extract characteristics of subdural hematoma from head CT reports
Emergency Radiology, 26
Arika Wieneke, E. Bowles, David Cronkite, K. Wernli, Hongyuan Gao, D. Carrell, D. Buist (2015)
Validation of natural language processing to extract breast cancer pathology procedures and results
Journal of Pathology Informatics, 6
Aldo Tinoco, R. Evans, C. Staes, James Lloyd, Jeffrey Rothschild, Peter Haug (2011)
Comparison of computerized surveillance and manual chart review for adverse events
Journal of the American Medical Informatics Association : JAMIA, 18 4
Gustafson E, Pacheco J, Wehbe F
A machine learning algorithm for identifying atopic dermatitis in adults from electronic health records
Po-Hao Chen, H. Zafar, M. Galperin-Aizenberg, T. Cook (2018)
Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports
Journal of Digital Imaging, 31
Porter M
An algorithm for suffix stripping
V. Nguyen, Hien Nguyen, H. Duong, V. Snás̃el (2016)
n-Gram-Based Text Compression
Computational Intelligence and Neuroscience, 2016
P. Nadkarni, L. Ohno-Machado, W. Chapman (2011)
Natural language processing: an introduction
Journal of the American Medical Informatics Association : JAMIA, 18 5
R. Lacson, Kimberly Harris, P. Brawarsky, T. Tosteson, T. Onega, A. Tosteson, Abby Kaye, Irina Gonzalez, R. Birdwell, J. Haas (2015)
Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry
Journal of Digital Imaging, 28
David Martínez, G. Pitson, Andrew MacKinlay, L. Cavedon (2014)
Cross-hospital portability of information extraction of cancer staging information
Artificial intelligence in medicine, 62 1
Q. Ostrom, H. Gittleman, Peter Liao, Toni Vecchione-Koval, Yingli Wolinsky, C. Kruchko, J. Barnholtz-Sloan (2017)
CBTRUS Statistical Report: Primary brain and other central nervous system tumors diagnosed in the United States in 2010–2014
Neuro-Oncology, 19
Murff HJ, FitzHenry F, Matheny ME
Automated identification of postoperative complications within an electronic medical record using natural language processing
H. Trivedi, M. Panahiazar, April Liang, D. Lituiev, P. Chang, J. Sohn, Yunni-Yi Chen, B. Franc, B. Joe, D. Hadley (2018)
Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning
Journal of Digital Imaging, 32
Ross MK, Wei W, Ohno-Machado L
"Big data" and the electronic health record
M. Topaz, Kenneth Lai, D. Dowding, Victor Lei, A. Zisberg, K. Bowles, Li Zhou (2016)
Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application.
International journal of nursing studies, 64
G. Collins, J. Reitsma, D. Altman, K. Moons (2015)
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
BMJ : British Medical Journal, 350
J. Ranstam, J. Cook (2018)
LASSO regression
British Journal of Surgery, 105
Liang Chen, Liting Song, Yue Shao, Dewei Li, K. Ding (2019)
Using natural language processing to extract clinically useful information from Chinese electronic medical records
International journal of medical informatics, 124
Shumei Miao, Tingyu Xu, Yonghui Wu, Hui Xie, Jingqi Wang, Shenqi Jing, Yaoyun Zhang, Xiaoliang Zhang, Yinshuang Yang, Xin Zhang, Tao Shan, Li Wang, Hua Xu, Shui Wang, Yun Liu (2018)
Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches
International journal of medical informatics, 119
J. Landis, G. Koch (1977)
The measurement of observer agreement for categorical data.
Biometrics, 33 1
Senders JT, Karhade AV, Cote DJ
Natural language processing for automated quantification of brain metastases reported in free-text radiology reports
Olga Patterson, M. Freiberg, M. Skanderson, Samah Fodeh, C. Brandt, S. Duvall (2017)
Unlocking echocardiogram measurements for heart disease research through natural language processing
BMC Cardiovascular Disorders, 17
Jejo Koola, Sharon Davis, Omar Al-Nimri, S. Parr, D. Fabbri, B. Malin, S. Ho, M. Matheny (2018)
Development of an automated phenotyping algorithm for hepatorenal syndrome
Journal of biomedical informatics, 80
T. Patel, M. Puppala, R. Ogunti, J. Ensor, T. He, J. Shewale, D. Ankerst, V. Kaklamani, Angel Rodriguez, Stephen Wong, Jenny Chang (2017)
Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods
Cancer, 123
Vassar Matt, Holzmann Matthew (2013)
The retrospective chart review: important methodological considerations
Journal of Educational Evaluation for Health Professions, 10
Laura Dabbish, Colleen Stuart, Jason Tsay, J. Herbsleb (2012)
Social coding in GitHub: transparency and collaboration in an open software repository
Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Viera AJ, Garrett JM
Understanding interobserver agreement: The kappa statistic
A. Coden, G. Savova, I. Sominsky, M. Tanenblatt, James Masanz, K. Schuler, J. Cooper, Wei Guan, P. Groen (2009)
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model
Journal of biomedical informatics, 42 5
J. Zech, M. Pain, J. Titano, Marcus Badgeley, J. Schefflein, Andres Su, A. Costa, J. Bederson, J. Lehár, E. Oermann (2018)
Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.
Radiology, 287 2
F. Schroeck, Olga Patterson, Patrick Alba, Erik Pattison, J. Seigne, S. Duvall, D. Robertson, B. Sirovich, P. Goodney (2017)
Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.
Urology, 110
R. Tang, Lizhi Ouyang, Clara Li, Yue He, Molly Griffin, A. Taghian, Barbara Smith, Adam Yala, R. Barzilay, K. Hughes (2018)
Machine learning to parse breast pathology reports in Chinese
Breast Cancer Research and Treatment, 169
Pershad Y, Govindan S, Hara AK
Using naive Bayesian analysis to determine imaging characteristics of KRAS mutations in metastatic colon cancer
Yim WW, Denman T, Kwan SW
Tumor information extraction in radiology reports for hepatocellular carcinoma patients

Publisher: Wolters Kluwer Health
Copyright: (C) 2020 American Society of Clinical Oncology
ISSN: 2473-4276
DOI: 10.1200/CCI.19.00060
Publisher site: See Article on Publisher Site

Abstract

PURPOSE: The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others. MATERIALS AND METHODS: Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports. RESULTS: In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels ([rho] = 0.904; P < .001) but not with the frequency distribution of the variables of interest ([rho] = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents. CONCLUSION: This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.

Journal

JCO Clinical Cancer Informatics – Wolters Kluwer Health

Published: Jan 24, 2020

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma

Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma

Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma

References (39)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies