Access the full text.
Sign up today, get DeepDyve free for 14 days.
J. Hanley, B. McNeil (1982)
The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology, 143 1
Glaser AP, Jordan BJ, Cohen J
Automated extraction of grade, stage, and quality information from transurethral resection of bladder tumor pathology reports using natural language processing
Nguyen AN, Moore J, O'Dwyer J
Assessing the utility of automatic cancer registry notifications data extraction from free-text pathology reports
R. Jackson, R. Patel, N. Jayatilleke, A. Kolliakou, M. Ball, Genevieve Gorrell, A. Roberts, R. Dobson, R. Stewart (2017)
Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) projectBMJ Open, 7
G. Leroy, Yang Gu, S. Pettygrove, Maureen Galindo, Ananyaa Arora, M. Kurzius-Spencer (2018)
Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and ApplicationJournal of Medical Internet Research, 20
R. Khor, Anthony Nguyen, J. O'Dwyer, G. Kothari, J. Sia, David Chang, S. Ng, G. Duchesne, F. Foroudi (2019)
Extracting tumour prognostic factors from a diverse electronic record dataset in genito-urinary oncologyInternational journal of medical informatics, 121
P. Pruitt, A. Naidech, Jonathan Ornam, P. Borczuk, W. Thompson (2019)
A natural language processing algorithm to extract characteristics of subdural hematoma from head CT reportsEmergency Radiology, 26
Arika Wieneke, E. Bowles, David Cronkite, K. Wernli, Hongyuan Gao, D. Carrell, D. Buist (2015)
Validation of natural language processing to extract breast cancer pathology procedures and resultsJournal of Pathology Informatics, 6
Aldo Tinoco, R. Evans, C. Staes, James Lloyd, Jeffrey Rothschild, Peter Haug (2011)
Comparison of computerized surveillance and manual chart review for adverse eventsJournal of the American Medical Informatics Association : JAMIA, 18 4
Gustafson E, Pacheco J, Wehbe F
A machine learning algorithm for identifying atopic dermatitis in adults from electronic health records
Po-Hao Chen, H. Zafar, M. Galperin-Aizenberg, T. Cook (2018)
Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology ReportsJournal of Digital Imaging, 31
Porter M
An algorithm for suffix stripping
V. Nguyen, Hien Nguyen, H. Duong, V. Snás̃el (2016)
n-Gram-Based Text CompressionComputational Intelligence and Neuroscience, 2016
P. Nadkarni, L. Ohno-Machado, W. Chapman (2011)
Natural language processing: an introductionJournal of the American Medical Informatics Association : JAMIA, 18 5
R. Lacson, Kimberly Harris, P. Brawarsky, T. Tosteson, T. Onega, A. Tosteson, Abby Kaye, Irina Gonzalez, R. Birdwell, J. Haas (2015)
Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening RegistryJournal of Digital Imaging, 28
David Martínez, G. Pitson, Andrew MacKinlay, L. Cavedon (2014)
Cross-hospital portability of information extraction of cancer staging informationArtificial intelligence in medicine, 62 1
Q. Ostrom, H. Gittleman, Peter Liao, Toni Vecchione-Koval, Yingli Wolinsky, C. Kruchko, J. Barnholtz-Sloan (2017)
CBTRUS Statistical Report: Primary brain and other central nervous system tumors diagnosed in the United States in 2010–2014Neuro-Oncology, 19
Murff HJ, FitzHenry F, Matheny ME
Automated identification of postoperative complications within an electronic medical record using natural language processing
H. Trivedi, M. Panahiazar, April Liang, D. Lituiev, P. Chang, J. Sohn, Yunni-Yi Chen, B. Franc, B. Joe, D. Hadley (2018)
Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep LearningJournal of Digital Imaging, 32
Ross MK, Wei W, Ohno-Machado L
"Big data" and the electronic health record
M. Topaz, Kenneth Lai, D. Dowding, Victor Lei, A. Zisberg, K. Bowles, Li Zhou (2016)
Automated identification of wound information in clinical notes of patients with heart diseases: Developing and validating a natural language processing application.International journal of nursing studies, 64
G. Collins, J. Reitsma, D. Altman, K. Moons (2015)
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statementBMJ : British Medical Journal, 350
J. Ranstam, J. Cook (2018)
LASSO regressionBritish Journal of Surgery, 105
Liang Chen, Liting Song, Yue Shao, Dewei Li, K. Ding (2019)
Using natural language processing to extract clinically useful information from Chinese electronic medical recordsInternational journal of medical informatics, 124
Shumei Miao, Tingyu Xu, Yonghui Wu, Hui Xie, Jingqi Wang, Shenqi Jing, Yaoyun Zhang, Xiaoliang Zhang, Yinshuang Yang, Xin Zhang, Tao Shan, Li Wang, Hua Xu, Shui Wang, Yun Liu (2018)
Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approachesInternational journal of medical informatics, 119
J. Landis, G. Koch (1977)
The measurement of observer agreement for categorical data.Biometrics, 33 1
Senders JT, Karhade AV, Cote DJ
Natural language processing for automated quantification of brain metastases reported in free-text radiology reports
Olga Patterson, M. Freiberg, M. Skanderson, Samah Fodeh, C. Brandt, S. Duvall (2017)
Unlocking echocardiogram measurements for heart disease research through natural language processingBMC Cardiovascular Disorders, 17
Jejo Koola, Sharon Davis, Omar Al-Nimri, S. Parr, D. Fabbri, B. Malin, S. Ho, M. Matheny (2018)
Development of an automated phenotyping algorithm for hepatorenal syndromeJournal of biomedical informatics, 80
T. Patel, M. Puppala, R. Ogunti, J. Ensor, T. He, J. Shewale, D. Ankerst, V. Kaklamani, Angel Rodriguez, Stephen Wong, Jenny Chang (2017)
Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methodsCancer, 123
Vassar Matt, Holzmann Matthew (2013)
The retrospective chart review: important methodological considerationsJournal of Educational Evaluation for Health Professions, 10
Laura Dabbish, Colleen Stuart, Jason Tsay, J. Herbsleb (2012)
Social coding in GitHub: transparency and collaboration in an open software repositoryProceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Viera AJ, Garrett JM
Understanding interobserver agreement: The kappa statistic
A. Coden, G. Savova, I. Sominsky, M. Tanenblatt, James Masanz, K. Schuler, J. Cooper, Wei Guan, P. Groen (2009)
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation ModelJournal of biomedical informatics, 42 5
J. Zech, M. Pain, J. Titano, Marcus Badgeley, J. Schefflein, Andres Su, A. Costa, J. Bederson, J. Lehár, E. Oermann (2018)
Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.Radiology, 287 2
F. Schroeck, Olga Patterson, Patrick Alba, Erik Pattison, J. Seigne, S. Duvall, D. Robertson, B. Sirovich, P. Goodney (2017)
Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.Urology, 110
R. Tang, Lizhi Ouyang, Clara Li, Yue He, Molly Griffin, A. Taghian, Barbara Smith, Adam Yala, R. Barzilay, K. Hughes (2018)
Machine learning to parse breast pathology reports in ChineseBreast Cancer Research and Treatment, 169
Pershad Y, Govindan S, Hara AK
Using naive Bayesian analysis to determine imaging characteristics of KRAS mutations in metastatic colon cancer
Yim WW, Denman T, Kwan SW
Tumor information extraction in radiology reports for hepatocellular carcinoma patients
PURPOSE: The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others. MATERIALS AND METHODS: Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports. RESULTS: In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels ([rho] = 0.904; P < .001) but not with the frequency distribution of the variables of interest ([rho] = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents. CONCLUSION: This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.
JCO Clinical Cancer Informatics – Wolters Kluwer Health
Published: Jan 24, 2020
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.