Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries

Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries AbstractAs part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classifier to determine smoking status from the same records after all smoking-related words had been manually removed (the smoke-blind dataset). The performance of the Naïve Bayes classifier was compared with the performance of three human annotators on a subset of the same training dataset (n = 54) and against the evaluation dataset (n = 104 records). The rule-based classifier was able to accurately extract smoking status from hospital discharge summaries when they contained explicit smoking words. On the smoke-blind dataset, where explicit smoking cues are not available, two Naïve Bayes systems performed less well than the rule-based classifier, but similarly to three expert human annotators. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the American Medical Informatics Association Oxford University Press

Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries

Loading next page...
 
/lp/oxford-university-press/using-implicit-information-to-identify-smoking-status-in-smoke-blind-e07DXR6YcD

References (8)

Publisher
Oxford University Press
Copyright
American Medical Informatics Association
ISSN
1067-5027
eISSN
1527-974X
DOI
10.1197/jamia.M2440
pmid
17947620
Publisher site
See Article on Publisher Site

Abstract

AbstractAs part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classifier to determine smoking status from the same records after all smoking-related words had been manually removed (the smoke-blind dataset). The performance of the Naïve Bayes classifier was compared with the performance of three human annotators on a subset of the same training dataset (n = 54) and against the evaluation dataset (n = 104 records). The rule-based classifier was able to accurately extract smoking status from hospital discharge summaries when they contained explicit smoking words. On the smoke-blind dataset, where explicit smoking cues are not available, two Naïve Bayes systems performed less well than the rule-based classifier, but similarly to three expert human annotators.

Journal

Journal of the American Medical Informatics AssociationOxford University Press

Published: Jan 1, 2008

There are no references for this article.