Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Crowdsourcing Ground Truth for Medical Relation Extraction

Crowdsourcing Ground Truth for Medical Relation Extraction Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Interactive Intelligent Systems (TiiS) Association for Computing Machinery

Crowdsourcing Ground Truth for Medical Relation Extraction

Loading next page...
 
/lp/association-for-computing-machinery/crowdsourcing-ground-truth-for-medical-relation-extraction-Io0DX3xCMQ

References (43)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2018 Owner/Author
ISSN
2160-6455
eISSN
2160-6463
DOI
10.1145/3152889
Publisher site
See Article on Publisher Site

Abstract

Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.

Journal

ACM Transactions on Interactive Intelligent Systems (TiiS)Association for Computing Machinery

Published: Jul 6, 2018

Keywords: Ground truth

There are no references for this article.