Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Defining a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts

Defining a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts PurposeNatural language processing techniques are essential for unlocking patients’ data from electronic health records. An important NLP task is the ability to recognize morphosyntactic information from the texts, a process called part-of-speech (POS) tagging. Currently, neural network architectures are the state-of-the-art method, although there is a lack of studies exploiting this approach within Brazilian Portuguese clinical texts. The objective of this study is to define a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts.MethodsWe reviewed multiple neural network-based POS-tagging algorithms, and the Flair tool was selected due to its exceptional performance in the journalistic domain, as there is any specific algorithm to Portuguese clinical texts. We executed a normalization process on available corpora from multiple domains (two journalistic, one biomedical, one clinical, and a new corpus composed of all three of these). The Flair algorithm was trained with all corpora, generating five models, which were evaluated with all domains.ResultsThe clinical model achieved 92.39% accuracy (previous POS-tagging clinical work reached 91.5%); the biomedical model achieved 97.9% accuracy. All the models were assessed on their own test set.ConclusionWe developed a new state-of-the-art modeling environment for POS tagging of Brazilian Portuguese clinical texts and achieved comparable results to other state-of-the-art studies in journalistic contexts. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Research on Biomedical Engineering Springer Journals

Loading next page...
 
/lp/springer-journals/defining-a-state-of-the-art-pos-tagging-environment-for-brazilian-mDMTxchIYH
Publisher
Springer Journals
Copyright
Copyright © Sociedade Brasileira de Engenharia Biomedica 2020
ISSN
2446-4732
eISSN
2446-4740
DOI
10.1007/s42600-020-00067-7
Publisher site
See Article on Publisher Site

Abstract

PurposeNatural language processing techniques are essential for unlocking patients’ data from electronic health records. An important NLP task is the ability to recognize morphosyntactic information from the texts, a process called part-of-speech (POS) tagging. Currently, neural network architectures are the state-of-the-art method, although there is a lack of studies exploiting this approach within Brazilian Portuguese clinical texts. The objective of this study is to define a state-of-the-art POS-tagging environment for Brazilian Portuguese clinical texts.MethodsWe reviewed multiple neural network-based POS-tagging algorithms, and the Flair tool was selected due to its exceptional performance in the journalistic domain, as there is any specific algorithm to Portuguese clinical texts. We executed a normalization process on available corpora from multiple domains (two journalistic, one biomedical, one clinical, and a new corpus composed of all three of these). The Flair algorithm was trained with all corpora, generating five models, which were evaluated with all domains.ResultsThe clinical model achieved 92.39% accuracy (previous POS-tagging clinical work reached 91.5%); the biomedical model achieved 97.9% accuracy. All the models were assessed on their own test set.ConclusionWe developed a new state-of-the-art modeling environment for POS tagging of Brazilian Portuguese clinical texts and achieved comparable results to other state-of-the-art studies in journalistic contexts.

Journal

Research on Biomedical EngineeringSpringer Journals

Published: Sep 19, 2020

There are no references for this article.