Exploiting and assessing multi-source data for supervised biomedical named entity recognition.

Journal: Bioinformatics (Oxford, England)

Published Date: Jul 15, 2018

Abstract

MOTIVATION: Recognition of biomedical entities from scientific text is a critical component of natural language processing and automated information extraction platforms. Modern named entity recognition approaches rely heavily on supervised machine learning techniques, which are critically dependent on annotated training corpora. These approaches have been shown to perform well when trained and tested on the same source. However, in such scenario, the performance and evaluation of these models may be optimistic, as such models may not necessarily generalize to independent corpora, resulting in potential non-optimal entity recognition for large-scale tagging of widely diverse articles in databases such as PubMed.

Authors

Dieter Galea

Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK.
Ivan Laponogov

Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK.
Kirill Veselkov

Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK.

Keywords

Data Mining Databases, Factual Natural Language Processing PubMed Supervised Machine Learning

External Resources

View on PubMed Access via DOI PubMed (29538614)

Exploiting and assessing multi-source data for supervised biomedical named entity recognition.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals