Semi-Automated Data Curation from Biomedical Literature.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

Published Date: Apr 29, 2023

Abstract

Data curation is a bottleneck for many informatics pipelines. A specific example of this is aggregating data from preclinical studies to identify novel genetic pathways for atherosclerosis in humans. This requires extracting data from published mouse studies such as the perturbed gene and its impact on lesion sizes and plaque inflammation, which is non-trivial. Curation efforts are resource-heavy, with curators manually extracting data from hundreds of publications. In this work, we describe the development of a semi-automated curation tool to accelerate data extraction. We use natural language processing (NLP) methods to auto-populate a web-based form which is then reviewed by a curator. We conducted a controlled user study to evaluate the curation tool. Our NLP model has a 70% accuracy on categorical fields and our curation tool accelerates task completion time by 49% compared to manual curation.

Authors

Protiva Rahman

Vanderbilt University Medical Center, Nashville, TN.
Daniel Fabbri

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.

Keywords

Animals Data Curation Humans Mice Natural Language Processing Publications

External Resources

View on PubMed PubMed (37128469)

Semi-Automated Data Curation from Biomedical Literature.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals