An Exploratory Study on Pseudo-Data Generation in Prescription and Adverse Drug Reaction Extraction.

Journal: Studies in health technology and informatics

Published Date: Aug 21, 2019

Abstract

Prescription information and adverse drug reactions (ADR) are two components of detailed medication instructions that can benefit many aspects of clinical research. Automatic extraction of this information from free-text narratives via Information Extraction (IE) can open it up to downstream uses. IE is commonly tackled by supervised Natural Language Processing (NLP) systems which rely on annotated training data. However, training data generation is manual, time-consuming, and labor-intensive. It is desirable to develop automatic methods for augmenting manually labeled data. We propose pseudo-data generation as one such automatic method. Pseudo-data are synthetic data generated by combining elements of existing labeled data. We propose and evaluate two sets of pseudo-data generation methods: knowledge-driven methods based on gazetteers and data-driven methods based on deep learning. We use the resulting pseudo-data to improve medication and ADR extraction. Data-driven pseudo-data are suitable for concept categories with high semantic regularities and short textual spans. Knowledge-driven pseudo-data are effective for concept categories with longer textual spans, assuming the knowledge base offers good coverage of these concepts. Combining the knowledge- and data-driven pseudo-data achieves significant performance improvement on medication names and ADRs over baselines limited to the use of available labeled data.

Authors

Carson Tao

Department of Information Science, State University of New York at Albany, NY, USA. Electronic address: mtao@albany.edu.
Kahyun Lee

George Mason University, Fairfax, VA, USA.
Michele Filannino

Department of Computer Science, State University of New York at Albany, NY, USA.
Ozlem Uzuner

Department of Information Studies, University at Albany, SUNY. Albany, NY.

Keywords

Drug Prescriptions Drug-Related Side Effects and Adverse Reactions Information Storage and Retrieval Knowledge Bases Natural Language Processing Semantics

External Resources

View on PubMed Access via DOI PubMed (31437951)

An Exploratory Study on Pseudo-Data Generation in Prescription and Adverse Drug Reaction Extraction.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals