Toward a complete dataset of drug-drug interaction information from publicly available sources.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data.

Authors

  • Serkan Ayvaz
    Department of Computer Science, Kent State University, 241 Math and Computer Science Building, Kent, OH 44242, USA. Electronic address: sayvaz1@kent.edu.
  • John Horn
    Department of Pharmacy, School of Pharmacy and University of Washington Medicine, Pharmacy Services, University of Washington, H375V Health Sciences Bldg, Box 357630, Seattle, WA 98195, USA. Electronic address: jrhorn@uw.edu.
  • Oktie Hassanzadeh
    IBM T.J. Watson Research Center, 1101 Kitchawan Rd Route 134, P.O. Box 218, Yorktown Heights, NY 10598, USA. Electronic address: hassanzadeh@us.ibm.com.
  • Qian Zhu
    Institute for Prevention and Control of AIDS and STD, Henan Center for Disease Control and Prevention, Zhengzhou 450016, Henan, China.
  • Johann Stan
    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA. Electronic address: johann.stan.phd@gmail.com.
  • Nicholas P Tatonetti
    Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, 622 West 168th St VC5, New York, NY 10032, USA. Electronic address: nick.tatonetti@columbia.edu.
  • Santiago Vilar
    Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, 622 West 168th St VC5, New York, NY 10032, USA. Electronic address: sav7003@dbmi.columbia.edu.
  • Mathias Brochhausen
    Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
  • Matthias Samwald
    Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria. matthias.samwald@meduniwien.ac.at.
  • Majid Rastegar-Mojarad
    Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
  • Michel Dumontier
    Stanford University, Stanford, CA USA.
  • Richard D Boyce
    Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.