PISTON: Predicting drug indications and side effects using topic modeling and natural language processing.

Journal: Journal of biomedical informatics

Published Date: Sep 27, 2018

Abstract

The process of discovering novel drugs to treat diseases requires a long time and high cost. It is important to understand side effects of drugs as well as their therapeutic effects, because these can seriously damage the patients due to unexpected actions of the derived candidate drugs. In order to overcome these limitations, computational methods for predicting the therapeutic effects and side effects have been proposed. In particular, text mining is a widely used technique in the field of systems biology, because it can discover hidden relationships between drugs, genes and diseases from a large amount of literature data. Compared with in vivo/in vitro experiments, text mining derives meaningful results with less time and cost. In this study, we propose an algorithm for predicting novel drug-phenotype associations and drug-side effect associations using topic modeling and natural language processing (NLP). We extract sentences in which drugs and genes co-occur from the abstracts of the literature and identify words that describe the relationship between them using NLP. Considering the characteristics of the identified words, we determine if the drug has an up-regulation effect or a down-regulation effect on the gene. Based on genes that affect drugs and their regulatory relationships, we group the frequently occurring genes and regulatory relationships into topics, and build a drug-topic probability matrix by calculating the score that the drug will have a topic using topic modeling. Using the matrix, a classifier is constructed for predicting the novel indications and side effects of drugs considering the characteristics of known drug-phenotype associations or drug-side effect associations. The proposed method predicts both indications and side effects with a single algorithm, and it can exclude drugs with serious side effects or side effects that patients do not want to experience from among the candidate drugs provided for the treatment of the phenotype. Furthermore, lists of novel candidate drugs for phenotypes and side effects can be continuously updated with our algorithm every time a document is added. More than a thousand documents are produced per day, and it is possible for our algorithm to efficiently derive candidate drugs because it requires less cost than the existing drug repositioning methods. The resource of PISTON is available at databio.gachon.ac.kr/tools/PISTON.

Authors

Giup Jang

Department of IT Convergence Engineering, Gachon University, Seongnam, Republic of Korea.
Taekeon Lee

Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea.
Soyoun Hwang

Department of IT Convergence Engineering, Gachon University, Seongnam, Republic of Korea.
Chihyun Park

Dept. of Computer Science, Yonsei University, Seodaemun-gu, Seoul, Korea.
Jaegyoon Ahn

Department of Integrative Biology and Physiology, University of California, Los Angeles, USA. Electronic address: jgahn@ucla.edu.
Sukyung Seo

Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea.
Youhyeon Hwang

Department of Computer Science, University of Southern California, Los Angeles, USA.
Youngmi Yoon

Department of Computer Engineering, Gachon University, South Korea. Electronic address: ymyoon@gachon.ac.kr.

Keywords

Algorithms Area Under Curve Data Mining Drug Repositioning Drug-Related Side Effects and Adverse Reactions Electronic Health Records Humans Medical Informatics Natural Language Processing Phenotype Probability Systems Biology

External Resources

View on PubMed Access via DOI PubMed (30268842)

PISTON: Predicting drug indications and side effects using topic modeling and natural language processing.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals