Development of benchmark datasets for text mining and sentiment analysis to accelerate regulatory literature review.

Journal: Regulatory toxicology and pharmacology : RTP
Published Date:

Abstract

In the field of regulatory science, reviewing literature is an essential and important step, which most of the time is conducted by manually reading hundreds of articles. Although this process is highly time-consuming and labor-intensive, most output of this process is not well transformed into machine-readable format. The limited availability of data has largely constrained the artificial intelligence (AI) system development to facilitate this literature reviewing in the regulatory process. In the past decade, AI has revolutionized the area of text mining as many deep learning approaches have been developed to search, annotate, and classify relevant documents. After the great advancement of AI algorithms, a lack of high-quality data instead of the algorithms has recently become the bottleneck of AI system development. Herein, we constructed two large benchmark datasets, Chlorine Efficacy dataset (CHE) and Chlorine Safety dataset (CHS), under a regulatory scenario that sought to assess the antiseptic efficacy and toxicity of chlorine. For each dataset, ∼10,000 scientific articles were initially collected, manually reviewed, and their relevance to the review task were labeled. To ensure high data quality, each paper was labeled by a consensus among multiple experienced reviewers. The overall relevance rate was 27.21% (2,663 of 9,788) for CHE and 7.50% (761 of 10,153) for CHS, respectively. Furthermore, the relevant articles were categorized into five subgroups based on the focus of their content. Next, we developed an attention-based classification language model using these two datasets. The proposed classification model yielded 0.857 and 0.908 of Area Under the Curve (AUC) for CHE and CHS dataset, respectively. This performance was significantly better than permutation test (p < 10E-9), demonstrating that the labeling processes were valid. To conclude, our datasets can be used as benchmark to develop AI systems, which can further facilitate the literature review process in regulatory science.

Authors

  • Leihong Wu
    Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA. Leihong.wu@fda.hhs.gov.
  • Si Chen
    Department of Pharmacy, The First Affiliated Hospital, Fujian Medical University, Fuzhou, China.
  • Lei Guo
    Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Svitlana Shpyleva
    Division of Biochemical Toxicology, National Center for Toxicological Research, U.S. FDA, Jefferson, AR, 72079, USA.
  • Kelly Harris
    Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, U.S. FDA, Jefferson, AR, 72079, USA.
  • Tariq Fahmi
    Office of Scientific Coordination, National Center for Toxicological Research, U.S. FDA, Jefferson, AR, 72079, USA.
  • Timothy Flanigan
    Division of Neurotoxicology, National Center for Toxicological Research, U.S. FDA, Jefferson, AR, 72079, USA.
  • Weida Tong
    National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR, United States.
  • Joshua Xu
    Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA.
  • Zhen Ren
    Division of Biochemical Toxicology, National Center for Toxicological Research, U.S. FDA, Jefferson, AR, 72079, USA. Electronic address: zhen.ren@fda.hhs.gov.