GraphATC: advancing multilevel and multi-label anatomical therapeutic chemical classification via atom-level graph learning.

Journal: Briefings in bioinformatics
PMID:

Abstract

The accurate categorization of compounds within the anatomical therapeutic chemical (ATC) system is fundamental for drug development and fundamental research. Although this area has garnered significant research focus for over a decade, the majority of prior studies have concentrated solely on the Level 1 labels defined by the World Health Organization (WHO), neglecting the labels of the remaining four levels. This narrow focus fails to address the true nature of the task as a multilevel, multi-label classification challenge. Moreover, existing benchmarks like Chen-2012 and ATC-SMILES have become outdated, lacking the incorporation of new drugs or updated properties of existing ones that have emerged in recent years and have been integrated into the WHO ATC system. To tackle these shortcomings, we present a comprehensive approach in this paper. Firstly, we systematically cleanse and enhance the drug dataset, expanding it to encompass all five levels through a rigorous cross-resource validation process involving KEGG, PubChem, ChEMBL, ChemSpider, and ChemicalBook. This effort culminates in the creation of a novel benchmark termed ATC-GRAPH. Secondly, we extend the classification task to encompass Level 2 and introduce graph-based learning techniques to provide more accurate representations of drug molecular structures. This approach not only facilitates the modeling of Polymers, Macromolecules, and Multi-Component drugs more precisely but also enhances the overall fidelity of the classification process. The efficacy of our proposed framework is validated through extensive experiments, establishing a new state-of-the-art methodology. To facilitate the replication of this study, we have made the benchmark dataset, source code, and web server openly accessible.

Authors

  • Wengyu Zhang
    Department of Computer Science, Sichuan University, Chengdu 610065, China.
  • Qi Tian
    College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, 310027 Hanghzou, China; Key Laboratory for Biomedical Engineering, Ministry of Education, China. Electronic address: Tianq@zju.edu.cn.
  • Yi Cao
    Department of Dermatology, First Clinical Medical College of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China.
  • Wenqi Fan
    Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
  • Dongmei Jiang
  • Yaowei Wang
    PengCheng Laboratory, China. Electronic address: wangyw@pcl.ac.cn.
  • Qing Li
    Department of Internal Medicine, University of Michigan Ann Arbor, MI 48109, USA.
  • Xiao-Yong Wei