Heavy chain sequence-based classifier for the specificity of human antibodies.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Antibodies specifically bind to antigens and are an essential part of the immune system. Hence, antibodies are powerful tools in research and diagnostics. High-throughput sequencing technologies have promoted comprehensive profiling of the immune repertoire, which has resulted in large amounts of antibody sequences that remain to be further analyzed. In this study, antibodies were downloaded from IMGT/LIGM-DB and Sequence Read Archive databases. Contributing features from antibody heavy chains were formulated as numerical inputs and fed into an ensemble machine learning classifier to classify the antigen specificity of six classes of antibodies, namely anti-HIV-1, anti-influenza virus, anti-pneumococcal polysaccharide, anti-citrullinated protein, anti-tetanus toxoid and anti-hepatitis B virus. The classifier was validated using cross-validation and a testing dataset. The ensemble classifier achieved a macro-average area under the receiver operating characteristic curve (AUC) of 0.9246 from the 10-fold cross-validation, and 0.9264 for the testing dataset. Among the contributing features, the contribution of the complementarity-determining regions was 53.1% and that of framework regions was 46.9%, and the amino acid mutation rates occupied the first and second ranks among the top five contributing features. The classifier and insights provided in this study could promote the mechanistic study, isolation and utilization of potential therapeutic antibodies.

Authors

  • Yaqi Wang
    Key Laboratory of RF Circuits and Systems, Ministry of Education, Hangzhou Dianzi University, Hangzhou 310018, China.
  • Guoqin Mai
  • Min Zou
    School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, P.R. China.
  • Haoyu Long
    School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, P.R. China.
  • Yao-Qing Chen
    School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, P.R. China.
  • Litao Sun
    SEU-FEI Nano-Pico Center, Key Laboratory of MEMS of Ministry of Education, Southeast University, Nanjing, 210096, China.
  • Dechao Tian
    School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, P.R. China.
  • Yang Zhao
    The George Institute for Global Health, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia.
  • Guozhi Jiang
    Department of Medicine and Therapeutics, and Hong Kong Institute of Diabetes and Obesity, and Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong, Special Administrative Region, China.
  • Zicheng Cao
  • Xiangjun Du
    School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, P.R. China.