AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding.

Journal: Genome biology
PMID:

Abstract

Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272.

Authors

  • Lingyan Zheng
    Department of Oral Surgery, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Shuiyang Shi
    College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China.
  • Mingkun Lu
    College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China.
  • Pan Fang
    Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China.
  • Ziqi Pan
    College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
  • Hongning Zhang
    College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
  • Zhimeng Zhou
    College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
  • Hanyu Zhang
    Department of Environmental Science and Engineering, Beijing Technology and Business University, Beijing 100048, China.
  • Minjie Mou
    College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
  • Shijie Huang
    College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
  • Lin Tao
    Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China.
  • Weiqi Xia
    College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
  • Honglin Li
    Innovation Center for AI and Drug Discovery, East China Normal University, China.
  • Zhenyu Zeng
    Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
  • Shun Zhang
    Department of Radiology, Weill Cornell Medical College, New York, New York.
  • Yuzong Chen
    Bioinformatics and Drug Design Group, Department of Pharmacy, National, University of Singapore, Singapore, 117543, Singapore.
  • Zhaorong Li
    Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
  • Feng Zhu
    Department of Critical Care Medicine, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, People's Republic of China.