Optimizing lipocalin sequence classification with ensemble deep learning models.

Journal: PloS one
PMID:

Abstract

Deep learning (DL) has become a powerful tool for the recognition and classification of biological sequences. However, conventional single-architecture models often struggle with suboptimal predictive performance and high computational costs. To address these challenges, we present EnsembleDL-Lipo, an innovative ensemble deep learning framework that combines Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) to enhance the identification of lipocalin sequences. Lipocalins are multifunctional extracellular proteins involved in various diseases and stress responses, and their low sequence similarity and occurrence in the 'twilight zone' of sequence alignment present significant hurdles for accurate classification. These challenges necessitate efficient computational methods to complement traditional, labor-intensive experimental approaches. EnsembleDL-Lipo overcomes these issues by leveraging a set of PSSM-based features to train a large ensemble of deep learning models. The framework integrates multiple feature representations derived from position-specific scoring matrices (PSSMs), optimizing classification performance across diverse sequence patterns. The model achieved superior results on the training dataset, with an accuracy (ACC) of 97.65%, recall of 97.10%, Matthews correlation coefficient (MCC) of 0.95, and area under the curve (AUC) of 0.99. Validation on an independent test set further confirmed the robustness of the model, yielding an ACC of 95.79%, recall of 90.48%, MCC of 0.92, and AUC of 0.97. These results demonstrate that EnsembleDL-Lipo is a highly effective and computationally efficient tool for lipocalin sequence identification, significantly outperforming existing methods and offering strong potential for applications in biomarker discovery.

Authors

  • Yonglin Zhang
    State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China.
  • Lezheng Yu
    School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China.
  • Li Xue
    HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China.
  • Fengjuan Liu
    School of Geography and Resources, Guizhou Education University, Guiyang 550018, China.
  • Runyu Jing
    College of Cybersecurity, Sichuan University, Chengdu 610065, China.
  • Jiesi Luo
    College of Chemistry, Sichuan University, Chengdu 610064, PR China.