Optimizing lipocalin sequence classification with ensemble deep learning models.

Journal: PloS one

PMID: 40238838

Abstract

Deep learning (DL) has become a powerful tool for the recognition and classification of biological sequences. However, conventional single-architecture models often struggle with suboptimal predictive performance and high computational costs. To address these challenges, we present EnsembleDL-Lipo, an innovative ensemble deep learning framework that combines Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) to enhance the identification of lipocalin sequences. Lipocalins are multifunctional extracellular proteins involved in various diseases and stress responses, and their low sequence similarity and occurrence in the 'twilight zone' of sequence alignment present significant hurdles for accurate classification. These challenges necessitate efficient computational methods to complement traditional, labor-intensive experimental approaches. EnsembleDL-Lipo overcomes these issues by leveraging a set of PSSM-based features to train a large ensemble of deep learning models. The framework integrates multiple feature representations derived from position-specific scoring matrices (PSSMs), optimizing classification performance across diverse sequence patterns. The model achieved superior results on the training dataset, with an accuracy (ACC) of 97.65%, recall of 97.10%, Matthews correlation coefficient (MCC) of 0.95, and area under the curve (AUC) of 0.99. Validation on an independent test set further confirmed the robustness of the model, yielding an ACC of 95.79%, recall of 90.48%, MCC of 0.92, and AUC of 0.97. These results demonstrate that EnsembleDL-Lipo is a highly effective and computationally efficient tool for lipocalin sequence identification, significantly outperforming existing methods and offering strong potential for applications in biomarker discovery.

Authors

Yonglin Zhang

State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China.
Lezheng Yu

School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China.
Li Xue

HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China.
Fengjuan Liu

School of Geography and Resources, Guizhou Education University, Guiyang 550018, China.
Runyu Jing

College of Cybersecurity, Sichuan University, Chengdu 610065, China.
Jiesi Luo

College of Chemistry, Sichuan University, Chengdu 610064, PR China.

Keywords

Amino Acid Sequence Computational Biology Deep Learning Humans Lipocalins Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40238838)

Optimizing lipocalin sequence classification with ensemble deep learning models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals