Prediction of pathogenic mutations in human transmembrane proteins and their associated diseases via utilizing pre-trained Bio-LLMs.
Journal:
Communications biology
Published Date:
Jul 15, 2025
Abstract
Missense mutations can disrupt the structure and function of membrane proteins, potentially impairing key biological processes and leading to various human diseases. However, existing computational methods primarily focus on binary pathogenicity classification for general proteins, with limited approaches specifically designed for membrane proteins, and even fewer methods capable of fine-grained, multi-label classification for specific disease categories. To address this gap, we proposed MutDPAL, a deep learning method specifically designed to identify pathogenic mutations in membrane proteins and further classify such pathogenic mutations into potential diseases categories. MutDPAL utilizes two pre-trained biological large language models (Bio-LLMs), one for raw sequence features and the other for encoding transmembrane environment features. By employing a cross-attention-based disease-protein association learning approach in the context of membrane proteins, MutDPAL captures the intricate relationships between mutations and diseases, enabling accurate pathogenicity prediction and classification into 15 distinct disease categories. Experimental results demonstrate that MutDPAL outperforms existing methods in predicting membrane protein mutation pathogenicity and excels in multi-label disease classification tasks, achieving high predictive accuracy across all 15 disease categories. MutDPAL is the first to combine transmembrane environment with disease encoding features for fine-grained disease classification, offering valuable insights into the pathogenicity of missense mutations in membrane protein.