LengthLogD: A Length-Stratified Ensemble Framework for Enhanced Peptide Lipophilicity Prediction via Multi-Scale Feature Integration
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Peptide compounds demonstrate considerable potential as therapeutic agents
due to their high target affinity and low toxicity, yet their drug development
is constrained by their low membrane permeability. Molecular weight and peptide
length have significant effects on the logD of peptides, which in turn
influences their ability to cross biological membranes. However, accurate
prediction of peptide logD remains challenging due to the complex interplay
between sequence, structure, and ionization states. This study introduces
LengthLogD, a predictive framework that establishes specialized models through
molecular length stratification while innovatively integrating multi-scale
molecular representations. We constructed feature spaces across three
hierarchical levels: atomic (10 molecular descriptors), structural (1024-bit
Morgan fingerprints), and topological (3 graph-based features including Wiener
index), optimized through stratified ensemble learning. An adaptive weight
allocation mechanism specifically developed for long peptides significantly
enhances model generalizability. Experimental results demonstrate superior
performance across all categories: short peptides (R^2=0.855), medium peptides
(R^2=0.816), and long peptides (R^2=0.882), with a 34.7% reduction in
prediction error for long peptides compared to conventional single-model
approaches. Ablation studies confirm: 1) The length-stratified strategy
contributes 41.2% to performance improvement; 2) Topological features account
for 28.5% of predictive importance. Compared to state-of-the-art models, our
method maintains short peptide prediction accuracy while achieving a 25.7%
increase in the coefficient of determination (R^2) for long peptides. This
research provides a precise logD prediction tool for peptide drug development,
particularly demonstrating unique value in optimizing long peptide lead
compounds.