VEFill: accurate and generalizable deep mutational scanning score imputation across protein domains.
Journal:
Molecular systems biology
Published Date:
Mar 20, 2026
Abstract
Deep Mutational Scanning (DMS) assays can systematically assess the effects of amino acid substitutions on protein function, but many datasets have incomplete variant coverage due to technical constraints. We developed VEFill (Variant Effect Fill), a gradient boosting model for imputing missing DMS scores across protein domains. Trained on the Human Domainome 1, VEFill integrates ESM-1v sequence embeddings, evolutionary conservation (EVE scores), amino acid substitution matrices, and physicochemical descriptors. The model achieved robust predictive performance (Pearson r = 0.80) and generalized reliably to unseen proteins in stability-based datasets, while showing weaker performance on activity-based assays. Per-protein models confirmed VEFill's effectiveness under limited-data conditions and a reduced two-feature version performed comparably to the full model, suggesting an efficient alternative. Across multiple benchmarking settings, VEFill consistently outperformed baselines once ≥20% of experimental measurements were available. However, true zero-shot prediction without positional context remains challenging, particularly for functionally complex proteins. Overall, VEFill offers an interpretable, scalable framework for DMS score imputation, and enables systematic mutation prioritization including the design of sparse experimental libraries for variant effect studies.
Authors
Keywords
No keywords available for this article.