Determinants of protein corona adsorption and abundance revealed by interpretable machine learning across nanoparticle systems.

Journal: Scientific reports
Published Date:

Abstract

Nanoparticles (NPs) hold significant potential in biotechnology, including molecular sensing, controlled release systems, and therapeutic applications. However, their behavior in biological environments remains difficult to predict because proteins rapidly absorb onto NP surfaces, forming a protein corona (PC) that reshapes their surface properties and determines their biological identity, transport, and cellular interactions. In this study, we developed large-scale deep neural network (DNN) models to predict both protein adsorption (binary classification) and relative protein abundance (regression) on NP surfaces. We utilized a well-curated and comprehensive PC dataset comprising data from 83 peer-reviewed studies, 817 NP-PC samples, and 2,497 proteins, substantially expanding the scale and diversity compared with prior studies. Then, we employed a prevalence-based filtering strategy to mitigate sparsity and batch noise and trained over 200 machine learning models across proteins. The adsorption classification models achieved high discriminative performance (AUC = 0.96), while the abundance models achieved a pooled R² of 0.67 and an average per-protein R2 of 0.40 on the test set. SHapley Additive exPlanations (SHAP) revealed that adsorption was predominantly governed by NP material class and surface chemistry, whereas abundance was more strongly influenced by experimental handling and kinetic parameters, particularly isolation and incubation time. Incorporation of applicability domain (AD) analysis enabled identification of reliable prediction regions, with in-AD predictions demonstrating higher confidence and reduced error for both tasks. Together, these results demonstrate that our DNN models can identify predictive drivers of PC composition and reveal feature associations consistent with patterns reported in prior mechanistic literature, offering a data-driven reference to inform nanomaterial design for biomedical and environmental applications.

Authors

Keywords

No keywords available for this article.