Fingerprint-Based Explainable Machine Learning for Predicting Blood–Brain Barrier Permeability

Journal: bioRxiv
Published Date:

Abstract

Predicting blood–brain barrier (BBB) permeability is essential for early central nervous system (CNS) drug discovery, yet reliable computational screening remains challenging. This study presents a gradient-boosted ensemble framework trained on precomputed molecular fingerprints to classify compounds as BBB-permeable (BBB+) or non-permeable (BBB−). The 2048-bit fingerprints encode substructural information relevant to passive diffusion without requiring explicit physicochemical descriptors. The model, trained on experimentally annotated BBB datasets using Extreme Gradient Boosting (XGBoost) with Synthetic Minority Oversampling (SMOTE) to address class imbalance, achieved strong predictive performance (cross-validated ROC– AUC = 0.897 ± 0.019; validation ROC–AUC = 0.932). External testing on literature-reported CNS-active compounds (Caffeine, Diazepam, Dopamine, and Levodopa) confirmed biological consistency: highly lipophilic drugs were predicted as BBB+, while polar molecules dependent on carrier-mediated transport were predicted as BBB−. The fingerprint-based model thus captures underlying permeability mechanisms through data-driven substructure learning. This approach eliminates the need for handcrafted descriptors while preserving interpretability through feature-importance analysis, establishing a reproducible, efficient, and explainable baseline for virtual BBB permeability screening in CNS drug development.

Authors

  • Nilanjan Panda; Sutirtha Panda