Prediction of Fraction Unbound in Human Plasma for Per- and Polyfluoroalkyl Substances: Evaluating Transfer Learning as an Algorithmic Solution to the Problem of Sparse Data.
Journal:
Journal of chemical information and modeling
Published Date:
Jul 22, 2025
Abstract
Fraction unbound in plasma () is a crucial parameter in physiologically based toxicokinetic (PBTK) models, representing the fraction of a chemical compound that is not sequestered by plasma proteins when present in the bloodstream. This is often used as a proxy for the quantity of the compound that is bioavailable for metabolism or the exertion of physiological effects; on the other hand, a low is also a predictor for bioaccumulative potential. In this work, we propose and investigate a new machine learning methodology to improve our quantitative structure-activity relationship (QSAR) modeling of for specific chemical classes, including per- and polyfluoroalkyl substances (PFAS). We evaluate a novel transfer learning strategy across chemical space, using a deep learning model trained on a broad chemical library and fine-tuned on a small data set of PFAS, in terms of its added value compared to a global random forest model presented in a prior publication. Our results demonstrate increased statistical performance after the fine-tuning process when applied to other similarly small chemical families; however, due to the sparsity and imbalance of the data, the prior global model remains the most competitive for PFAS. We conclude our work with an investigation of the PFAS structural space in relation to the activity of interest, formulating recommendations for future experimental characterization to expand the knowledge space for modeling. The measurement of these data will inform our PFAS models and may ultimately produce sufficient data amenable to modeling to improve the viability of local and transfer learning approaches for this class of chemicals.