Machine learning-based q-RASAR predictions of the bioconcentration factor of organic molecules estimated following the organisation for economic co-operation and development guideline 305.
Journal:
Journal of hazardous materials
PMID:
39243539
Abstract
In this study, we utilized an innovative quantitative read-across (RA) structure-activity relationship (q-RASAR) approach to predict the bioconcentration factor (BCF) values of a diverse range of organic compounds, based on a dataset of 575 compounds tested using Organisation for Economic Co-operation and Development Test Guideline 305 for bioaccumulation in fish. Initially, we constructed the q-RASAR model using the partial least squares regression method, yielding promising statistical results for the training set (R =0.71, Q=0.68, mean absolute error [MAE]=0.54). The model was further validated using the test set (Q=0.77, Q=0.75, MAE=0.51). Subsequently, we explored the q-RASAR method using other regression-based supervised machine-learning algorithms, demonstrating favourable results for the training and test sets. All models exhibited R and Q values exceeding 0.7, Q values greater than 0.6, and low MAE values, indicating high model quality and predictive capability for new, unidentified chemical substances. These findings represent the significance of the RASAR method in enhancing predictivity for new unknown chemicals due to the incorporation of similarity functions in the RASAR descriptors, independent of a specific algorithm.