A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites.

Journal: Molecular informatics
Published Date:

Abstract

In this study, the specific surface area of various perovskites was modeled using a novel quantitative read-across structure-property relationship (q-RASPR) approach, which clubs both Read-Across (RA) and quantitative structure-property relationship (QSPR) together. After optimization of the hyper-parameters, certain similarity-based error measures for each query compound were obtained. Clubbing some of these error-based measures with the previously selected features along with the Read-Across prediction function, a number of machine learning models were developed using Partial Least Squares (PLS), Ridge Regression (RR), Linear Support Vector Regression (LSVR), Random Forest (RF) regression, Gradient Boost (GBoost), Adaptive Boosting (Adaboost), Multiple Layer Perceptron (MLP) regression and k-Nearest Neighbor (kNN) regression. Based on the repeated cross-validation as well as external prediction quality and interpretability, the PLS model (n  = 38, n  = 12, =0.737, was selected as the best predictor which underscored the previously reported results. The finally selected model should efficiently predict specific surface areas of other perovskites for their use in photocatalysis. The new q-RASPR method also appears promising for the prediction of several other property endpoints of interest in materials science.

Authors

  • Arkaprava Banerjee
    Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India.
  • Agnieszka Gajewicz-Skretna
    Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
  • Kunal Roy
    Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.