Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.

Authors

  • Hiromasa Kaneko
    Department of Applied Chemistry School of Science and Technology Meiji University Kawasaki Japan.