Systematic Modeling of log  Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis.

Journal: Journal of chemical information and modeling

Published Date: Jan 10, 2020

Abstract

Lipophilicity, as evaluated by the -octanol/buffer solution distribution coefficient at pH = 7.4 (log ), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure-property relationship (QSPR) models to predict log  based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model ( = 0.906 and RMSE = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance ( = 0.922 and RMSE = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Moreover, the group contribution model based on 110 atom types and the local models for different ionization states were also established and compared to the global models. The results demonstrated that the descriptor-based consensus model is superior to the group contribution method, and the local models have no advantage over the global models. Finally, matched molecular pair (MMP) analysis and descriptor importance analysis were performed to extract transformation rules and give some explanations related to log . In conclusion, we believe that the consensus model developed in this study can be used as a reliable and promising tool to evaluate log  in drug discovery.

Authors

Li Fu

Xiangya School of Pharmaceutical Sciences , Central South University , Changsha 410013 , Hunan , P. R. China.
Lu Liu

College of Pharmacy, Harbin Medical University, Harbin, China.
Zhi-Jiang Yang

Xiangya School of Pharmaceutical Sciences , Central South University , Changsha 410013 , Hunan , P. R. China.
Pan Li

Department of Infections，Beijing Hospital of Traditional Chinese Medicine, Affiliated to the Capital Medical University, No. 23, Back Road of the Art Gallery, Dongcheng District, Beijing 100010, China.
Jun-Jie Ding

Beijing Institute of Pharmaceutical Chemistry , Beijing 102205 , P. R. China.
Yong-Huan Yun

College of Food Science and Engineering , Hainan University , Haikou 570228 , P. R. China.
Ai-Ping Lu

Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China.
Ting-Jun Hou

Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , Zhejiang , P. R. China.
Dong-Sheng Cao

Xiangya School of Pharmaceutical Sciences , Central South University , Changsha 410013 , Hunan , P. R. China.

Keywords

Algorithms Drug Discovery Lipids Machine Learning Models, Molecular Quantitative Structure-Activity Relationship

External Resources

View on PubMed Access via DOI PubMed (31869226)

Systematic Modeling of log  Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Systematic Modeling of log Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Systematic Modeling of log  Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis.