From NMR to AI: Do We Need H NMR Experimental Spectra to Obtain High-Quality logD Prediction Models?

Journal: Journal of chemical information and modeling

PMID: 40044424

Abstract

This study presents a novel approach to H NMR-based machine learning (ML) models for predicting logD using computer-generated H NMR spectra. Building on our previous work, which integrated experimental H NMR data, this study addresses key limitations associated with experimental measurements, such as sample stability, solvent variability, and extensive processing, by replacing them with fully computational workflows. Benchmarking across various density functional theory (DFT) functionals and basis sets highlighted their limitations, with DFT-based models showing relatively high RMSE values (average CHI logD of 1.12, lowest at 0.96) and extensive computational demands, limiting their usefulness for large-scale predictions. In contrast, models trained on predicted H NMR spectra by NMRshiftDB2 and JEOL JASON achieved RMSE values as low as 0.76, compared to 0.88 for experimental spectra. Further analysis revealed that mixing experimental and predicted spectra did not enhance accuracy, underscoring the advantage of homogeneous datasets. Validation with external datasets confirmed the robustness of our models, showing comparable performance to commercial software like Instant JChem, thus underscoring the reliability of the proposed computational workflow. Additionally, using normalized RMSE (NRMSE) proved essential for consistent model evaluation across datasets with varying data scales. By eliminating the need for experimental input, this workflow offers a widely accessible, computationally efficient pipeline, setting a new standard for ML-driven chemical property predictions without experimental data constraints.

Authors

Arkadiusz Leniak

Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland.
Wojciech Pietruś

Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland.
Aleksandra Swiderska

Department of Psychology, University of Warsaw.
Rafał Kurczab

Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland.

Keywords

Density Functional Theory Machine Learning Proton Magnetic Resonance Spectroscopy

External Resources

View on PubMed Access via DOI PubMed (40044424)

From NMR to AI: Do We Need H NMR Experimental Spectra to Obtain High-Quality logD Prediction Models?

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals