From NMR to AI: Do We Need H NMR Experimental Spectra to Obtain High-Quality logD Prediction Models?

Journal: Journal of chemical information and modeling
PMID:

Abstract

This study presents a novel approach to H NMR-based machine learning (ML) models for predicting logD using computer-generated H NMR spectra. Building on our previous work, which integrated experimental H NMR data, this study addresses key limitations associated with experimental measurements, such as sample stability, solvent variability, and extensive processing, by replacing them with fully computational workflows. Benchmarking across various density functional theory (DFT) functionals and basis sets highlighted their limitations, with DFT-based models showing relatively high RMSE values (average CHI logD of 1.12, lowest at 0.96) and extensive computational demands, limiting their usefulness for large-scale predictions. In contrast, models trained on predicted H NMR spectra by NMRshiftDB2 and JEOL JASON achieved RMSE values as low as 0.76, compared to 0.88 for experimental spectra. Further analysis revealed that mixing experimental and predicted spectra did not enhance accuracy, underscoring the advantage of homogeneous datasets. Validation with external datasets confirmed the robustness of our models, showing comparable performance to commercial software like Instant JChem, thus underscoring the reliability of the proposed computational workflow. Additionally, using normalized RMSE (NRMSE) proved essential for consistent model evaluation across datasets with varying data scales. By eliminating the need for experimental input, this workflow offers a widely accessible, computationally efficient pipeline, setting a new standard for ML-driven chemical property predictions without experimental data constraints.

Authors

  • Arkadiusz Leniak
    Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland.
  • Wojciech Pietruś
    Department of Medicinal Chemistry, Celon Pharma S.A., ul. Marymoncka 15, 05-152 Kazuń Nowy, Poland.
  • Aleksandra Swiderska
    Department of Psychology, University of Warsaw.
  • Rafał Kurczab
    Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Kraków, Poland.