Transfer-Learning Deep Raman Models Using Semiempirical Quantum Chemistry.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Biophotonic technologies such as Raman spectroscopy are powerful tools for obtaining highly specific molecular information. Due to its minimal sample preparation requirements, Raman spectroscopy is widely used across diverse scientific disciplines, often in combination with chemometrics, machine learning (ML), and deep learning (DL). However, Raman spectroscopy lacks large databases of independent Raman spectra for model training, leading to overfitting, overestimation, and limited model generalizability. We address this problem by generating simulated vibrational spectra using semiempirical quantum chemistry methods, enabling the efficient pretraining of deep learning models on large synthetic data sets. These pretrained models are then fine-tuned on a smaller experimental Raman data set of bacterial spectra. Transfer learning significantly reduces the computational cost while maintaining performance comparable to models trained from scratch in this real biophotonic application. The results validate the utility of synthetic data for pretraining deep Raman models and offer a scalable framework for spectral analysis in resource-limited settings.

Authors

  • Jawad Kamran
    Institute of Physical Chemistry, Friedrich Schiller University Jena, Helmholtzweg 4, 07743 Jena, Germany.
  • Julian Hniopek
    Institute of Physical Chemistry (IPC) and Abbe School of Photonics (ASP), Friedrich-Schiller-Universität Jena, Helmholtzweg 4, 07743 Jena, Germany.
  • Thomas Bocklitz
    Institute of Physical Chemistry and Abbe Center of Photonics, Friedrich Schiller University, Jena, Germany.