FT-NIR and HPLC combined with machine learning for origin traceability and prediction of characteristic component content in Gastrodia elata.
Journal:
Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
Published Date:
Apr 2, 2026
Abstract
As a dual-purpose plant with both medicinal and culinary applications, Gastrodia elata (GE) benefits from origin traceability and content prediction to guide quality assessment of its products. This study utilized HPLC to quantify the primary characteristic components in GE samples from various origins and established fingerprint chromatograms. Traditional machine learning models-including PCA, t-SNE, PLS-DA, PLSR, SVM, RF, and GBMs-along with a deep learning model (ResNet), were developed for origin traceability and content prediction using FT-NIR spectral data. Results demonstrated significant variation in the characteristic components of GE across different origins, with HPLC fingerprint similarities ranging from 0.726 to 0.979. Neither PCA nor t-SNE effectively classified the geographic origins of GE. However, traditional machine learning models showed markedly improved classification accuracy after 2nd preprocessing, with the SVM model achieving the highest prediction performance (test set accuracy: 95.42%), followed by the PLS-DA model (test set accuracy: 93.75%). In contrast, RF and GBMs models did not reach 90% accuracy. The ResNet model, built from synchronous 2DCOS images, achieved high classification accuracy (100%) for origin traceability, surpassing synchronous 3DCOS and establishing itself as the optimal model for this study. Additionally, the PLSR model, applied to 2nd preprocessed FT-NIR spectra, performed exceptionally well in predicting the content of GE's major active components, with linear regression R2 > 0.8 and RPD > 2. This study offers a novel reference for the quality traceability of GE products in the market.
Authors
Keywords
No keywords available for this article.