Machine Learning-Based Spectral Analyses for Cultivar Identification.
Journal:
Molecules (Basel, Switzerland)
PMID:
39942650
Abstract
is a plant species with high cultural and biological relevance. Besides being used as an ornamental plant species, has relevant biological properties. Due to hybridization, thousands of cultivars are known, and their accurate identification is mandatory. Infrared spectroscopy is currently recognized as an accurate and rapid technique for species and/or subspecies identifications, including in plants. However, selecting proper analysis tools (spectra pre-processing, feature selection, and chemometric models) highly impacts the accuracy of such identifications. This study tests the impact of two distinct machine learning-based approaches for discriminating cultivars using near-infrared (NIR) and Fourier transform infrared (FTIR) spectroscopies. Leaves infrared spectra (NIR-obtained in a previous study; FTIR-obtained herein) of 15 different cultivars (38 plants) were modeled and analyzed via different machine learning-based approaches (Approach 1 and Approach 2), each combining a feature selection method plus a classifier application. Regarding Approach 1, NIR spectroscopy emerged as the most effective technique for predicting cultivars, achieving 81.3% correct cultivar assignments. However, Approach 2 obtained the best results with FTIR spectroscopy data, achieving a perfect 100.0% accuracy in cultivar assignments. When comparing both approaches, Approach 2 also improved the results for NIR data, increasing the correct cultivar predictions by nearly 13%. The results obtained in this study highlight the importance of chemometric tools in analyzing infrared data. The choice of a specific data analysis approach significantly affects the accuracy of the technique. Moreover, the same approach can have varying impacts on different techniques. Therefore, it is not feasible to establish a universal data analysis approach, even for very similar datasets from comparable analytical techniques.