Regularized regression in ultra-small chemometric datasets: A methodological case study using FTIR spectra of Schiff bases.

Journal: PloS one
Published Date:

Abstract

This study is not intended to establish a predictive framework for reaction yield. Instead, it is framed as a methodological investigation examining the statistical behavior and instability of regularized regression techniques when applied to ultra-small, high-dimensional chemometric datasets. The analysis is based on a curated dataset of Schiff base compounds (n = 21) for which post-synthesis Fourier Transform Infrared (FTIR) spectra and experimentally reported reaction yields are available. Structural information for all compounds is fully disclosed to ensure chemical transparency. Descriptive physicochemical properties, including molecular weight, physical appearance, retention factor (Rf), melting point, and reaction yield, are summarized to characterize the dataset; however, only yield (%) is used as the response variable in the subsequent statistical analyses. Baseline-corrected and normalized FTIR spectra were transformed into a high-dimensional explanatory matrix and analyzed using regularized regression approaches designed for high collinearity and [Formula: see text] settings, specifically sparse Partial Least Squares (sPLS) and Elastic Net regression. Model behavior was examined using leave-one-out cross-validation (LOOCV), which is more appropriate for extremely small datasets where conventional train-test splitting is unreliable. Given the severe sample-size limitation, the analysis is interpreted as a methodological illustration rather than a generalizable predictive framework. Model outputs are therefore discussed primarily in terms of coefficient sparsity, variability, and stability under regularization rather than predictive accuracy. Overall, the study demonstrates the practical challenges and statistical instability that arise when regression-based machine learning techniques are applied to ultra-small spectral datasets. The results highlight the importance of cautious interpretation and methodological transparency when chemometric models are developed under severe sample-size constraints.

Authors

Keywords

No keywords available for this article.