A framework for developing machine learning-based chemical fingerprinting models using large gas chromatograph-mass spectrometer datasets: Application to oil spill residues classification.
Journal:
The Science of the total environment
Published Date:
Jan 23, 2026
Abstract
Chemical fingerprinting is a key environmental forensics technique used in oil spill investigations to identify the source and type of oil in spill residues. Conventional approaches rely on detecting individual petroleum biomarkers in gas chromatograph-mass spectrometer (GC-MS) chromatograms, examining their distributions, and calculating diagnostic ratios derived from them. The combination of biomarker distribution and diagnostic ratios forms a unique chemical fingerprint for each oil. Source oils and spill residues share similar fingerprints, allowing source identification of the spill's origin. Variations in biomarker distribution cause oils to exhibit distinct chromatographic and mass spectral patterns. This study presents a machine learning (ML) framework for chemical fingerprinting directly from GC-MS data, leveraging pattern recognition to eliminate the need for individual biomarker identification or diagnostic ratio calculations. The framework introduces several methodological innovations to address analytical limitations. It trains ML classifiers exclusively on fresh, unweathered oil data to overcome the scarcity of weathered samples, incorporates synthetic data generation to mitigate small dataset issues, and employs aggregate ensemble models that combine multiple classifiers to improve robustness across sample types. Standardized preprocessing procedures are implemented for data format conversion, ensuring compatibility across GC-MS instruments. The framework was evaluated using GC-MS datasets of fresh unweathered crude oils and spill residues, where aggregate models trained solely on fresh unweathered crude oils achieved the highest accuracy in classifying residues and identifying their source oils. Overall, this study presents a novel ML-based approach for oil spill chemical fingerprinting, offering a practical pathway to advance environmental forensics investigations.
Authors
Keywords
No keywords available for this article.