Classification of DNA secondary structures by combining multiple spectral techniques with machine learning.

Journal: Analytica chimica acta
Published Date:

Abstract

BACKGROUND: The identification of structural features is an essential prerequisite for the determination of DNA secondary structures and investigating structural formation mechanisms. Circular dichroism (CD) spectroscopy, fluorescence (FL) spectroscopy, and thermal difference spectra (TDS) have already been used to monitor the DNA secondary structures due to their advantages in operational simplicity, detection speed and lower cost. However, each individual spectroscopic method has limitations in providing comprehensive structural information. Therefore, we propose that integrating these three spectroscopic techniques could improve the classification accuracy of DNA secondary structures-though, to date, no related studies have been reported. RESULTS: In this assay, a combination method of CD, FL, and TDS was proposed through machine learning (ML). Principal component analysis (PCA) was firstly used to reduce the dimensionality and facilitate data analysis, and then, three machine learning methods, including linear discriminant analysis (LDA), K-nearest neighbor (KNN), and support vector machine (SVM), were employed to deeply excavate more structure-related information of CD, FL, and TDS spectra. Combined with a two-step ML strategy, 79 out of 85 DNA sequences, that fall into G4, iM and DS category respectively, were correctly classified (classification accuracy of 0.95). Thus, we achieved the goal of predicting unknown DNA secondary structures by combining CD, FL, and TDS spectra, and demonstrated the superiority of the combination of three spectra in DNA structure identification. SIGNIFICANCE: The method is significantly superior to the single spectroscopic technique. Thus, a simple, fast, and cost-efficient spectroscopic platform for the direct and comprehensive identification of DNA secondary structures has been established. By building a multispectral database and using ML methods, the accurate and comprehensive identification of unknown DNA secondary structures will finally be realized.

Authors

Keywords

No keywords available for this article.