AI-driven feature selection and epigenetic pattern analysis: A screening strategy of CpGs validated by pyrosequencing for body fluid identification.

Journal: Forensic science international
PMID:

Abstract

Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.

Authors

  • Ming Zhao
    School of Computer Science and Engineering, Central South University, Changsha, 410000, China.
  • Meiming Cai
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China.
  • Fanzhang Lei
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China.
  • Xi Yuan
    Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Bio-Sensing and Chemometrics, College of Biology, Aptamer Engineering Center of Hunan Province, Hunan University, Changsha, Hunan, 410082, China.
  • QingLin Liu
    From the Department of Interventional Neuroradiology, Beijing Neurosurgical Institute and Beijing Tiantan Hospital of Capital Medical University, China (Q.L., P.J., Y.J., H.G., S.L., H.J., Y.L.).
  • Yating Fang
    Department of Industrial and Systems Engineering, Rutgers University, Piscataway, New Jersey 08854, United States.
  • Bofeng Zhu
    Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, 1838 Guangzhou Avenue North, Guangzhou, Guangdong, PR China; Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'anJiaotong University, 99 Yanxiang Road, Xi'an, Shaanxi, PR China. Electronic address: zhubofeng7372@126.com.