A machine learning approach to infer DNase1L3 activity from plasma cell-free DNA fragmentomics

Journal: bioRxiv
Published Date:

Abstract

DNase1L3 is an endonuclease that fragments DNA during apoptosis and digests DNA from microparticles in plasma, shaping key features of cell-free DNA (cfDNA) and maintaining extracellular DNA homeostasis, a process implicated in autoimmunity. The common missense variant p.Arg206Cys (R206C) affects cfDNA through a non-linear allele dosage effect, with limited effects in heterozygotes and strong effects in homozygotes. Therefore, commonly used models trained on fragmentomics from individuals with normal DNase1L3 activity perform poorly in R206C homozygotes. To address this, we analyzed cfDNA sequencing data from 129,676 Non-Invasive Prenatal Tests and validated R206C genotypes in a selection of 169 matching plasma samples. Supervised and unsupervised learning were used to infer DNase1L3 activity from cfDNA fragmentation properties. Our models accurately identify R206C homozygotes using as little as 10,000 cfDNA fragments, outperforming genotype imputation. However, unsupervised analysis reveals few samples that cluster with homozygotes but lack the corresponding genotype, suggesting that our method also identifies other or downstream effects of DNase1L3 impairment. Conversely, some R206C homozygotes initially lacked the aberrant fragmentome, but longitudinal follow-up across subsequent pregnancies shows that they develop aberrant fragmentomes over time. These findings enable the identification of samples with impaired DNase1L3 activity directly from sequencing data, providing a practical approach to improve interpretation and robustness of cfDNA-based diagnostics across clinical applications.

Authors

  • Linthorst
  • J.; Sistermans
  • E. A.

Categories