A standardized framework for robust fragmentomic feature extraction from cell-free DNA sequencing data.

Journal: Genome biology
Published Date:

Abstract

Fragmentomics features of cell-free DNA represent promising non-invasive biomarkers for cancer diagnosis. A lack of systematic evaluation of biases in feature quantification hinders the adoption of such applications. We compare features derived from whole-genome sequencing of ten healthy donors using nine library kits and ten data-processing routes and validated in 1182 plasma samples from published studies. Our results clarify the variations from library preparation and feature quantification methods. We design the Trim Align Pipeline and cfDNAPro R package as unified interfaces for data pre-processing, feature extraction, and visualization to standardize multi-modal feature engineering and integration for machine learning.

Authors

  • Haichao Wang
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Paulius D Mennea
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Yu Kiu Elkie Chan
    LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
  • Zhao Cheng
    Health Geography and Policy Group, ETH Zürich, Zürich, Switzerland.
  • Maria C Neofytou
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Arif Anwer Surani
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Aadhitthya Vijayaraghavan
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Emma-Jane Ditter
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Richard Bowers
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Matthew D Eldridge
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Dmitry S Shcherbo
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Christopher G Smith
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Florian Markowetz
    Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge CB2 0RE, UK; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK. Electronic address: florian.markowetz@cruk.cam.ac.uk.
  • Wendy N Cooper
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
  • Tommy Kaplan
    School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel.
  • Nitzan Rosenfeld
    Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK.
  • Hui Zhao
    School of Mathematics and Computer Science, Shaanxi University of Technology, Hanzhong, 723000, Shaanxi, China.