SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

SUMMARY: The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences.

Authors

  • Ketrin Gjoni
    Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States.
  • Katherine S Pollard
    Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA. katherine.pollard@gladstone.ucsf.edu.