RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.

Journal: Genome biology
PMID:

Abstract

Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.

Authors

  • Sarah Fazal
    Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
  • Matt C Danzi
    Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
  • Isaac Xu
    Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
  • Shilpa Nadimpalli Kobren
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02155, USA.
  • Shamil Sunyaev
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02155, USA.
  • Chloe Reuter
    Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA.
  • Shruti Marwaha
    Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA.
  • Matthew Wheeler
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA.
  • Egor Dolzhenko
    Illumina Inc., San Diego, CA, 92112, USA.
  • Francesca Lucas
    Department of Computer Science, Delft University of Technology, Delft, The Netherlands.
  • Stefan Wuchty
    Department of Computer Science, University of Miami, Miami, FL, 33146, USA.
  • Mustafa Tekin
    Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
  • Stephan Züchner
    JD McDonald Department of Human Genetics and Hussman Institute for Human Genomics, University of Miami, Miami, FL, USA.
  • Vanessa Aguiar-Pulido
    Department of Computer Science, University of Miami, Miami, FL, USA.