A rapid accurate approach to inferring pedigrees in endogamous populations.

Journal: Genetics
Published Date:

Abstract

Accurate reconstruction of pedigrees from genetic data remains a challenging problem. Many relationship categories (e.g. half-sibships versus avuncular) can be difficult to distinguish without external information. Pedigree inference algorithms are often trained on European-descent families in urban locations. Thus, existing methods tend to perform poorly in endogamous populations for which there may be reticulations within the pedigrees and elevated haplotype sharing. We present a simple, rapid algorithm which initially uses only high-confidence first-degree relationships to seed a machine learning step based on summary statistics of identity-by-descent (IBD) sharing. One of these statistics, our ``haplotype score'', is novel and can be used to: (1) distinguish half-sibling pairs from avuncular or grandparent-grandchildren pairs; and (2) assign individuals to ancestor versus descendant generation. We test our approach in a sample of ∼700 individuals from northern Namibia, sampled from an endogamous population called the Himba. Due to a culture of concurrent relationships in the Himba, there is a high proportion of half-sibships. We accurately identify first through fourth-degree relationships and distinguish between various second-degree relationships: half-sibships, avuncular pairs, and grandparent-grandchildren. We further validate our approach in a second African-descent dataset, the Barbados Asthma Genetics Study (BAGS), and a European-descent founder population from Quebec. Accurate reconstruction of relatives facilitates estimation of allele frequencies, tracing allele trajectories, improved phasing, heritability and other population genomic questions.

Authors

  • Cole M Williams
    Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.
  • Brooke A Scelza
    Department of Anthropology, University of California, Los Angeles, Los Angeles, CA 90095, USA.
  • Sarah D Slack
    Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO 80045, USA.
  • Neus Font-Porterias
    Department of Anthropology and the UCD Genome Center, University of California, Davis, Davis, CA 95616, USA.
  • Dana R Al-Hindi
    Department of Anthropology and the UCD Genome Center, University of California, Davis, Davis, CA 95616, USA.
  • Rasika A Mathias
    Genomics and Precision Health Section, Laboratory of Allergic Diseases, National Institute of Allergy and Infection Disease, Bethesda, MD 20892, USA.
  • Harold Watson
    Faculty of Medical Sciences, The University of the West Indies, Queen Elizabeth Hospital, Bridgetown, St. Michael, Barbados.
  • Kathleen C Barnes
    Colorado Center for Personalized Medicine, University of Colorado-Anschutz, Aurora, CO 80045, USA.
  • Ethan Lange
    Colorado Center for Personalized Medicine, University of Colorado-Anschutz, Aurora, CO 80045, USA.
  • Randi K Johnson
    Colorado Center for Personalized Medicine, University of Colorado-Anschutz, Aurora, CO 80045, USA.
  • Christopher R Gignoux
    Colorado Center for Personalized Medicine, University of Colorado-Anschutz, Aurora, CO 80045, USA.
  • Sohini Ramachandran
    Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America.
  • Brenna M Henn
    Department of Anthropology and the UCD Genome Center, University of California, Davis, Davis, CA 95616, USA.

Keywords

No keywords available for this article.