Generating realistic artificial human genomes using adversarial autoencoders.

Journal: NAR genomics and bioinformatics

Published Date: Jul 24, 2025

Abstract

A publicly available human genome is both valuable to researchers and a risk for its donor. Many actors could exploit it to extract information about the donor's health or that of their relatives. Recent efforts have employed artificial intelligence models to simulate genomic data, aiming to create synthetic datasets with scientific merit while preserving patient anonymity. Challenges arise due to the vast amount of data that constitute a complete human genome and the computational resources required. We present a dimension reduction method that combines artificial intelligence with our knowledge of mutation association mechanisms. This approach enables processing large amounts of data without significant computational resources. Our genome segmentation follows chromosomal recombination hotspots, closely resembling mutation transmission mechanisms. Data from the 1000 Genomes Project are used to train variational autoencoders with a Wasserstein GAN to generate novel data in a two-step process. After optimizing our strategy, our pipeline can generate a simulated population meeting several essential criteria. They are diverse but realistic; the newly generated combinations of mutations follow linkage disequilibrium found in humans. Our pipeline does not reveal the genetic identity of any individual donor, synthesizing genomes that differ from reference samples.

Authors

Callum Burnard

Institut de Génétique Humaine, 34094 Montpellier, France.
Alban Mancheron

Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, 34095 Montpellier, France.
William Ritchie

Institut de Génétique Humaine (IGH-UMR9002), Centre National de la Recherche Scientifique (CNRS), University of Montpellier, Montpellier, France. william.ritchie@igh.cnrs.fr.

Keywords

Algorithms Artificial Intelligence Autoencoder Genome, Human Genomics Humans Linkage Disequilibrium Mutation

External Resources

View on PubMed Access via DOI PubMed (40708851)

Generating realistic artificial human genomes using adversarial autoencoders.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Generating realistic artificial human genomes using adversarial autoencoders.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals