G-quadruplexes (GQs) are non-canonical DNA structures encoded by G-flipons with potential roles in gene regulation and chromatin structure. Here, we explore the role of G-flipons in tissue specification. We present a deep learning-based framework for...
Using millions of methylation segments, we developed DiffuCpG, a generative artificial intelligence (AI) diffusion model designed to solve the critical challenge of missing data in high-throughput methylation technologies. DiffuCpG goes beyond conven...
Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here,...
A comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. The Gene Ontology Consortium has been working towards this g...
Sequence-based deep learning models have emerged as powerful tools for deciphering the cis-regulatory grammar of the human genome but cannot generalize to unobserved cellular contexts. Here, we present EpiBERT, a multi-modal transformer that learns g...
The Homo sapiens Chromosomal Location Ontology (HSCLO) is designed to facilitate the integration of human genomic features into biomedical knowledge graphs from releases GRCh37 and GRCh38 at multiple resolutions. HSCLO comprises two distinct versions...
Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic r...
Annual review of genomics and human genetics
Aug 6, 2024
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the develo...
Genomic sequences are traditionally represented as strings of characters: A (adenine), C (cytosine), G (guanine), and T (thymine). However, an alternative approach involves depicting sequence-related information through image representations, such as...
Small insertions and deletions (indels) are critical yet challenging genetic variations with significant clinical implications. However, the identification of pathogenic indels from neutral variants in clinical contexts remains an understudied proble...
Join thousands of healthcare professionals staying informed about the latest AI breakthroughs in medicine. Get curated insights delivered to your inbox.