Proceedings of the National Academy of Sciences of the United States of America
Sep 25, 2020
Although we know many sequence-specific transcription factors (TFs), how the DNA sequence of cis-regulatory elements is decoded and orchestrated on the genome scale to determine immune cell differentiation is beyond our grasp. Leveraging a granular a...
BACKGROUND: Colon cancer is one of the leading causes of cancer deaths in the USA and around the world. Molecular level characters, such as gene expression levels and mutations, may provide profound information for precision treatment apart from path...
Transcription factors (TFs) regulate the gene expression of their target genes by binding to the regulatory sequences of target genes (e.g., promoters and enhancers). To fully understand gene regulatory mechanisms, it is crucial to decipher the relat...
Conversion between cell types, e.g., by induced expression of master transcription factors, holds great promise for cellular therapy. Our ability to manipulate cell identity is constrained by incomplete information on cell identity genes (CIGs) and t...
Biochimica et biophysica acta. Molecular basis of disease
Apr 28, 2020
Lung cancer is one of the most common cancer types worldwide and causes more than one million deaths annually. Lung adenocarcinoma (AC) and lung squamous cell cancer (SCC) are two major lung cancer subtypes and have different characteristics in sever...
BACKGROUND: Alzheimer's disease (AD) is a neurodegenerative disorder and characterized by the cognitive impairments. It is essential to identify potential gene biomarkers for AD pathology.
Unsupervised machine learning that can discover novel knowledge from big sequence data without prior knowledge or particular models is highly desirable for current genome study. We previously established a batch-learning self-organizing map (BLSOM) f...
BACKGROUND: Interactions between protein and nucleic acid molecules are essential to a variety of cellular processes. A large amount of interaction data generated by high-throughput technologies have triggered the development of several computational...
Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systema...
The decoding of transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF labels into the same space. By training on binding data f...