MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Journal: Bioinformatics (Oxford, England)

Published Date: Sep 15, 2017

Abstract

MOTIVATION: The NCBI's Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA.

Authors

Matthew N Bernstein

Department of Computer Sciences.
AnHai Doan

Department of Computer Sciences.
Colin N Dewey

Department of Computer Sciences.

Keywords

Biological Ontologies Databases, Genetic High-Throughput Nucleotide Sequencing Humans Metadata Sequence Analysis, DNA Sequence Analysis, RNA Software Vocabulary, Controlled

External Resources

View on PubMed Access via DOI PubMed (28535296)

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals