ALE: automated label extraction from GEO metadata.

Journal: BMC bioinformatics

Published Date: Dec 28, 2017

Abstract

BACKGROUND: NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence.

Authors

Cory B Giles

Arthritis & Clinical Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104, USA.
Chase A Brown

Arthritis & Clinical Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104, USA.
Michael Ripperger

Vanderbilt University Medical Center, Nashville, TN.
Zane Dennis

Department of Computer Science, Baylor University, Hankamer Academic Building, 105 Baylor Ave, Waco, TX, 76706, USA.
Xiavan Roopnarinesingh

Arthritis & Clinical Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104, USA.
Hunter Porter

Arthritis & Clinical Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104, USA.
Aleksandra Perz

Arthritis & Clinical Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104, USA.
Jonathan D Wren

Arthritis & Clinical Immunology Program, Oklahoma Medical Research Foundation, 825 N.E. 13th Street, Oklahoma City, OK, 73104, USA. jonathan-wren@omrf.org.

Keywords

Age Factors Algorithms Animals Automation Databases, Genetic Female Gene Expression Gene Ontology Humans Machine Learning Male Metadata Middle Aged Molecular Sequence Annotation Rats Reference Standards

External Resources

View on PubMed Access via DOI PubMed (29297276)

ALE: automated label extraction from GEO metadata.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals