Quality assurance of the gene ontology using abstraction networks.

Journal: Journal of bioinformatics and computational biology
Published Date:

Abstract

The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.

Authors

  • Christopher Ochs
    New Jersey Institute of Technology, Newark, NJ.
  • Yehoshua Perl
    Dept of Computer Science, NJIT, Newark, NJ, USA.
  • Michael Halper
    New Jersey Institute of Technology, Newark, NJ.
  • James Geller
    Dept of Computer Science, NJIT, Newark, NJ, USA.
  • Jane Lomax
    ‡ Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK.