ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

UNLABELLED: The association of organisms to their environments is a key issue in exploring biodiversity patterns. This knowledge has traditionally been scattered, but textual descriptions of taxa and their habitats are now being consolidated in centralized resources. However, structured annotations are needed to facilitate large-scale analyses. Therefore, we developed ENVIRONMENTS, a fast dictionary-based tagger capable of identifying Environment Ontology (ENVO) terms in text. We evaluate the accuracy of the tagger on a new manually curated corpus of 600 Encyclopedia of Life (EOL) species pages. We use the tagger to associate taxa with environments by tagging EOL text content monthly, and integrate the results into the EOL to disseminate them to a broad audience of users.

Authors

  • Evangelos Pafilis
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Sune P Frankild
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Julia Schnetzer
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Lucia Fanini
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Sarah Faulwetter
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Christina Pavloudi
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Katerina Vasileiadou
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Patrick Leary
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Jennifer Hammock
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Katja Schulz
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Cynthia Sims Parr
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Christos Arvanitidis
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.
  • Lars Juhl Jensen
    Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece, Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark, Max Planck Institute for Marine Microbiology, Bremen, Germany, Jacobs University gGmbH, School of Engineering and Sciences, Bremen, Germany, Marine Biological Laboratory, Woods Hole, MA 02543, USA and National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA.