Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait.

Journal: BMC plant biology
PMID:

Abstract

BACKGROUND: Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes.

Authors

  • Gurnoor Singh
    Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Evangelia A Papoutsoglou
    Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Frederique Keijts-Lalleman
    IBM Netherlands, Amsterdam, The Netherlands.
  • Bilyana Vencheva
    IBM Netherlands, Amsterdam, The Netherlands.
  • Mark Rice
    IBM Netherlands, Amsterdam, The Netherlands.
  • Richard G F Visser
    Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Christian W B Bachem
    Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands.
  • Richard Finkers
    Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, The Netherlands. richard.finkers@wur.nl.