Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development.

Journal: Drug discovery today
Published Date:

Abstract

External content sources such as MEDLINE(®), National Institutes of Health (NIH) grants and conference websites provide access to the latest breaking biomedical information, which can inform pharmaceutical and biotechnology company pipeline decisions. The value of the sites for industry, however, is limited by the use of the public internet, the limited synonyms, the rarity of batch searching capability and the disconnected nature of the sites. Fortunately, many sites now offer their content for download and we have developed an automated internal workflow that uses text mining and tailored ontologies for programmatic search and knowledge extraction. We believe such an efficient and secure approach provides a competitive advantage to companies needing access to the latest information for a range of use cases and complements manually curated commercial sources.

Authors

  • Robin McEntire
    Knowledge Discovery/Knowledge Management, Merck & Co., USA. Electronic address: robin_mcentire@merck.com.
  • Debbie Szalkowski
    Knowledge Discovery/Knowledge Management, Merck & Co., USA.
  • James Butler
    Linguamatics Solutions, USA.
  • Michelle S Kuo
    Knowledge Discovery/Knowledge Management, Merck & Co., USA.
  • Meiping Chang
    Knowledge Discovery/Knowledge Management, Merck & Co., USA.
  • Man Chang
    Linguamatics Solutions, USA.
  • Darren Freeman
    Informatics IT, Merck & Co., USA.
  • Sarah McQuay
    Linguamatics Limited, UK.
  • Jagruti Patel
    Knowledge Discovery/Knowledge Management, Merck & Co., USA.
  • Michael McGlashen
    Knowledge Discovery/Knowledge Management, Merck & Co., USA.
  • Wendy D Cornell
    IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598USA.
  • Jinghai James Xu
    Knowledge Discovery/Knowledge Management, Merck & Co., USA.