DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records.

Journal: Cancer research
Published Date:

Abstract

Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. .

Authors

  • Guergana K Savova
    Department of Pediatrics, Children's Hospital of Boston, Boston.
  • Eugene Tseytlin
    Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, USA.
  • Sean Finan
    From Research Information Systems and Computing (V.M.C., V.G., S.M.), Partners Healthcare; Boston Children's Hospital Informatics Program (D.D., S.F., G.S.); Harvard Medical School (D.D., S.Y., A.C., M.A.-E.-B., N.A.S., S.M., S.T.W., R.D.); Department of Medicine (S.Y., S.T.W.), Department of Neurosurgery (A.C., M.A.-E.-B., R.D.), Division of Rheumatology, Immunology and Allergy (N.A.S.), and Channing Division of Network Medicine (S.T.W., R.D.), Brigham and Women's Hospital, Boston, MA; Center for Statistical Science (S.Y.), Tsinghua University, Beijing, China; Department of Neurology (S.M.), Massachusetts General Hospital; and Biostatistics (T.C.), Harvard School of Public Health, Boston, MA.
  • Melissa Castine
    Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States.
  • Timothy Miller
    School of Computing and Information Systems, University of Melbourne, Victoria 3010, Australia.
  • Olga Medvedeva
    Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.
  • David Harris
    Boston Children's Hospital, Boston, Massachusetts.
  • Harry Hochheiser
    University of Pittsburgh, Pittsburgh, PA, USA.
  • Chen Lin
    Faculty of Business and Economics, University of Hong Kong, Hong Kong SAR 999077, China.
  • Girish Chavan
    Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA. chavang@upmc.edu.
  • Rebecca S Jacobson
    Department of Biomedical Informatics, University of Pittsburgh School of Medicine, The Offices at Baum, 5607 Baum Boulevard, BAUM 423, Rm 523, Pittsburgh, PA, 15206-3701, USA. rebeccaj@pitt.edu.