Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance.

Journal: JCO clinical cancer informatics
Published Date:

Abstract

PURPOSE: Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure.

Authors

  • Zhengyi Deng
    Massachusetts General Hospital, Boston, MA.
  • Kanhua Yin
    Massachusetts General Hospital, Boston, MA.
  • Yujia Bao
    The Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Victor Diego Armengol
    Massachusetts General Hospital, Boston, MA.
  • Cathy Wang
    Harvard TH Chan School of Public Health, Boston, MA.
  • Ankur Tiwari
    Massachusetts General Hospital, Boston, MA.
  • Regina Barzilay
    Computer Science and Artificial Intelligence Laboratory , Massachusetts Institute of Technology , 77 Massachusetts Avenue , Cambridge , MA 02139 , USA . Email: regina@csail.mit.edu.
  • Giovanni Parmigiani
    Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215; gp@jimmy.harvard.edu.
  • Danielle Braun
    Harvard TH Chan School of Public Health, Boston, MA.
  • Kevin S Hughes
    Division of Surgical Oncology, MGH, Boston, USA.