Cross-registry neural domain adaptation to extract mutational test results from pathology reports.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVE: We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site.

Authors

  • Anthony Rios
    Department of Computer Science, University of Kentucky, 329 Rose Street, Lexington, KY 40506, USA. Electronic address: anthony.rios1@uky.edu.
  • Eric B Durbin
    University of Kentucky, Lexington, KY.
  • Isaac Hands
    University of Kentucky, Lexington, KY.
  • Susanne M Arnold
    University of Kentucky, Lexington, KY.
  • Darshil Shah
    University of Kentucky, Lexington, KY.
  • Stephen M Schwartz
    Fred Hutchinson Cancer Research Center, Seattle, WA.
  • Bernardo H L Goulart
    Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
  • Ramakanth Kavuluru
    Div. of Biomedical Informatics, Dept. of Internal Medicine, Dept. of Computer Science, University of Kentucky, Lexington, KY.