Indirect reference interval estimation using a convolutional neural network with application to cancer antigen 125.

Journal: Scientific reports
PMID:

Abstract

Indirect methods for reference interval (RI) estimation, which use data acquired from routine pathology testing, have the potential to accelerate the establishment of RIs to account for variables such as gender and age to improve clinical assessments. However, they require more sophisticated methods of analysis due to the potential influence of pathological patients in raw clinical datasets. In this paper we develop a novel convolutional neural network (CNN) model trained on synthetic data to identify underlying healthy distributions within pathological admixtures. We present both the methodology to generate synthetic data and the CNN model. We evaluate the CNN using two synthetic test datasets, including samples from a proposed benchmark for indirect methods (RIBench) and show significant improvements compared to the reported state-of-the-art method based on the benchmark (refineR). We also demonstrate a real-world application of the model, estimating age-specific RIs for cancer antigen 125 (CA-125), a crucial biomarker for ovarian cancer diagnostics. Our results show that CA-125 RIs are strongly age-dependent, which could have important diagnostic consequences.

Authors

  • Jack LeBien
    Rainforest Connection, Science Department, 440 Cobia Drive, Suite 1902, Katy, TX, 77494, USA.
  • Julian Velev
    Department of Physics, University of Puerto Rico, San Juan, PR, 00925-2537, USA. julian.velev@upr.edu.
  • Abiel Roche-Lima
    Center for Collaborative Research in Health Disparities (CCRHH), University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico.