Unsupervised machine learning method for indirect estimation of reference intervals for chronic kidney disease in the Puerto Rican population.

Journal: Scientific reports
PMID:

Abstract

Reference intervals (RIs) for clinical laboratory values are extremely important for diagnostics and treatment of patients. However, the determination of these ranges is costly and time-consuming. As a result, often different unverified RIs are used in practice for the same analyte and the same range is used for all patients despite evidence that the values are gender, age, and ethnicity dependent. Moreover, the abnormal flags are rudimentary, merely indicating if a value is within the RI. At the same time, clinical lab data generated in the everyday medical practice contains a wealth of information, that given the correct methodology, can help determine the RIs for each specific segment of the population, including populations that suffer from health disparities. In this work, we develop unsupervised machine learning methods, based on Gaussian mixtures, to determine RIs of analytes related to chronic kidney disease, using millions of routine lab results for the Puerto Rican population. We show that the measures are both gender and age dependent and we find evidence for normal age-related organ function deterioration and failure. We also show that the joint distribution of measures improves the diagnostic value of the lab results.

Authors

  • Julian Velev
    Department of Physics, University of Puerto Rico, San Juan, PR, 00925-2537, USA. julian.velev@upr.edu.
  • Jack LeBien
    Rainforest Connection, Science Department, 440 Cobia Drive, Suite 1902, Katy, TX, 77494, USA.
  • Abiel Roche-Lima
    Center for Collaborative Research in Health Disparities (CCRHH), University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico.