Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels.

Journal: AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
Published Date:

Abstract

Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.

Authors

  • Mehmet F Bagci
    University of California San Diego, ECE Dept., La Jolla, CA 92093.
  • Samantha R Spierling
    Dept. of Research Development, Scripps Health, CA.
  • Anna L Ritko
    Dept. of Knowledge Management, Scripps Health, CA.
  • Truong Nguyen
    Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.
  • Brian D Modena
    University of California San Diego, ECE Dept., La Jolla, CA 92093.
  • Yusuf Ozturk
    San Diego State University, ECE Dept., San Diego, CA 92182.

Keywords

No keywords available for this article.