An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C).

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

Authors

  • Sijia Liu
    These authors contributed equally to this study and Dr. Li is now working at IBM; Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.
  • Andrew Wen
    Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA.
  • Liwei Wang
    Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
  • Huan He
    Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States.
  • Sunyang Fu
    Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, USA.
  • Robert Miller
    Department of Hand and Plastic surgery department, Chelsea and Westminster Hospital, London, UK.
  • Andrew Williams
    Tufts Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts, USA.
  • Daniel Harris
    Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA.
  • Ramakanth Kavuluru
    Div. of Biomedical Informatics, Dept. of Internal Medicine, Dept. of Computer Science, University of Kentucky, Lexington, KY.
  • Mei Liu
    Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Missouri, USA.
  • Noor Abu-El-Rub
    Division of Medical Informatics, University of Kansas Medical Center, Kansas City, Kansas, USA.
  • Dalton Schutte
    Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA; Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA.
  • Rui Zhang
    Department of Cardiology, Zhongda Hospital, Medical School of Southeast University, Nanjing, China.
  • Masoud Rouhizadeh
    Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.
  • John D Osborne
    Center for Clinical and Translational Science, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294 ozborn@uab.edu.
  • Yongqun He
    University of Michigan Medical School, Ann Arbor, MI 48109 USA ; Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1301 MSRB III, 1150 W. Medical Dr., Ann Arbor, MI 48109 USA.
  • Umit Topaloglu
    Clinical Translational Research Informatics Branch, National Cancer Institute, Bethesda, US.
  • Stephanie S Hong
    Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
  • Joel H Saltz
  • Thomas Schaffter
    Computational Oncology, Sage Bionetworks, Seattle, Washington.
  • Emily Pfaff
    University of North Carolina, Chapel Hill, NC, USA.
  • Christopher G Chute
  • Tim Duong
    Department of Radiology, Albert Einstein College of Medicine, Bronx, New York, USA.
  • Melissa A Haendel
    Library, Oregon Health & Science University, Portland, OR 97239, USA.
  • Rafael Fuentes
    Alex Informatics, North Bethesda, Maryland, USA.
  • Peter Szolovits
    Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Hua Xu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Hongfang Liu
    Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States.