Enhancing and Disaggregating Native Hawaiian and Pacific Islander (NHPI) Data Using Natural Language Processing and an Expanded Race/Ethnicity Lexicon.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
Native Hawaiian and Pacific Islander (NHPI) populations are often aggregated into broad racial categories, obscuring potential disparities. This study leverages an expanded race/ethnicity lexicon and natural language processing (NLP) to identify documentation of NHPI subgroups to address gaps in electronic health records' (EHRs) recorded race. Results demonstrate the potential of NLP to classify NHPI documentation, disaggregate legacy categories, and improve health equity by incorporating more detailed subgroup data into standardized healthcare data sets.