A Scoping Review of Algorithmic Equity, Data Diversity, and Inclusive Design in the Transformer Era of Clinical NLP
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
The rapid digitization of healthcare has positioned transformer-based natural language processing (NLP) models as powerful tools for managing clinical textual data. Yet their integration into practice raises unresolved questions of equity and inclusivity criteria. This scoping review synthesizes 56 studies published between 2017 and 2024 to evaluate how equity is addressed across three dimensions: algorithmic equity, data diversity and representativeness, and participatory design. Guided by intersectionality and the Digital Health Equity framework, our analysis shows that most equity audits are post hoc and fragmented, with limited impact on model development. Persistent underrepresentation of linguistic, demographic, and clinical subgroups creates what we define as Data Diversity Debt—a structural liability that compounds over time. Participatory design was observed in only 11% of studies, revealing a critical gap in stakeholder inclusion beyond clinicians. Fairness metrics were inconsistently defined, impeding comparability and accountability. To convert descriptive audits into structural repair, we translate these findings into an equity-by-design roadmap that embeds fairness, inclusivity, and accountability across the full lifecycle of healthcare NLP systems. We conclude that equity must shift from reactive auditing to equity-by-design, integrating participatory governance, fairness-aware training objectives, and continuous monitoring to retire Data Diversity Debt and ensure that clinical NLP systems advance rather than reproduce health disparities.