Patient2Sentence: Semantic Compression of Clinical Trial Eligibility Using Large Language Models

Journal: medRxiv
Published Date:

Abstract

Clinical decision-making generates vast unstructured data that remain underexploited for trial recruitment. We present Patient2Sentence (P2S), a framework that transforms electronic health records into language-based representations to enable automated eligibility screening for oncology trials. Using synthetic patient records derived from three completed breast cancer studies (KATHERINE, MONARCH, and OLYMPIA), we created 25 virtual patients per trial and compared eligibility classification between full records and their condensed “patient sentences.” P2S achieved a mean concordance of 93.2% (95% CI 89.8–96.6%; Cohen’s κ = 0.91) between sentence-level and full-record decisions while reducing token usage by ~67%. This compression preserved semantic fidelity and reduced computational cost approximately threefold. By encoding heterogeneous clinical data into compact natural-language form, P2S provides a reproducible and efficient approach to patient-trial matching, with potential applications across diverse clinical decision-support systems.

Authors

  • Gerson Hiroshi Yoshinari Júnior; William Caetano Silva Goulart; Ana Beatriz Oliveira Urbano; Maressa Mouty Rabello; Sanderson Oliveira Macedo

Categories