Patient2Sentence: Semantic Compression of Clinical Trial Eligibility Using Large Language Models

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Clinical decision-making generates vast unstructured data that remain underexploited for trial recruitment. We present Patient2Sentence (P2S), a framework that transforms electronic health records into language-based representations to enable automated eligibility screening for oncology trials. Using synthetic patient records derived from three completed breast cancer studies (KATHERINE, MONARCH, and OLYMPIA), we created 25 virtual patients per trial and compared eligibility classification between full records and their condensed “patient sentences.” P2S achieved a mean concordance of 93.2% (95% CI 89.8–96.6%; Cohen’s κ = 0.91) between sentence-level and full-record decisions while reducing token usage by ~67%. This compression preserved semantic fidelity and reduced computational cost approximately threefold. By encoding heterogeneous clinical data into compact natural-language form, P2S provides a reproducible and efficient approach to patient-trial matching, with potential applications across diverse clinical decision-support systems.

Authors

Gerson Hiroshi Yoshinari Júnior; William Caetano Silva Goulart; Ana Beatriz Oliveira Urbano; Maressa Mouty Rabello; Sanderson Oliveira Macedo

External Resources

View on medRxiv Access via DOI

Patient2Sentence: Semantic Compression of Clinical Trial Eligibility Using Large Language Models

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Patient2Sentence: Semantic Compression of Clinical Trial Eligibility Using Large Language Models

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals