Automatic text classification of prostate cancer malignancy scores in radiology reports using NLP models.

Journal: Medical & biological engineering & computing
Published Date:

Abstract

This paper presents the implementation of two automated text classification systems for prostate cancer findings based on the PI-RADS criteria. Specifically, a traditional machine learning model using XGBoost and a language model-based approach using RoBERTa were employed. The study focused on Spanish-language radiological MRI prostate reports, which has not been explored before. The results demonstrate that the RoBERTa model outperforms the XGBoost model, although both achieve promising results. Furthermore, the best-performing system was integrated into the radiological company's information systems as an API, operating in a real-world environment.

Authors

  • Jaime Collado-Montañez
    Department of Computer Science, University of Jaén, Campus Las Lagunillas, s/n, 23071, Jaén, Spain. Electronic address: jcollado@ujaen.es.
  • Pilar López-Úbeda
    Universidad de Jaén, Jaén, Andalucía, Spain.
  • Mariia Chizhikova
    Department of Computer Science, University of Jaén, Campus Las Lagunillas, s/n, 23071, Jaén, Spain. Electronic address: mchizhik@ujaen.es.
  • M Carlos Díaz-Galiano
    Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Campus Las Lagunillas, Jaén, 23071, Spain.
  • L Alfonso Ureña-López
    SINAI Group - CEATIC - Universidad de Jaén, Campus Las Lagunillas s/n, Jaén E-23071, Spain. Electronic address: laurena@ujaen.es.
  • Teodoro Martín-Noguerol
    MRI Unit, Radiology Department, HT médica Carmelo Torres 2, Jaén 23007, Spain. Electronic address: t.martin.f@htime.org.
  • Antonio Luna
    MRI Unit, Radiology Department, Health Time, Jaén, Spain. Electronic address: aluna70@htime.org.
  • M Teresa Martín-Valdivia
    SINAI Group - CEATIC - Universidad de Jaén, Campus Las Lagunillas s/n, Jaén E-23071, Spain. Electronic address: maite@ujaen.es.