Deep learning for occupation recognition and knowledge discovery in rheumatology clinical notes.

Journal: Scientific reports
Published Date:

Abstract

Occupational data is a crucial social determinant of health, influencing diagnostic accuracy, treatment strategies, and policy-making in healthcare. However, its inclusion in electronic health records (EHR) is often relegated to unstructured fields. This study aims to assess the collection and use of occupation-related data in rheumatology clinical narratives, describe factors influencing its collection, and analyze its association with patient diagnoses. We employed a pre-trained Spanish language model fine-tuned with biomedical texts to identify occupation mentions in the EHR of 35,586 rheumatic patients. The model's performance was evaluated using a gold-standard dataset with precision, recall, and F1-score metrics. Occupation mentions were normalized using the European Skills, Competences, Qualifications, and Occupations (ESCO) classification. Logistic regression analyses identified sociodemographic and clinical predictors of occupation collection and examined associations between occupations and diagnoses. The model achieved an F1-score of 0.73, identifying valid occupation mentions in 3527 patients (10%). Normalization yielded 402 ESCO codes. Mechanical pathologies such as back pain and muscle disorders were associated with a higher probability of occupation collection, while professions like cleaners and helpers were linked to these conditions. Customer service clerks and hairdressers were associated with autoimmune diseases. This study demonstrates the feasibility of automated occupation recognition in EHRs, highlighting the relevance of occupational data as a social determinant of health in rheumatology. Integrating such data could inform targeted prevention and treatment strategies for rheumatic diseases.

Authors

  • Alfredo Madrid-García
    Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos, Prof. Martin Lagos s/n, Madrid 28040, Spain.
  • Inés Pérez-Sancristobal
    Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, 28040, Madrid, Spain.
  • Leticia León
    Rheumatology Department, Hospital Clínical San Carlos, and IdISSC, Madríd, Spain.
  • Lydia Abasolo
    Rheumatology Department, Hospital Clínical San Carlos, and IdISSC, Madríd, Spain.
  • Benjamín Fernandez-Gutierrez
    Rheumatology Department, Hospital Clínical San Carlos, and IdISSC, Madríd, Spain.
  • Luis Rodríguez-Rodríguez
    Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria San Carlos, Madrid, Spain.