SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field.

Authors

  • Lucas Emanuel Silva E Oliveira
    Health Technology Program, Pontifical Catholic University of Paraná, Rua Imaculada Conceição, 1155 - Curitiba, Paraná, 80215-901, Brazil. kunkaweb@gmail.com.
  • Ana Carolina Peters
    Health Technology Program, Pontifical Catholic University of Paraná, Rua Imaculada Conceição, 1155 - Curitiba, Paraná, 80215-901, Brazil.
  • Adalniza Moura Pucca da Silva
    Health Technology Program, Pontifical Catholic University of Paraná, Rua Imaculada Conceição, 1155 - Curitiba, Paraná, 80215-901, Brazil.
  • Caroline Pilatti Gebeluca
    Health Technology Program, Pontifical Catholic University of Paraná, Rua Imaculada Conceição, 1155 - Curitiba, Paraná, 80215-901, Brazil.
  • Yohan Bonescki Gumiel
    Health Technology Program, Pontifical Catholic University of Paraná, Curitiba, PR, Brazil.
  • Lilian Mie Mukai Cintho
    Health Technology Program, Pontifical Catholic University of Paraná, Curitiba, PR, Brazil.
  • Deborah Ribeiro Carvalho
    Health Technology Program, Pontifical Catholic University of Paraná, Curitiba, PR, Brazil.
  • Sadid Al Hasan
    AI Lab, Philips Research North America, Cambridge, MA, USA.
  • Claudia Maria Cabral Moro
    Health Technology Program, Pontifical Catholic University of Paraná, Curitiba, PR, Brazil.