MediAlbertina: An European Portuguese medical language model.

Journal: Computers in biology and medicine
PMID:

Abstract

BACKGROUND: Patient medical information often exists in unstructured text containing abbreviations and acronyms deemed essential to conserve time and space but posing challenges for automated interpretation. Leveraging the efficacy of Transformers in natural language processing, our objective was to use the knowledge acquired by a language model and continue its pre-training to develop an European Portuguese (PT-PT) healthcare-domain language model.

Authors

  • Miguel Nunes
    ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026, Lisbon, Portugal.
  • João Boné
    Select Data, Anaheim, CA, 92807, USA.
  • João C Ferreira
    Department of Logistics, Molde University College, Molde, 6410, Norway; ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026, Lisbon, Portugal; Inov Inesc Inovação - Instituto de Novas Tecnologias, 1000-029, Lisbon, Portugal.
  • Pedro Chaves
    Select Data, Anaheim, CA, 92807, USA.
  • Luis B Elvas
    Department of Logistics, Molde University College, Molde, 6410, Norway; ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026, Lisbon, Portugal; Inov Inesc Inovação - Instituto de Novas Tecnologias, 1000-029, Lisbon, Portugal. Electronic address: luis.m.elvas@himolde.no.