Information extraction from medical case reports using OpenAI InstructGPT.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: Researchers commonly use automated solutions such as Natural Language Processing (NLP) systems to extract clinical information from large volumes of unstructured data. However, clinical text's poor semantic structure and domain-specific vocabulary can make it challenging to develop a one-size-fits-all solution. Large Language Models (LLMs), such as OpenAI's Generative Pre-Trained Transformer 3 (GPT-3), offer a promising solution for capturing and standardizing unstructured clinical information. This study evaluated the performance of InstructGPT, a family of models derived from LLM GPT-3, to extract relevant patient information from medical case reports and discussed the advantages and disadvantages of LLMs versus dedicated NLP methods.

Authors

  • Veronica Sciannameo
    University of Padova, Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, Italy.
  • Daniele Jahier Pagliari
    Department of Control and Computer Engineering, Politecnico di Torino, Turin 10129, Italy.
  • Sara Urru
    Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy.
  • Piercesare Grimaldi
    Fuster Laboratory of Cognitive Neuroscience, Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, California, United States of America.
  • Honoria Ocagli
    Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy.
  • Sara Ahsani-Nasab
    Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy.
  • Rosanna Irene Comoretto
    Department of Public Health and Pediatrics, University of Torino, Via Santena 5 bis, Torino 10126, Italy.
  • Dario Gregori
    Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy.
  • Paola Berchialla
    Medical Statistics Unit, Department of Clinical and Biological Sciences, University of Torino, Italy.