From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation.

Journal: Medical image analysis
Published Date:

Abstract

Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.

Authors

  • Gabriel Reale-Nosei
    ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain. Electronic address: gabriel.reale.nosei@alumnos.upm.es.
  • Elvira Amador-Domínguez
    Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain. Electronic address: elvira.amador@upm.es.
  • Emilio Serrano
    ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain. Electronic address: emilio.serrano@upm.es.