Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models.

Authors

  • Andrea Seveso
    Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126, Italy.
  • Andrea Campagner
    IRCCS Istituto Ortopedico Galeazzi, Via Riccardo Galeazzi, 4, 20161, Milano, Italy. Electronic address: a.campagner@campus.unimib.it.
  • Davide Ciucci
    Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126, Italy.
  • Federico Cabitza
    Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy.