Is a score enough? Pitfalls and solutions for AI severity scores.

Journal: European radiology experimental
Published Date:

Abstract

Severity scores, which often refer to the likelihood or probability of a pathology, are commonly provided by artificial intelligence (AI) tools in radiology. However, little attention has been given to the use of these AI scores, and there is a lack of transparency into how they are generated. In this comment, we draw on key principles from psychological science and statistics to elucidate six human factors limitations of AI scores that undermine their utility: (1) variability across AI systems; (2) variability within AI systems; (3) variability between radiologists; (4) variability within radiologists; (5) unknown distribution of AI scores; and (6) perceptual challenges. We hypothesize that these limitations can be mitigated by providing the false discovery rate and false omission rate for each score as a threshold. We discuss how this hypothesis could be empirically tested. KEY POINTS: The radiologist-AI interaction has not been given sufficient attention. The utility of AI scores is limited by six key human factors limitations. We propose a hypothesis for how to mitigate these limitations by using false discovery rate and false omission rate.

Authors

  • Michael H Bernstein
    School of Public Health, Brown University, Providence, RI, United States.
  • Marly van Assen
    Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, Ashley River Tower, 25 Courtenay Dr, Charleston, SC 29425-2260 (S.S.M., D.M., M.v.A., C.N.D.C., R.R.B., C.T., A.V.S., A.M.F., B.E.J., L.P.G., U.J.S.); Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Frankfurt, Germany (S.S.M., T.J.V.); Stanford University School of Medicine, Department of Radiology, Stanford, Calif (D.M.); Division of Cardiothoracic Imaging, Nuclear Medicine and Molecular Imaging, Department of Radiology and Imaging Sciences, Emory University, Atlanta, Ga (C.N.D.C.); Division of Cardiology, Department of Medicine, Medical University of South Carolina, Charleston, SC (R.R.B.); Department of Cardiology and Intensive Care Medicine, Heart Center Munich-Bogenhausen, Munich, Germany (C.T.); Department of Cardiology, Munich University Clinic, Ludwig-Maximilians-University, Munich, Germany (C.T.); Siemens Medical Solutions USA, Malvern, Pa (P.S.); and Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC (A.J.M.).
  • Michael A Bruno
    Departments of Radiology and Medicine, Penn State Milton S. Hershey Medical Center, Hershey, PA.
  • Elizabeth A Krupinski
    Department of Radiology and Imaging Sciences, Emory School of Medicine, Atlanta, GA, USA. Electronic address: elizabeth.anne.krupinski@emory.edu.
  • Carlo De Cecco
    Department of Radiology and Imaging Sciences, Emory School of Medicine, Emory University, Atlanta, USA.
  • Grayson L Baird
    From the Departments of Diagnostic Imaging (M.T.S., M.J., J.L.B., G.L.B., R.A.M.), Diagnostic Imaging (A.D.Y.), and Neurosurgery (M.J., R.A.M.), Warren Alpert School of Medicine at Brown University, Rhode Island Hospital, 593 Eddy St, APC 701, Providence, RI 02903; Department of Computer Science, Brown University, Providence, RI (J.V., M.P.D., Y.H.K., S.S.S., H.J.T., A.W., H.L.C.W., C.E., U.C.); and the Norman Prince Neuroscience Institute, Rhode Island Hospital, Providence, RI (M.J., R.A.M.).