Automatic Labeling of Special Diagnostic Mammography Views from Images and DICOM Headers.

Journal: Journal of digital imaging
Published Date:

Abstract

Applying state-of-the-art machine learning techniques to medical images requires a thorough selection and normalization of input data. One of such steps in digital mammography screening for breast cancer is the labeling and removal of special diagnostic views, in which diagnostic tools or magnification are applied to assist in assessment of suspicious initial findings. As a common task in medical informatics is prediction of disease and its stage, these special diagnostic views, which are only enriched among the cohort of diseased cases, will bias machine learning disease predictions. In order to automate this process, here, we develop a machine learning pipeline that utilizes both DICOM headers and images to predict such views in an automatic manner, allowing for their removal and the generation of unbiased datasets. We achieve AUC of 99.72% in predicting special mammogram views when combining both types of models. Finally, we apply these models to clean up a dataset of about 772,000 images with expected sensitivity of 99.0%. The pipeline presented in this paper can be applied to other datasets to obtain high-quality image sets suitable to train algorithms for disease detection.

Authors

  • Dmytro S Lituiev
    Institute for Computational Health Sciences, University of California, San Francisco, 550 16th Street, San Francisco, CA, USA.
  • Hari Trivedi
    Department of Radiology, Medical College of Georgia at Augusta University, 1120 15th St, Augusta, GA 30912 (Y.T.); and Department of Radiology, Emory University, Atlanta, Ga (B.V., E.K., A.P., J.G., N.S., H.T.).
  • Maryam Panahiazar
    Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA.
  • Beau Norgeot
    Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.
  • Youngho Seo
    Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, California.
  • Benjamin Franc
    Department of Radiology and Biomedical Imaging, UCSF, San Francisco, California.
  • Roy Harnish
    From the Department of Radiology and Biomedical Imaging (Y.D., J.H.S., H.T., R.H., N.W.J., T.P.C., M.S.A., C.M.A., S.C.B., R.R.F., S.Y.H., Y.S., R.A.H., M.H.P., B.L.F.) and Institute for Computational Health Sciences (J.H.S., M.G.K., H.T., D.L., K.A.Z., D.H.), University of California, San Francisco, 550 Parnassus Ave, San Francisco, CA 94143; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, Calif (Y.D.); and Department of Radiology, University of California, Davis, Sacramento, Calif (L.N.).
  • Michael Kawczynski
    Institute for Computational Health Sciences, University of California, San Francisco, 550 16th Street, San Francisco, CA, USA.
  • Dexter Hadley
    Institute for Computational Health Sciences, University of California, San Francisco.