Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning.

Journal: Journal of digital imaging

Published Date: Feb 1, 2019

Abstract

Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.

Authors

Hari M Trivedi

Division of Emergency Radiology, Emory University School of Medicine, Atlanta, Georgia.
Maryam Panahiazar

Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA.
April Liang

University of California School of Medicine, San Francisco, CA, USA.
Dmytro Lituiev

From the Department of Radiology and Biomedical Imaging (Y.D., J.H.S., H.T., R.H., N.W.J., T.P.C., M.S.A., C.M.A., S.C.B., R.R.F., S.Y.H., Y.S., R.A.H., M.H.P., B.L.F.) and Institute for Computational Health Sciences (J.H.S., M.G.K., H.T., D.L., K.A.Z., D.H.), University of California, San Francisco, 550 Parnassus Ave, San Francisco, CA 94143; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, Calif (Y.D.); and Department of Radiology, University of California, Davis, Sacramento, Calif (L.N.).
Peter Chang

Department of Urology, Beth Israel Deaconess Medical Center, Boston, MA, USA.
Jae Ho Sohn

Radiology & Biomedical Imaging, UCSF Medical Center, 505 Parnassus Ave, San Francisco, CA, 94158, USA. sohn87@gmail.com.
Yunn-Yi Chen

Department of Pathology, University of California, San Francisco, CA, USA.
Benjamin L Franc

From the Department of Radiology and Biomedical Imaging (Y.D., J.H.S., H.T., R.H., N.W.J., T.P.C., M.S.A., C.M.A., S.C.B., R.R.F., S.Y.H., Y.S., R.A.H., M.H.P., B.L.F.) and Institute for Computational Health Sciences (J.H.S., M.G.K., H.T., D.L., K.A.Z., D.H.), University of California, San Francisco, 550 Parnassus Ave, San Francisco, CA 94143; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, Calif (Y.D.); and Department of Radiology, University of California, Davis, Sacramento, Calif (L.N.).
Bonnie Joe

Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.
Dexter Hadley

Institute for Computational Health Sciences, University of California, San Francisco.

Keywords

Breast Breast Neoplasms Databases, Factual Deep Learning Electronic Health Records Female Humans Image Interpretation, Computer-Assisted Mammography Middle Aged

External Resources

View on PubMed Access via DOI PubMed (30128778)

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals