Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

Journal: Artificial intelligence in medicine

Published Date: Apr 27, 2017

Abstract

BACKGROUND AND OBJECTIVES: Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers.

Authors

Nir Nissim

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel. Electronic address: nirni@post.bgu.ac.il.
Yuval Shahar

Medical Informatics Research Center, Department of Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva, Israel.
Yuval Elovici

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
George Hripcsak

Department of Biomedical Informatics, Columbia University, 622 W 168th Street, PH20, New York, NY 10032, USA; Medical Informatics Services, NewYork-Presbyterian Hospital, 622 W 168th Street, PH20, New York, NY 10032, USA. Electronic address: hripcsak@columbia.edu.
Robert Moskovitch

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Department of Biomedical Informatics, Columbia University, New York, NY, USA.

Keywords

Area Under Curve Data Mining Electronic Health Records Humans Learning Curve Observer Variation Phenotype Reproducibility of Results Severity of Illness Index Supervised Machine Learning Time Factors

External Resources

View on PubMed Access via DOI PubMed (28456512)

Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals