Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting.

Journal: Journal of speech, language, and hearing research : JSLHR
PMID:

Abstract

PURPOSE: Many studies using machine learning (ML) in speech, language, and hearing sciences rely upon cross-validations with single data splitting. This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust data splitting method of nested -fold cross-validation. The second purpose is to present methods and MATLAB code to perform power analysis for ML-based analysis during the design of a study.

Authors

  • Hamzeh Ghasemzadeh
    Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston.
  • Robert E Hillman
    Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston.
  • Daryush D Mehta
    Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston.