A nested cross validation approach to machine learning model performance evaluation on a small dataset for Creutzfeldt-Jakob disease diagnosis.
Journal:
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PMID:
40039298
Abstract
The use of machine learning (ML) to diagnose neurological diseases has become increasingly popular. However, some rare neurodegenerative diseases such as Creutzfeldt-Jakob disease (CJD) suffer from the way that the traditional diagnosis relying abnormal protease-resistant prion protein accumulation in post-mortem brain tissue as the definitive confirmation of the disease such that the possibility of early detection and intervention is very limited. One recent ML study has shown promise in aiding the identification of CJD with greater accuracy and efficiency using protein levels in cerebrospinal fluid that could be obtained in-vivo. However, rare disease diagnosis has "small data problems" which pose significant challenges to accurately diagnose such diaseses in real life. In this paper, using the base performance of the model, we investigated how different nested cross validation (nCV) methods could be used to improve the performance of the model. We showed that there was an intricate relation between the loop structure inherent to the nCV method, but we can obtain a higher predictive power without having an overfitting issue. The systematic optimization of loop structure for nCV approach will remains a topic of future research, but we showed a case where a small data problem can be addressed with respect to model performance improvement by employing nCV for a small data problem such as CJD diagnostic.