WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper

Journal: arXiv

Published Date: May 25, 2025

Abstract

Whisper fails to correctly transcribe dementia speech because persons with dementia (PwDs) often exhibit irregular speech patterns and disfluencies such as pauses, repetitions, and fragmented sentences. It was trained on standard speech and may have had little or no exposure to dementia-affected speech. However, correct transcription is vital for dementia speech for cost-effective diagnosis and the development of assistive technology. In this work, we fine-tune Whisper with the open-source dementia speech dataset (DementiaBank) and our in-house dataset to improve its word error rate (WER). The fine-tuning also includes filler words to ascertain the filler inclusion rate (FIR) and F1 score. The fine-tuned models significantly outperformed the off-the-shelf models. The medium-sized model achieved a WER of 0.24, outperforming previous work. Similarly, there was a notable generalisability to unseen data and speech patterns.

Authors

Emmanuel Akinrintoyo
Nadine Abdelhalim
Nicole Salomons

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.21551v1)

WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals