DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease
Journal:
arXiv
Published Date:
Feb 6, 2025
Abstract
Alzheimer's Disease (AD) is an irreversible neurodegenerative disease
affecting 50 million people worldwide. Low-cost, accurate identification of key
markers of AD is crucial for timely diagnosis and intervention. Language
impairment is one of the earliest signs of cognitive decline, which can be used
to discriminate AD patients from normal control individuals.
Patient-interviewer dialogues may be used to detect such impairments, but they
are often mixed with ambiguous, noisy, and irrelevant information, making the
AD detection task difficult. Moreover, the limited availability of AD speech
samples and variability in their speech styles pose significant challenges in
developing robust speech-based AD detection models. To address these
challenges, we propose DECT, a novel speech-based domain-specific approach
leveraging large language models (LLMs) for fine-grained linguistic analysis
and label-switched label-preserved data generation. Our study presents four
novelties: We harness the summarizing capabilities of LLMs to identify and
distill key Cognitive-Linguistic information from noisy speech transcripts,
effectively filtering irrelevant information. We leverage the inherent
linguistic knowledge of LLMs to extract linguistic markers from unstructured
and heterogeneous audio transcripts. We exploit the compositional ability of
LLMs to generate AD speech transcripts consisting of diverse linguistic
patterns to overcome the speech data scarcity challenge and enhance the
robustness of AD detection models. We use the augmented AD textual speech
transcript dataset and a more fine-grained representation of AD textual speech
transcript data to fine-tune the AD detection model. The results have shown
that DECT demonstrates superior model performance with an 11% improvement in AD
detection accuracy on the datasets from DementiaBank compared to the baselines.