Acoustic Analysis of Primary Care Patient-provider Conversations to Screen for Cognitive Impairment
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Cognitive impairment (CI) is often under detected in primary care due to time and resource constraints. Passive analysis of clinical dialogue may offer an accessible approach for screening. To assess whether audio recordings of patient–physician dialogue during routine primary care visits can be used to identify CI using acoustic speech features and machine learning (ML). This observational study conducted among older primary care patients involved audio recording primary care visits using a microphone and portable device. An external validation cohort was recruited in a separate city to assess reproducibility of findings. The study was conducted in primary care practices in New York City, with additional participants recruited from primary care practices in Chicago, Illinois, for validation. The study included 787 English-speaking patients aged 55 years and older, without documented history of dementia or mild CI. Eligible patients were recruited from primary care practices during routine visits. For validation, 179 patients meeting the same eligibility criteria were recruited from primary care practices in Chicago. Multiple thirty-second speech segments were extracted from recordings. Acoustic features were derived using foundation models (Whisper, HuBERT, Wav2Vec 2.0) and expert-defined methods (eGeMAPS, prosody). CI was defined as Montreal Cognitive Assessment score ≥1.0 standard deviations below age and education-adjusted norms. ML classifiers were trained to predict CI status from audio recordings. We calculated area under the receiver operating characteristic curve (AUC-ROC) and maximum F1 score (Fmax) for identifying CI participants. The mean age was 66.8 years and 21% had CI. Models using Whisper-derived acoustic features performed best (AUC-ROC=0.733, 95% confidence interval [95%CI]=0.714-0.752; Fmax(CI)=0.504, 95%CI=0.474-0.534). Results generalized to the external site with similar performance (AUC-ROC=0.727, 95%CI=0.714-0.740; Fmax(CI)=0.459, 95%CI=0.442-0.476). Model interpretation identified pitch, timing, and variability features as key predictors. When used for screening, the algorithm achieved positive predictive value of 30.4% (95%CI=28.7%-32.1%), sensitivity of 68.2% (95%CI=61.8%-74.6%), and specificity of 63.6% (95%CI=59.8%-67.4%) on the holdout cohort. ML models trained on acoustic features from brief clinical conversations identified CI with high accuracy. These findings support the feasibility of passive, speech-based screening during routine primary care. Can acoustic features extracted from audio recordings of patient–physician conversations during routine primary care visits be used to screen for cognitive impairment? In this study including 787 older adults without diagnosis of cognitive problems, machine learning models trained on acoustic features from audio segments of recordings of primary care visits achieved area under the receiver operating characteristic curve values of 0.72 for predicting cognitive impairment. The algorithm achieved a sensitivity of 83%, specificity of 44%, and positive predictive value of 28%, identifying a subset of primary care patients at higher risk for cognitive impairment. Models performed similarly on an external validation dataset of 179 participants. Interpretability analyses highlighted patient pause duration and energy-related features as salient indicators of cognition status. These findings suggest that short segments of naturalistic clinical dialogue may contain useful acoustic signals for passively screening patients for cognitive impairment.