RBA-FE: A Robust Brain-Inspired Audio Feature Extractor for Depression Diagnosis
Journal:
arXiv
Published Date:
Jun 8, 2025
Abstract
This article proposes a robust brain-inspired audio feature extractor
(RBA-FE) model for depression diagnosis, using an improved hierarchical network
architecture. Most deep learning models achieve state-of-the-art performance
for image-based diagnostic tasks, ignoring the counterpart audio features. In
order to tailor the noise challenge, RBA-FE leverages six acoustic features
extracted from the raw audio, capturing both spatial characteristics and
temporal dependencies. This hybrid attribute helps alleviate the precision
limitation in audio feature extraction within other learning models like deep
residual shrinkage networks. To deal with the noise issues, our model
incorporates an improved spiking neuron model, called adaptive rate smooth
leaky integrate-and-fire (ARSLIF). The ARSLIF model emulates the mechanism of
``retuning of cellular signal selectivity" in the brain attention systems,
which enhances the model robustness against environmental noises in audio data.
Experimental results demonstrate that RBA-FE achieves state-of-the-art accuracy
on the MODMA dataset, respectively with 0.8750, 0.8974, 0.8750 and 0.8750 in
precision, accuracy, recall and F1 score. Extensive experiments on the AVEC2014
and DAIC-WOZ datasets both show enhancements in noise robustness. It is further
indicated by comparison that the ARSLIF neuron model suggest the abnormal
firing pattern within the feature extraction on depressive audio data, offering
brain-inspired interpretability.