ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer

Journal: arXiv

Published Date: Mar 26, 2025

Abstract

Text-driven speech style transfer aims to mold the intonation, pace, and timbre of a spoken utterance to match stylistic cues from text descriptions. While existing methods leverage large-scale neural architectures or pre-trained language models, the computational costs often remain high. In this paper, we present \emph{ReverBERT}, an efficient framework for text-driven speech style transfer that draws inspiration from a state space model (SSM) paradigm, loosely motivated by the image-based method of Wang and Liu~\cite{wang2024stylemamba}. Unlike image domain techniques, our method operates in the speech space and integrates a discrete Fourier transform of latent speech features to enable smooth and continuous style modulation. We also propose a novel \emph{Transformer-based SSM} layer for bridging textual style descriptors with acoustic attributes, dramatically reducing inference time while preserving high-quality speech characteristics. Extensive experiments on benchmark speech corpora demonstrate that \emph{ReverBERT} significantly outperforms baselines in terms of naturalness, expressiveness, and computational efficiency. We release our model and code publicly to foster further research in text-driven speech style transfer.

Authors

Michael Brown
Sofia Martinez
Priya Singh

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2503.20992v1)

ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals