Toward Non-Invasive Voice Restoration: A Deep Learning Approach Using Real-Time MRI

Journal: medRxiv
Published Date:

Abstract

Despite recent advances in brain–computer interfaces (BCIs) for speech restoration, existing systems remain invasive, costly, and inaccessible to individuals with congenital mutism or neurodegenerative disease. We present a proof-of-concept pipeline that synthesizes personalized speech directly from real-time magnetic resonance imaging (rtMRI) of the vocal tract, without requiring acoustic input. Segmented rtMRI frames are mapped to articulatory class representations using a Pix2Pix conditional GAN, which are then transformed into synthetic audio waveforms by a convolutional neural network modeling the articulatory-to-acoustic relationship. The outputs are rendered into audible form and evaluated with speaker-similarity metrics derived from Resemblyzer embeddings. While preliminary, our results suggest that even silent articulatory motion encodes sufficient information to approximate a speaker’s vocal characteristics, offering a non-invasive direction for future speech restoration in individuals who have lost or never developed voice.

Authors

  • Mohamad Saleh