Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

Journal: arXiv

Published Date: Feb 1, 2025

Abstract

Visual speech recognition remains an open research problem where different challenges must be considered by dispensing with the auditory sense, such as visual ambiguities, the inter-personal variability among speakers, and the complex modeling of silence. Nonetheless, recent remarkable results have been achieved in the field thanks to the availability of large-scale databases and the use of powerful attention mechanisms. Besides, multiple languages apart from English are nowadays a focus of interest. This paper presents noticeable advances in automatic continuous lipreading for Spanish. First, an end-to-end system based on the hybrid CTC/Attention architecture is presented. Experiments are conducted on two corpora of disparate nature, reaching state-of-the-art results that significantly improve the best performance obtained to date for both databases. In addition, a thorough ablation study is carried out, where it is studied how the different components that form the architecture influence the quality of speech recognition. Then, a rigorous error analysis is carried out to investigate the different factors that could affect the learning of the automatic system. Finally, a new Spanish lipreading benchmark is consolidated. Code and trained models are available at https://github.com/david-gimeno/evaluating-end2end-spanish-lipreading.

Authors

David Gimeno-Gómez
Carlos-D. Martínez-Hinarejos

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2502.00464v2)

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals