A Dataset of Real and Synthetic Speech in Ukrainian.
Journal:
Scientific data
PMID:
40328782
Abstract
This work is dedicated to the analysis and evaluation of the DRSSU dataset: A Dataset of Real and Synthetic Speech in Ukrainian, created to support research in the field of natural language processing and speech recognition. The dataset contains a unique collection of audio recordings that include both real and synthesized Ukrainian speech, providing unprecedented opportunities for improving machine learning algorithms aimed at speech recognition and analysis. The main focus of the research is on identifying statistically significant differences between generated and real speech, which is of great importance for the further development of automatic speech recognition systems. The analysis demonstrates potential applications of the dataset in a wide range of areas, from combating misinformation to supporting linguistic diversity and cultural heritage. The work emphasizes the importance of innovation in the field of NLP and speech processing, with a special focus on the development of technologies adapted to the Ukrainian language.