A Dataset of Real and Synthetic Speech in Ukrainian.

Journal: Scientific data
PMID:

Abstract

This work is dedicated to the analysis and evaluation of the DRSSU dataset: A Dataset of Real and Synthetic Speech in Ukrainian, created to support research in the field of natural language processing and speech recognition. The dataset contains a unique collection of audio recordings that include both real and synthesized Ukrainian speech, providing unprecedented opportunities for improving machine learning algorithms aimed at speech recognition and analysis. The main focus of the research is on identifying statistically significant differences between generated and real speech, which is of great importance for the further development of automatic speech recognition systems. The analysis demonstrates potential applications of the dataset in a wide range of areas, from combating misinformation to supporting linguistic diversity and cultural heritage. The work emphasizes the importance of innovation in the field of NLP and speech processing, with a special focus on the development of technologies adapted to the Ukrainian language.

Authors

  • Khrystyna Lipianina-Honcharenko
    West Ukrainian National University, Ternopil, Ukraine. kh.lipianina@wunu.edu.ua.
  • Hennadii Bohuta
    West Ukrainian National University, Ternopil, Ukraine.
  • Adam Ivaniush
    West Ukrainian National University, Ternopil, Ukraine.
  • Mariana Soia
    West Ukrainian National University, Ternopil, Ukraine.