A Dataset of Real and Synthetic Speech in Ukrainian.

Journal: Scientific data

PMID: 40328782

Abstract

This work is dedicated to the analysis and evaluation of the DRSSU dataset: A Dataset of Real and Synthetic Speech in Ukrainian, created to support research in the field of natural language processing and speech recognition. The dataset contains a unique collection of audio recordings that include both real and synthesized Ukrainian speech, providing unprecedented opportunities for improving machine learning algorithms aimed at speech recognition and analysis. The main focus of the research is on identifying statistically significant differences between generated and real speech, which is of great importance for the further development of automatic speech recognition systems. The analysis demonstrates potential applications of the dataset in a wide range of areas, from combating misinformation to supporting linguistic diversity and cultural heritage. The work emphasizes the importance of innovation in the field of NLP and speech processing, with a special focus on the development of technologies adapted to the Ukrainian language.

Authors

Khrystyna Lipianina-Honcharenko

West Ukrainian National University, Ternopil, Ukraine. kh.lipianina@wunu.edu.ua.
Hennadii Bohuta

West Ukrainian National University, Ternopil, Ukraine.
Adam Ivaniush

West Ukrainian National University, Ternopil, Ukraine.
Mariana Soia

West Ukrainian National University, Ternopil, Ukraine.

Keywords

Humans Language Machine Learning Natural Language Processing Speech Speech Recognition Software Ukraine

External Resources

View on PubMed Access via DOI PubMed (40328782)

A Dataset of Real and Synthetic Speech in Ukrainian.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals