A dataset for classifying phrases and sentences into statements, questions, or exclamations based on sound pitch.
Journal:
Data in brief
Published Date:
Jun 24, 2025
Abstract
Speech is the most fundamental and sophisticated channel of human communication, and breakthroughs in Natural Language Processing (NLP) have substantially raised the quality of human-computer interaction. In particular, new wave of deep learning methods have significantly advanced human speech recognition by obtaining fine-grained acoustic cues including pitch, an acoustic feature that can be a critical ingredient in understanding communicative intent. Pitch variation is in particular important for prosodic classification tasks (i.e., statements, questions, and exclamations), which is crucial in tonal and low resource languages such as Kurdish, where intonation holds significant semantic information. This paper presents the dataset of the Statements, Questions, or Exclamations Based on Sound Pitch (SQEBSP) which contains 12,660 professionally-recorded speech audio clips by 431 native Kurdish speakers who reside in the Kurdistan Region of Iraq. Regarding utterances, 10 new phrases were articulated by each speaker per three prosodic categories: statements, questions, and exclamations. All utterances were digitized at 16 kHz and then manually checked for correctness concerning pitch-based classification. The dataset contains equal representation from all three classes, about 4200 samples per class, and metadata such as speaker gender, age group, and sentence identifiers. The original audio files, alongside resources like Mel-Frequency Cepstral Coefficients (MFCCs) and waveform visualizations, can be found on Mendeley Data. The dataset offered has significant advantages for formulating and testing pitch-based speech classification algorithms, furthers the work on pronunciation modelling for languages lacking sufficient resources. It furthermore, aids in developing speech technologies sensitive to dialects.
Authors
Keywords
No keywords available for this article.