Spectro-temporal acoustical markers differentiate speech from song across cultures.

Journal: Nature communications
PMID:

Abstract

Humans produce two forms of cognitively complex vocalizations: speech and song. It is debated whether these differ based primarily on culturally specific, learned features, or if acoustical features can reliably distinguish them. We study the spectro-temporal modulation patterns of vocalizations produced by 369 people living in 21 urban, rural, and small-scale societies across six continents. Specific ranges of spectral and temporal modulations, overlapping within categories and across societies, significantly differentiate speech from song. Machine-learning classification shows that this effect is cross-culturally robust, vocalizations being reliably classified solely from their spectro-temporal features across all 21 societies. Listeners unfamiliar with the cultures classify these vocalizations using similar spectro-temporal cues as the machine learning algorithm. Finally, spectro-temporal features are better able to discriminate song from speech than a broad range of other acoustical variables, suggesting that spectro-temporal modulation-a key feature of auditory neuronal tuning-accounts for a fundamental difference between these categories.

Authors

  • Philippe Albouy
    CERVO Brain Research Centre, School of Psychology, Laval University, Québec City, QC, Canada. philippe.albouy@psy.ulaval.ca.
  • Samuel A Mehr
    International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada.
  • Roxane S Hoyer
    CERVO Brain Research Centre, School of Psychology, Laval University, Québec City, QC, Canada.
  • Jérémie Ginzburg
    CERVO Brain Research Centre, School of Psychology, Laval University, Québec City, QC, Canada.
  • Yi Du
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.
  • Robert J Zatorre
    International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada. robert.zatorre@mcgill.ca.