Polyphonic pitch tracking with deep layered learning.

Journal: The Journal of the Acoustical Society of America

Published Date: Jul 1, 2020

Abstract

This article presents a polyphonic pitch tracking system that is able to extract both framewise and note-based estimates from audio. The system uses several artificial neural networks trained individually in a deep layered learning setup. First, cascading networks are applied to a spectrogram for framewise fundamental frequency (f) estimation. A sparse receptive field is learned by the first network and then used as a filter kernel for parameter sharing throughout the system. The f activations are connected across time to extract pitch contours. These contours define a framework within which subsequent networks perform onset and offset detection, operating across both time and smaller pitch fluctuations at the same time. As input, the networks use, e.g., variations of latent representations from the f estimation network. Finally, erroneous tentative notes are removed one by one in an iterative procedure that allows a network to classify notes within a correct context. The system was evaluated on four public test sets: MAPS, Bach10, TRIOS, and the MIREX Woodwind quintet and achieved state-of-the-art results for all four datasets. It performs well across all subtasks f, pitched onset, and pitched offset tracking.

Authors

Anders Elowsson

KTH Royal Institute of Technology, School of Computer Science and Communication, Speech, Music and Hearing, Stockholm, Sweden.

Keywords

Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (32752737)

Polyphonic pitch tracking with deep layered learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals