The Journal of the Acoustical Society of America
Feb 1, 2021
Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recog...
The Journal of the Acoustical Society of America
Feb 1, 2021
Broadband spectrograms from surface ships are employed in convolutional neural networks (CNNs) to predict the seabed type, ship speed, and closest point of approach (CPA) range. Three CNN architectures of differing size and depth are trained on diffe...
The Journal of the Acoustical Society of America
Jan 1, 2021
This paper is part of a special issue on machine learning in acoustics. A model-based convolutional neural network (CNN) approach is presented to test the viability of this method as an alternative to conventional matched-field processing (MFP) for u...
The Journal of the Acoustical Society of America
Jan 1, 2021
The goal of this research is to find a way of highlighting the acoustic differences between consonant phonemes of the Polish and Lithuanian languages. For this purpose, similarity matrices are employed based on speech acoustic parameters combined wit...
The Journal of the Acoustical Society of America
Dec 1, 2020
Multiple approaches for depth estimation in deep-ocean environments are discussed. First, a multispectral transformation for depth estimation (MSTDE) method based on the low-spatial-frequency interference in a constant sound speed is derived to estim...
The Journal of the Acoustical Society of America
Dec 1, 2020
In this paper, an audio-driven, multimodal approach for speaker diarization in multimedia content is introduced and evaluated. The proposed algorithm is based on semi-supervised clustering of audio-visual embeddings, generated using deep learning tec...
The Journal of the Acoustical Society of America
Nov 1, 2020
Normalizing intrinsic variabilities (e.g., variability in speech production brought on by aging, physical or cognitive task stress, Lombard effect, etc.) in speech and speaker recognition models is essential for system robustness. This study focuses ...
The Journal of the Acoustical Society of America
Sep 1, 2020
The performances of deep convolutional neural network (DCNN) modeling and transfer learning (TF) for thyroid tumor grading using ultrasound imaging were evaluated. This retrospective study included input patient data (ultrasound B-mode image sets) as...
The Journal of the Acoustical Society of America
Sep 1, 2020
Speaker separation is a special case of speech separation, in which the mixture signal comprises two or more speakers. Many talker-independent speaker separation methods have been introduced in recent years to address this problem in anechoic conditi...
The Journal of the Acoustical Society of America
Jul 1, 2020
This article presents a polyphonic pitch tracking system that is able to extract both framewise and note-based estimates from audio. The system uses several artificial neural networks trained individually in a deep layered learning setup. First, casc...