Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for... With the popular application of deep learning-based models in various classification problems, more and more researchers have applied these models to environmental sound classification (ESC) tasks in recent years. However, the performance of existing models that use acoustic features such as log-scaled mel spectrogram (Log mel) and mel frequency cepstral coefficient or raw waveform to train deep neural networks for ESC is unsatisfactory. In this paper, first of all, a fusion of multiple features consisting of Log mel, log-scaled cochleagram and log-scaled constant-Q transform are proposed, and these features are fused to form the feature set that is called LMCC. Then, a network called CNN-GRUNN which consists of convolutional neural network and gated recurrent unit neural network in parallel is presented to improve the performance of ESC with the proposed aggregated features. Experiments were conducted on ESC-10, ESC-50, and UrbanSound8K datasets. The experimental results indicate that the model with LMCC as input to CNN-GRUNN is appropriate for ESC problems. And our model is able to achieve good classification accuracy for the three datasets, i.e., ESC-10 (92.30%), ESC-50 (87.43%), and UrbanSound8K (96.10%). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Automatic Control and Computer Sciences Springer Journals

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Loading next page...
 
/lp/springer-journals/convolutional-neural-network-gated-recurrent-unit-neural-network-with-yNA3hK0c0s

References (41)

Publisher
Springer Journals
Copyright
Copyright © Allerton Press, Inc. 2021. ISSN 0146-4116, Automatic Control and Computer Sciences, 2021, Vol. 55, No. 4, pp. 311–318. © Allerton Press, Inc., 2021.
ISSN
0146-4116
eISSN
1558-108X
DOI
10.3103/s0146411621040106
Publisher site
See Article on Publisher Site

Abstract

With the popular application of deep learning-based models in various classification problems, more and more researchers have applied these models to environmental sound classification (ESC) tasks in recent years. However, the performance of existing models that use acoustic features such as log-scaled mel spectrogram (Log mel) and mel frequency cepstral coefficient or raw waveform to train deep neural networks for ESC is unsatisfactory. In this paper, first of all, a fusion of multiple features consisting of Log mel, log-scaled cochleagram and log-scaled constant-Q transform are proposed, and these features are fused to form the feature set that is called LMCC. Then, a network called CNN-GRUNN which consists of convolutional neural network and gated recurrent unit neural network in parallel is presented to improve the performance of ESC with the proposed aggregated features. Experiments were conducted on ESC-10, ESC-50, and UrbanSound8K datasets. The experimental results indicate that the model with LMCC as input to CNN-GRUNN is appropriate for ESC problems. And our model is able to achieve good classification accuracy for the three datasets, i.e., ESC-10 (92.30%), ESC-50 (87.43%), and UrbanSound8K (96.10%).

Journal

Automatic Control and Computer SciencesSpringer Journals

Published: Jul 1, 2021

Keywords: environmental sound classification; feature fusion; convolutional neural network-gated recurrent unit neural network

There are no references for this article.