2024 : 12 : 4
Iman Esmaili Paeen Afrakoti

Iman Esmaili Paeen Afrakoti

Academic rank: Associate Professor
ORCID:
Education: PhD.
ScopusId:
HIndex:
Faculty: Faculty of Technology and Engineering
Address: Engineering & Technology Department, University of Mazandaran, Pasdaran Street, Babolsar, Iran
Phone: 01135305134

Research

Title
Speech Emotion Recognition Using Scalogram Based Deep Structure
Type
JournalPaper
Keywords
Continuous Wavelet Transform Emotion Recognition Convolutional Neural Network Recurrent Network Long-short Term Memory
Year
2020
Journal International Journal of Engineering Transactions A: Basics
DOI
Researchers khadijeh Aghajani ، Iman Esmaili Paeen Afrakoti

Abstract

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concatenated Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). The CNN can be used to learn local salient features from speech signals, images, and videos. Moreover, the RNNs have been used in many sequential data processing tasks in order to learn long-term dependencies between the local features. A combination of these two gives us the advantage of the strengths of both networks. In the proposed method, CNN has been applied directly to a scalogram of speech signals. Then, the attention-mechanism-based RNN model was used to learn long-term temporal relationships of the learned features. Experiments on various data such as RAVDESS, SAVEE, and Emo-DB demonstrate the effectiveness of the proposed SER method.