2024 : 4 : 29
Samira Mavaddati

Samira Mavaddati

Academic rank: Assistant Professor
ORCID:
Education: PhD.
ScopusId:
Faculty: Faculty of Technology and Engineering
Address: University of mazandaran
Phone: 011-35305126

Research

Title
A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain
Type
JournalPaper
Keywords
Voice Activity Detector Spectro-temporal Domain Sparse Structured Principal Component Analysis Sparse Non-negative Matrix Factorization
Year
2023
Journal International Journal of Engineering
DOI
Researchers Samira Mavaddati

Abstract

Voice activity detectors are presented to extract silence/speech segments of the speech signal to eliminate different background noise signals. A novel voice activity detector is proposed in this paper using spectro-temporal features extracted from the auditory model of the speech signal. After extracting the scale, rate, and frequency features from this feature space, a sparse structured principal component analysis algorithm is used to consider the basic components of these features and reduce the dimension of learning data. Then these feature vectors are employed to learn the models by the sparse non-negative matrix factorization algorithm. The model learning procedure is performed to represent each feature vector with a proper sparse rate based on the selected atoms. Voice activity detection of the input frames is performed by computing the energy of the sparse representation for each input frame over the composite model. If the calculated energy exceeds a specified threshold, it indicates that the input frame has a structure similar to the atoms of the learned models and concludes that the observed frame has voice content. The results of the proposed detector were compared with other baseline methods and classifiers in this processing field. These results in the presence of stationary, non-stationary and periodic noises were investigated and they are shown that the proposed method based on model learning with spectro-temporal features can correctly detect the silence/speech activities.