Voice Activity Detection using Clustering-based Method in SpectroTemporal Features Space

Research

Title	Voice Activity Detection using Clustering-based Method in SpectroTemporal Features Space
Type	JournalPaper
Keywords	Spectro-temporal Features, Auditory Model, Gaussian Mixture Model, WK-means clustering, Voice Activity Detection.
Year	2022
Journal	Journal of artificial intelligence and data mining
DOI
Researchers	Nafiseh Esfandian ، Fatemeh Jahani Bahnamiri ، Samira Mavaddati

Abstract

This paper proposes a novel method for voice activity detection based on clustering in the spectro-temporal domain. In the proposed algorithms, the auditory model is used in order to extract the spectro-temporal features. The Gaussian mixture model and the WK-means clustering methods are used to decrease the dimensions of the spectro-temporal space. Moreover, the energy and positions of the clusters are used for voice activity detection. Silence/speech is recognized using the attributes of clusters and the updated threshold value in each frame. Having a higher energy, the first cluster is used as the main speech section in computation. The efficiency of the proposed method is evaluated for silence/speech discrimination in different noisy conditions. Displacement of the clusters in the spectro-temporal domain is considered as the criterion to determine the robustness of the features. According to the results obtained, the proposed method improves the speech/nonspeech segmentation rate in comparison to the temporal and spectral features in low signal to noise ratios (SNRs).

Samira Mavaddati

Research

Abstract