Mazandaran University | Voice Activity Detection using Clustering-based Method in SpectroTemporal Features Space

Title	Voice Activity Detection using Clustering-based Method in SpectroTemporal Features Space
Type	JournalPaper
Keywords	Spectro-temporal Features, Auditory Model, Gaussian Mixture Model, WK-means clustering, Voice Activity Detection.
Abstract	This paper proposes a novel method for voice activity detection based on clustering in the spectro-temporal domain. In the proposed algorithms, the auditory model is used in order to extract the spectro-temporal features. The Gaussian mixture model and the WK-means clustering methods are used to decrease the dimensions of the spectro-temporal space. Moreover, the energy and positions of the clusters are used for voice activity detection. Silence/speech is recognized using the attributes of clusters and the updated threshold value in each frame. Having a higher energy, the first cluster is used as the main speech section in computation. The efficiency of the proposed method is evaluated for silence/speech discrimination in different noisy conditions. Displacement of the clusters in the spectro-temporal domain is considered as the criterion to determine the robustness of the features. According to the results obtained, the proposed method improves the speech/nonspeech segmentation rate in comparison to the temporal and spectral features in low signal to noise ratios (SNRs).
Researchers	Samira Mavaddati (Third Researcher), Nafiseh Esfandian (First Researcher), Fatemeh Jahani Bahnamiri (Second Researcher)

Research Info