Project: Individual Voice Activity Detection

Individual voice activity detection (IVAD) detects speech regions of an interest person in audio. I use a VAD method named PARADE [1] and a speaker identification method based on GMM [2] to construct an IVAD system.

The report is available at filereport.pdf

First Edition: May 2008. Last Modified: May 2008
The GMM based Speaker Identification

I have some doubts for PARADE [1].

  1. Equation (2) in [3] was not accurate. This affected a terrible result.
  2. Human voice formants are not clearly multiples of fundamental frequency F0. Formants would be like 100, 220, 310, 390.
  3. Picking points at multiples of F0 is feasible. Frequency analysis of human voice signal wave is not a clean smoothed curve, but jaggy curve in practice. Values at 2*F0 and 2*F0 -1 would be different so much, and the big difference will affect the final result considerably. We should smooth signal beforehand or estimate envelop of signals using methods discussed in Project: Source-Filter Separation for example and use the envelop signals.


