PukiWiki contents have been moved into SONOTS Plugin (20070703)

Project: Individual Voice Activity Detection

Table of Contents


Individual voice activity detection (IVAD) detects speech regions of an interest person in audio. I use a VAD method named PARADE [1] and a speaker identification method based on GMM [2] to construct an IVAD system.

The report is available at filereport.pdf

First Edition: May 2008. Last Modified: May 2008
Tag: Scientific SoundProcessing VAD SpeakerIdentification GMM


Refer [1].

The GMM based Speaker Identification

Refer [2]


I have some doubts for PARADE [1].

  1. Equation (2) in [3] was not accurate. This affected a terrible result.
  2. Human voice formants are not clearly multiples of fundamental frequency F0. Formants would be like 100, 220, 310, 390.
  3. Picking points at multiples of F0 is feasible. Frequency analysis of human voice signal wave is not a clean smoothed curve, but jaggy curve in practice. Values at 2*F0 and 2*F0 -1 would be different so much, and the big difference will affect the final result considerably. We should smooth signal beforehand or estimate envelop of signals using methods discussed in Project: Source-Filter Separation for example and use the envelop signals.


  • [1] T. Fujimoto M. Ishizuka, K. Nakatani and N. Miyazaki, Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio, Proc. Interspeech 7 (2007), 230–233.
  • [2] D. A. Reynolds, Speaker identification and verification using gaussian mixture speaker models, Speech Communication 17 (1995), 91–108.
  • [3] Similar paper for [1] is available online at SAPA2006 Website