In this thesis, we propose a few practical statistical voice activity detectors (VADs) which combine the voice activity information in the short-term and long-term statistics of the speech signal. Unlike most VADs, which assume that the cues to activity lie within the frame alone, the proposed VAD schemes seek information for activity in the current as well as the neighboring frames. Particularly, we develop primary and contextual detectors to process the short-term and long-term information, respectively. We use the perceptual Ephraim-Malah (PEM) model to develop three primary detectors based on the Bayesian, Neyman-Pearson (NP) and competitive NP (CNP) approaches. Moreover, upon viewing voice activity detection as a composite hypothesis where the prior signal-to-noise ratio (SNR) forms the free parameter, we reveal that a correlation between the prior SNR and the hypothesis exists, i.e., a high prior SNR is more likely to be associated with 'speech hypothesis' than the 'pause hypothesis' and vice-versa, and unlike the Bayesian and NP approaches, the CNP approach alone exploits this correlation