Technical Report

A Neurally Motivated Technique for Voicing Detection and F0 Estimation for Speech

Details

Citation

Smith L (1996) A Neurally Motivated Technique for Voicing Detection and F0 Estimation for Speech. CCCN Technical Report, 22. Centre for Cognitive and Computational Neuroscience (CCCN).

Abstract
Speech consists of alternating voiced and unvoiced sections. Voiced speech consists of multiple harmonics of some fundamental ($F_{0}$); unvoiced speech consists of silence, or filtered noise. Here, speech is wideband bandpass filtered into many bands (modelling the cochlea). Each filter output is rectified (modelling the organ of Corti hair cell action), and bandpass filtered by convolution with the difference between two Gaussian averaging functions. This detects and emphasises the amplitude modulation resulting from unresolved harmonics (and models the combined effect of the auditory nerve and certain cochlear nucleus cell types). This output is compressed, summed across the bands, then used to discover glottal pulses. The presence of glottal pulses signals voicing, and the time between glottal pulses is used to find $F_{0}$. Results show good performance, particularly on male speakers. The system is reasonably resistant to background noise.

Status	Published
Title of series	CCCN Technical Report
Number in series	22
Publication date	31/07/1996
Publisher	Centre for Cognitive and Computational Neuroscience (CCCN)
ISSN of series	0968-0640

People (1)

Professor Leslie Smith

Emeritus Professor, Computing Science