Technical Report

A Neurally Motivated Technique for Voicing Detection and F0 Estimation for Speech

Details

Citation

Smith L (1996) A Neurally Motivated Technique for Voicing Detection and F0 Estimation for Speech. CCCN Technical Report, 22. Centre for Cognitive and Computational Neuroscience (CCCN).

Abstract
Speech consists of alternating voiced and unvoiced sections. Voiced speech consists of multiple harmonics of some fundamental ($F_{0}$); unvoiced speech consists of silence, or filtered noise. Here, speech is wideband bandpass filtered into many bands (modelling the cochlea). Each filter output is rectified (modelling the organ of Corti hair cell action), and bandpass filtered by convolution with the difference between two Gaussian averaging functions. This detects and emphasises the amplitude modulation resulting from unresolved harmonics (and models the combined effect of the auditory nerve and certain cochlear nucleus cell types). This output is compressed, summed across the bands, then used to discover glottal pulses. The presence of glottal pulses signals voicing, and the time between glottal pulses is used to find $F_{0}$. Results show good performance, particularly on male speakers. The system is reasonably resistant to background noise.

StatusPublished
Title of seriesCCCN Technical Report
Number in series22
Publication date31/07/1996
PublisherCentre for Cognitive and Computational Neuroscience (CCCN)
ISSN of series0968-0640

People (1)

Professor Leslie Smith

Professor Leslie Smith

Emeritus Professor, Computing Science