Technical Report
Details
Citation
Smith L (1996) A Neurally Motivated Technique for Voicing Detection and F0 Estimation for Speech. CCCN Technical Report, 22. Centre for Cognitive and Computational Neuroscience (CCCN).
Abstract
Speech consists of alternating voiced and unvoiced sections. Voiced speech consists of multiple harmonics of some fundamental ($F_{0}$); unvoiced speech consists of silence, or filtered noise. Here, speech is wideband bandpass filtered into many bands (modelling the cochlea). Each filter output is rectified (modelling the organ of Corti hair cell action), and bandpass filtered by convolution with the difference between two Gaussian averaging functions. This detects and emphasises the amplitude modulation resulting from unresolved harmonics (and models the combined effect of the auditory nerve and certain cochlear nucleus cell types). This output is compressed, summed across the bands, then used to discover glottal pulses. The presence of glottal pulses signals voicing, and the time between glottal pulses is used to find $F_{0}$. Results show good performance, particularly on male speakers. The system is reasonably resistant to background noise.
Status | Published |
---|---|
Title of series | CCCN Technical Report |
Number in series | 22 |
Publication date | 31/07/1996 |
Publisher | Centre for Cognitive and Computational Neuroscience (CCCN) |
ISSN of series | 0968-0640 |
People (1)
Emeritus Professor, Computing Science