Book Chapter

Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation

Details

Citation

Abel A, Hussain A, Nguyen Q, Ringeval F, Chetouani M & Milgram M (2009) Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation. In: Fierrez J, Ortega-Garcia J, Esposito A, Drygajlo A & Faundez-Zanuy M (eds.) Biometric ID Management and Multimodal Communication: Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain: September 2009, Proceedings. Lecture Notes in Computer Science, 5707. Berlin, Germany: Springer-Verlag, pp. 65-72. http://www.springer.com/computer/image+processing/book/978-3-642-04390-1; https://doi.org/10.1007/978-3-642-04391-8_9

Abstract
In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automatically segment speech. Canonical Correlation Analysis (CCA) on segmented and non segmented data in a range of noisy speech environments finds that segmented speech has a significantly better audiovisual correlation, demonstrating the feasibility of our techniques for further development as part of a proposed audiovisual speech enhancement system.

Status	Published
Title of series	Lecture Notes in Computer Science
Number in series	5707
Publication date	31/12/2009
URL	http://hdl.handle.net/1893/10872
Publisher	Springer-Verlag
Publisher URL	http://www.springer.com/…78-3-642-04390-1
Place of publication	Berlin, Germany
ISSN of series	0302-9743
ISBN	978-3642043901