Auteurs : Fabian Brugger (LTCI), Leila Zouari (LTCI), Hervé Bredin (LTCI), Asma Amehraye (STICC), Gérard Chollet (LTCI), Dominique Pastor (STICC), Yang Ni (SAMOVAR)

Conférence : Communications avec actes dans un congrès international - 12/06/2006 - Journées d'Etude sur la Parole

This article presents a new Electronic Retina based Smart
Microphone (VMike) and investigates the use of its novel
parameters – lip profiles – in audiovisual speech recognition. In order to evaluate the parameterization, both an audio only and a video only speech recognition system are
developed and tested. Then, two main fusion techniques
are employed to test the usability of profiles in audiovisual
systems : feature fusion and decision fusion. These results
are compared to the performance of recognizers based on
a state-of-the-art parameterization, and also to results obtained by applying perceptual filtering to the speech signal prior to recognition. When feature fusion is applied,
and under noisy conditions, recognition using lip profiles
improved by up to 13 percent with respect to audio-only
recognition.