PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION

Luca Cappelletta, Naomi Harte

2012

Abstract

Phonemes are the standard modelling unit in HMM-based continuous speech recognition systems. Visemes are the equivalent unit in the visual domain, but there is less agreement on precisely what visemes are, or how many to model on the visual side in audio-visual speech recognition systems. This paper compares the use of 5 viseme maps in a continuous speech recognition task. The focus of the study is visual-only recognition to examine the choice of viseme map. All the maps are based on the phoneme-to-viseme approach, created either using a linguistic method or a data driven method. DCT, PCA and optical flow are used to derive the visual features. The best visual-only recognition on the VidTIMIT database is achieved using a linguistically motivated viseme set. These initial experiments demonstrate that the choice of visual unit requires more careful attention in audio-visual speech recognition system development.

Download


Paper Citation


in Harvard Style

Cappelletta L. and Harte N. (2012). PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 322-329. DOI: 10.5220/0003731903220329

in Bibtex Style

@conference{icpram12,
author={Luca Cappelletta and Naomi Harte},
title={PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2012},
pages={322-329},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003731903220329},
isbn={978-989-8425-99-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - PHONEME-TO-VISEME MAPPING FOR VISUAL SPEECH RECOGNITION
SN - 978-989-8425-99-7
AU - Cappelletta L.
AU - Harte N.
PY - 2012
SP - 322
EP - 329
DO - 10.5220/0003731903220329