VISUAL SPEECH SYNTHESIS FROM 3D VIDEO

J. D. Edge, A. Hilton

2007

Abstract

Data-driven approaches to 2D facial animation from video have achieved highly realistic results. In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance. Animation from real speech is performed by path optimisation over a graph representation of phonetically segmented captured 3D video. A novel similarity metric using a hierarchical wavelet decomposition is presented to identify transitions between 3D video frames without visual artifacts in facial shape, appearance or dynamics. Face synthesis is performed by playing back segments of the captured 3D video to accurately reproduce facial dynamics. The framework allows visual speech synthesis from captured 3D video with minimal user intervention. Results are presented for synthesis from a database of 12minutes (18000 frames) of 3D video which demonstrate highly realistic facial animation.

Download


Paper Citation


in Harvard Style

D. Edge J. and Hilton A. (2007). VISUAL SPEECH SYNTHESIS FROM 3D VIDEO . In Proceedings of the Second International Conference on Computer Graphics Theory and Applications - Volume 2: GRAPP, ISBN 978-972-8865-72-6, pages 57-62. DOI: 10.5220/0002080400570062

in Bibtex Style

@conference{grapp07,
author={J. D. Edge and A. Hilton},
title={VISUAL SPEECH SYNTHESIS FROM 3D VIDEO},
booktitle={Proceedings of the Second International Conference on Computer Graphics Theory and Applications - Volume 2: GRAPP,},
year={2007},
pages={57-62},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002080400570062},
isbn={978-972-8865-72-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Computer Graphics Theory and Applications - Volume 2: GRAPP,
TI - VISUAL SPEECH SYNTHESIS FROM 3D VIDEO
SN - 978-972-8865-72-6
AU - D. Edge J.
AU - Hilton A.
PY - 2007
SP - 57
EP - 62
DO - 10.5220/0002080400570062