Audiovisual Data Fusion for Successive Speakers Tracking
Quentin Labourey, Olivier Aycard, Denis Pellerin, Michele Rombaut
2014
Abstract
In this paper, a human speaker tracking method on audio and video data is presented. It is applied to conversation tracking with a robot. Audiovisual data fusion is performed in a two-steps process. Detection is performed independently on each modality: face detection based on skin color on video data and sound source localization based on the time delay of arrival on audio data. The results of those detection processes are then fused thanks to an adaptation of bayesian filter to detect the speaker. The robot is able to detect the face of the talking person and to detect a new speaker in a conversation.
DownloadPaper Citation
in Harvard Style
Labourey Q., Aycard O., Pellerin D. and Rombaut M. (2014). Audiovisual Data Fusion for Successive Speakers Tracking . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-003-1, pages 696-701. DOI: 10.5220/0004852506960701
in Bibtex Style
@conference{visapp14,
author={Quentin Labourey and Olivier Aycard and Denis Pellerin and Michele Rombaut},
title={Audiovisual Data Fusion for Successive Speakers Tracking},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={696-701},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004852506960701},
isbn={978-989-758-003-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)
TI - Audiovisual Data Fusion for Successive Speakers Tracking
SN - 978-989-758-003-1
AU - Labourey Q.
AU - Aycard O.
AU - Pellerin D.
AU - Rombaut M.
PY - 2014
SP - 696
EP - 701
DO - 10.5220/0004852506960701