LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Matthias Wimmer; Björn Schuller; Dejan Arsic; Gerhard Rigoll; Bernd Radig

doi:10.5220/0001082801450151

LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Matthias Wimmer, Björn Schuller, Dejan Arsic, Gerhard Rigoll, Bernd Radig

2008

Abstract

Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turn- or chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect. However, early fusion is known to be more effective in many other multimodal recognition tasks. We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification. This strategy also allows for a combined feature-space optimization which will be discussed herein. The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.

Download

Paper Citation

in Harvard Style

Wimmer M., Schuller B., Arsic D., Rigoll G. and Radig B. (2008). LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION . In Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008) ISBN 978-989-8111-21-0, pages 145-151. DOI: 10.5220/0001082801450151

in Bibtex Style

@conference{visapp08,
author={Matthias Wimmer and Björn Schuller and Dejan Arsic and Gerhard Rigoll and Bernd Radig},
title={LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION},
booktitle={Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008)},
year={2008},
pages={145-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001082801450151},
isbn={978-989-8111-21-0},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008)
TI - LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION
SN - 978-989-8111-21-0
AU - Wimmer M.
AU - Schuller B.
AU - Arsic D.
AU - Rigoll G.
AU - Radig B.
PY - 2008
SP - 145
EP - 151
DO - 10.5220/0001082801450151