Combining Contextual and Modal Action Information into a Weighted Multikernel SVM for Human Action Recognition

Jordi Bautista-Ballester, Jaume Jaume Vergés-Llahí, Domenec Puig

2016

Abstract

Understanding human activities is one of the most challenging modern topics for robots. Either for imitation or anticipation, robots must recognize which action is performed by humans when they operate in a human environment. Action classification using a Bag of Words (BoW) representation has shown computational simplicity and good performance, but the increasing number of categories, including actions with high confusion, and the addition, especially in human robot interactions, of significant contextual and multimodal information has led most authors to focus their efforts on the combination of image descriptors. In this field, we propose the Contextual and Modal MultiKernel Learning Support Vector Machine (CMMKL-SVM). We introduce contextual information -objects directly related to the performed action by calculating the codebook from a set of points belonging to objects- and multimodal information -features from depth and 3D images resulting in a set of two extra modalities of information in addition to RGB images-. We code the action videos using a BoW representation with both contextual and modal information and introduce them to the optimal SVM kernel as a linear combination of single kernels weighted by learning. Experiments have been carried out on two action databases, CAD-120 and HMDB. The upturn achieved with our approach attained the same results for high constrained databases with respect to other similar approaches of the state of the art and it is much better as much realistic is the database, reaching a performance improvement of 14.27 % for HMDB.

Download


Paper Citation


in Harvard Style

Bautista-Ballester J., Jaume Vergés-Llahí J. and Puig D. (2016). Combining Contextual and Modal Action Information into a Weighted Multikernel SVM for Human Action Recognition . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 299-307. DOI: 10.5220/0005669002990307

in Bibtex Style

@conference{visapp16,
author={Jordi Bautista-Ballester and Jaume Jaume Vergés-Llahí and Domenec Puig},
title={Combining Contextual and Modal Action Information into a Weighted Multikernel SVM for Human Action Recognition},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={299-307},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005669002990307},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Combining Contextual and Modal Action Information into a Weighted Multikernel SVM for Human Action Recognition
SN - 978-989-758-175-5
AU - Bautista-Ballester J.
AU - Jaume Vergés-Llahí J.
AU - Puig D.
PY - 2016
SP - 299
EP - 307
DO - 10.5220/0005669002990307