SEGMENTED–MEMORY RECURRENT NEURAL NETWORKS VERSUS HIDDEN MARKOV MODELS IN EMOTION RECOGNITION FROM SPEECH

Stefan Glüge, Ronald Böck, Andreas Wendemuth

2011

Abstract

Emotion recognition from speech means to determine the emotional state of a speaker from his or her voice. Today’s most used classifiers in this field are Hidden Markov Models (HMMs) and Support Vector Machines. Both architectures are not made to consider the full dynamic character of speech. However, HMMs are able to capture the temporal characteristics of speech on phoneme, word, or utterance level but fail to learn the dynamics of the input signal on short time scales (e.g., frame rate). The use of dynamical features (first and second derivatives of speech features) attenuates this problem. We propose the use of Segmented-Memory Recurrent Neural Networks to learn the full spectrum of speech dynamics. Therefore, the dynamical features can be removed form the input data. The resulting neural network classifier is compared to HMMs that use the reduced feature set as well as to HMMs that work with the full set of features. The networks perform comparable to HMMs while using significantly less features.

Download


Paper Citation


in Harvard Style

Glüge S., Böck R. and Wendemuth A. (2011). SEGMENTED–MEMORY RECURRENT NEURAL NETWORKS VERSUS HIDDEN MARKOV MODELS IN EMOTION RECOGNITION FROM SPEECH . In Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2011) ISBN 978-989-8425-84-3, pages 308-315. DOI: 10.5220/0003644003080315

in Bibtex Style

@conference{ncta11,
author={Stefan Glüge and Ronald Böck and Andreas Wendemuth},
title={SEGMENTED–MEMORY RECURRENT NEURAL NETWORKS VERSUS HIDDEN MARKOV MODELS IN EMOTION RECOGNITION FROM SPEECH},
booktitle={Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2011)},
year={2011},
pages={308-315},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003644003080315},
isbn={978-989-8425-84-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2011)
TI - SEGMENTED–MEMORY RECURRENT NEURAL NETWORKS VERSUS HIDDEN MARKOV MODELS IN EMOTION RECOGNITION FROM SPEECH
SN - 978-989-8425-84-3
AU - Glüge S.
AU - Böck R.
AU - Wendemuth A.
PY - 2011
SP - 308
EP - 315
DO - 10.5220/0003644003080315