Unconstrained Speech Segmentation using Deep Neural Networks

Van Zyl van Vuuren, Louis ten Bosch, Thomas Niesler

2015

Abstract

We propose a method for improving the unconstrained segmentation of speech into phoneme-like units using deep neural networks. The proposed approach is not dependent on acoustic models or forced alignment, but operates using the acoustic features directly. Previous solutions of this type were plagued by the tendency to hypothesise additional incorrect phoneme boundaries near the phoneme transitions. We show that the application of deep neural networks is able to reduce this over-segmentation substantially, and achieve improved segmentation accuracies. Furthermore, we find that generative pre-training offers an additional benefit.

Download


Paper Citation


in Harvard Style

Zyl van Vuuren V., ten Bosch L. and Niesler T. (2015). Unconstrained Speech Segmentation using Deep Neural Networks . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 248-254. DOI: 10.5220/0005201802480254

in Bibtex Style

@conference{icpram15,
author={Van Zyl van Vuuren and Louis ten Bosch and Thomas Niesler},
title={Unconstrained Speech Segmentation using Deep Neural Networks},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={248-254},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005201802480254},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Unconstrained Speech Segmentation using Deep Neural Networks
SN - 978-989-758-076-5
AU - Zyl van Vuuren V.
AU - ten Bosch L.
AU - Niesler T.
PY - 2015
SP - 248
EP - 254
DO - 10.5220/0005201802480254