Unconstrained Speech Segmentation using Deep Neural Networks
Van Zyl van Vuuren, Louis ten Bosch, Thomas Niesler
2015
Abstract
We propose a method for improving the unconstrained segmentation of speech into phoneme-like units using deep neural networks. The proposed approach is not dependent on acoustic models or forced alignment, but operates using the acoustic features directly. Previous solutions of this type were plagued by the tendency to hypothesise additional incorrect phoneme boundaries near the phoneme transitions. We show that the application of deep neural networks is able to reduce this over-segmentation substantially, and achieve improved segmentation accuracies. Furthermore, we find that generative pre-training offers an additional benefit.
DownloadPaper Citation
in Harvard Style
Zyl van Vuuren V., ten Bosch L. and Niesler T. (2015). Unconstrained Speech Segmentation using Deep Neural Networks . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 248-254. DOI: 10.5220/0005201802480254
in Bibtex Style
@conference{icpram15,
author={Van Zyl van Vuuren and Louis ten Bosch and Thomas Niesler},
title={Unconstrained Speech Segmentation using Deep Neural Networks},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={248-254},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005201802480254},
isbn={978-989-758-076-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Unconstrained Speech Segmentation using Deep Neural Networks
SN - 978-989-758-076-5
AU - Zyl van Vuuren V.
AU - ten Bosch L.
AU - Niesler T.
PY - 2015
SP - 248
EP - 254
DO - 10.5220/0005201802480254