An Experimentation Line for Underlying Graphemic Properties - Acquiring Knowledge from Text Data with Self Organizing Maps

Gilles Bernard, Nourredine Aliane, Otman Manad

2015

Abstract

We present an experimentation line that encompasses various stages for research on graphemes distribution and unsupervised classification. We aim to help close the gap between recent research results showing the abilities of unsupervised learning and clustering algorithms to detect underlying properties of phonemes and the present possibilities of Unicode textual representation. Our procedures need to ensure repeatability and guarantee that no information is implicitely present in the preprocessing of data. Our approach is able to categorize potential graphemes correctly, thus showing that not only phonemic properties are indeed present in textual data, but that they can be automatically retrieved from raw-unicode text data and translated into phonemic representations. By the way, we observe that SOM algorithm copes well with very sparse vectors.

Download


Paper Citation


in Harvard Style

Bernard G., Aliane N. and Manad O. (2015). An Experimentation Line for Underlying Graphemic Properties - Acquiring Knowledge from Text Data with Self Organizing Maps . In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ANNIIP, (ICINCO 2015) ISBN 978-989-758-122-9, pages 659-666. DOI: 10.5220/0005577706590666

in Bibtex Style

@conference{anniip15,
author={Gilles Bernard and Nourredine Aliane and Otman Manad},
title={An Experimentation Line for Underlying Graphemic Properties - Acquiring Knowledge from Text Data with Self Organizing Maps},
booktitle={Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ANNIIP, (ICINCO 2015)},
year={2015},
pages={659-666},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005577706590666},
isbn={978-989-758-122-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ANNIIP, (ICINCO 2015)
TI - An Experimentation Line for Underlying Graphemic Properties - Acquiring Knowledge from Text Data with Self Organizing Maps
SN - 978-989-758-122-9
AU - Bernard G.
AU - Aliane N.
AU - Manad O.
PY - 2015
SP - 659
EP - 666
DO - 10.5220/0005577706590666