STATISTICAL LANGUAGE IDENTIFICATION OF SHORT TEXTS

Fela Winkelmolen, Viviana Mascardi

2011

Abstract

Although correctly identifying the language of short texts should prove useful in a large number of applications, few satisfactory attemps are reported in the literature. In this paper we describe a Naive Bayes Classifier that performs well on very short texts, as well as the corpus that we created from movie subtitles for training it. Both the corpus and the algorithm are available under the GNU Lesser General Public License.

Download


Paper Citation


in Harvard Style

Winkelmolen F. and Mascardi V. (2011). STATISTICAL LANGUAGE IDENTIFICATION OF SHORT TEXTS . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 498-503. DOI: 10.5220/0003294404980503

in Bibtex Style

@conference{icaart11,
author={Fela Winkelmolen and Viviana Mascardi},
title={STATISTICAL LANGUAGE IDENTIFICATION OF SHORT TEXTS},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={498-503},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003294404980503},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - STATISTICAL LANGUAGE IDENTIFICATION OF SHORT TEXTS
SN - 978-989-8425-40-9
AU - Winkelmolen F.
AU - Mascardi V.
PY - 2011
SP - 498
EP - 503
DO - 10.5220/0003294404980503