The GENIE Project - A Semantic Pipeline for Automatic Document Categorisation

Angel L. Garrido, Maria G. Buey, Sandra Escudero, Alvaro Peiro, Sergio Ilarri, Eduardo Mena

2014

Abstract

Automatic text categorisation systems is a type of software that every day it is receiving more interest, due not only to its use in documentaries environments but also to its possible application to tag properly documents on the Web. Many options have been proposed to face this subject using statistical approaches, natural language processing tools, ontologies and lexical databases. Nevertheless, there have been no too many empirical evaluations comparing the influence of the different tools used to solve these problems, particularly in a multilingual environment. In this paper we propose a multi-language rule-based pipeline system for automatic document categorisation and we compare empirically the results of applying techniques that rely on statistics and supervised learning with the results of applying the same techniques but with the support of smarter tools based on language semantics and ontologies, using for this purpose several corpora of documents. GENIE is being applied to real environments, which shows the potential of the proposal.

Download


Paper Citation


in Harvard Style

Garrido A., Buey M., Escudero S., Peiro A., Ilarri S. and Mena E. (2014). The GENIE Project - A Semantic Pipeline for Automatic Document Categorisation . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-024-6, pages 161-171. DOI: 10.5220/0004750601610171

in Bibtex Style

@conference{webist14,
author={Angel L. Garrido and Maria G. Buey and Sandra Escudero and Alvaro Peiro and Sergio Ilarri and Eduardo Mena},
title={The GENIE Project - A Semantic Pipeline for Automatic Document Categorisation},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2014},
pages={161-171},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004750601610171},
isbn={978-989-758-024-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - The GENIE Project - A Semantic Pipeline for Automatic Document Categorisation
SN - 978-989-758-024-6
AU - Garrido A.
AU - Buey M.
AU - Escudero S.
AU - Peiro A.
AU - Ilarri S.
AU - Mena E.
PY - 2014
SP - 161
EP - 171
DO - 10.5220/0004750601610171