EXPLORING BASQUE DOCUMENT CATEGORIZATION FOR EDUCATIONAL PURPOSES USING LSI

A. Zelaia, I. Alegria, O. Arregi, A. Arruarte, A. Díaz de Ilarraza, J. A. Elorriaga, B. Sierra

2009

Abstract

In the process of preparing learning material for Computer Supported Learning Systems (CSLSs), one of the first steps involves finding documents relevant to the topics and to the students. This requires documents to be categorized according to some criteria. In this paper we analyze the behaviour of classification techniques such as Na " i ve Bayes, Winnow, SVMs and k-NN, together with lemmatization and noun selection, in the categorization of documents written in Basque. In a second experiment, we study the effect of applying the Singular Value Decomposition (SVD) dimensionality reduction technique before using the mentioned classification techniques. The results obtained show that the approach which combines SVD and k-NN for a lemmatized corpus gives the best categorization of all with a remarkable difference. The final aim pursued in this project is to facilitate the semiautomatic construction of the domain module of a CSLS.

Download


Paper Citation


in Harvard Style

Zelaia A., Alegria I., Arregi O., Arruarte A., Díaz de Ilarraza A., Elorriaga J. and Sierra B. (2009). EXPLORING BASQUE DOCUMENT CATEGORIZATION FOR EDUCATIONAL PURPOSES USING LSI . In Proceedings of the First International Conference on Computer Supported Education - Volume 1: CSEDU, ISBN 978-989-8111-82-1, pages 5-9. DOI: 10.5220/0001834300050009

in Bibtex Style

@conference{csedu09,
author={A. Zelaia and I. Alegria and O. Arregi and A. Arruarte and A. Díaz de Ilarraza and J. A. Elorriaga and B. Sierra},
title={EXPLORING BASQUE DOCUMENT CATEGORIZATION FOR EDUCATIONAL PURPOSES USING LSI},
booktitle={Proceedings of the First International Conference on Computer Supported Education - Volume 1: CSEDU,},
year={2009},
pages={5-9},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001834300050009},
isbn={978-989-8111-82-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Computer Supported Education - Volume 1: CSEDU,
TI - EXPLORING BASQUE DOCUMENT CATEGORIZATION FOR EDUCATIONAL PURPOSES USING LSI
SN - 978-989-8111-82-1
AU - Zelaia A.
AU - Alegria I.
AU - Arregi O.
AU - Arruarte A.
AU - Díaz de Ilarraza A.
AU - Elorriaga J.
AU - Sierra B.
PY - 2009
SP - 5
EP - 9
DO - 10.5220/0001834300050009