ONTOLOGY-DRIVEN CONCEPTUAL DOCUMENT CLASSIFICATION

Gordana Pavlović-Lažetić, Jelena Graovac

2010

Abstract

Document classification based on the lexical-semantic network, wordnet, is presented. Two types of document classification in Serbian have been experimented with – classification based on chosen concepts from Serbian WordNet (SWN) and proper names-based classification. Conceptual document classification criteria are constructed from hierarchies rooted in a set of chosen concepts (first case) or in hierarchies rooted in some of the proper names' hypernyms (second case). A classificator of the first type is trained and then tested on an indexed and already classified Ebart corpus of Serbian newspapers (476917 articles). Precision, recall and F-measure show that this type of classification is promising although incomplete due mainly to SWN incompleteness. In the context of proper names-based classification, a proper names ontology based on the SWN is presented in the paper. A distance based similarity measure is defined, based on Euclidean and Manhattan distances. Classification of a subset of Contemporary Serbian Language Corpus is presented.

Download


Paper Citation


in Harvard Style

Pavlović-Lažetić G. and Graovac J. (2010). ONTOLOGY-DRIVEN CONCEPTUAL DOCUMENT CLASSIFICATION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 383-386. DOI: 10.5220/0003063903830386

in Bibtex Style

@conference{kdir10,
author={Gordana Pavlović-Lažetić and Jelena Graovac},
title={ONTOLOGY-DRIVEN CONCEPTUAL DOCUMENT CLASSIFICATION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={383-386},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003063903830386},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - ONTOLOGY-DRIVEN CONCEPTUAL DOCUMENT CLASSIFICATION
SN - 978-989-8425-28-7
AU - Pavlović-Lažetić G.
AU - Graovac J.
PY - 2010
SP - 383
EP - 386
DO - 10.5220/0003063903830386