ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL

Yevgeniy Guseynov

2011

Abstract

In this paper, we present a formalization of an Index Assignment process that was used against documents stored in a text database. The process uses key phrases or terms from a hierarchical thesaurus or ontology and is based on the new notion of entropy on ontology for terms and their weights that is an extension of the Shannon concept of entropy in Information Theory and the Resnik semantic similarity measure for terms on ontology. Introduced notion provides a measure of closeness or semantic similarity for a set of terms in ontology and their weights and allows creation of a clustering algorithm that constructively resolves index assignment task. The algorithm was tested on 30,000 documents randomly extracted from MEDLINE biomedicine database that are manually indexed by professional indexers. The main output from experiments shows that after all 30,000 documents were processed in seven topics out of ten the presented algorithm and human indexers have the same understanding of documents.

Download


Paper Citation


in Harvard Style

Guseynov Y. (2011). ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL . In Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8425-51-5, pages 555-567. DOI: 10.5220/0003298205550567

in Bibtex Style

@conference{webist11,
author={Yevgeniy Guseynov},
title={ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL},
booktitle={Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2011},
pages={555-567},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003298205550567},
isbn={978-989-8425-51-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL
SN - 978-989-8425-51-5
AU - Guseynov Y.
PY - 2011
SP - 555
EP - 567
DO - 10.5220/0003298205550567