Applying Information-theoretic and Edit Distance Approaches to Flexibly Measure Lexical Similarity

Thi Thuy Anh Nguyen, Stefan Conrad

2014

Abstract

Measurement of similarity plays an important role in data mining and information retrieval. Several techniques for calculating the similarities between objects have been proposed so far, for example, lexical-based, structure-based and instance-based measures. Existing lexical similarity measures usually base on either ngrams or Dice’s approaches to obtain correspondences between strings. Although these measures are efficient, they are inadequate in situations where strings are quite similar or the sets of characters are the same but their positions are different in strings. In this paper, a lexical similarity approach combining information-theoretic model and edit distance to determine correspondences among the concept labels is developed. Precision, Recall and F-measure as well as partial OAEI benchmark 2008 tests are used to evaluate the proposed method. The results show that our approach is flexible and has some prominent features compared to other lexical-based methods.

Download


Paper Citation


in Harvard Style

Nguyen T. and Conrad S. (2014). Applying Information-theoretic and Edit Distance Approaches to Flexibly Measure Lexical Similarity . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014) ISBN 978-989-758-048-2, pages 505-511. DOI: 10.5220/0005170005050511

in Bibtex Style

@conference{sstm14,
author={Thi Thuy Anh Nguyen and Stefan Conrad},
title={Applying Information-theoretic and Edit Distance Approaches to Flexibly Measure Lexical Similarity},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)},
year={2014},
pages={505-511},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005170005050511},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)
TI - Applying Information-theoretic and Edit Distance Approaches to Flexibly Measure Lexical Similarity
SN - 978-989-758-048-2
AU - Nguyen T.
AU - Conrad S.
PY - 2014
SP - 505
EP - 511
DO - 10.5220/0005170005050511