ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS

Alexandra Cernian, Liliana Dobrica, Dorin Carstoiu, Valentin Sgarciu

2010

Abstract

Current Web search engines return long lists of ranked documents that users are forced to sift through to find relevant documents. This paper introduces a new approach for clustering Web search results, based on the notion of clustering by compression. Compression algorithms allow defining a similarity measure based on the degree of common information. Classification methods allow clustering similar data without any previous knowledge. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files. Our goal is to apply the clustering by compression algorithm in order to cluster the documents returned by a Web search engine in response to a user query.

Download


Paper Citation


in Harvard Style

Cernian A., Dobrica L., Carstoiu D. and Sgarciu V. (2010). ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8425-22-5, pages 293-298. DOI: 10.5220/0002926102930298

in Bibtex Style

@conference{icsoft10,
author={Alexandra Cernian and Liliana Dobrica and Dorin Carstoiu and Valentin Sgarciu},
title={ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2010},
pages={293-298},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002926102930298},
isbn={978-989-8425-22-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS
SN - 978-989-8425-22-5
AU - Cernian A.
AU - Dobrica L.
AU - Carstoiu D.
AU - Sgarciu V.
PY - 2010
SP - 293
EP - 298
DO - 10.5220/0002926102930298