ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS
Alexandra Cernian, Liliana Dobrica, Dorin Carstoiu, Valentin Sgarciu
2010
Abstract
Current Web search engines return long lists of ranked documents that users are forced to sift through to find relevant documents. This paper introduces a new approach for clustering Web search results, based on the notion of clustering by compression. Compression algorithms allow defining a similarity measure based on the degree of common information. Classification methods allow clustering similar data without any previous knowledge. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files. Our goal is to apply the clustering by compression algorithm in order to cluster the documents returned by a Web search engine in response to a user query.
DownloadPaper Citation
in Harvard Style
Cernian A., Dobrica L., Carstoiu D. and Sgarciu V. (2010). ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8425-22-5, pages 293-298. DOI: 10.5220/0002926102930298
in Bibtex Style
@conference{icsoft10,
author={Alexandra Cernian and Liliana Dobrica and Dorin Carstoiu and Valentin Sgarciu},
title={ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2010},
pages={293-298},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002926102930298},
isbn={978-989-8425-22-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - ON USING THE NORMALIZED COMPRESSION DISTANCE TO CLUSTER WEB SEARCH RESULTS
SN - 978-989-8425-22-5
AU - Cernian A.
AU - Dobrica L.
AU - Carstoiu D.
AU - Sgarciu V.
PY - 2010
SP - 293
EP - 298
DO - 10.5220/0002926102930298