A COMPARATIVE EVALUATION OF PROXIMITY MEASURES FOR SPECTRAL CLUSTERING

Nadia Farhanaz Azam, Herna L. Viktor

2011

Abstract

A cluster analysis algorithm is considered successful when the data is clustered into meaningful groups so that the objects in the same group are similar, and the objects residing in two different groups are different from one another. One such cluster analysis algorithm, the spectral clustering algorithm, has been deployed across numerous domains ranging from image processing to clustering protein sequences with a wide range of data types. The input, in this case, is a similarity matrix, constructed from the pair-wise similarity between the data objects. The pair-wise similarity between the objects is calculated by employing a proximity (similarity, dissimilarity or distance) measure. It follows that the success of a spectral clustering algorithm therefore heavily depends on the selection of the proximity measure. While, the majority of prior research on the spectral clustering algorithm emphasizes the algorithm-specific issues, little research has been performed on the evaluation of the performance of the proximity measures. To this end, we perform a comparative and exploratory analysis on several existing proximity measures to evaluate their suitability for the spectral clustering algorithm. Our results indicate that the commonly used Euclidean distance measure may not always be a good choice especially in domains where the data is highly imbalanced and the correct clustering of the boundary objects are crucial. Furthermore, for numeric data, measures based on the relative distances often yield better results than measures based on the absolute distances, specifically when aiming to cluster boundary objects. When considering mixed data, the measure for numeric data has the highest impact on the final outcome and, again, the use of the Euclidian measure may be inappropriate.

Download


Paper Citation


in Harvard Style

Farhanaz Azam N. and Viktor H. (2011). A COMPARATIVE EVALUATION OF PROXIMITY MEASURES FOR SPECTRAL CLUSTERING . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 30-41. DOI: 10.5220/0003649000300041

in Bibtex Style

@conference{kdir11,
author={Nadia Farhanaz Azam and Herna L. Viktor},
title={A COMPARATIVE EVALUATION OF PROXIMITY MEASURES FOR SPECTRAL CLUSTERING},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={30-41},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003649000300041},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A COMPARATIVE EVALUATION OF PROXIMITY MEASURES FOR SPECTRAL CLUSTERING
SN - 978-989-8425-79-9
AU - Farhanaz Azam N.
AU - Viktor H.
PY - 2011
SP - 30
EP - 41
DO - 10.5220/0003649000300041