Data Clustering Validation using Constraints

João M. N. Duarte, Ana L. N. Fred, F. Jorge F. Duarte

2013

Abstract

Much attention is being given to the incorporation of constraints into data clustering, mainly expressed in the form of must-link and cannot-link constraints between pairs of domain objects. However, its inclusion in the important clustering validation process was so far disregarded. In this work, we integrate the use of constraints in clustering validation. We propose three approaches to accomplish it: produce a weighted validity score considering a traditional validity index and the constraint satisfaction ratio; learn a new distance function or feature space representation which better suits the constraints, and use it with a validation index; and a combination of the previous. Experimental results in 14 synthetic and real data sets have shown that including the information provided by the constraints increases the performance of the clustering validation process in selecting the best number of clusters.

Download


Paper Citation


in Harvard Style

M. N. Duarte J., L. N. Fred A. and F. Duarte F. (2013). Data Clustering Validation using Constraints . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 17-27. DOI: 10.5220/0004543800170027

in Bibtex Style

@conference{kdir13,
author={João M. N. Duarte and Ana L. N. Fred and F. Jorge F. Duarte},
title={Data Clustering Validation using Constraints},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={17-27},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004543800170027},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Data Clustering Validation using Constraints
SN - 978-989-8565-75-4
AU - M. N. Duarte J.
AU - L. N. Fred A.
AU - F. Duarte F.
PY - 2013
SP - 17
EP - 27
DO - 10.5220/0004543800170027