Data Clustering Validation using Constraints
João M. N. Duarte, Ana L. N. Fred, F. Jorge F. Duarte
2013
Abstract
Much attention is being given to the incorporation of constraints into data clustering, mainly expressed in the form of must-link and cannot-link constraints between pairs of domain objects. However, its inclusion in the important clustering validation process was so far disregarded. In this work, we integrate the use of constraints in clustering validation. We propose three approaches to accomplish it: produce a weighted validity score considering a traditional validity index and the constraint satisfaction ratio; learn a new distance function or feature space representation which better suits the constraints, and use it with a validation index; and a combination of the previous. Experimental results in 14 synthetic and real data sets have shown that including the information provided by the constraints increases the performance of the clustering validation process in selecting the best number of clusters.
DownloadPaper Citation
in Harvard Style
M. N. Duarte J., L. N. Fred A. and F. Duarte F. (2013). Data Clustering Validation using Constraints . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 17-27. DOI: 10.5220/0004543800170027
in Bibtex Style
@conference{kdir13,
author={João M. N. Duarte and Ana L. N. Fred and F. Jorge F. Duarte},
title={Data Clustering Validation using Constraints},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={17-27},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004543800170027},
isbn={978-989-8565-75-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Data Clustering Validation using Constraints
SN - 978-989-8565-75-4
AU - M. N. Duarte J.
AU - L. N. Fred A.
AU - F. Duarte F.
PY - 2013
SP - 17
EP - 27
DO - 10.5220/0004543800170027