Arbitrary Shape Cluster Summarization with Gaussian Mixture Model

Elnaz Bigdeli, Mahdi Mohammadi, Bijan Raahemi, Stan Matwin

2014

Abstract

One of the main concerns in the area of arbitrary shape clustering is how to summarize clusters. An accurate representation of clusters with arbitrary shapes is to characterize a cluster with all its members. However, this approach is neither practical nor efficient. In many applications such as stream data mining, preserving all samples for a long period of time in presence of thousands of incoming samples is not practical. Moreover, in the absence of labelled data, clusters are representative of each class, and in case of arbitrary shape clusters, finding the closest cluster to a new incoming sample using all objects of clusters is not accurate and efficient. In this paper, we present a new algorithm to summarize arbitrary shape clusters. Our proposed method, called SGMM, summarizes a cluster using a set of objects as core objects, then represents each cluster with corresponding Gaussian Mixture Model (GMM). Using GMM, the closest cluster to the new test sample is identified with low computational cost. We compared the proposed method with ABACUS, a well-known algorithm, in terms of time, space and accuracy for both categorization and summarization purposes. The experimental results confirm that the proposed method outperforms ABACUS on various datasets including syntactic and real datasets.

Download


Paper Citation


in Harvard Style

Bigdeli E., Mohammadi M., Raahemi B. and Matwin S. (2014). Arbitrary Shape Cluster Summarization with Gaussian Mixture Model . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 43-52. DOI: 10.5220/0005071500430052

in Bibtex Style

@conference{kdir14,
author={Elnaz Bigdeli and Mahdi Mohammadi and Bijan Raahemi and Stan Matwin},
title={Arbitrary Shape Cluster Summarization with Gaussian Mixture Model},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={43-52},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005071500430052},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Arbitrary Shape Cluster Summarization with Gaussian Mixture Model
SN - 978-989-758-048-2
AU - Bigdeli E.
AU - Mohammadi M.
AU - Raahemi B.
AU - Matwin S.
PY - 2014
SP - 43
EP - 52
DO - 10.5220/0005071500430052