GREEDY APPROACH FOR DOCUMENT CLUSTERING

Lim Choen Choi, Soon Cheol Park

2012

Abstract

A Greedy Algorithm for Document Clustering (Greedy Clustering) is proposed in this paper. Various cluster validity indices (DB, CH, SD, AS) are used to find the most appropriate optimization function for Greedy Clustering. The clustering algorithms are tested and compared on Reuter-21578. The results show that AS Index shows the best performance and the fastest running time among cluster indices in various experiments. Also Greedy Clustering with AS Index has 15~20% better performance than traditional clustering algorithms (K-means, Group Average).

Download


Paper Citation


in Harvard Style

Choi L. and Park S. (2012). GREEDY APPROACH FOR DOCUMENT CLUSTERING . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 597-600. DOI: 10.5220/0003836605970600

in Bibtex Style

@conference{icpram12,
author={Lim Choen Choi and Soon Cheol Park},
title={GREEDY APPROACH FOR DOCUMENT CLUSTERING},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2012},
pages={597-600},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003836605970600},
isbn={978-989-8425-99-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - GREEDY APPROACH FOR DOCUMENT CLUSTERING
SN - 978-989-8425-99-7
AU - Choi L.
AU - Park S.
PY - 2012
SP - 597
EP - 600
DO - 10.5220/0003836605970600