CLUSTERING DOCUMENTS WITH LARGE OVERLAP OF TERMS INTO DIFFERENT CLUSTERS BASED ON SIMILARITY ROUGH SET MODEL

Nguyen Chi Thanh, Koichi Yamada, Muneyuki Unehara

2010

Abstract

Similarity rough set model for document clustering (SRSM) uses a generalized rough set model based on similarity relation and term co-occurrence to group documents in the collection into clusters. The model is extended from tolerance rough set model (TRSM) (Ho and Funakoshi, 1997). The SRSM methods have been evaluated and the results showed that it perform better than TRSM. However, in document collections where there are words overlapped in different document classes, the effect of SRSM is rather small. In this paper we propose a method to improve the performance of SRSM method in such document collections.

Download


Paper Citation


in Harvard Style

Chi Thanh N., Yamada K. and Unehara M. (2010). CLUSTERING DOCUMENTS WITH LARGE OVERLAP OF TERMS INTO DIFFERENT CLUSTERS BASED ON SIMILARITY ROUGH SET MODEL . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 396-399. DOI: 10.5220/0003068803960399

in Bibtex Style

@conference{kdir10,
author={Nguyen Chi Thanh and Koichi Yamada and Muneyuki Unehara},
title={CLUSTERING DOCUMENTS WITH LARGE OVERLAP OF TERMS INTO DIFFERENT CLUSTERS BASED ON SIMILARITY ROUGH SET MODEL },
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={396-399},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003068803960399},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - CLUSTERING DOCUMENTS WITH LARGE OVERLAP OF TERMS INTO DIFFERENT CLUSTERS BASED ON SIMILARITY ROUGH SET MODEL
SN - 978-989-8425-28-7
AU - Chi Thanh N.
AU - Yamada K.
AU - Unehara M.
PY - 2010
SP - 396
EP - 399
DO - 10.5220/0003068803960399