Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods

Elisabete Cunha, Álvaro Figueira, Óscar Mealha

2013

Abstract

In this paper we analyze and discuss two methods that are based on the traditional k-means for document clustering and that feature integration of social tags in the process. The first one allows the integration of tags directly into a Vector Space Model, and the second one proposes the integration of tags in order to select the initial seeds. We created a predictive model for the impact of the tags’ integration in both models, and compared the two methods using the traditional k-means++ and the novel k-C algorithm. To compare the results, we propose a new internal measure, allowing the computation of the cluster compactness. The experimental results indicate that the careful selection of seeds on the k-C algorithm present better results to those obtained with the k-means++, with and without integration of tags.

Download


Paper Citation


in Harvard Style

Cunha E., Figueira Á. and Mealha Ó. (2013). Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 160-168. DOI: 10.5220/0004545201600168

in Bibtex Style

@conference{kdir13,
author={Elisabete Cunha and Álvaro Figueira and Óscar Mealha},
title={Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={160-168},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004545201600168},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods
SN - 978-989-8565-75-4
AU - Cunha E.
AU - Figueira Á.
AU - Mealha Ó.
PY - 2013
SP - 160
EP - 168
DO - 10.5220/0004545201600168