Combining Clustering and Classification Approaches for Reducing the Effort of Automatic Tweets Classification

Elias de Oliveira, Henrique Gomes Basoni, Marcos Rodrigues Saúde, Patrick Marques Ciarelli

2014

Abstract

The classification problem has got a new importance dimension with the growing aggregated value which has been given to the Social Media such as Twitter. The huge number of small documents to be organized into subjects is challenging the previous resources and techniques that have been using so far. Futhermore, today more than ever, personalization is the most important feature that a system needs to exhibit. The goal of many online systems, which are available in many areas, is to address the needs or desires of each individual user. To achieve this goal, these systems need to be more flexible and faster in order to adapt to the user’s needs. In this work, we explore a variety of techniques with the aim of better classify a large Twitter data set accordingly to a user goal. We propose a methodology where we cascade an unsupervised following by supervised technique. For the unsupervised technique we use standard clustering algorithms, and for the supervised technique we propose the use of a kNN algorithm and a Centroid Based Classifier to perform the experiments. The results are promising because we reduced the amount of work to be done by the specialists and, in addition, we were able to mimic the human assessment decisions 0.7907 of the time, according to the F1-measure.

Download


Paper Citation


in Harvard Style

de Oliveira E., Gomes Basoni H., Rodrigues Saúde M. and Marques Ciarelli P. (2014). Combining Clustering and Classification Approaches for Reducing the Effort of Automatic Tweets Classification . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 465-472. DOI: 10.5220/0005159304650472

in Bibtex Style

@conference{kdir14,
author={Elias de Oliveira and Henrique Gomes Basoni and Marcos Rodrigues Saúde and Patrick Marques Ciarelli},
title={Combining Clustering and Classification Approaches for Reducing the Effort of Automatic Tweets Classification},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={465-472},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005159304650472},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Combining Clustering and Classification Approaches for Reducing the Effort of Automatic Tweets Classification
SN - 978-989-758-048-2
AU - de Oliveira E.
AU - Gomes Basoni H.
AU - Rodrigues Saúde M.
AU - Marques Ciarelli P.
PY - 2014
SP - 465
EP - 472
DO - 10.5220/0005159304650472