Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets

Elias Oliveira, Howard Roatti, Matheus de Araujo Nogueira, Henrique Gomes Basoni, Patrick Marques Ciarelli

2015

Abstract

The usual practice in the classification problem is to create a set of labeled data for training and then use it to tune a classifier for predicting the classes of the remaining items in the dataset. However, labeled data demand great human effort, and classification by specialists is normally expensive and consumes a large amount of time. In this paper, we discuss how we can benefit from a cluster-based tree kNN structure to quickly build a training dataset from scratch. We evaluated the proposed method on some classification datasets, and the results are promising because we reduced the amount of labeling work by the specialists to 4% of the number of documents in the evaluated datasets. Furthermore, we achieved an average accuracy of 72.19% on tested datasets, versus 77.12% when using 90% of the dataset for training.

Download


Paper Citation


in Harvard Style

Oliveira E., Roatti H., Nogueira M., Basoni H. and Ciarelli P. (2015). Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015) ISBN 978-989-758-158-8, pages 567-576. DOI: 10.5220/0005615305670576

in Bibtex Style

@conference{sstm15,
author={Elias Oliveira and Howard Roatti and Matheus de Araujo Nogueira and Henrique Gomes Basoni and Patrick Marques Ciarelli},
title={Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)},
year={2015},
pages={567-576},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005615305670576},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)
TI - Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets
SN - 978-989-758-158-8
AU - Oliveira E.
AU - Roatti H.
AU - Nogueira M.
AU - Basoni H.
AU - Ciarelli P.
PY - 2015
SP - 567
EP - 576
DO - 10.5220/0005615305670576