Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets

Philipp Baumann, Dorit S. Hochbaum, Quico Spaen

2016

Abstract

Machine learning techniques that rely on pairwise similarities have proven to be leading algorithms for classification. Despite their good and robust performance, similarity-based techniques are rarely chosen for largescale data mining because the time required to compute all pairwise similarities grows quadratically with the size of the data set. To address this issue of scalability, we introduced a method called sparse computation, which efficiently generates a sparse similarity matrix that contains only significant similarities. Sparse computation achieves significant reductions in running time with minimal and often no loss in accuracy. However, for massively-large data sets even such a sparse similarity matrix may lead to considerable running times. In this paper, we propose an extension of sparse computation called sparse-reduced computation that not only avoids computing very low similarities but also avoids computing similarities between highly-similar or identical objects by compressing them to a single object. Our computational results show that sparse-reduced computation allows highly-accurate classification of data sets with millions of objects in seconds.

Download


Paper Citation


in Harvard Style

Baumann P., Hochbaum D. and Spaen Q. (2016). Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 224-231. DOI: 10.5220/0005690402240231

in Bibtex Style

@conference{icpram16,
author={Philipp Baumann and Dorit S. Hochbaum and Quico Spaen},
title={Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={224-231},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005690402240231},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets
SN - 978-989-758-173-1
AU - Baumann P.
AU - Hochbaum D.
AU - Spaen Q.
PY - 2016
SP - 224
EP - 231
DO - 10.5220/0005690402240231