EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data

Ammar Al Abd Alazeez, Sabah Jassim, Hongbo Du

2017

Abstract

Data stream clustering is becoming an active research area in big data. It refers to group constantly arriving new data records in large chunks to enable dynamic analysis/updating of information patterns conveyed by the existing clusters, the outliers, and the newly arriving data chunk. Prototype-based algorithms for solving the problem have their promises for simplicity and efficiency. However, existing implementations have limitations in relation to quality of clusters, ability to discover outliers, and little consideration of possible new patterns in different chunks. In this paper, a new incremental algorithm called Enhanced Incremental K-Means (EINCKM) is developed. The algorithm is designed to detect new clusters in an incoming data chunk, merge new clusters and existing outliers to the currently existing clusters, and generate modified clusters and outliers ready for the next round. The algorithm applies a heuristic-based method to estimate the number of clusters (K), a radius-based technique to determine and merge overlapped clusters and a variance-based mechanism to discover the outliers. The algorithm was evaluated on synthetic and real-life datasets. The experimental results indicate improved clustering correctness with a comparable time complexity to existing methods dealing with the same kind of problems.

Download


Paper Citation


in Harvard Style

Al Abd Alazeez A., Jassim S. and Du H. (2017). EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 173-183. DOI: 10.5220/0006196901730183

in Bibtex Style

@conference{icpram17,
author={Ammar Al Abd Alazeez and Sabah Jassim and Hongbo Du},
title={EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={173-183},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006196901730183},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data
SN - 978-989-758-222-6
AU - Al Abd Alazeez A.
AU - Jassim S.
AU - Du H.
PY - 2017
SP - 173
EP - 183
DO - 10.5220/0006196901730183