POP: A Parallel Optimized Preparation of Data for Data Mining

Christian Ernst, Youssef Hmamouche, Alain Casali

2015

Abstract

In light of the fact that data preparation has a substantial impact on data mining results, we provide an original framework for automatically preparing the data of any given database. Our research focuses, for each attribute of the database, on two points: (i) Specifying an optimized outlier detection method, and (ii), Identifying the most appropriate discretization method. Concerning the former, we illustrate that the detection of an outlier depends on if data distribution is normal or not. When attempting to discern the best discretization method, what is important is the shape followed by the density function of its distribution law. For this reason, we propose an automatic choice for finding the optimized discretization method based on a multi-criteria (Entropy, Variance, Stability) evaluation. Processings are performed in parallel using multicore capabilities. Conducted experiments validate our approach, showing that it is not always the very same discretization method that is the best.

Download


Paper Citation


in Harvard Style

Ernst C., Hmamouche Y. and Casali A. (2015). POP: A Parallel Optimized Preparation of Data for Data Mining . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 36-45. DOI: 10.5220/0005594700360045

in Bibtex Style

@conference{kdir15,
author={Christian Ernst and Youssef Hmamouche and Alain Casali},
title={POP: A Parallel Optimized Preparation of Data for Data Mining},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={36-45},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005594700360045},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - POP: A Parallel Optimized Preparation of Data for Data Mining
SN - 978-989-758-158-8
AU - Ernst C.
AU - Hmamouche Y.
AU - Casali A.
PY - 2015
SP - 36
EP - 45
DO - 10.5220/0005594700360045