RESAMPLING BASED ON STATISTICAL PROPERTIES OF DATA SETS

Julia Bondarenko

2009

Abstract

In imbalanced data sets, classes separated into majority (negative) and minority (positive) classes, are not approximately equally represented. That leads to impeding of accurate classification results. Well balanced data sets assume uniform distribution. The approach we present in the paper, is based on directed oversampling of minority class objects with simultaneous undersampling of majority class objects, to balance non-uniform data sets, and relies upon the certain statistical criteria. The resampling procedure is carried out for the daily traffic injuries data sets. The results obtained show the improving of rare cases (positive class objects) identification with accordance to several performance measures.

Download


Paper Citation


in Harvard Style

Bondarenko J. (2009). RESAMPLING BASED ON STATISTICAL PROPERTIES OF DATA SETS . In Proceedings of the 6th International Conference on Informatics in Control, Automation and Robotics - Volume 3: ICINCO, ISBN 978-989-8111-99-9, pages 143-148. DOI: 10.5220/0002171701430148

in Bibtex Style

@conference{icinco09,
author={Julia Bondarenko},
title={RESAMPLING BASED ON STATISTICAL PROPERTIES OF DATA SETS},
booktitle={Proceedings of the 6th International Conference on Informatics in Control, Automation and Robotics - Volume 3: ICINCO,},
year={2009},
pages={143-148},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002171701430148},
isbn={978-989-8111-99-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Informatics in Control, Automation and Robotics - Volume 3: ICINCO,
TI - RESAMPLING BASED ON STATISTICAL PROPERTIES OF DATA SETS
SN - 978-989-8111-99-9
AU - Bondarenko J.
PY - 2009
SP - 143
EP - 148
DO - 10.5220/0002171701430148