DATAZAPPER: GENERATING INCOMPLETE DATASETS

Yingying Wen, Kevin B. Korb, Ann E. Nicholson

2009

Abstract

Evaluating the relative performance of machine learners on incomplete data is important because one common problem with real data is that the data is often incomplete, which means that some values in the data are not present. DataZapper is a tool for uncreating data: given a dataset containing joint samples over variables, DataZapper will make a specified percentage of observed values disappear, replaced by an indication that the measurement failed. Since the causal mechanisms of measurement that result in failed measurements may depend in arbitrary ways upon the system under study, it is important to be able to produce incomplete data sets which allow for such arbitrary dependencies. DataZapper is the only tool that allows any kind of dependence, and any degree of dependence, in its generation of missing data. We illustrate its use in a machine learning experiment and offer it to the data mining and machine learning communities.

Download


Paper Citation


in Harvard Style

Wen Y., B. Korb K. and E. Nicholson A. (2009). DATAZAPPER: GENERATING INCOMPLETE DATASETS . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 69-76. DOI: 10.5220/0001660700690076

in Bibtex Style

@conference{icaart09,
author={Yingying Wen and Kevin B. Korb and Ann E. Nicholson},
title={ DATAZAPPER: GENERATING INCOMPLETE DATASETS},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={69-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001660700690076},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - DATAZAPPER: GENERATING INCOMPLETE DATASETS
SN - 978-989-8111-66-1
AU - Wen Y.
AU - B. Korb K.
AU - E. Nicholson A.
PY - 2009
SP - 69
EP - 76
DO - 10.5220/0001660700690076