An Ontology-based Methodology for Reusing Data Cleaning Knowledge

Ricardo Almeida, Paulo Maio, Paulo Oliveira, João Barroso

2015

Abstract

The organizations’ demand to integrate several heterogeneous data sources and an ever-increasing volume of data is revealing the presence of quality problems in data. Currently, most of the data cleaning approaches (for detection and correction of data quality problems) are tailored for data sources with the same schema and sharing the same data model (e.g., relational model). On the other hand, these approaches are highly dependent on a domain expert to specify the data cleaning operations. This paper extends a previously proposed data cleaning methodology that reuses cleaning knowledge specified for other data sources. The methodology is further detailed/refined by specifying the requirements that a data cleaning operations vocabulary must satisfy. Ontologies in RDF/OWL are proposed as the data model for an abstract representation of the data schemas, no matter which data model is used (e.g., relational; graph). Existing approaches, methods and techniques that support the implementation of the proposed methodology, in general, and specifically of the data cleaning operations vocabulary are also presented and discussed in this paper.

Download


Paper Citation


in Harvard Style

Almeida R., Maio P., Oliveira P. and Barroso J. (2015). An Ontology-based Methodology for Reusing Data Cleaning Knowledge . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, (IC3K 2015) ISBN 978-989-758-158-8, pages 202-211. DOI: 10.5220/0005596402020211

in Bibtex Style

@conference{keod15,
author={Ricardo Almeida and Paulo Maio and Paulo Oliveira and João Barroso},
title={An Ontology-based Methodology for Reusing Data Cleaning Knowledge},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, (IC3K 2015)},
year={2015},
pages={202-211},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005596402020211},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, (IC3K 2015)
TI - An Ontology-based Methodology for Reusing Data Cleaning Knowledge
SN - 978-989-758-158-8
AU - Almeida R.
AU - Maio P.
AU - Oliveira P.
AU - Barroso J.
PY - 2015
SP - 202
EP - 211
DO - 10.5220/0005596402020211