KVFS: An HDFS Library over NoSQL Databases

Emmanouil Pavlidakis, Stelios Mavridis, Giorgos Saloustros, Angelos Bilas

2016

Abstract

Recently, NoSQL stores, such as HBase, have gained acceptance and popularity due to their ability to scale-out and perform queries over large amounts of data. NoSQL stores typically arrange data in tables of (key,value) pairs and support few simple operations: get, insert, delete, and scan. Despite its simplicity, this API has proven to be extremely powerful. Nowadays most data analytics frameworks utilize distributed file systems (DFS) for storing and accessing data. HDFS has emerged as the most popular choice due to its scalability. In this paper we explore how popular NoSQL stores, such as HBase, can provide an HDFS scale-out file system abstraction. We show how we can design an HDFS compliant filesystem on top a key-value store. We implement our design as a user-space library (KVFS) providing an HDFS filesystem over an HBase key-value store. KVFS is designed to run Hadoop style analytics such as MapReduce, Hive, Pig and Mahout over NoSQL stores without the use of HDFS. We perform a preliminary evaluation of KVFS against a native HDFS setup using DFSIO with varying number of threads. Our results show that the approach of providing a filesystem API over a key-value store is a promising direction: Read and write throughput of KVFS and HDFS, for big and small datasets, is identical. Both HDFS and KVFS throughput is limited by the network for small datasets and from the device I/O for bigger datasets.

Download


Paper Citation


in Harvard Style

Pavlidakis E., Mavridis S., Saloustros G. and Bilas A. (2016). KVFS: An HDFS Library over NoSQL Databases . In Proceedings of the 6th International Conference on Cloud Computing and Services Science - Volume 1: DataDiversityConvergence, (CLOSER 2016) ISBN 978-989-758-182-3, pages 360-367. DOI: 10.5220/0005924003600367

in Bibtex Style

@conference{datadiversityconvergence16,
author={Emmanouil Pavlidakis and Stelios Mavridis and Giorgos Saloustros and Angelos Bilas},
title={KVFS: An HDFS Library over NoSQL Databases},
booktitle={Proceedings of the 6th International Conference on Cloud Computing and Services Science - Volume 1: DataDiversityConvergence, (CLOSER 2016)},
year={2016},
pages={360-367},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005924003600367},
isbn={978-989-758-182-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Cloud Computing and Services Science - Volume 1: DataDiversityConvergence, (CLOSER 2016)
TI - KVFS: An HDFS Library over NoSQL Databases
SN - 978-989-758-182-3
AU - Pavlidakis E.
AU - Mavridis S.
AU - Saloustros G.
AU - Bilas A.
PY - 2016
SP - 360
EP - 367
DO - 10.5220/0005924003600367