PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches

Dominique Thiébaut, Yang Li, Diana Jaunzeikare, Alexandra Cheng, Ellysha Raelen Recto, Gillian Riggs, Xia Ting Zhao, Tonje Stolpestad, Cam Le T. Nguyen

2011

Abstract

We present a simple comparison of the performance of three different cluster platforms: Apple’s XGrid, and Hadoop the open-source version of Google’s MapReduce as the total execution time taken by each to parse a 27-GByte XML dump of the English Wikipedia. A local hadoop cluster of Linux workstation, as well as an Elastic MapReduce cluster rented from Amazon are used. We show that for this specific workload, XGrid yields the fastest execution time, with the local Hadoop cluster a close second. The overhead of fetching data from Amazon’s Simple Storage System (S3), along with the inability to skip the reduce, sort, and merge phases on Amazon penalizes this platform targeted for much larger data sets.

Download


Paper Citation


in Harvard Style

Thiébaut D., Li Y., Jaunzeikare D., Cheng A., Raelen Recto E., Riggs G., Ting Zhao X., Stolpestad T. and Le T. Nguyen C. (2011). PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 391-396. DOI: 10.5220/0003385603910396

in Bibtex Style

@conference{closer11,
author={Dominique Thiébaut and Yang Li and Diana Jaunzeikare and Alexandra Cheng and Ellysha Raelen Recto and Gillian Riggs and Xia Ting Zhao and Tonje Stolpestad and Cam Le T. Nguyen},
title={PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={391-396},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003385603910396},
isbn={978-989-8425-52-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - PROCESSING WIKIPEDIA DUMPS - A Case-study Comparing the XGrid and MapReduce Approaches
SN - 978-989-8425-52-2
AU - Thiébaut D.
AU - Li Y.
AU - Jaunzeikare D.
AU - Cheng A.
AU - Raelen Recto E.
AU - Riggs G.
AU - Ting Zhao X.
AU - Stolpestad T.
AU - Le T. Nguyen C.
PY - 2011
SP - 391
EP - 396
DO - 10.5220/0003385603910396