Bringing Search Engines to the Cloud using Open Source Components

Khaled Nagi

2015

Abstract

The usage of search engines is nowadays extended to do intelligent analytics of petabytes of data. With Lucene being at the heart of the vast majority of information retrieval systems, several attempts are made to bring it to the cloud in order to scale to big data. Efforts include implementing scalable distribution of the search indices over the file system, storing them in NoSQL databases, and porting them to inherently distributed ecosystems, such as Hadoop. We evaluate the existing efforts in terms of distribution, high availability, fault tolerance, manageability, and high performance. We believe that the key to supporting search indexing capabilities for big data can only be achieved through the use of common open-source technology to be deployed on standard cloud platforms such as Amazon EC2, Microsoft Azure, etc. For each approach, we build a benchmarking system by indexing the whole Wikipedia content and submitting hundreds of simultaneous search requests. We measure the performance of both indexing and searching operations. We stimulate node failures and monitor the recoverability of the system. We show that a system built on top of Solr and Hadoop has the best stability and manageability; while systems based on NoSQL databases present an attractive alternative in terms of performance.

Download


Paper Citation


in Harvard Style

Nagi K. (2015). Bringing Search Engines to the Cloud using Open Source Components . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 116-126. DOI: 10.5220/0005632701160126

in Bibtex Style

@conference{kdir15,
author={Khaled Nagi},
title={Bringing Search Engines to the Cloud using Open Source Components},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={116-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005632701160126},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Bringing Search Engines to the Cloud using Open Source Components
SN - 978-989-758-158-8
AU - Nagi K.
PY - 2015
SP - 116
EP - 126
DO - 10.5220/0005632701160126