SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER

Joo Yong Lee, Sang Ho Lee, Yanggon Kim

2007

Abstract

As the size of the Web grows, it becomes increasingly important to parallelize a crawling process in order to complete downloading pages in a reasonable amount of time. This paper presents the design and implementation of an effective parallel web crawler. We first present various design choices and strategies for a parallel web crawler, and describe our crawler’s architecture and implementation techniques. In particular, we investigate the URL distributor for URL balancing and the scalability of our crawler.

Download


Paper Citation


in Harvard Style

Yong Lee J., Ho Lee S. and Kim Y. (2007). SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER . In Proceedings of the Second International Conference on e-Business - Volume 1: ICE-B, (ICETE 2007) ISBN 978-989-8111-11-1, pages 151-156. DOI: 10.5220/0002108701510156

in Bibtex Style

@conference{ice-b07,
author={Joo Yong Lee and Sang Ho Lee and Yanggon Kim},
title={SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER},
booktitle={Proceedings of the Second International Conference on e-Business - Volume 1: ICE-B, (ICETE 2007)},
year={2007},
pages={151-156},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002108701510156},
isbn={978-989-8111-11-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on e-Business - Volume 1: ICE-B, (ICETE 2007)
TI - SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER
SN - 978-989-8111-11-1
AU - Yong Lee J.
AU - Ho Lee S.
AU - Kim Y.
PY - 2007
SP - 151
EP - 156
DO - 10.5220/0002108701510156