SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER
Joo Yong Lee, Sang Ho Lee, Yanggon Kim
2007
Abstract
As the size of the Web grows, it becomes increasingly important to parallelize a crawling process in order to complete downloading pages in a reasonable amount of time. This paper presents the design and implementation of an effective parallel web crawler. We first present various design choices and strategies for a parallel web crawler, and describe our crawler’s architecture and implementation techniques. In particular, we investigate the URL distributor for URL balancing and the scalability of our crawler.
DownloadPaper Citation
in Harvard Style
Yong Lee J., Ho Lee S. and Kim Y. (2007). SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER . In Proceedings of the Second International Conference on e-Business - Volume 1: ICE-B, (ICETE 2007) ISBN 978-989-8111-11-1, pages 151-156. DOI: 10.5220/0002108701510156
in Bibtex Style
@conference{ice-b07,
author={Joo Yong Lee and Sang Ho Lee and Yanggon Kim},
title={SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER},
booktitle={Proceedings of the Second International Conference on e-Business - Volume 1: ICE-B, (ICETE 2007)},
year={2007},
pages={151-156},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002108701510156},
isbn={978-989-8111-11-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Second International Conference on e-Business - Volume 1: ICE-B, (ICETE 2007)
TI - SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER
SN - 978-989-8111-11-1
AU - Yong Lee J.
AU - Ho Lee S.
AU - Kim Y.
PY - 2007
SP - 151
EP - 156
DO - 10.5220/0002108701510156