ARABELLA - A Directed Web Crawler

Pedro Lopes, Davide Pinto, David Campos, José Luís Oliveira

2009

Abstract

The Internet is becoming the primary source of knowledge. However, its disorganized evolution brought about an exponential increase in the amount of distributed, heterogeneous information. Web crawling engines were the first answer to ease the task of finding the desired information. Nevertheless, when one is searching for quality information related to a certain scientific domain, typical search engines like Google are not enough. This is the problem that directed crawlers try to solve. Arabella is a directed web crawler that navigates through a predefined set of domains searching for specific information. It includes text-processing capabilities that increase the system’s flexibility and the number of documents that can be crawled: any structured document or REST web service can be processed. These complex processes do not harm overall system performance due to the multithreaded engine that was implemented, resulting in an efficient and scalable web crawler.

Download


Paper Citation


in Harvard Style

Lopes P., Pinto D., Campos D. and Luís Oliveira J. (2009). ARABELLA - A Directed Web Crawler . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 270-273. DOI: 10.5220/0002291602700273

in Bibtex Style

@conference{kdir09,
author={Pedro Lopes and Davide Pinto and David Campos and José Luís Oliveira},
title={ARABELLA - A Directed Web Crawler},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={270-273},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002291602700273},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - ARABELLA - A Directed Web Crawler
SN - 978-989-674-011-5
AU - Lopes P.
AU - Pinto D.
AU - Campos D.
AU - Luís Oliveira J.
PY - 2009
SP - 270
EP - 273
DO - 10.5220/0002291602700273