FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT

Lefteris Kozanidis, Sofia Stamou, George Spiros

2009

Abstract

Retrieving relevant data for location-sensitive keyword queries is a challenging task that has so far been addressed as a problem of automatically determining the geographical orientation of web searches. Unfortu-nately, identifying localizable queries is not sufficient per se for performing successful location-sensitive searches, unless there exists a geo-referenced index of data sources against which localizable queries are searched. In this paper, we propose a novel approach towards the automatic construction of a geo-referenced search engine index. Our approach relies on a geo-focused crawler that incorporates a structural parser and uses GeoWordNet as a knowledge base in order to automatically deduce the geo-spatial information that is latent in the pages’ contents. Based on location-descriptive elements in the page URLs and anchor text, the crawler directs the pages to a location-sensitive downloader. This downloading module resolves the geo-graphical references of the URL location elements and organizes them into indexable hierarchical structures. The location-aware URL hierarchies are linked to their respective pages, resulting into a geo-referenced index against which location-sensitive queries can be answered.

Download


Paper Citation


in Harvard Style

Kozanidis L., Stamou S. and Spiros G. (2009). FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 244-249. DOI: 10.5220/0001823002440249

in Bibtex Style

@conference{webist09,
author={Lefteris Kozanidis and Sofia Stamou and George Spiros},
title={FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={244-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001823002440249},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - FOCUSING WEB CRAWLS ON LOCATION-SPECIFIC CONTENT
SN - 978-989-8111-81-4
AU - Kozanidis L.
AU - Stamou S.
AU - Spiros G.
PY - 2009
SP - 244
EP - 249
DO - 10.5220/0001823002440249