A NEW APPROACH TOWARDS VERTICAL SEARCH ENGINES - Intelligent Focused Crawling and Multilingual Semantic Techniques

Sybille Peters, Claus-Peter Rückemann, Wolfgang Sander-Beuermann

2010

Abstract

Search engines typically consist of a crawler which traverses the web retrieving documents and a search frontend which provides the user interface to the acquired information. Focused crawlers refine the crawler by intelligently directing it to predefined topic areas. The evolution of search engines today is expedited by supplying more search capabilities such as a search for metadata as well as search within the content text. Semantic web standards have supplied methods for augmenting webpages with metadata. Machine learning techniques are used where necessary to gather more metadata from unstructured webpages. This paper analyzes the effectiveness of techniques for vertical search engines with respect to focused crawling and metadata integration exemplarily in the field of “educational research”. A search engine for these purposes implemented within the EERQI project is described and tested. The enhancement of focused crawling with the use of link analysis and anchor text classification is implemented and verified. A new heuristic score calculation formula has been developed for focusing the crawler. Full-texts and metadata from various multilingual sources are collected and combined into a common format.

Download


Paper Citation


in Harvard Style

Peters S., Rückemann C. and Sander-Beuermann W. (2010). A NEW APPROACH TOWARDS VERTICAL SEARCH ENGINES - Intelligent Focused Crawling and Multilingual Semantic Techniques . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST, ISBN 978-989-674-025-2, pages 181-186. DOI: 10.5220/0002777901810186

in Bibtex Style

@conference{webist10,
author={Sybille Peters and Claus-Peter Rückemann and Wolfgang Sander-Beuermann},
title={A NEW APPROACH TOWARDS VERTICAL SEARCH ENGINES - Intelligent Focused Crawling and Multilingual Semantic Techniques},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST,},
year={2010},
pages={181-186},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002777901810186},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST,
TI - A NEW APPROACH TOWARDS VERTICAL SEARCH ENGINES - Intelligent Focused Crawling and Multilingual Semantic Techniques
SN - 978-989-674-025-2
AU - Peters S.
AU - Rückemann C.
AU - Sander-Beuermann W.
PY - 2010
SP - 181
EP - 186
DO - 10.5220/0002777901810186