CLASSIFYING WEB PAGES WITH VISUAL FEATURES

Viktor de Boer, Maarten van Someren, Tiberiu Lupascu

2010

Abstract

To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual features we can classify the web pages for several different types of tasks. The features used in this document are simple color and edge histograms, Gabor and texture features. These were extracted using an off-the-shelf visual feature extraction method. In three experiments, we classify web pages by their aesthetic value, their recency and the type of website. Results show that these simple, global visual features already produce good classification results. We also introduce an online tool that uses the trained classifiers to assess new web pages.

Download


Paper Citation


in Harvard Style

de Boer V., van Someren M. and Lupascu T. (2010). CLASSIFYING WEB PAGES WITH VISUAL FEATURES . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST, ISBN 978-989-674-025-2, pages 245-252. DOI: 10.5220/0002804102450252

in Bibtex Style

@conference{webist10,
author={Viktor de Boer and Maarten van Someren and Tiberiu Lupascu},
title={CLASSIFYING WEB PAGES WITH VISUAL FEATURES},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,},
year={2010},
pages={245-252},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002804102450252},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,
TI - CLASSIFYING WEB PAGES WITH VISUAL FEATURES
SN - 978-989-674-025-2
AU - de Boer V.
AU - van Someren M.
AU - Lupascu T.
PY - 2010
SP - 245
EP - 252
DO - 10.5220/0002804102450252