Large Scale Web-Content Classification

Luca Deri, Maurizio Martinelli, Daniele Sartiano, Loredana Sideri

2015

Abstract

Web classification is used in many security devices for preventing users to access selected web sites that are not allowed by the current security policy, as well for improving web search and for implementing contextual advertising. There are many commercial web classification services available on the market and a few publicly available web directory services. Unfortunately they mostly focus on English-speaking web sites, making them unsuitable for other languages in terms of classification reliability and coverage. This paper covers the design and implementation of a web-based classification tool for TLDs (Top Level Domain). Each domain is classified by analysing the main domain web site, and classifying it in categories according to its content. The tool has been successfully validated by classifying all the registered .it Internet domains, whose results are presented in this paper.

Download


Paper Citation


in Harvard Style

Deri L., Martinelli M., Sartiano D. and Sideri L. (2015). Large Scale Web-Content Classification . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015) ISBN 978-989-758-158-8, pages 545-554. DOI: 10.5220/0005635605450554

in Bibtex Style

@conference{sstm15,
author={Luca Deri and Maurizio Martinelli and Daniele Sartiano and Loredana Sideri},
title={Large Scale Web-Content Classification},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)},
year={2015},
pages={545-554},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005635605450554},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)
TI - Large Scale Web-Content Classification
SN - 978-989-758-158-8
AU - Deri L.
AU - Martinelli M.
AU - Sartiano D.
AU - Sideri L.
PY - 2015
SP - 545
EP - 554
DO - 10.5220/0005635605450554