CLASSIFYING WEB PAGES BY GENRE - Dealing with Unbalanced Distributions, Multiple Labels and Noise

Jane E. Mason, Michael Shepherd, Jack Duffy, Vlado Kešelj

2011

Abstract

Web page genre classification is a potentially powerful tool for filtering the results of online searches. The goal of this research is to develop an approach to the problem of Web page genre classification that is effective not only on balanced, single-label corpora, but also on unbalanced and multi-label corpora, and in the presence of noise, in order to better represent a real world environment. The approach is based on n-gram representations of the Web pages and centroid representations of the genre classes. Experimental results compare very favorably with those of other researchers.

Download


Paper Citation


in Harvard Style

E. Mason J., Shepherd M., Duffy J. and Kešelj V. (2011). CLASSIFYING WEB PAGES BY GENRE - Dealing with Unbalanced Distributions, Multiple Labels and Noise . In Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8425-51-5, pages 589-594. DOI: 10.5220/0003343505890594

in Bibtex Style

@conference{webist11,
author={Jane E. Mason and Michael Shepherd and Jack Duffy and Vlado Kešelj},
title={CLASSIFYING WEB PAGES BY GENRE - Dealing with Unbalanced Distributions, Multiple Labels and Noise},
booktitle={Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2011},
pages={589-594},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003343505890594},
isbn={978-989-8425-51-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - CLASSIFYING WEB PAGES BY GENRE - Dealing with Unbalanced Distributions, Multiple Labels and Noise
SN - 978-989-8425-51-5
AU - E. Mason J.
AU - Shepherd M.
AU - Duffy J.
AU - Kešelj V.
PY - 2011
SP - 589
EP - 594
DO - 10.5220/0003343505890594