CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION

Hieu Quang Le, Stefan Conrad

2009

Abstract

This paper studies the problem of classifying structured data sources on the Web. While prior works use all features, once extracted from search interfaces, we further refine the feature set. In our research, each search interface is treated simply as a bag-of-words. We choose a subset of words, which is suited to classify web sources, by our feature selection methods with new metrics and a novel simple ranking scheme. Using aggressive feature selection approach, together with a Gaussian process classifier, we obtained high classification performance in an evaluation over real web data.

Download


Paper Citation


in Harvard Style

Quang Le H. and Conrad S. (2009). CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 613-620. DOI: 10.5220/0001824706130620

in Bibtex Style

@conference{webist09,
author={Hieu Quang Le and Stefan Conrad},
title={CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={613-620},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001824706130620},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION
SN - 978-989-8111-81-4
AU - Quang Le H.
AU - Conrad S.
PY - 2009
SP - 613
EP - 620
DO - 10.5220/0001824706130620