CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION
Hieu Quang Le, Stefan Conrad
2009
Abstract
This paper studies the problem of classifying structured data sources on the Web. While prior works use all features, once extracted from search interfaces, we further refine the feature set. In our research, each search interface is treated simply as a bag-of-words. We choose a subset of words, which is suited to classify web sources, by our feature selection methods with new metrics and a novel simple ranking scheme. Using aggressive feature selection approach, together with a Gaussian process classifier, we obtained high classification performance in an evaluation over real web data.
DownloadPaper Citation
in Harvard Style
Quang Le H. and Conrad S. (2009). CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 613-620. DOI: 10.5220/0001824706130620
in Bibtex Style
@conference{webist09,
author={Hieu Quang Le and Stefan Conrad},
title={CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={613-620},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001824706130620},
isbn={978-989-8111-81-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION
SN - 978-989-8111-81-4
AU - Quang Le H.
AU - Conrad S.
PY - 2009
SP - 613
EP - 620
DO - 10.5220/0001824706130620