MACHINE LEARNING AND LINK ANALYSIS FOR WEB CONTENT MINING

Moreno Carullo, Elisabetta Binaghi

2010

Abstract

In this work we define a hybrid Web Content Mining strategy aimed to recognize within Web pages the main entity, intended as the short text that refers directly to the main topic of a given page. The salient aspect of the strategy is the use of a novel supervised Machine Learning model able to represent in an unified framework the integrated use of visual pages layout features, textual features and hyperlink description. The proposed approach has been evaluated with promising results.

Download


Paper Citation


in Harvard Style

Carullo M. and Binaghi E. (2010). MACHINE LEARNING AND LINK ANALYSIS FOR WEB CONTENT MINING . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 156-161. DOI: 10.5220/0003065401560161

in Bibtex Style

@conference{kdir10,
author={Moreno Carullo and Elisabetta Binaghi},
title={MACHINE LEARNING AND LINK ANALYSIS FOR WEB CONTENT MINING},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={156-161},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003065401560161},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - MACHINE LEARNING AND LINK ANALYSIS FOR WEB CONTENT MINING
SN - 978-989-8425-28-7
AU - Carullo M.
AU - Binaghi E.
PY - 2010
SP - 156
EP - 161
DO - 10.5220/0003065401560161