CONCEPTS EXTRACTION BASED ON HTML DOCUMENTS STRUCTURE

Rim Zarrad, Narjes Doggaz, Ezzeddine Zagrouba

2012

Abstract

The traditional methods to acquire automatically the ontology concepts from a textual corpus often privilege the analysis of the text itself, whether they are based on a statistical or linguistic approach. In this paper, we extend these methods by considering the document structure which provides interesting information on the significances contained in the texts. Our approach focuses on the structure of the HTML documents in order to extract the most relevant concepts of a given field. The candidate terms are extracted and filtered by analyzing their occurrences in the titles and in the links belonging to the documents and by considering the used styles.

Download


Paper Citation


in Harvard Style

Zarrad R., Doggaz N. and Zagrouba E. (2012). CONCEPTS EXTRACTION BASED ON HTML DOCUMENTS STRUCTURE . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-95-9, pages 503-506. DOI: 10.5220/0003748305030506

in Bibtex Style

@conference{icaart12,
author={Rim Zarrad and Narjes Doggaz and Ezzeddine Zagrouba},
title={CONCEPTS EXTRACTION BASED ON HTML DOCUMENTS STRUCTURE},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2012},
pages={503-506},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003748305030506},
isbn={978-989-8425-95-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - CONCEPTS EXTRACTION BASED ON HTML DOCUMENTS STRUCTURE
SN - 978-989-8425-95-9
AU - Zarrad R.
AU - Doggaz N.
AU - Zagrouba E.
PY - 2012
SP - 503
EP - 506
DO - 10.5220/0003748305030506