WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES

Ben Choi, Xiaomei Huang

2009

Abstract

To address the problem of information overload and to make effective use of information contained on the Web, we created a summarization system that can abstract key concepts and can extract key sentences to summarize text documents including Web pages. Our proposed system is the first summarization system that uses a knowledge base to generate new abstract concepts to summarize documents. To generate abstract concepts, our system first maps words contained in a document to concepts contained in the knowledge base called ResearchCyc, which organized concepts into hierarchies forming an ontology in the domain of human consensus reality. Then, it increases the weights of the mapped concepts to determine the importance, and propagates the weights upward in the concept hierarchies, which provides a method for generalization. To extract key sentences, our system weights each sentence in the document based on the concept weights associated with the sentence, and extracts the sentences with some of the highest weights to summarize the document. Moreover, we created a word sense disambiguation method based on the concept hierarchies to select the most appropriate concepts. Test results show that our approach is viable and applicable for knowledge discovery and semantic Web.

Download


Paper Citation


in Harvard Style

Choi B. and Huang X. (2009). WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 281-286. DOI: 10.5220/0001664102810286

in Bibtex Style

@conference{icaart09,
author={Ben Choi and Xiaomei Huang},
title={WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={281-286},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001664102810286},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES
SN - 978-989-8111-66-1
AU - Choi B.
AU - Huang X.
PY - 2009
SP - 281
EP - 286
DO - 10.5220/0001664102810286