DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR

Nuo Zhang, Daisuke Matsuzaki, Toshinori Watanabe, Hisashi Koga

2009

Abstract

Nowadays, there are a great deal of e-documents can be easily accessed. It will be beneficial if a method can evaluate documents and abstract significant content. Similarity analysis and topic extraction are widely used as document relation analysis techniques. Most of the methods are based on dictionary-base morphological analysis. They cannot meet the requirement when the Internet grows fast and new terms appear but dictionary cannot be automatically updated fast enough. In this study, we propose a novel document relation analysis (topic extraction) method based on a compressibility vector. Our proposal does not require morphological analysis, and it can automatically evaluate input documents. We will examine the proposal with using model document and Reuters-21578 dataset, for relation analysis and topic extraction. The effectiveness of the proposed method will be shown in simulations.

Download


Paper Citation


in Harvard Style

Zhang N., Matsuzaki D., Watanabe T. and Koga H. (2009). DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 255-260. DOI: 10.5220/0001660202550260

in Bibtex Style

@conference{icaart09,
author={Nuo Zhang and Daisuke Matsuzaki and Toshinori Watanabe and Hisashi Koga},
title={DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={255-260},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001660202550260},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR
SN - 978-989-8111-66-1
AU - Zhang N.
AU - Matsuzaki D.
AU - Watanabe T.
AU - Koga H.
PY - 2009
SP - 255
EP - 260
DO - 10.5220/0001660202550260