DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR
Nuo Zhang, Daisuke Matsuzaki, Toshinori Watanabe, Hisashi Koga
2009
Abstract
Nowadays, there are a great deal of e-documents can be easily accessed. It will be beneficial if a method can evaluate documents and abstract significant content. Similarity analysis and topic extraction are widely used as document relation analysis techniques. Most of the methods are based on dictionary-base morphological analysis. They cannot meet the requirement when the Internet grows fast and new terms appear but dictionary cannot be automatically updated fast enough. In this study, we propose a novel document relation analysis (topic extraction) method based on a compressibility vector. Our proposal does not require morphological analysis, and it can automatically evaluate input documents. We will examine the proposal with using model document and Reuters-21578 dataset, for relation analysis and topic extraction. The effectiveness of the proposed method will be shown in simulations.
DownloadPaper Citation
in Harvard Style
Zhang N., Matsuzaki D., Watanabe T. and Koga H. (2009). DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 255-260. DOI: 10.5220/0001660202550260
in Bibtex Style
@conference{icaart09,
author={Nuo Zhang and Daisuke Matsuzaki and Toshinori Watanabe and Hisashi Koga},
title={DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={255-260},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001660202550260},
isbn={978-989-8111-66-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - DOCUMENT RELATION ANALYSIS BASED ON COMPRESSIBILITY VECTOR
SN - 978-989-8111-66-1
AU - Zhang N.
AU - Matsuzaki D.
AU - Watanabe T.
AU - Koga H.
PY - 2009
SP - 255
EP - 260
DO - 10.5220/0001660202550260