DOCUMENTS REPRESENTATION BASED ON INDEPENDENT COMPRESSIBILITY FEATURE SPACE

Nuo Zhang, Toshinori Watanabe

2010

Abstract

There are two well-known feature representation methods, bag-of-words and N-gram models, which have been widely used in natural language processing, text mining, and web document analysis. A novel Pattern Representation scheme using Data Compression (PRDC) has been proposed for data representation. The PRDC not only can process data of linguistic text, but also can process the other multimedia data effectively. Although PRDC provides better performance than the traditional methods in some situation, it still suffers the problem of dictionary selection and construction of feature space. In this study, we propose a method for PRDC to construct an independent compressibility space, and compare the proposed method to the two other representation methods and PRDC. The performance will be compared in terms of clustering ability. Experiment results will show that the proposed method can provide better performance than that of PRDC and the other two methods.

Download


Paper Citation


in Harvard Style

Zhang N. and Watanabe T. (2010). DOCUMENTS REPRESENTATION BASED ON INDEPENDENT COMPRESSIBILITY FEATURE SPACE . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-674-021-4, pages 217-222. DOI: 10.5220/0002704402170222

in Bibtex Style

@conference{icaart10,
author={Nuo Zhang and Toshinori Watanabe},
title={DOCUMENTS REPRESENTATION BASED ON INDEPENDENT COMPRESSIBILITY FEATURE SPACE},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2010},
pages={217-222},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002704402170222},
isbn={978-989-674-021-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - DOCUMENTS REPRESENTATION BASED ON INDEPENDENT COMPRESSIBILITY FEATURE SPACE
SN - 978-989-674-021-4
AU - Zhang N.
AU - Watanabe T.
PY - 2010
SP - 217
EP - 222
DO - 10.5220/0002704402170222