A Tensor-based Clustering Approach for Multiple Document Classifications

Salvatore Romeo, Andrea Tagarelli, Francesco Gullo, Sergio Greco

2013

Abstract

We propose a novel approach to the problem of document clustering when multiple organizations are provided for the documents in input. Besides considering the information on the text-based content of the documents, our approach exploits frequent associations of the documents in the groups across the existing classifications, in order to capture how documents tend to be grouped together orthogonally to different views. A third-order tensor for the document collection is built over both the space of terms and the space of the discovered frequent document-associations, and then it is decomposed to finally establish a unique encompassing clustering of documents. Preliminary experiments conducted on a document clustering benchmark have shown the potential of the approach to capture the multi-view structure of existing organizations for a given collection of documents.

Download


Paper Citation


in Harvard Style

Romeo S., Tagarelli A., Gullo F. and Greco S. (2013). A Tensor-based Clustering Approach for Multiple Document Classifications . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 200-205. DOI: 10.5220/0004269102000205

in Bibtex Style

@conference{icpram13,
author={Salvatore Romeo and Andrea Tagarelli and Francesco Gullo and Sergio Greco},
title={A Tensor-based Clustering Approach for Multiple Document Classifications},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={200-205},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004269102000205},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Tensor-based Clustering Approach for Multiple Document Classifications
SN - 978-989-8565-41-9
AU - Romeo S.
AU - Tagarelli A.
AU - Gullo F.
AU - Greco S.
PY - 2013
SP - 200
EP - 205
DO - 10.5220/0004269102000205