Study of the Parallel Techniques for Dimensionality Reduction and Its Impact on Performance of the Text Processing Algorithms

Marcin Pietron, Maciej Wielgosz, Pawel Russek, Kazimierz Wiatr

2016

Abstract

The presented algorithms employ the Vector Space Model (VSM) and its enhancements such as TFIDF (Term Frequency Inverse Document Frequency). Vector space model suffers from curse of dimensionality. Therefore various dimensionality reduction algorithms are utilized. This paper deals with two of the most common ones i.e. Latent Semantic Indexing (LSI) and Random Projection (RP). It turns out that the size of a document corpus has a substantial impact on the processing time. Thus the authors introduce GPU based on acceleration of these techniques. A dedicated test set-up was created and a series of experiments were conducted which revealed important properties of the algorithms and their accuracy. They show that the random projection outperforms LSI in terms of computing speed at the expanse of results quality.

Download


Paper Citation


in Harvard Style

Pietron M., Wielgosz M., Russek P. and Wiatr K. (2016). Study of the Parallel Techniques for Dimensionality Reduction and Its Impact on Performance of the Text Processing Algorithms . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 1: PUaNLP, (ORG/PUANLP 2016) ISBN 978-989-758-172-4, pages 315-322. DOI: 10.5220/0005756903150322

in Bibtex Style

@conference{puanlp16,
author={Marcin Pietron and Maciej Wielgosz and Pawel Russek and Kazimierz Wiatr},
title={Study of the Parallel Techniques for Dimensionality Reduction and Its Impact on Performance of the Text Processing Algorithms},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 1: PUaNLP, (ORG/PUANLP 2016)},
year={2016},
pages={315-322},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005756903150322},
isbn={978-989-758-172-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 1: PUaNLP, (ORG/PUANLP 2016)
TI - Study of the Parallel Techniques for Dimensionality Reduction and Its Impact on Performance of the Text Processing Algorithms
SN - 978-989-758-172-4
AU - Pietron M.
AU - Wielgosz M.
AU - Russek P.
AU - Wiatr K.
PY - 2016
SP - 315
EP - 322
DO - 10.5220/0005756903150322