Using Word Sense as a Latent Variable in LDA Can Improve Topic Modeling

Yunqing Xia, Guoyu Tang, Huan Zhao, Erik Cambria, Thomas Fang Zheng

2014

Abstract

Since proposed, LDA have been successfully used in modeling text documents. So far, words are the common features to induce latent topic, which are later used in document representation. Observation on documents indicates that the polysemous words can make the latent topics less discriminative, resulting in less accurate document representation. We thus argue that the semantically deterministic word senses can improve quality of the latent topics. In this work, we proposes a series of word sense aware LDA models which use word sense as an extra latent variable in topic induction. Preliminary experiments on document clustering on benchmark datasets show that word sense can indeed improve topic modeling.

Download


Paper Citation


in Harvard Style

Xia Y., Tang G., Zhao H., Cambria E. and Fang Zheng T. (2014). Using Word Sense as a Latent Variable in LDA Can Improve Topic Modeling . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 532-537. DOI: 10.5220/0004889705320537

in Bibtex Style

@conference{icaart14,
author={Yunqing Xia and Guoyu Tang and Huan Zhao and Erik Cambria and Thomas Fang Zheng},
title={Using Word Sense as a Latent Variable in LDA Can Improve Topic Modeling},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={532-537},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004889705320537},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Using Word Sense as a Latent Variable in LDA Can Improve Topic Modeling
SN - 978-989-758-015-4
AU - Xia Y.
AU - Tang G.
AU - Zhao H.
AU - Cambria E.
AU - Fang Zheng T.
PY - 2014
SP - 532
EP - 537
DO - 10.5220/0004889705320537