Enhanced Information Access to Social Streams Through Word Clouds with Entity Grouping

Martin Leginus, Leon Derczynski, Peter Dolog

2015

Abstract

Intuitive and effective access to large volumes of information is increasingly important. As social media explodes as a useful source of information, so are methods required to access these large volumes of user-generated content. Word clouds are an effective information access tool. However, those generated over social media data often depict redundant and mis-ranked entries. This limits the users’ ability to browse and explore datasets. This paper proposes a method for improving word cloud generation over social streams. Named entity expressions in tweets are detected, disambiguated and aggregated into entity clusters. A word cloud is generated from terms that represent the most relevant entity clusters. We find that word clouds with grouped named entities attain significantly broader coverage and significantly decreased content duplication. Further, access to relevant entries in the collection is improved. An extrinsic crowdsourced user evaluation of generated word clouds was performed. Word clouds with grouped named entities are rated as significantly more relevant and more diverse with respect to the baseline. In addition, we found that word clouds with higher levels of Mean Average Precision (MAP) are more likely to be rated by users as being relevant to the concepts reflected. Critically, this supports MAP as a tool for predicting word cloud quality without requiring a human in the loop.

Download


Paper Citation


in Harvard Style

Leginus M., Derczynski L. and Dolog P. (2015). Enhanced Information Access to Social Streams Through Word Clouds with Entity Grouping . In Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-106-9, pages 183-193. DOI: 10.5220/0005403101830193

in Bibtex Style

@conference{webist15,
author={Martin Leginus and Leon Derczynski and Peter Dolog},
title={Enhanced Information Access to Social Streams Through Word Clouds with Entity Grouping},
booktitle={Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2015},
pages={183-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005403101830193},
isbn={978-989-758-106-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Enhanced Information Access to Social Streams Through Word Clouds with Entity Grouping
SN - 978-989-758-106-9
AU - Leginus M.
AU - Derczynski L.
AU - Dolog P.
PY - 2015
SP - 183
EP - 193
DO - 10.5220/0005403101830193