From Text Vocabularies to Visual Vocabularies - What Basis?

Jean Martinet

doi:10.5220/0004749606680675

From Text Vocabularies to Visual Vocabularies - What Basis?

Jean Martinet

2014

Abstract

The popular "bag-of-visual-words" approach for representing and searching visual documents consists in describing images (or video keyframes) using a set of descriptors, that correspond to quantized low-level features. Most of existing approaches for visual words are inspired from works in text indexing, based on the implicit assumption that visual words can be handled the same way as text words. More specifically, these techniques implicitly rely on the same postulate as in text information retrieval, stating that the words distribution for a natural language globally follows Zipf's law -- that is to say, words from a natural language appear in a corpus with a frequency inversely proportional to their rank. However, our study shows that the visual words distribution depends on the choice of low-level features, and also especially on the choice of the clustering method. We also show that when the visual words distribution is close to this of text words, the results of an image retrieval system are increased. To the best of our knowledge, no prior study has yet been carried out to compare the distributions of text words and visual words, with the objective of establishing the theoretical foundations of visual vocabularies.

Download

Paper Citation

in Harvard Style

Martinet J. (2014). From Text Vocabularies to Visual Vocabularies - What Basis? . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 668-675. DOI: 10.5220/0004749606680675

in Bibtex Style

@conference{visapp14,
author={Jean Martinet},
title={From Text Vocabularies to Visual Vocabularies - What Basis?},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={668-675},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004749606680675},
isbn={978-989-758-004-8},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - From Text Vocabularies to Visual Vocabularies - What Basis?
SN - 978-989-758-004-8
AU - Martinet J.
PY - 2014
SP - 668
EP - 675
DO - 10.5220/0004749606680675