INITIAL EXPERIMENTS WITH EXTRACTION OF STOPWORDS IN HEBREW

Yaakov HaCohen-Kerner, Shmuel Yishai Blitz

2010

Abstract

Stopwords are regarded as meaningless in terms of information retrieval. Various stopword lists have been constructed for English and a few other languages. However, to the best of our knowledge, no stopword list has been constructed for Hebrew. In this ongoing work, we present an implementation of three baseline methods that attempt to extract stopwords for a data set containing Israeli daily news. Two of the methods are state-of-the-art methods previously applied to other languages and the third method is proposed by the authors. Comparison of the behavior of these three methods to the behavior of the Zipf's law shows that Zipf’s succeeds to describe the distribution of the top occurring words according to these methods.

Download


Paper Citation


in Harvard Style

HaCohen-Kerner Y. and Yishai Blitz S. (2010). INITIAL EXPERIMENTS WITH EXTRACTION OF STOPWORDS IN HEBREW . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 449-453. DOI: 10.5220/0003093104490453

in Bibtex Style

@conference{kdir10,
author={Yaakov HaCohen-Kerner and Shmuel Yishai Blitz},
title={INITIAL EXPERIMENTS WITH EXTRACTION OF STOPWORDS IN HEBREW },
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={449-453},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003093104490453},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - INITIAL EXPERIMENTS WITH EXTRACTION OF STOPWORDS IN HEBREW
SN - 978-989-8425-28-7
AU - HaCohen-Kerner Y.
AU - Yishai Blitz S.
PY - 2010
SP - 449
EP - 453
DO - 10.5220/0003093104490453