A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language
Dante Degl'Innocenti, Dario De Nart, Carlo Tasso
2014
Abstract
Associating meaningful keyphrases to text documents and Web pages is an activity that can significantly increase the accuracy of Information Retrieval, Personalization and Recommender systems, but the growing amount of text data available is too large for an extensive manual annotation. On the other hand, automatic keyphrase generation can significantly support this activity. This task is already performed with satisfactory results by several systems proposed in the literature, however, most of them focuses solely on the English language which represents approximately more than 50% of Web contents. Only few other languages have been investigated and Italian, despite being the ninth most used language on the Web, is not among them. In order to overcome this shortage, we propose a novel multi-language, unsupervised, knowledge-based approach towards keyphrase generation. To support our claims, we developed DIKpE-G, a prototype system which integrates several kinds of knowledge for selecting and evaluating meaningful keyphrases, ranging from linguistic to statistical, meta/structural, social, and ontological knowledge. DIKpE-G performs well over English and Italian texts.
DownloadPaper Citation
in Harvard Style
Degl'Innocenti D., De Nart D. and Tasso C. (2014). A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 78-85. DOI: 10.5220/0005077100780085
in Bibtex Style
@conference{kdir14,
author={Dante Degl'Innocenti and Dario De Nart and Carlo Tasso},
title={A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={78-85},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005077100780085},
isbn={978-989-758-048-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language
SN - 978-989-758-048-2
AU - Degl'Innocenti D.
AU - De Nart D.
AU - Tasso C.
PY - 2014
SP - 78
EP - 85
DO - 10.5220/0005077100780085