A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language

Dante Degl'Innocenti, Dario De Nart, Carlo Tasso

2014

Abstract

Associating meaningful keyphrases to text documents and Web pages is an activity that can significantly increase the accuracy of Information Retrieval, Personalization and Recommender systems, but the growing amount of text data available is too large for an extensive manual annotation. On the other hand, automatic keyphrase generation can significantly support this activity. This task is already performed with satisfactory results by several systems proposed in the literature, however, most of them focuses solely on the English language which represents approximately more than 50% of Web contents. Only few other languages have been investigated and Italian, despite being the ninth most used language on the Web, is not among them. In order to overcome this shortage, we propose a novel multi-language, unsupervised, knowledge-based approach towards keyphrase generation. To support our claims, we developed DIKpE-G, a prototype system which integrates several kinds of knowledge for selecting and evaluating meaningful keyphrases, ranging from linguistic to statistical, meta/structural, social, and ontological knowledge. DIKpE-G performs well over English and Italian texts.

Download


Paper Citation


in Harvard Style

Degl'Innocenti D., De Nart D. and Tasso C. (2014). A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 78-85. DOI: 10.5220/0005077100780085

in Bibtex Style

@conference{kdir14,
author={Dante Degl'Innocenti and Dario De Nart and Carlo Tasso},
title={A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={78-85},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005077100780085},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language
SN - 978-989-758-048-2
AU - Degl'Innocenti D.
AU - De Nart D.
AU - Tasso C.
PY - 2014
SP - 78
EP - 85
DO - 10.5220/0005077100780085