AN EXTENSIVE COMPARISON OF METRICS FOR AUTOMATIC EXTRACTION OF KEY TERMS
Luis F. S. Teixeira, Gabriel P. Lopes, Rita A. Ribeiro
2012
Abstract
In this paper we compare twenty language independent statistical-based metrics for key term extraction from any document collection. While some of those metrics are widely used, others were recently created. Two different document representations are considered in our experiments. One is based on words and multi-words and the other is based on word prefixes of fixed length (5 characters for the experiments made) for handling morphologically rich languages, namely Portuguese and Czech. English is also experimented, as a non-morphologically rich language. Results are manually evaluated and agreement between evaluators is assessed using k-Statistics. The metrics based on Tf-Idf and Phi-square proved to have higher precision and recall. The use of prefix-based representation of documents enabled a significant improvement for documents written in Portuguese.
DownloadPaper Citation
in Harvard Style
F. S. Teixeira L., P. Lopes G. and A. Ribeiro R. (2012). AN EXTENSIVE COMPARISON OF METRICS FOR AUTOMATIC EXTRACTION OF KEY TERMS . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-95-9, pages 55-63. DOI: 10.5220/0003720400550063
in Bibtex Style
@conference{icaart12,
author={Luis F. S. Teixeira and Gabriel P. Lopes and Rita A. Ribeiro},
title={AN EXTENSIVE COMPARISON OF METRICS FOR AUTOMATIC EXTRACTION OF KEY TERMS},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2012},
pages={55-63},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003720400550063},
isbn={978-989-8425-95-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - AN EXTENSIVE COMPARISON OF METRICS FOR AUTOMATIC EXTRACTION OF KEY TERMS
SN - 978-989-8425-95-9
AU - F. S. Teixeira L.
AU - P. Lopes G.
AU - A. Ribeiro R.
PY - 2012
SP - 55
EP - 63
DO - 10.5220/0003720400550063