On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution?

Vojtěch Kovář, Miloš Jakubíček, Aleš Horák

2016

Abstract

The paper discusses problems in state of the art evaluation methods used in natural language processing (NLP). Usually, some form of gold standard data is used for evaluation of various NLP tasks, ranging from morphological annotation to semantic analysis. We discuss problems and validity of this type of evaluation, for various tasks, and illustrate the problems on examples. Then we propose using application-driven evaluations, wherever it is possible. Although it is more expensive, more complicated and not so precise, it is the only way to find out if a particular tool is useful at all.

Download


Paper Citation


in Harvard Style

Kovář V., Jakubíček M. and Horák A. (2016). On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution? . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-172-4, pages 540-545. DOI: 10.5220/0005824805400545

in Bibtex Style

@conference{icaart16,
author={Vojtěch Kovář and Miloš Jakubíček and Aleš Horák},
title={On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution?},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2016},
pages={540-545},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005824805400545},
isbn={978-989-758-172-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution?
SN - 978-989-758-172-4
AU - Kovář V.
AU - Jakubíček M.
AU - Horák A.
PY - 2016
SP - 540
EP - 545
DO - 10.5220/0005824805400545