FEATURES FOR NAMED ENTITY RECOGNITION IN CZECH LANGUAGE

Pavel Král

2011

Abstract

This paper deals with Named Entity Recognition (NER). Our work focuses on the application for the Czech News Agency (ˇCTK).We propose and implement a Czech NER system that facilitates the data searching from the ˇCTK text news databases. The choice of the feature set is crucial for the NER task. The main contribution of this work is thus to propose and evaluate some different features for the named entity recognition and to create an “optimal” set of features. We use Conditional Random Fields (CRFs) as a classifier. Our system is tested on a Czech NER corpus with nine main named entity classes. We reached 58% of the F-measure with the best feature set which is sufficient for our target application.

Download


Paper Citation


in Harvard Style

Král P. (2011). FEATURES FOR NAMED ENTITY RECOGNITION IN CZECH LANGUAGE . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011) ISBN 978-989-8425-80-5, pages 437-441. DOI: 10.5220/0003660104370441

in Bibtex Style

@conference{keod11,
author={Pavel Král},
title={FEATURES FOR NAMED ENTITY RECOGNITION IN CZECH LANGUAGE},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011)},
year={2011},
pages={437-441},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003660104370441},
isbn={978-989-8425-80-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011)
TI - FEATURES FOR NAMED ENTITY RECOGNITION IN CZECH LANGUAGE
SN - 978-989-8425-80-5
AU - Král P.
PY - 2011
SP - 437
EP - 441
DO - 10.5220/0003660104370441