HOW STATISTICAL INFORMATION FROM THE WEB CAN HELP IDENTIFY NAMED ENTITIES

Mathieu Roche

2011

Abstract

This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of ’People’, ’Places’, ’Organizations’, ’Software’, ’Illnesses’, and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial prepositional collocations from Noun-Noun candidates; (2) Measuring the ”relevance” of the resulting prepositional collocations using statistical methods (Web Mining); (3) Selecting prepositional collocations. The evaluation of Noun-Noun collocations from French and English corpora confirmed the relevance of our system.

Download


Paper Citation


in Harvard Style

Roche M. (2011). HOW STATISTICAL INFORMATION FROM THE WEB CAN HELP IDENTIFY NAMED ENTITIES . In Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WTM, (WEBIST 2011) ISBN 978-989-8425-51-5, pages 685-689. DOI: 10.5220/0003473906850689

in Bibtex Style

@conference{wtm11,
author={Mathieu Roche},
title={HOW STATISTICAL INFORMATION FROM THE WEB CAN HELP IDENTIFY NAMED ENTITIES},
booktitle={Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WTM, (WEBIST 2011)},
year={2011},
pages={685-689},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003473906850689},
isbn={978-989-8425-51-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1: WTM, (WEBIST 2011)
TI - HOW STATISTICAL INFORMATION FROM THE WEB CAN HELP IDENTIFY NAMED ENTITIES
SN - 978-989-8425-51-5
AU - Roche M.
PY - 2011
SP - 685
EP - 689
DO - 10.5220/0003473906850689