Using Conditional Random Fields with Constraints to Train Support Vector Machines - Locating and Parsing Bibliographic References

Sebastian Lindner

2013

Abstract

This paper shows how bibliographic references can be located in HTML and then be separated into fields. First it is demonstrated, how Conditional Random Fields (CRFs) with constraints and prior knowledge about the bibliographic domain can be used to split bibliographic references into fields e.g. authors and title, when only a few labeled training instances are available. For this purpose an algorithm for automatic keyword extraction and a unique set of features and constraints is introduced. Features and the output of this Conditional Random Field (CRF) for tagging bibliographic references, Part Of Speech (POS) analysis and Named Entity Recognition (NER) are then used to find the bibliographic reference section in an article. First, a separation of the HTML document into blocks of consecutive inline elements is done. Then we compare one machine learning approach using a Support Vector Machines (SVM) with another one using a CRF for the reference locating process. In contrast to other reference locating approches, our method can even cope with single reference entries in a document or with multiple reference sections. We show that our reference location process achieves very good results, while the reference tagging approach is able to compete with other state-of-the-art approaches and sometimes even outperforms them.

Download


Paper Citation


in Harvard Style

Lindner S. (2013). Using Conditional Random Fields with Constraints to Train Support Vector Machines - Locating and Parsing Bibliographic References . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 28-36. DOI: 10.5220/0004546100280036

in Bibtex Style

@conference{kdir13,
author={Sebastian Lindner},
title={Using Conditional Random Fields with Constraints to Train Support Vector Machines - Locating and Parsing Bibliographic References},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={28-36},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004546100280036},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Using Conditional Random Fields with Constraints to Train Support Vector Machines - Locating and Parsing Bibliographic References
SN - 978-989-8565-75-4
AU - Lindner S.
PY - 2013
SP - 28
EP - 36
DO - 10.5220/0004546100280036