Transfer Learning for Bibliographic Information Extraction

Quang-Hong Vuong, Takasu Atsuhiro

2015

Abstract

This paper discusses the problems of analyzing title page layouts and extracting bibliographic information from academic papers. Information extraction is an important task for easily using digital libraries. Sequence analyzers are usually used to extract information from pages. Because we often receive new layouts and the layouts also usually change, it is necessary to have a machenism for self-trainning a new analyzer to achieve a good extraction accuracy. This also makes the management becomes easier. For example, when the new layout is inputed, There is a problem of how we can learn automatically and efficiently to create a new analyzer. This paper focuses on learning a new sequence analyzer automatically by using transfer learning approach. We evaluated the efficiency by testing three academic journals. The results show that the proposed method is effective to self-train a new sequence analyer.

Download


Paper Citation


in Harvard Style

Vuong Q. and Atsuhiro T. (2015). Transfer Learning for Bibliographic Information Extraction . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 374-379. DOI: 10.5220/0005283003740379

in Bibtex Style

@conference{icpram15,
author={Quang-Hong Vuong and Takasu Atsuhiro},
title={Transfer Learning for Bibliographic Information Extraction},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={374-379},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005283003740379},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Transfer Learning for Bibliographic Information Extraction
SN - 978-989-758-076-5
AU - Vuong Q.
AU - Atsuhiro T.
PY - 2015
SP - 374
EP - 379
DO - 10.5220/0005283003740379