A Multi-fonts Kanji Character Recognition Method for Early-modern Japanese Printed Books with Ruby Characters

Taeka Awazu, Manami Fukuo, Masami Takata, Kazuki Joe

2014

Abstract

The web site of National Diet Library in Japan provides a lot of early-modern (AD1868-1945) Japanese printed books to the public, but full-text search is essentially impossible. In order to perform advanced search for historical literatures, the automatic textualization of the images is required. However, the ruby system, which is peculiar to Japanese books, gives a serious obstacle against the textualization. When we apply existing OCRs to early-modern Japanese printed books, the recognition rate is extremely low. To solve this problem, we have already proposed a multi-font Kanji character recognition method using the PDC feature and an SVM. In this paper, we propose a ruby character removal method for early-modern Japanese printed books using genetic programming, and evaluate our multi-fonts Kanji character recognition method with 1,000 types of early-modern Japanese printed Kanji characters.

Download


Paper Citation


in Harvard Style

Awazu T., Fukuo M., Takata M. and Joe K. (2014). A Multi-fonts Kanji Character Recognition Method for Early-modern Japanese Printed Books with Ruby Characters . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 637-645. DOI: 10.5220/0004825306370645

in Bibtex Style

@conference{icpram14,
author={Taeka Awazu and Manami Fukuo and Masami Takata and Kazuki Joe},
title={A Multi-fonts Kanji Character Recognition Method for Early-modern Japanese Printed Books with Ruby Characters},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={637-645},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004825306370645},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Multi-fonts Kanji Character Recognition Method for Early-modern Japanese Printed Books with Ruby Characters
SN - 978-989-758-018-5
AU - Awazu T.
AU - Fukuo M.
AU - Takata M.
AU - Joe K.
PY - 2014
SP - 637
EP - 645
DO - 10.5220/0004825306370645