Extraction of Homogeneous Regions in Historical Document Images

Maroua Mehri, Pierre Héroux, Nabil Sliti, Petra Gomez-Krämer, Najoua Essoukri Ben Amara, Rémy Mullot

2015

Abstract

To reach the objective of ensuring the indexing and retrieval of digitized resources and offering a structured access to large sets of cultural heritage documents, a raising interest to historical document image segmentation has been generated. In fact, there is a real need for automatic algorithms ensuring the identification of homogenous regions or similar groups of pixels sharing some visual characteristics from historical documents (i.e. distinguishing graphic types, segmenting graphical regions from textual ones, and discriminating text in a variety of situations of different fonts and scales). Indeed, determining graphic regions can help to segment and analyze the graphical part in historical heritage, while finding text zones can be used as a pre-processing stage for character recognition, text line extraction, handwriting recognition, etc. Thus, we propose in this article an automatic segmentation method for historical document images based on extraction of homogeneous or similar content regions. The proposed algorithm is based on using simple linear iterative clustering (SLIC) superpixels, Gabor filters, multi-scale analysis, majority voting technique, connected component analysis, color layer separation, and an adaptive run-length smoothing algorithm (ARLSA). It has been evaluated on 1000 pages of historical documents and achieved interesting results.

Download


Paper Citation


in Harvard Style

Mehri M., Héroux P., Sliti N., Gomez-Krämer P., Ben Amara N. and Mullot R. (2015). Extraction of Homogeneous Regions in Historical Document Images . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-091-8, pages 47-54. DOI: 10.5220/0005265500470054

in Bibtex Style

@conference{visapp15,
author={Maroua Mehri and Pierre Héroux and Nabil Sliti and Petra Gomez-Krämer and Najoua Essoukri Ben Amara and Rémy Mullot},
title={Extraction of Homogeneous Regions in Historical Document Images},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={47-54},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005265500470054},
isbn={978-989-758-091-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)
TI - Extraction of Homogeneous Regions in Historical Document Images
SN - 978-989-758-091-8
AU - Mehri M.
AU - Héroux P.
AU - Sliti N.
AU - Gomez-Krämer P.
AU - Ben Amara N.
AU - Mullot R.
PY - 2015
SP - 47
EP - 54
DO - 10.5220/0005265500470054