EXTRACTION OF OBJECTS AND PAGE SEGMENTATION OF COMPOSITE DOCUMENTS WITH NON-UNIFORM BACKGROUND

Yasser Alginahi, Maher Sid-Ahmed, Majid Ahmadi

2005

Abstract

In designing page segmentation systems for documents with complex background and poor illumination, separating the background from the objects (text and images) is very crucial for the success of such system. The new local based neural binarization technique developed by the authors will be used to extract the objects from document images with complex backgrounds. This algorithm uses statistical and textural feature measures to obtain a feature vector for each pixel from a window of size (2n + 1) × (2n + 1) , where n ≥ 1 . These features provide a local understanding of pixels from their neighbourhoods making it easier to classify each pixel into its proper class. A Multi-Layer Perceptron Neural Network (MLP NN) is then used to classify each pixel in the image. The results of thresholding are then passed to a block segmentation stage. The block segmentation technique developed is a feature-based method that uses a Neural Network classifier to automatically segment and classify the image contents into text and halftone images. The results of page segmentation are then ready to be passed into an OCR system that will convert the text image into a format the can be stored and modified.

Download


Paper Citation


in Harvard Style

Alginahi Y., Sid-Ahmed M. and Ahmadi M. (2005). EXTRACTION OF OBJECTS AND PAGE SEGMENTATION OF COMPOSITE DOCUMENTS WITH NON-UNIFORM BACKGROUND . In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 3: ICINCO, ISBN 972-8865-31-7, pages 344-347. DOI: 10.5220/0001167903440347

in Bibtex Style

@conference{icinco05,
author={Yasser Alginahi and Maher Sid-Ahmed and Majid Ahmadi},
title={EXTRACTION OF OBJECTS AND PAGE SEGMENTATION OF COMPOSITE DOCUMENTS WITH NON-UNIFORM BACKGROUND},
booktitle={Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 3: ICINCO,},
year={2005},
pages={344-347},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001167903440347},
isbn={972-8865-31-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 3: ICINCO,
TI - EXTRACTION OF OBJECTS AND PAGE SEGMENTATION OF COMPOSITE DOCUMENTS WITH NON-UNIFORM BACKGROUND
SN - 972-8865-31-7
AU - Alginahi Y.
AU - Sid-Ahmed M.
AU - Ahmadi M.
PY - 2005
SP - 344
EP - 347
DO - 10.5220/0001167903440347