PaperClip: Automated Dossier Reorganizing

Wessel Stoop, Iris Hendrickx, Tom van Ees

2017

Abstract

We investigate the creation of a robust algorithm for document identification and page ordering in a digital mail room in the banking sector. PaperClip is a system that takes files containing pages of various documents as input, and returns multiple files that contain all the pages of one document in the correct order. PaperClip performs (1) document type classification and (2) page number classification on each page, and then (3) merges the results. We experimented with various algorithms and methods for these three steps and we performed an elaborate evaluation to measure different aspects of the methods. The best performing setup achieved a cut F-score of 86\% and a V-measure of 0.91\% . This is high enough to fulfill business needs of the banking sector.

Download


Paper Citation


in Harvard Style

Stoop W., Hendrickx I. and van Ees T. (2017). PaperClip: Automated Dossier Reorganizing . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 471-478. DOI: 10.5220/0006195904710478

in Bibtex Style

@conference{icpram17,
author={Wessel Stoop and Iris Hendrickx and Tom van Ees},
title={PaperClip: Automated Dossier Reorganizing},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={471-478},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006195904710478},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - PaperClip: Automated Dossier Reorganizing
SN - 978-989-758-222-6
AU - Stoop W.
AU - Hendrickx I.
AU - van Ees T.
PY - 2017
SP - 471
EP - 478
DO - 10.5220/0006195904710478