Alternative PPM Model for Quality Score Compression

Mete Akgün, Mahmut Şamil Sağıroğlu

2013

Abstract

Next Generation Sequencing (NGS) platforms generate header data and quality information for each nucleotide sequence. These platforms may produce gigabyte-scale datasets. The storage of these datasets is one of the major bottlenecks of NGS technology. Information produced by NGS are stored in FASTQ format. In this paper, we propose an algorithm to compress quality score information stored in a FASTQ file. We try to find a model that gives the lowest entropy on quality score data. We combine our powerful statistical model with arithmetic coding to compress the quality score data the smallest. We compare its performance to text compression utilities such as bzip2, gzip and ppmd and existing compression algorithms for quality scores. We show that the performance of our compression algorithm is superior to that of both systems.

Download


Paper Citation


in Harvard Style

Akgün M. and Şamil Sağıroğlu M. (2013). Alternative PPM Model for Quality Score Compression . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 122-126. DOI: 10.5220/0004221601220126

in Bibtex Style

@conference{bioinformatics13,
author={Mete Akgün and Mahmut Şamil Sağıroğlu},
title={Alternative PPM Model for Quality Score Compression},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={122-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004221601220126},
isbn={978-989-8565-35-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Alternative PPM Model for Quality Score Compression
SN - 978-989-8565-35-8
AU - Akgün M.
AU - Şamil Sağıroğlu M.
PY - 2013
SP - 122
EP - 126
DO - 10.5220/0004221601220126