Alternative PPM Model for Quality Score Compression
Mete Akgün, Mahmut Şamil Sağıroğlu
2013
Abstract
Next Generation Sequencing (NGS) platforms generate header data and quality information for each nucleotide sequence. These platforms may produce gigabyte-scale datasets. The storage of these datasets is one of the major bottlenecks of NGS technology. Information produced by NGS are stored in FASTQ format. In this paper, we propose an algorithm to compress quality score information stored in a FASTQ file. We try to find a model that gives the lowest entropy on quality score data. We combine our powerful statistical model with arithmetic coding to compress the quality score data the smallest. We compare its performance to text compression utilities such as bzip2, gzip and ppmd and existing compression algorithms for quality scores. We show that the performance of our compression algorithm is superior to that of both systems.
DownloadPaper Citation
in Harvard Style
Akgün M. and Şamil Sağıroğlu M. (2013). Alternative PPM Model for Quality Score Compression . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 122-126. DOI: 10.5220/0004221601220126
in Bibtex Style
@conference{bioinformatics13,
author={Mete Akgün and Mahmut Şamil Sağıroğlu},
title={Alternative PPM Model for Quality Score Compression},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={122-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004221601220126},
isbn={978-989-8565-35-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Alternative PPM Model for Quality Score Compression
SN - 978-989-8565-35-8
AU - Akgün M.
AU - Şamil Sağıroğlu M.
PY - 2013
SP - 122
EP - 126
DO - 10.5220/0004221601220126