Two Novel Techniques for Space Compaction on Biological Sequences

George Volis, Christos Makris, Andreas Kanavos

2016

Abstract

The number and size of genomic databases have grown rapidly the last years. Consequently, the number of Internet-accessible databases has been rapidly growing .Therefore there is a need for satisfactory methods for managing this growing information. A lot of effort has been put to this direction. Contributing to this effort this paper presents two algorithms which can eliminate the amount of space for storing genomic information. Our first algorithm is based on the classic n-grams/2L technique for indexing a DNA sequence and it can convert the Inverted Index of this classic algorithm to a more compressed format. Researchers have revealed the existence of repeated and palindrome patterns in DNA of living organisms. The main motivation of this technique is based on this remark and proposes an alternative data structure for handling these sequences. Our experimental results show that our algorithm can achieve a more efficient index than the n-grams/2L algorithm and can be adapted by any algorithm that is based to n-grams/2L The second algorithm is based on the n-grams technique. Perceiving the four symbols of DNA alphabet as vertex of a square scheme imprint a DNA sequence as a relation between vertices, sides and diagonals of a square. The experimental results shows that this second idea succeed even more successfully compression of our index structure.

Download


Paper Citation


in Harvard Style

Volis G., Makris C. and Kanavos A. (2016). Two Novel Techniques for Space Compaction on Biological Sequences . In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-186-1, pages 105-112. DOI: 10.5220/0005801101050112

in Bibtex Style

@conference{webist16,
author={George Volis and Christos Makris and Andreas Kanavos},
title={Two Novel Techniques for Space Compaction on Biological Sequences},
booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2016},
pages={105-112},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005801101050112},
isbn={978-989-758-186-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Two Novel Techniques for Space Compaction on Biological Sequences
SN - 978-989-758-186-1
AU - Volis G.
AU - Makris C.
AU - Kanavos A.
PY - 2016
SP - 105
EP - 112
DO - 10.5220/0005801101050112