SINGULAR VALUE DECOMPOSITION (SVD) AND BLAST - Quite Different Methods Achieving Similar Results

Bráulio Roberto Gonçalves Marinho Couto, Macelo Matos Santoro, Marcos Augusto dos Santos

2011

Abstract

The dominant methods to search for relevant patterns in protein sequences are based on character-by-character matching, performed by software known as BLAST. In this paper, sequences are recoded as p-peptide frequency matrix that is reduced by singular value decomposition (SVD). The objective is to evaluate the association between statistics used by BLAST and similarity metrics used by SVD (Euclidean distance and cosine). We chose BLAST as a standard because this string-matching program is widely used for nucleotide searching and protein databases. Three datasets were used: mitochondrial-gene sequences, non-identical PDB sequences and a Swiss-Prot protein collection. We built scatter graphs and calculated Spearman correlation () with metrics produced by BLAST and SVD. Euclidean distance was negatively correlated with bit score (>-0.6) and positively correlated with E value (>+0.7). Cosine had negative correlation with E value (>-0.7) and positive correlation with bit score (>+0.8). Besides, we made agreement tests between SVD and BLAST in classifying protein families. For the mitochondrial gene database, we achieved a kappa coefficient of 1.0. For the Swiss-Prot sample there is an agreement higher than 80%. The fact that SVD has a strong correlation to BLAST results may represent a possible core technique within a broader algorithm.

Download


Paper Citation


in Harvard Style

Roberto Gonçalves Marinho Couto B., Matos Santoro M. and Augusto dos Santos M. (2011). SINGULAR VALUE DECOMPOSITION (SVD) AND BLAST - Quite Different Methods Achieving Similar Results . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 189-195. DOI: 10.5220/0003162301890195

in Bibtex Style

@conference{bioinformatics11,
author={Bráulio Roberto Gonçalves Marinho Couto and Macelo Matos Santoro and Marcos Augusto dos Santos},
title={SINGULAR VALUE DECOMPOSITION (SVD) AND BLAST - Quite Different Methods Achieving Similar Results},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)},
year={2011},
pages={189-195},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003162301890195},
isbn={978-989-8425-36-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)
TI - SINGULAR VALUE DECOMPOSITION (SVD) AND BLAST - Quite Different Methods Achieving Similar Results
SN - 978-989-8425-36-2
AU - Roberto Gonçalves Marinho Couto B.
AU - Matos Santoro M.
AU - Augusto dos Santos M.
PY - 2011
SP - 189
EP - 195
DO - 10.5220/0003162301890195