A Novel Feature Generation Method for Sequence Classification - Mutated Subsequence Generation

Hao Wan, Carolina Ruiz, Joseph Beck

2014

Abstract

In this paper, we present a new feature generation algorithm for sequence data sets called Mutated Subsequence Generation (MSG). Given a data set of sequences, the MSG algorithm generates features from these sequences by incorporating mutative positions in subsequences. We compare this algorithm with other sequence-based feature generation algorithms, including position-based, k-grams, and k-gapped pairs. Our experiments show that the MSG algorithm outperforms these other algorithms in domains in which presence, not specific location, of sequential patterns discriminate among classes in a data set.

Download


Paper Citation


in Harvard Style

Wan H., Ruiz C. and Beck J. (2014). A Novel Feature Generation Method for Sequence Classification - Mutated Subsequence Generation . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014) ISBN 978-989-758-012-3, pages 68-79. DOI: 10.5220/0004808200680079

in Bibtex Style

@conference{bioinformatics14,
author={Hao Wan and Carolina Ruiz and Joseph Beck},
title={A Novel Feature Generation Method for Sequence Classification - Mutated Subsequence Generation},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)},
year={2014},
pages={68-79},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004808200680079},
isbn={978-989-758-012-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2014)
TI - A Novel Feature Generation Method for Sequence Classification - Mutated Subsequence Generation
SN - 978-989-758-012-3
AU - Wan H.
AU - Ruiz C.
AU - Beck J.
PY - 2014
SP - 68
EP - 79
DO - 10.5220/0004808200680079