A Methodology for Optimizing the Cost Matrix in Cost Sensitive Learning Models applied to Prediction of Molecular Functions in Embryophyta Plants

S. García-López, J. A. Jaramillo-Garzón, L. Duque-Muñoz, C. G. Castellanos-Domínguez

2013

Abstract

Due to the large amount of data generated by genomics and proteomics research, the use of computational methods has been a great support tool for this purpose. However, tools based on machine learning, face several problems associated to the nature of the data, one of them is the class-imabalance problem. Several balancing techniques exist to obtain an improvement in prediction performance, such as boosting and resampling, but they have multiple weaknesses in difficult data spaces. On the other hand, cost sensitive learning is an alternative solution, yet, the obtention of appropriate cost matrix to induce a good prediction model is complex, and still remains an open problem. In this paper, a methodology to obtain an optimal cost matrix to train models based on cost sensitive learning is proposed. The results show that cost sensitive learning with a proper cost can be very competitive, and even outperform many class-balance strategies in the state of the art. Tests were applied to prediction of molecular functions in Embryophyta plants.

Download


Paper Citation


in Harvard Style

García-López S., A. Jaramillo-Garzón J., Duque-Muñoz L. and G. Castellanos-Domínguez C. (2013). A Methodology for Optimizing the Cost Matrix in Cost Sensitive Learning Models applied to Prediction of Molecular Functions in Embryophyta Plants . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 71-80. DOI: 10.5220/0004250900710080

in Bibtex Style

@conference{bioinformatics13,
author={S. García-López and J. A. Jaramillo-Garzón and L. Duque-Muñoz and C. G. Castellanos-Domínguez},
title={A Methodology for Optimizing the Cost Matrix in Cost Sensitive Learning Models applied to Prediction of Molecular Functions in Embryophyta Plants},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},
year={2013},
pages={71-80},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004250900710080},
isbn={978-989-8565-35-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - A Methodology for Optimizing the Cost Matrix in Cost Sensitive Learning Models applied to Prediction of Molecular Functions in Embryophyta Plants
SN - 978-989-8565-35-8
AU - García-López S.
AU - A. Jaramillo-Garzón J.
AU - Duque-Muñoz L.
AU - G. Castellanos-Domínguez C.
PY - 2013
SP - 71
EP - 80
DO - 10.5220/0004250900710080