EMBEDDED FEATURE SELECTION FOR SPAM AND PHISHING FILTERING USING SUPPORT VECTOR MACHINES

Sebastián Maldonado, Gastón L'Huillier

2012

Abstract

Today, the Internet is full of harmful and wasteful elements, such as phishing and spam messages, which must be properly classified before reaching end-users. This issue has attracted the pattern recognition community’s attention and motivated to determine which strategies achieve best classification results. Several methods use as many features as content-based properties the data set have, which leads to a high dimensional classification problem. In this context, this paper presents a feature selection approach that simultaneously determines a nonlinear classification function with minimal error and minimizes the number of features by penalizing their use in the dual formulation of binary Support Vector Machines (SVM). The method optimizes the width of an anisotropic RBF Kernel via successive gradient descent steps, eliminating features that have low relevance for the model. Experiments with two real-world Spam and Phishing data sets demonstrate that our approach accomplishes the best performance compared to well-known feature selection methods using consistently a small number of features.

Download


Paper Citation


in Harvard Style

Maldonado S. and L'Huillier G. (2012). EMBEDDED FEATURE SELECTION FOR SPAM AND PHISHING FILTERING USING SUPPORT VECTOR MACHINES . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 445-450. DOI: 10.5220/0003782004450450

in Bibtex Style

@conference{icpram12,
author={Sebastián Maldonado and Gastón L'Huillier},
title={EMBEDDED FEATURE SELECTION FOR SPAM AND PHISHING FILTERING USING SUPPORT VECTOR MACHINES},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2012},
pages={445-450},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003782004450450},
isbn={978-989-8425-99-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - EMBEDDED FEATURE SELECTION FOR SPAM AND PHISHING FILTERING USING SUPPORT VECTOR MACHINES
SN - 978-989-8425-99-7
AU - Maldonado S.
AU - L'Huillier G.
PY - 2012
SP - 445
EP - 450
DO - 10.5220/0003782004450450