On Selecting Helpful Unlabeled Data for Improving Semi-Supervised Support Vector Machines

Thanh-Binh Le, Sang-Woon Kim

2014

Abstract

Recent studies have demonstrated that Semi-Supervised Learning (SSL) approaches that use both labeled and unlabeled data are more effective and robust than those that use only labeled data. However, it is also well known that using unlabeled data is not always helpful in SSL algorithms. Thus, in order to select a small amount of helpful unlabeled samples, various selection criteria have been proposed in the literature. One criterion is based on the prediction by an ensemble classifier and the similarity between pairwise training samples. However, because the criterion is only concerned with the distance information among the samples, sometimes it does not work appropriately, particularly when the unlabeled samples are near the boundary. In order to address this concern, a method of training semi-supervised support vector machines (S3VMs) using selection criterion is investigated; this method is a modified version of that used in SemiBoost. In addition to the quantities of the original criterion, using the estimated conditional class probability, the confidence values of the unlabeled data are computed first. Then, some unlabeled samples that have higher confidences are selected and, together with the labeled data, used for retraining the ensemble classifier. The experimental results, obtained using artificial and real-life benchmark datasets, demonstrate that the proposed mechanism can compensate for the shortcomings of the traditional S3VMs and, compared with previous approaches, can achieve further improved results in terms of classification accuracy.

Download


Paper Citation


in Harvard Style

Le T. and Kim S. (2014). On Selecting Helpful Unlabeled Data for Improving Semi-Supervised Support Vector Machines . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 48-59. DOI: 10.5220/0004810500480059

in Bibtex Style

@conference{icpram14,
author={Thanh-Binh Le and Sang-Woon Kim},
title={On Selecting Helpful Unlabeled Data for Improving Semi-Supervised Support Vector Machines},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={48-59},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004810500480059},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - On Selecting Helpful Unlabeled Data for Improving Semi-Supervised Support Vector Machines
SN - 978-989-758-018-5
AU - Le T.
AU - Kim S.
PY - 2014
SP - 48
EP - 59
DO - 10.5220/0004810500480059