LEARNING TO PLAY K-ARMED BANDIT PROBLEMS

Francis Maes, Louis Wehenkel, Damien Ernst

2012

Abstract

We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm first samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a fixed number of Bernoulli arms, a fixed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and en-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information.

Download


Paper Citation


in Harvard Style

Maes F., Wehenkel L. and Ernst D. (2012). LEARNING TO PLAY K-ARMED BANDIT PROBLEMS . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-95-9, pages 74-81. DOI: 10.5220/0003733500740081

in Bibtex Style

@conference{icaart12,
author={Francis Maes and Louis Wehenkel and Damien Ernst},
title={LEARNING TO PLAY K-ARMED BANDIT PROBLEMS},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2012},
pages={74-81},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003733500740081},
isbn={978-989-8425-95-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - LEARNING TO PLAY K-ARMED BANDIT PROBLEMS
SN - 978-989-8425-95-9
AU - Maes F.
AU - Wehenkel L.
AU - Ernst D.
PY - 2012
SP - 74
EP - 81
DO - 10.5220/0003733500740081