Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

Saba Q. Yahyaa, Madalina M. Drugan, Bernard Manderick

2014

Abstract

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

Download


Paper Citation


in Harvard Style

Q. Yahyaa S., M. Drugan M. and Manderick B. (2014). Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 74-83. DOI: 10.5220/0004796600740083

in Bibtex Style

@conference{icaart14,
author={Saba Q. Yahyaa and Madalina M. Drugan and Bernard Manderick},
title={Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={74-83},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004796600740083},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms
SN - 978-989-758-015-4
AU - Q. Yahyaa S.
AU - M. Drugan M.
AU - Manderick B.
PY - 2014
SP - 74
EP - 83
DO - 10.5220/0004796600740083