Approximate Bayes Optimal Policy Search using Neural Networks

Michael Castronovo, Vincent François-Lavet, Raphaël Fonteneau, Damien Ernst, Adrien Couëtoux

2017

Abstract

Bayesian Reinforcement Learning (BRL) agents aim to maximise the expected collected rewards obtained when interacting with an unknown Markov Decision Process (MDP) while using some prior knowledge. State-of-the-art BRL agents rely on frequent updates of the belief on the MDP, as new observations of the environment are made. This offers theoretical guarantees to converge to an optimum, but is computationally intractable, even on small-scale problems. In this paper, we present a method that circumvents this issue by training a parametric policy able to recommend an action directly from raw observations. Artificial Neural Networks (ANNs) are used to represent this policy, and are trained on the trajectories sampled from the prior. The trained model is then used online, and is able to act on the real MDP at a very low computational cost. Our new algorithm shows strong empirical performance, on a wide range of test problems, and is robust to inaccuracies of the prior distribution.

Download


Paper Citation


in Harvard Style

Castronovo M., François-Lavet V., Fonteneau R., Ernst D. and Couëtoux A. (2017). Approximate Bayes Optimal Policy Search using Neural Networks . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 142-153. DOI: 10.5220/0006191701420153

in Bibtex Style

@conference{icaart17,
author={Michael Castronovo and Vincent François-Lavet and Raphaël Fonteneau and Damien Ernst and Adrien Couëtoux},
title={Approximate Bayes Optimal Policy Search using Neural Networks},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2017},
pages={142-153},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006191701420153},
isbn={978-989-758-220-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Approximate Bayes Optimal Policy Search using Neural Networks
SN - 978-989-758-220-2
AU - Castronovo M.
AU - François-Lavet V.
AU - Fonteneau R.
AU - Ernst D.
AU - Couëtoux A.
PY - 2017
SP - 142
EP - 153
DO - 10.5220/0006191701420153