Online Knowledge Gradient Exploration in an Unknown Environment

Saba Q. Yahyaa, Bernard Manderick

2014

Abstract

We present online kernel-based LSPI (or least squares policy iteration) which is an extension of offline kernel based LSPI. Online kernel-based LSPI combines characteristics of both online LSPI and offline kernel-based LSPI to improve the convergence rate as well as the optimal policy performances of the online LSPI. Online kernel-based LSPI uses knowledge gradient policy as an exploration policy and the approximate linear dependency based kernel sparsification method to select features automatically. We compare the optimal policy performance of online kernel-based LSPI and online LSPI on 5 discrete Markov decision problems, where online kernel-based LSPI outperforms online LSPI.

Download


Paper Citation


in Harvard Style

Q. Yahyaa S. and Manderick B. (2014). Online Knowledge Gradient Exploration in an Unknown Environment . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 5-13. DOI: 10.5220/0004718700050013

in Bibtex Style

@conference{icaart14,
author={Saba Q. Yahyaa and Bernard Manderick},
title={Online Knowledge Gradient Exploration in an Unknown Environment},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={5-13},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004718700050013},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Online Knowledge Gradient Exploration in an Unknown Environment
SN - 978-989-758-015-4
AU - Q. Yahyaa S.
AU - Manderick B.
PY - 2014
SP - 5
EP - 13
DO - 10.5220/0004718700050013