Online Knowledge Gradient Exploration in an Unknown Environment
Saba Q. Yahyaa, Bernard Manderick
2014
Abstract
We present online kernel-based LSPI (or least squares policy iteration) which is an extension of offline kernel based LSPI. Online kernel-based LSPI combines characteristics of both online LSPI and offline kernel-based LSPI to improve the convergence rate as well as the optimal policy performances of the online LSPI. Online kernel-based LSPI uses knowledge gradient policy as an exploration policy and the approximate linear dependency based kernel sparsification method to select features automatically. We compare the optimal policy performance of online kernel-based LSPI and online LSPI on 5 discrete Markov decision problems, where online kernel-based LSPI outperforms online LSPI.
DownloadPaper Citation
in Harvard Style
Q. Yahyaa S. and Manderick B. (2014). Online Knowledge Gradient Exploration in an Unknown Environment . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 5-13. DOI: 10.5220/0004718700050013
in Bibtex Style
@conference{icaart14,
author={Saba Q. Yahyaa and Bernard Manderick},
title={Online Knowledge Gradient Exploration in an Unknown Environment},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={5-13},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004718700050013},
isbn={978-989-758-015-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Online Knowledge Gradient Exploration in an Unknown Environment
SN - 978-989-758-015-4
AU - Q. Yahyaa S.
AU - Manderick B.
PY - 2014
SP - 5
EP - 13
DO - 10.5220/0004718700050013