A Model Based Approach To Exploration Of Continuous-State MDPs Using Divergence-To-Go
Matthew Emigh, Evan Kriminger, Jose Carlos Principe

In reinforcement learning, exploration is typically conducted by taking occasional random actions. The literature lacks an exploration method driven by uncertainty, in which exploratory actions explicitly seek to improve the learning process in a sequential decision problem. In this paper, we propose a framework called Divergence-to-Go, which is a model-based method that uses recursion similarly to dynamic programming to quantify the uncertainty associated with each state-action pair. Information-theoretic estimators of uncertainty allow our method to function even in large, continuous spaces. The performance is demonstrated on a maze and mountain car task.