In this talk, I summarize recent advances in the analysis of regret and sample complexity in structured Reinforcement Learning (RL) problems with finite state and action spaces. Specifically, we explain how to derive generic information-theoretical performance limits, in terms of regret or sample complexity. These limits are made explicit for Lipschitz RL problems, and reveal that it is actually theoretical possible to devise an algorithm whose regret does not scale with the sizes of state and action spaces, while keeping the dependence w.r.t. the time horizon logarithmic. An algorithm with such performance guarantees is proposed.
This is joint work with Jungseul Ok (University of Washington).
Back to the workshop page