Structured Value-based Planning and Reinforcement Learning

Harnessing Structures for Value-Based Planning and Reinforcement Learning

Yuzhe Yang Guo Zhang Zhi Xu Dina Katabi

MIT CSAIL

Abstract

Value-based methods constitute a fundamental methodology in planning and deep reinforcement learning (RL). In this paper, we propose to exploit the underlying structures of the state-action value function, i.e., Q function, for both planning and deep RL. In particular, if the underlying system dynamics lead to some global structures of the Q function, one should be capable of inferring the function better by leveraging such structures. Specifically, we investigate the low-rank structure, which widely exists for big data matrices. We verify empirically the existence of low-rank Q functions in the context of control and deep RL tasks. As our key contribution, by leveraging Matrix Estimation (ME) techniques, we propose a general framework to exploit the underlying low-rank structure in Q functions. This leads to a more efficient planning procedure for classical control, and additionally, a simple scheme that can be applied to value-based RL techniques to consistently achieve better performance on "low-rank" tasks. Extensive experiments on control tasks and Atari games confirm the efficacy of our approach.

Paper

Harnessing Structures for Value-Based Planning and Reinforcement Learning
Yuzhe Yang, Guo Zhang, Zhi Xu, and Dina Katabi
International Conference on Learning Representations (ICLR 2020)
Oral Presentation (top 1.8%)
[Paper] • [OpenReview] • [arXiv] • [Code] • [Slides] • [BibTeX]

Talk

Representative Results

Figure 1. Policy heatmap & metric comparison between optimal & SVP policy.

Figure 2. Results of SV-RL on various value-based deep RL techniques. First row: results on DQN. Second row: results on double DQN. Third row: results on dueling DQN.

Figure 3. Diagnosis on ranks vs. improvements across different games. More structured games (with lower rank) can achieve better performance with SV-RL.

Downloads

Source Code: GitHub Repo
Slides: Slides

Also Check Out

ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation
Yuzhe Yang, Guo Zhang, Zhi Xu, and Dina Katabi
ICML 2019 • [Paper] • [Slides] • [Poster] • [Talk]