Download PDFOpen PDF in browserCurrent versionLearning to Self-Modify Rewards with Implicit GradientsEasyChair Preprint 8260, version 110 pages•Date: June 12, 2022AbstractReward shaping is a powerful technique for efficient learning of optimal policies in sequential decision-making. However, it is challenging to design auxiliary rewards to help the agent, and often needs considerable time and effort by domain experts. In this paper, we build on the optimal rewards methodology to adapt a given reward function. This problem can be naturally formulated as a meta-learning problem and solved in a bi-level optimization framework. However, standard approaches used in literature for these problems are not scalable. Hence we propose to use an implicit-gradient technique to solve this problem. We demonstrate the effectiveness of our method in both a) learning optimal rewards and b) adaptive reward shaping. Keyphrases: Reinforcement Learning, Reward Shaping, meta-learning
|