Download PDFOpen PDF in browserLearning to Self-Modify Rewards with Bi-Level GradientsEasyChair Preprint 8260, version 210 pages•Date: January 21, 2023AbstractReward shaping is a technique used to improve the efficiency of learning optimal policies in sequential decision-making problems. However, it can be difficult to design auxiliary rewards that effectively guide the agent's learning, and this often requires significant time and expertise from domain experts. In this paper, we propose an approach based on the optimal rewards methodology that learns a new reward function for better learning by adapting a given reward function. This can be formulated as a meta-learning problem, and we propose to solve it using a bi-level optimization framework. However, standard methods used in literature for this type of problem are not scalable, so we propose to use an implicit-gradient technique. Our method is shown to be effective in both a) learning optimal rewards and b) adaptive reward shaping. Keyphrases: Reinforcement Learning, Reward Shaping, meta-learning
|