Learning to Self-Modify Rewards with Bi-Level Gradients

EasyChair Preprint 8260, version 2

Versions: 12→history

10 pages•Date: January 21, 2023

Abstract

Reward shaping is a technique used to improve the efficiency of learning optimal policies in sequential decision-making problems. However, it can be difficult to design auxiliary rewards that effectively guide the agent's learning, and this often requires significant time and expertise from domain experts. In this paper, we propose an approach based on the optimal rewards methodology that learns a new reward function for better learning by adapting a given reward function. This can be formulated as a meta-learning problem, and we propose to solve it using a bi-level optimization framework. However, standard methods used in literature for this type of problem are not scalable, so we propose to use an implicit-gradient technique. Our method is shown to be effective in both a) learning optimal rewards and b) adaptive reward shaping.

Keyphrases: Reinforcement Learning, Reward Shaping, meta-learning

Links:

https://easychair.org/publications/preprint/kgbHG

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:8260,
  author    = {Aiden Boyd and Shibani and Will Callaghan},
  title     = {Learning to Self-Modify Rewards with Bi-Level Gradients},
  howpublished = {EasyChair Preprint 8260},
  year      = {EasyChair, 2023}}

Download PDF Open PDF in browser