Code Monkey home page Code Monkey logo

ls-iq's Introduction

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

This is the official code base of the paper LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning, which was presented at the eleventh International Conference on Learning Representations (ICLR 2023) in Kigali Ruanda. Here, we also provide all the baselines for the LocoMuJoCo imitation learning benchmark LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion presented at the Robot Learning workshop at NeurIPS 2023.


Divergence_Minimization

Method

Within this work, we analyze the effect of a squared norm regularizer on the implicit reward function in the inverse reinforcement learning setting. We build on previous work (IQ-Learn), and show that this regularizer results in a minimzation of the Chi^2-Divergence between the expert and a mixture distribution. We show that - unlike previously used divergences - this divergence is bounded and the resulting reward function is also bounded. An example is given in the picture above, where the target distribution is blue, the current policy distribution is green, and the mixture is orange. As can be seen, the vanilla Chi^2 divergence can reach very high values - despite the support area being non-zero - while the divergence on the mixture is bounded. Both optimization share the same optimal solution.

Also, this regularizer provides a particularly illuminating perspective: the original objective can be understood as squared Bellman error minimization with fixed rewards for the expert and the policy. This setting can be further used to stabilize training as shown in our paper.

Key Advatanages

✅ Simple implementation on top of SAC
✅ Bounded objective with bounded reward yields stable and convenient training
✅ Retains performance even without expert actions
✅ Performs even when only 1 expert trajectory is given
✅ Works in complex and realistic environments such as on the Atlas Locomotion task
✅ Unlike previous methods, no survival bias!


Installation

You can install this repo by cloning and then

cd ls-iq
pip install -e .

Download the Datasets [not needed for LocoMuJoCo]

In order to run the examples and reproduce the results, you have to download the datasets used in our paper. To do so, you have to install gdown:

pip install gdown

Then you can just run the download script:

chmod u+x ./download_data.sh
./download_data.sh

Examples

You can find launcher files in the example folder to launch all different versions of LSIQ and to reproduce the main results of the paper.

Here is how you run the training of LSIQ with 5 expert trajectories on all Mujoco Gym Tasks:

cd examples/02_episode_5/
python launcher.py

To monitor the training, you have to use Tensorboard. Once the training is launched, the directory logs will be created, which contains the Tensorboard logging data. Here is how you run Tensorboard:

tensorboard --logdir logs

Some experiments were such as the Atlas locomotion task were conducted on environment, which are yet not available on Mushroom-RL, but will be available soon! Once the environments are part of Mushroom-RL, the experiment files will be added here. Follow Mushroom-RL on Twitter @Mushroom_RL to immediately get notified once the new environment package is available!


Citation

@inproceedings{alhafez2023,
title={LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning},
author={Firas Al-Hafez and Davide Tateo and Oleg Arenz and Guoping Zhao and Jan Peters},
booktitle={Eleventh International Conference on Learning Representations (ICLR)},
year={2023},
url={https://openreview.net/pdf?id=o3Q4m8jg4BR}}

ls-iq's People

Contributors

robfiras avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.