Code Monkey home page Code Monkey logo

real_orl's Introduction

Real World Offline Reinforcement Learning with Realistic Data Source

[Arxiv] [Code] [Data] [Project Page]

teaser_figure

Abstract

Offline reinforcement learning (ORL) holds great promise for robot learning due to its ability to learn from arbitrary pre-generated experience. However, current ORL benchmarks are almost entirely in simulation and utilize contrived datasets like replay buffers of online RL agents or sub-optimal trajectories, and thus hold limited relevance for real-world robotics. In this work (Real-ORL), we posit that data collected from safe operations of closely related tasks are more practical data sources for real-world robot learning. Under these settings, we perform an extensive (6500+ trajectories collected over 800+ robot hours and 270+ human labor hour) empirical study evaluating generalization and transfer capabilities of representative ORL methods on four real-world tabletop manipulation tasks. Our study finds that ORL and imitation learning prefer different action spaces, and that ORL algorithms can generalize from leveraging offline heterogeneous data sources and outperform imitation learning.

Training Dataset

We release the Real-ORL dataset that we used for all of our experiments. The parsed and raw datasets can be downloaded here.

Tasks

Reaching

  • Random initial & goal positions in the air
  • 1000 trajectories

Sliding

  • Random initial & goal positions on the table
  • 731 trajectories

Lifting

  • Random initial positions on the table
  • 616 trajectories

Pick-n-place (pnp)

  • Random initial & goal positions on the table
  • 609 trajectories

Parsed Data

The data is parsed into a python list of dictionaries, where each dictionary is a trajectory. The keys of each dictionary are as follows:

  • 'observations': numpy array of shape (horizon, 7). Contains the robot's jointstates (as absolute joint angles) at each timestep
  • 'actions': numpy array of shape (horizon, 7). Contains the robot's actions (as absolute joint angles) at each timestep
  • 'rewards': numpy array of length (horizon, ). The reward is sparse, which means the reward at all but the last step is 0. We haven’t done any normalization for the reward in the current version
  • 'terminated': numpy array of length (horizon, ). This denotes whether the trajectory is terminated at each timestep, so all the entries is 0 except the last entry being 1

Raw Data

Our raw data contains a Franka Panda robot arm's proprioceptions, actions (absolute joint angles), and image observations from two camera views at each time step. The cameras are calibrated and relevant objects are tracked by AprilTags so we can recover the ground truth position of the objects. We also provide scripts to parse the raw data:

python scripts/parse_pushing.py -f </path/to/raw/pushing/dataset/folder> 

We define our reward functions in rewards/<task>.py for training various offline reinforcement learning policies.

Agents

For all evaluations, we compare four algorithms:

  • Behavior Cloning (BC)
  • Model-based Offline REinforcement Learning (MOREL)
  • Advantage-Weighted Actor Critic (AWAC)
  • Implicit Q-Learning (IQL)

We refer to mjrl's implementation for MOREL and d3rlpy's implementation for AWAC and IQL, and release our code under agents/.

Citation

@misc{zhou2022real,
      title={Real World Offline Reinforcement Learning with Realistic Data Source}, 
      author={Gaoyue Zhou and Liyiming Ke and Siddhartha Srinivasa and Abhinav Gupta and Aravind Rajeswaran and Vikash Kumar},
      year={2022},
      eprint={2210.06479},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

real_orl's People

Contributors

gaoyuezhou avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.