google-research / rl-reliability-metrics Goto Github PK

The RL Reliability Metrics library provides a set of metrics for measuring the reliability of reinforcement learning (RL) algorithms, as well as statistical tools for comparing algorithms and for computing confidence intervals on these metrics.

License: Apache License 2.0

Shell 1.82% Python 98.18%

rl-reliability-metrics's Introduction

Google Research

This repository contains code released by Google Research.

All datasets in this repository are released under the CC BY 4.0 International license, which can be found here: https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this repository are released under the Apache 2.0 license, the text of which can be found in the LICENSE file.

Because the repo is large, we recommend you download only the subdirectory of interest:

SUBDIR=foo
svn export https://github.com/google-research/google-research/trunk/$SUBDIR

If you'd like to submit a pull request, you'll need to clone the repository; we recommend making a shallow clone (without history).

git clone [email protected]:google-research/google-research.git --depth=1

Disclaimer: This is not an official Google product.

Updated in 2023.

rl-reliability-metrics's People

Contributors

Stargazers

Watchers

rl-reliability-metrics's Issues

Non-RL task

Thanks for releasing the code.
I wonder how we can use your code for Non-RL tasks? Let's say a conventional classification task on CIFAR 100.

where to place sample data

Hi -

In order to further learn about what is described in your paper, I am attempting to replicate your results using your sample data. I am having trouble getting it to run. Where should I place the sample data in the directory tree? Also, I am using Windows, so I will be using different shell commands, if that makes a difference.

Thanks!

Issues with the instructions for running the mujoco example

Hi there!

We're 5th semester bachelor students and would like to use your performance metrics in analyzing different versions of our DQN implementation for the LunarLander-V2 gym. Unfortunately, we got stuck when trying to make the example for mujoco work, namely right at the beginning (Step 0-3 ish, we're super confused where we currently are at).

Would it be possible for you to guide us through this example? :-)

-- Philipp

ValueError when running evaluate_metrics.py

Descrirption of Data

Consists of training logs collected for diffferent algorithms on the OpenAI Gym Box2D environments.
We ran the training process for each algorithm-environment pair for 3 runs each.
For some algorithm-environments, the runs may not have run for the exact same number of steps.

Python Version: 3.8.5
Operating System: Ubuntu 18.04 LTS

Error Log:

rl_reliability_metrics/metrics/metric_utils.py", line 466, in apply_window_fn
    raise ValueError(
ValueError: No timepoints exist in window of size 99000 at eval_point 50000 for curve 0 of 1: [[ 6.53530000e+04  9.81210000e+04  1.30754000e+05  1.63498000e+05
   1.96265000e+05  2.28937000e+05  2.61653000e+05  2.94417000e+05
   3.27138000e+05  6.54448000e+05  9.81823000e+05  1.30926000e+06
   1.63672200e+06  1.96418700e+06  2.29164200e+06  2.61909300e+06
   2.94655100e+06  3.27400300e+06  3.60144900e+06  3.92891300e+06
   4.25636600e+06  4.58379400e+06  4.91123400e+06  5.23866200e+06
   5.56607600e+06  5.89349700e+06  6.22091900e+06  6.54832800e+06
   6.87572300e+06  7.20312400e+06  7.53053400e+06  7.85794700e+06
   8.18535100e+06  8.51275300e+06  8.84014600e+06  9.16753100e+06
   9.49492300e+06  9.82231600e+06  1.01497040e+07  1.04771060e+07
   1.08044930e+07  1.11318840e+07  1.14592770e+07  1.17866620e+07
   1.21140380e+07  1.24414220e+07  1.27687940e+07  1.30961720e+07
   1.34235430e+07  1.37509110e+07  1.40782990e+07  1.44056710e+07
   1.47330410e+07  1.50604310e+07  1.53878030e+07  1.57151760e+07
   1.60425470e+07  1.63699210e+07]
 [ 6.00002990e-06  0.00000000e+00  6.98669525e-05  3.33035747e-05
  -7.52880862e-06  1.37550007e-04 -1.34304488e-04 -2.65021563e-05
   2.29437701e-04  2.22340290e-05  3.00221190e-05  4.47988013e-05
   1.10415555e-04  1.65926074e-04  9.07209870e-05  1.25995460e-04
   1.44716421e-04  2.37943390e-05  1.17302914e-04  9.28750596e-05
   2.74413375e-05 -5.95958079e-05  8.51775858e-05 -2.79818253e-05
  -4.69821981e-05 -2.83394488e-05  1.08697959e-04  2.30926150e-06
  -6.84152284e-05  2.49868200e-05  5.84322933e-06  6.10657285e-05
  -2.21012517e-05 -2.04874790e-05 -2.10860094e-05  4.52200997e-05
  -7.41860464e-05  7.65003209e-05 -2.19108675e-05  3.81947435e-06
   3.85752285e-05 -1.43368514e-05  6.82558777e-06 -2.37814975e-05
   4.64499658e-06 -1.33913837e-05  8.46934293e-05 -4.84939565e-06
  -3.44438250e-05  2.36883462e-05 -1.62917081e-06 -7.95428445e-05
   2.49484953e-05  5.61610500e-05  1.84892307e-06  3.13285886e-05
  -4.51176269e-05  1.72847243e-06]]
  In call to configurable 'evaluate' (<function Evaluator.evaluate at 0x7f2d8d0d5c10>)

Does the error mean that there is a missing training iteration log?

google-research / rl-reliability-metrics Goto Github PK

rl-reliability-metrics's Introduction

Google Research

rl-reliability-metrics's People

Contributors

Stargazers

Watchers

Forkers

rl-reliability-metrics's Issues

Non-RL task

where to place sample data

Issues with the instructions for running the mujoco example

ValueError when running evaluate_metrics.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent