openai / safety-starter-agents Goto Github PK

Basic constrained RL agents used in experiments for the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.

Home Page: https://openai.com/blog/safety-gym/

License: MIT License

Python 100.00%

safety-starter-agents's Issues

backtracking linesearch

placeholder_from_space only accepts Box and Discrete spaces

Hello!

I just started playing around with the code and I am trying to run a ppo_lagrangian agent for a custom environment. The issue is that my observation space is a dictionary that includes a variety of spaces, in particular 5 Box spaces and a MultiDiscrete space. I have changed run_agent.run_polopt_agent to accept my observation and action space as arguments and now I am getting a NotImplementedError from network.placeholder_from_space.

I was wondering if trying to find a workaround is worth-pursuing or these techniques are only meant to be run on simple Box and Discrete spaces.

sac-lagrangian shows poor performance on PointGoal1?

On running the lagrangian version of SAC I get the following curve for costs. I tried changing the constraint limits to a range of values and didn't get much benefit:

Am I doing something wrong, or is this expected in offpolicy algorithms?

is this available on Windows?

Hyperparameters for each environment-agent combination

Hello

In the paper, you mention that the results are presented with the hand-tuned hyperparameters for each algorithm class (Sec 5.2). Can you also share those hyperparams? This will save the computation cost for the grid search as well as add to the reproducibility value.

Please provide conda environment.yml file

While I'm able to install and run safety-gym, I am unable to install safety-starter-agents. It seems like there might be some conflicts due to older version of Tensorflow.
Could you please provide a environment.yml file Conda file with all the necessary dependencies?

Method to continue from checkpoint

Hi,
Is there a way we can continue or resume the training from some given inputs? We are saving all epochs at save frequency

[Disscusion] Alternative code base for safe reinforcement learning research: OmniSafe

The safety-starter-agents codebase has been a valuable resource for early-stage research in the field of reinforcement learning. However, it has come to our attention that the author is no longer maintaining the library, resulting in some frustration due to the absence of updates for the latest algorithms and the lack of support for model-based, offline security reinforcement learning algorithms.

In response to this issue and inspired by the streamlined design philosophy of safety-starter-agents, we have developed an infrastructural framework, OmniSafe, aimed at accelerating safe reinforcement learning research. Our framework supports a range of algorithms, including On-policy, Off-policy, model-based, offline, and control-based approaches, with continuous updates for the latest algorithms.

Thanks to safety-starter-agents, a superb codebase, we are able to build upon the achievements of our predecessors in the field of scientific research, and we hope that OmniSafe can provide support for further scientific research in safe reinforcement learning for everyone.

The OmniSafe git repository: https://github.com/OmniSafeAI/omnisafe

toy bechmarking experiment takes up too much GPU memory

I installed Safety-gym and this repository. I run the experiment by the following command:
"
python experiment.py --algo cpo --task goal1 --robot point --seed 0 --exp_name pointgoal1-cposeed0 --cpu 1
"
But this command takes too much GPU memory. My GPU is Tesla P40, and this simple experiment takes up almost 23G memory, which is quite strange.

Could you please help me?

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

System: Ubuntu 18 04
Compiler: PyCharm
There was a problem installing the dependency package mpi4py==3.0.2
Does anyone have a similar problem? How to solve it?

same random seed for train_env and test_env

Hi there,

I have two questions regarding the test_env:

Why did you only have test_env for sac, not for ppo and trpo?
In safe_rl.sac.sac.py line 273 you set the seeds of env and test_env using the same seeds, then test_env would be the same as the training envs, right? Is the purpose of test_env only testing the deterministic actions, not at all the generalization of the policy?

# Setting seeds
    tf.set_random_seed(seed)
    np.random.seed(seed)
    env.seed(seed)
    test_env.seed(seed)

Thank you very much in advance.

openai / safety-starter-agents Goto Github PK

safety-starter-agents's Issues

backtracking linesearch

placeholder_from_space only accepts Box and Discrete spaces

sac-lagrangian shows poor performance on PointGoal1?

is this available on Windows?

Hyperparameters for each environment-agent combination

Please provide conda environment.yml file

Method to continue from checkpoint

[Disscusion] Alternative code base for safe reinforcement learning research: OmniSafe

toy bechmarking experiment takes up too much GPU memory

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

same random seed for train_env and test_env

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent