eleurent / rl-agents Goto Github PK

View Code? Open in Web Editor NEW

562.0 19.0 150.0 1.03 MB

Implementations of Reinforcement Learning and Planning algorithms

License: MIT License

Python 100.00%

planning reinforcement-learning agents

rl-agents's Introduction

rl-agents

A collection of Reinforcement Learning agents

Installation
Usage
Monitoring
Agents
- Planning
  - Value Iteration
  - Cross-Entropy Method
  - Monte-Carlo Tree Search
- Safe planning
- Value-based
  - Deep Q-Network
  - Fitted-Q
- Safe value-based
  - Budgeted Fitted-Q
Citing

Installation

pip install --user git+https://github.com/eleurent/rl-agents

Usage

Most experiments can be started by moving to cd scripts and running python experiments.py

Usage:
  experiments evaluate <environment> <agent> (--train|--test)
                                             [--episodes <count>]
                                             [--seed <str>]
                                             [--analyze]
  experiments benchmark <benchmark> (--train|--test)
                                    [--processes <count>]
                                    [--episodes <count>]
                                    [--seed <str>]
  experiments -h | --help

Options:
  -h --help            Show this screen.
  --analyze            Automatically analyze the experiment results.
  --episodes <count>   Number of episodes [default: 5].
  --processes <count>  Number of running processes [default: 4].
  --seed <str>         Seed the environments and agents.
  --train              Train the agent.
  --test               Test the agent.

The evaluate command allows to evaluate a given agent on a given environment. For instance,

# Train a DQN agent on the CartPole-v0 environment
$ python3 experiments.py evaluate configs/CartPoleEnv/env.json configs/CartPoleEnv/DQNAgent.json --train --episodes=200

Every agent interacts with the environment following a standard interface:

action = agent.act(state)
next_state, reward, done, info = env.step(action)
agent.record(state, action, reward, next_state, done, info)

The environments are described by their gym id, and module for registration.

{
    "id": "CartPole-v0",
    "import_module": "gym"
}

And the agents by their class, and configuration dictionary.

{
    "__class__": "<class 'rl_agents.agents.deep_q_network.pytorch.DQNAgent'>",
    "model": {
        "type": "MultiLayerPerceptron",
        "layers": [512, 512]
    },
    "gamma": 0.99,
    "n_steps": 1,
    "batch_size": 32,
    "memory_capacity": 50000,
    "target_update": 1,
    "exploration": {
        "method": "EpsilonGreedy",
        "tau": 50000,
        "temperature": 1.0,
        "final_temperature": 0.1
    }
}

If keys are missing from these configurations, values in agent.default_config() will be used instead.

Finally, a batch of experiments can be scheduled in a benchmark. All experiments are then executed in parallel on several processes.

# Run a benchmark of several agents interacting with environments
$ python3 experiments.py benchmark cartpole_benchmark.json --test --processes=4

A benchmark configuration file contains a list of environment configurations and a list of agent configurations.

{
    "environments": ["envs/cartpole.json"],
    "agents": ["agents/dqn.json", "agents/mcts.json"]
}

Monitoring

There are several tools available to monitor the agent performances:

Run metadata: for the sake of reproducibility, the environment and agent configurations used for the run are merged and saved to a metadata.*.json file.
Gym Monitor: the main statistics (episode rewards, lengths, seeds) of each run are logged to an episode_batch.*.stats.json file. They can be automatically visualised by running scripts/analyze.py
Logging: agents can send messages through the standard python logging library. By default, all messages with log level INFO are saved to a logging.*.log file. Add the option scripts/experiments.py --verbose to save with log level DEBUG.
Tensorboard: by default, a tensoboard writer records information about useful scalars, images and model graphs to the run directory. It can be visualized by running: tensorboard --logdir <path-to-runs-dir>

Agents

The following agents are currently implemented:

Planning

`VI` Value Iteration

Perform a Value Iteration to compute the state-action value, and acts greedily with respect to it.

Only compatible with finite-mdp environments, or environments that handle an env.to_finite_mdp() conversion method.

Reference: Dynamic Programming, Bellman R., Princeton University Press (1957).

A sampling-based planning algorithm, in which sequences of actions are drawn from a prior gaussian distribution. This distribution is iteratively bootstraped by minimizing its cross-entropy to a target distribution approximated by the top-k candidates.

Only compatible with continuous action spaces. The environment is used as an oracle dynamics and reward model.

Reference: A Tutorial on the Cross-Entropy Method, De Boer P-T., Kroese D.P, Mannor S. and Rubinstein R.Y. (2005).

`MCTS` Monte-Carlo Tree Search

A world transition model is leveraged for trajectory search. A look-ahead tree is expanded so as to explore the trajectory space and quickly focus around the most promising moves.

References:

Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Coulom R., 2006.

`UCT` Upper Confidence bounds applied to Trees

The tree is traversed by iteratively applying an optimistic selection rule at each depth, and the value at leaves is estimated by sampling. Empirical evidence shows that this popular algorithms performs well in many applications, but it has been proved theoretically to achieve a much worse performance (doubly-exponential) than uniform planning in some problems.

References:

Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C. (2006).
Bandit Algorithms for Tree Search, Coquelin P-A., Munos R. (2007).

`OPD` Optimistic Planning for Deterministic systems

This algorithm is tailored for systems with deterministic dynamics and rewards. It exploits the reward structure to achieve a polynomial rate on regret, and behaves efficiently in numerical experiments with dense rewards.

Reference: Optimistic Planning for Deterministic Systems, Hren J., Munos R. (2008).

`OLOP` Open Loop Optimistic Planning

References:

Open Loop Optimistic Planning, Bubeck S., Munos R. (2010).
Practical Open-Loop Optimistic Planning, Leurent E., Maillard O.-A. (2019).

Trailblazer

Reference: Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017).

PlaTγPOOS

Reference: Scale-free adaptive planning for deterministic dynamics & discounted rewards, Bartlett P., Gabillon V., Healey J., Valko M. (2019).

Safe planning

`RVI` Robust Value Iteration

A list of possible finite-mdp models is provided in the agent configuration. The MDP ambiguity set is constrained to be rectangular: different models can be selected at every transition.The corresponding robust state-action value is computed so as to maximize the worst-case total reward.

References:

Robust Control of Markov Decision Processes with Uncertain Transition Matrices, Nilim A., El Ghaoui L. (2005).
Robust Dynamic Programming, Iyengar G. (2005).
Robust Markov Decision Processes, Wiesemann W. et al. (2012).

`DROP` Discrete Robust Optimistic Planning

The MDP ambiguity set is assumed to be finite, and is constructed from a list of modifiers to the true environment. The corresponding robust value is approximately computed by Deterministic Optimistic Planning so as to maximize the worst-case total reward.

References:

Approximate Robust Control of Uncertain Dynamical Systems, Leurent E. et al. (2018).

`IRP` Interval-based Robust Planning

We assume that the MDP is a parametrized dynamical system, whose parameter is uncertain and lies in a continuous ambiguity set. We use interval prediction to compute the set of states that can be reached at any time t, given that uncertainty, and leverage it to evaluate and improve a robust policy.

If the system is Linear Parameter-Varying (LPV) with polytopic uncertainty, an fast and stable interval predictor can be designed. Otherwise, sampling-based approaches can be used instead, with an increased computational load.

References:

Approximate Robust Control of Uncertain Dynamical Systems, Leurent E. et al. (2018).
Interval Prediction for Continuous-Time Systems with Parametric Uncertainties, Leurent E. et al (2019).

Value-based

`DQN` Deep Q-Network

A neural-network model is used to estimate the state-action value function and produce a greedy optimal policy.

Implemented variants:

Double DQN
Dueling architecture
N-step targets

References:

Playing Atari with Deep Reinforcement Learning, Mnih V. et al (2013).
Deep Reinforcement Learning with Double Q-learning, van Hasselt H. et al. (2015).
Dueling Network Architectures for Deep Reinforcement Learning, Wang Z. et al. (2015).

`FTQ` Fitted-Q

A Q-function model is trained by performing each step of Value Iteration as a supervised learning procedure applied to a batch of transitions covering most of the state-action space.

Reference: Tree-Based Batch Mode Reinforcement Learning, Ernst D. et al (2005).

Safe Value-based

`BFTQ` Budgeted Fitted-Q

An adaptation of FTQ in the budgeted setting: we maximise the expected reward r of a policy π under the constraint that an expected cost c remains under a given budget β. The policy π(a | s, β) is conditioned on this cost budget β, which can be changed online.

To that end, the Q-function model is trained to predict both the expected reward Qr and the expected cost Qc of the optimal constrained policy π.

This agent can only be used with environments that provide a cost signal in their info field:

>>> obs, reward, done, info = env.step(action)
>>> info
{'cost': 1.0}

Reference: Budgeted Reinforcement Learning in Continuous State Space, Carrara N., Leurent E., Laroche R., Urvoy T., Maillard O-A., Pietquin O. (2019).

Citing

If you use this project in your work, please consider citing it with:

@misc{rl-agents,
  author = {Leurent, Edouard},
  title = {rl-agents: Implementations of Reinforcement Learning algorithms},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/eleurent/rl-agents}},
}

rl-agents's People

Contributors

Stargazers

Watchers

Forkers

sebastopol06 itismouad jiamim allensmile yyht sprinterzzj sytzeandr practise-area wwxfromtju mathematicalmodels pankajk arita37 superxingzai nkcr7 chenpei-ws ying-wen ml-lab eemorsi kargarisaac zhangchi0605 zhijieliu blf11139 kongxianghan nikhildesa shitianyu-hue wunder2dream fagan2888 galleon saarbabi liuyuqi123 narutoten520 buster84 ashishrana160796 bhaskar-c gary-l-collins davidwitten qianmuluo colleyli hades-rp2010 srilalithaveerubhotla sritee wangwanlitype1 ofraam the-intelligence-of-information xczhanjun zhen-liu-github fredtoby mmahjoub5 mikael-ols pedromdorey gsr-ee wxd400 rvalienter90 northtiger guzonghua ouaburst niar9 ray3104 lars-x kassemm salihslh gari73 perkarlsson smethnani neskoc nambidiwakar alik604 pgkang yuzhouxianzhi chaveza9 yabing67 techthiyanes autonomous-vehicle rockwenjj reinforcement-learning-code mengyaowunotavailable raghavadevarajeurs yuchenwang9601 namjiwon1023 rl-code-lib taniagdn clvoloshin vasilis-se zhangfengyo guansuns twolige manaqeb walkacross allenhsu6 abhijeet1990 zamoraandres tibigg ningwak obadul024 2019getup aliigii thirteentj chrisyangsong lplenka chenjianxing1

rl-agents's Issues

Can't find cartpole_benchmark.json

I was trying to follow the benchmark command in the page but didn't manage to find the file. Could you point me to it?

Add saving of evaluation metadata

Through a to_json function ?

environment properties
agent properties

Implement an AlphaZero agent

stochastic transitions for tree search agents

Eleurent, thanks for developing this great project and sharing it.

To my understanding, currently the MCTS agent deterministically transitions to new states during the planning phase. I was wondering what class you would modify for considering stochastic transitions? For instance, in case gaussian noise was added to the actions executed by other agents.

Thank you in advance

Use Visdom for visualizations

Add unit tests

Automatically save generated figures

From RewardViewer and RunAnalyzer

Remove pygame dependency.

Use opengl instead

IndexError: tuple index out of range

When I try the value of print ("in_width ") I get this result
IndexError: tuple index out of range
I print ("env. Observation_space ")
get the box is (15,7)
Can a friend explain to me how to use in_width ?thanks

Hi，the problem of the reward function.

I'm very sorry to disturb you again.
In the highenv_env, the reward function define that the it's relate with the collision, the agent on the right lane ,and the high velocity reward. I can understand that the collision reward is that when the agent collide with other vehicle，we can get a collision reward and the simulation ends，but about the high velocity reward, I have two questions:
The first is that in the high velocity reward, i notice that a variable which named “SPEED_COUNT” and it‘s defined in the file vehicle/control.py, and it used to speed_to_index, but i don't know the meaning of the variable.
The second is that how often the high velocity reward is returned when the agent is trained in the environment ？It’s maybe a state gets rewarded once, a second, or depends on the last state?Can you tell me in which file do these problems are reflected? Thank you very much！

Add a Benchmark class to evaluate and compare several agents

Rename UCT and OPD

Currently named MCTS and DeterministicPlanner, which is unclear

ImportError: No module named 'rl_agents.agents.budgeted_ftq'

Hey, I installed the highway-env via :
pip3 install --user git+https://github.com/eleurent/highway-env
and the rl-agents with:
pip3 install --user git+https://github.com/eleurent/rl-agents

When I run
python3 experiments.py benchmark configs/RoundaboutEnv/benchmark_robust_control.json \ --test --episodes=100 --processes=4

I get the error:
ImportError: No module named 'rl_agents.agents.budgeted_ftq'

What could cause this problem?

Questions from the entrant

FileNotFoundError: [Errno 2] No such file or directory: 'envs/CartPole.json'

Hello, I used your example **_python3 experiments.py evaluate envs/CartPole.json agents/dqn.json --train --episodes=200_** to test run. I have installed openai gym in my own virtual environment. What's the reason for this error?

Analyze Data

Hello,
I was able to run an experiment with the HighwayEnv successfully and now I am trying to analyze the data. Unfortunately I cannot seem to figure it out. I am running the command

!python3 scripts/analyze.py run path/to/data/openaigym.episode_batch.0.953.stats.json

but I keep getting the following error

INFO:root:Fetching data in out/HighwayEnv/DQNAgent/run_20200413-214123_953/run/openaigym.episode_batch.0.953.stats
INFO:root:Found 0 runs.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'episode'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "rl-agents/scripts/analyze.py", line 131, in
main()
File "rl-agents/scripts/analyze.py", line 36, in main
RunAnalyzer(opts['<run_folder>'], out=opts["--out"], episodes_range=episodes_range)
File "rl-agents/scripts/analyze.py", line 49, in init
self.analyze()
File "rl-agents/scripts/analyze.py", line 97, in analyze
self.find_best_run()
File "rl-agents/scripts/analyze.py", line 111, in find_best_run
df = df[df["episode"] == df["episode"].max()].sort_values(criteria, ascending=ascending)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2800, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'episode'

I assume the issue with pandas stems from the fact that it says "Found 0 runs" and therefore could not pass that data on. I'm not too familiar with json files and I've tried running

!python3 scripts/analyze.py run path/to/data/

among many others things to get it to work but have not been successful. Could you help me figure out how to analyze the data? Are there specific files that need to be in the same folder as the stats.json?

Possible issue about the robust value iteration

Hi,

I was trying to run the experiment file using RVI algorithm. But, I face an error as follows. I appreciate it if you let me know if I am doing sth wrong.
My bash command:
python3 experiments.py evaluate configs/FiniteMDPEnv/large/env_2.json configs/FiniteMDPEnv/large/agents/value_iteration.json --train

The error:
Traceback (most recent call last):
File "experiments.py", line 149, in
main()
File "experiments.py", line 44, in main
evaluate(opts[''], opts[''], opts)
File "experiments.py", line 60, in evaluate
env = load_environment(environment_config)
File "/.local/lib/python3.6/site-packages/rl_agents/agents/common/factory.py", line 69, in load_environment
import(env_config["import_module"])
ModuleNotFoundError: No module named 'finite_mdp'

Visualization for DQNAgents

Hi ！

Hello！I have a problem in the ego_attention.json,The problem is when use the ego_attention.json to train the agent in the env_obs_attention, the error is happend:
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200408-221242_4475
Traceback (most recent call last):
File "experiments.py", line 148, in
main()
File "experiments.py", line 43, in main
evaluate(opts[''], opts[''], opts)
File "experiments.py", line 75, in evaluate
display_rewards=not options['--no-display'])
File "/home/cfxgg/rl-agents-master/rl_agents/trainer/evaluation.py", line 82, in init
self.agent.set_writer(self.writer)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/deep_q_network/pytorch.py", line 98, in set_writer
dtype=torch.float, device=self.device))
File "/home/cfxgg/conda/envs/test/lib/python3.7/site-packages/tensorboardX/writer.py", line 804, in add_graph
self._get_file_writer().add_graph(graph(model, input_to_model, verbose, profile_with_cuda, **kwargs))
File "/home/cfxgg/conda/envs/test/lib/python3.7/site-packages/tensorboardX/pytorch_graph.py", line 344, in graph
result = model(*args)
File "/home/cfxgg/conda/envs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/common/models.py", line 284, in forward
ego_embedded_att, _ = self.forward_attention(x)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/common/models.py", line 296, in forward_attention
ego, others, mask = self.split_input(x)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/common/models.py", line 289, in split_input
ego = x[:, 0:1, :]
IndexError: too many indices for tensor of dimension 2
How can I solve the problem ? I look forward to your reply !

Questions about selection_rule in OPD

Hi, in selection_rule function in deterministic.py, it chooses the max value_upper, but in the OPD paper, the algorithm expands according to the max value_upper while it chooses action according to the value_lower.

Add boltzmann exploration

Flawed Management of internal search environments in tree search planning

Some tree search algorithms implemented might have flawed high performances because of the management of the environments inside the tree search. Specifically to conduct the search, the environment is copied and passed to the planners, but the environment seed is copied as well. This results in a kind of "foreseeing the future", because the planners optimize on the random realizations instead of in expectation. This happens in the OLOP planner and also in the deterministic planner (ODP). In the deterministic planner is not that serious since it is thought for deterministic transitions, but in practice if you run this planner with a stochastic environment, the effect is that it performs amazingly well (because it can "predict" the exact future realizations).
This can be easily fixed by setting a random seed to the environments after copying them for the planners, e.g. adding the seeding in the plan method of the planner.

MCTS with max backup

Evaluate and compare with average backup

Add a pipeline to train agents relying on workers to collect experiences

Such as A3C, PPO, BFTQ, etc.

Add custom env

Is it possible to add custom environmment.

Indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead

/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. /pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
When I test the environment env_easy.json with the agent baseline.json, just to find this tip occurs repetitively on my screen .How can I ignore that or let it disappeared frommy screen?

rl-agents compatible with continuous action spaces

I am wondering is cross entropy method the only one that is compatible with continuous action spaces?

I tried CEM agent, but I found it runs very slow(Animation update is very slow), how could I increase the running speed? Thanks😁

DeterministicPlannerAgent checkpoint saving error

I have tried the DeterministicPlannerAgent in the Highway Env. During training, in the terminal, it published that a checkpoint model is saved. However, I didn't find the saved checkpoint policy at claiming location

The car cannot perform like the GIF after training converged

Hi Edouard Leurent, it is a great project for Rlers. But now I encountered a problem, when I do
"python scripts/experiments.py evaluate scripts/configs/HighwayEnv/env.json scripts/configs/HighwayEnv/agents/DQNAgent/baseline.json --train --episodes=2000", the score slowly converges to about 30 like:

"[INFO] Episode 595 score: 30.1
[INFO] Episode 596 score: 29.8
[INFO] Episode 597 score: 30.7
[INFO] Episode 598 score: 29.1
[INFO] Episode 599 score: 30.3
[INFO] Episode 600 score: 29.9
[INFO] Episode 601 score: 30.7
[INFO] Episode 602 score: 30.5
[INFO] Episode 603 score: 30.0
[INFO] Episode 604 score: 29.0"

But everytime when the video begin to record the vehicle running, it run into another car or cannot accelerate to overtake. So is this baseline.json for the GIF you add in highway-env repo? or I misunderstand something?

I appreciated if you can give me any suggestion. Thank you!

DQN with n-step return

How to analyze the data after being successfully trained.

I'm sorry to bother you again, but I can't find the way to analyze the traning data . I wanna watch the results of the DQNAgent method ,and how good it is.

[rl_agents.trainer.evaluation:INFO] Saved DQNAgent model to /home/zhao/gym/rl-agents-master/scripts/out/HighwayEnv/DQNAgent/run_20200406-223934_3509/checkpoint-999.tar [rl_agents.trainer.evaluation:INFO] Saved DQNAgent model to /home/zhao/gym/rl-agents-master/scripts/out/HighwayEnv/DQNAgent/run_20200406-223934_3509/checkpoint-final.tar

So ,you can see it has been completely traind . Then , I use the command python3 analyze.py run out/HighwayEnv/DQNAgent/run_20200406-223934_3509 to analyze the trained model. But I get this error
NotADirectoryError: [Errno 20] Not a directory: out/HighwayEnv/DQNAgent/run_20200406-223934_3509/openaigym.video.0.3509.video000000.meta.json
Can you tell me how to analyze the trained model , and in which way can I see the results of the trained model ?

Implement a A3C/PPO agent

Add train vs test modes for agents

'Segmentation fault' when I was testing in the env ,env_medium with DQNAgent/1_step.

`(gymlab) root@iZ8vbhynnqk42im5ymgijyZ:~/rl-agents/scripts# python3 experiments.py evaluate configs/HighwayEnv/env_medium.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=1000
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
INFO: Making new env: highway-v0
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), -1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), 1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200405-230154_7882
profiler execution failed
ALSA lib confmisc.c:768:(parse_card) cannot find card '0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory
ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory

ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory

ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM default
INFO: Starting new video recorder writing to /root/rl-agents/scripts/out/HighwayEnv/DQNAgent/run_20200405-230154_7882/openaigym.video.0.7882.video000000.mp4
Segmentation fault`
When i was testing the env_medium with DQN , i got this fault. It should be noted that i was using the SSH to test it and the operated way is CPU instead of CUDA.Can u help me ?

Gridworld scenario running issues

Sorry, i find that MCTSAgent fails to run on the gridworld test environments including DummyEnv/gridenv.json (This seems like a toy gridworld implemented by yourself) and Gridworld/empty.json (this is imported from gym_minigrid). I am doing a research project about MCTS these days and really want to test its performance on the gridworld environment. Your open-source code has provided a wonderful platform for test tree-search algorithms. Could you please help fix the gridworld testing issues? Thanks.

How to find the env configure file -- .json file?

I want to test the env, highway_env.Wherase I can't find the configure file --.json file in my folder ,so i can't make t test with your tips: python3 experiments.py evaluate envs/cartpole.json agents/dqn.json --train --episodes=200
Can u help me solve the problem

Load agents and environments from files

Parse json configuration files, allowing to simply configure and run an experiment

recorded mp4 format video cannot play

I run the following code to start training progress.

Under the path of the script directory

python3 experiments.py evaluate configs/HighwayEnv/env_medium.json configs/HighwayEnv/agents/DQNAgent/ddqn.json --train --episodes=1000000

However, the recorded video cannot be played, as attached.

openaigym.video.0.28551.video000000.mp4

Might I request anyone to help?

Vehicle type changed in safe_deepcopy_env()

Hi, I'm using MCTS in your rl-agents repo under the env of your another repo highway_env. In agents/common/factory.py, I understand the function safe_deepcopy_env() copies the current state for simulation in MCTS. However, although I set the "other_vehicles_type": "highway_env.vehicle.behavior.IDMVehicle", the v in k, v in obj.__dict__.items(): has the type: highway_env.vehicle.controller.MDPVehicle(I print type(v) to the console).

I want to change some property of IDMVehicle when copy the state. But with the MDPVehicle, I cannot do that. In which function or .py file is the other vehicle's type changed to MDP class?

Use separate configurable loggers in agents

how to test the training data

Hello, I just start the reinforcement learning decision recently, this is an excellent project for me. But I have some questions when I run this project.
1 I train the data use this command: python experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/ego_attention.json --train --episodes=4000 --name-from-config. after training the data successfully. Where are the deep network parameters saved? saved_models/lastest.tar or in ego_attention_20200514-201107_29818 folder (which .xx file?)
2 I train agent ego_attention.json, dqn.json and ddqn.json and env.json together in different terminal. After that I run command: python experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/ego_attention.json --test --episodes=10 --recover-from=out/IntersectionEnv/DQNAgent/saved_models/latest.tar. The results are strange that ego-vehicle cash other cars all the time and its attention is always on one car even it passed. So only one deep networks weight parameters after training can be saved? or all networks parameters after training can be saved in different files or same file like lastest.tar?
Thank you very much.

Questions about MCTS

Hi, in update_branch function, you update the parent node value with the same total_reward with the current node. Why? I think they have different values, because action from parent node to current node will get a reward which should be added in parent total_reward.

    def update_branch(self, total_reward):
        self.update(total_reward)
        if self.parent:
            self.parent.update_branch(total_reward)

FileNotFoundError: [Errno 2] No such file or directory: 'configs\\logging.json'

(base) C:\Users\nikhi\Desktop\rl-agents>python scripts/experiments.py evaluate configs/CartPoleEnv/env.json configs/CartPoleEnv/DQNAgent.json --train --episodes=200

Traceback (most recent call last):
File "scripts/experiments.py", line 148, in
main()
File "scripts/experiments.py", line 43, in main
evaluate(opts[''], opts[''], opts)
File "scripts/experiments.py", line 56, in evaluate
logger.configure(LOGGING_CONFIG)
File "C:\Users\nikhi\AppData\Roaming\Python\Python37\site-packages\rl_agents\trainer\logger.py", line 50, in configure
with Path(config).open() as f:
File "C:\Users\nikhi\anaconda3\lib\pathlib.py", line 1203, in open
opener=self._opener)
File "C:\Users\nikhi\anaconda3\lib\pathlib.py", line 1058, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'configs\logging.json'

Please help @eleurent

hello，I am a beginner in the reinforcement learning！！

    Hello, first that I'm sorry to bother you again，when I run the command ：

“ python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=10”，
the pygame environment named “highway-env” can only run 2 episode，and the 3 to 7 episode is vert quick ，when the eighty episode bagin，the pygame environment namd “ highway-env” rerun，why does this happend？Another question is that if i want to modify the reward function：i want to use a distance between the Self vehicle and the nearest vehicle which is on same lane to the Self vehicle，how can i modify the reward ？thanks for you help ！I‘m sorry i ask these question which you may feel very low，I am a beginner in the reinforcement learning，thanks a lot！！

Fill missing docstring

How to contribute?

I wanted to know if a contribution is welcomed here, and if it is, how to contribute? I mean, is there any guideline for how we should implement agents?

In fact, I wanted to implement agents like DDPG, SAC, and TD3. Are these in the scope of this project?

Choose a consistent configuration scheme

Explicit API
Configurations dict outside init ?
file: JSON or YAML ?

Some questions about ego attention DQN

Hi, I read your paper Social Attention for Autonomous Decision-Making in Dense Traffic, but I am not clear about why to fed tensors to several heads, and how to select the number of head? In HighwayEnv the number of heads is 2, I wonder if I should change this number in MergeEnv.

pip3 install --user git+https://github.com/eleurent/highway-env
pip3 install --user git+https://github.com/eleurent/rl-agents

Within Python, I was able to import highway_env but could not import rl_agents. Is there anything else I should do to set this up?

Interestingly, when I run pip3 list, rl-agents appears in the list.

eleurent / rl-agents Goto Github PK

rl-agents's Introduction

rl-agents

Installation

Usage

Monitoring

Agents

Planning

MCTS Monte-Carlo Tree Search

Safe planning

Value-based

Safe Value-based

Citing

rl-agents's People

Contributors

Stargazers

Watchers

Forkers

rl-agents's Issues

Recommend Projects

Recommend Topics

Recommend Org

`MCTS` Monte-Carlo Tree Search