Code Monkey home page Code Monkey logo

rl-agents's Introduction

rl-agents

A collection of Reinforcement Learning agents

build

Installation

pip install --user git+https://github.com/eleurent/rl-agents

Usage

Most experiments can be started by moving to cd scripts and running python experiments.py

Usage:
  experiments evaluate <environment> <agent> (--train|--test)
                                             [--episodes <count>]
                                             [--seed <str>]
                                             [--analyze]
  experiments benchmark <benchmark> (--train|--test)
                                    [--processes <count>]
                                    [--episodes <count>]
                                    [--seed <str>]
  experiments -h | --help

Options:
  -h --help            Show this screen.
  --analyze            Automatically analyze the experiment results.
  --episodes <count>   Number of episodes [default: 5].
  --processes <count>  Number of running processes [default: 4].
  --seed <str>         Seed the environments and agents.
  --train              Train the agent.
  --test               Test the agent.

The evaluate command allows to evaluate a given agent on a given environment. For instance,

# Train a DQN agent on the CartPole-v0 environment
$ python3 experiments.py evaluate configs/CartPoleEnv/env.json configs/CartPoleEnv/DQNAgent.json --train --episodes=200

Every agent interacts with the environment following a standard interface:

action = agent.act(state)
next_state, reward, done, info = env.step(action)
agent.record(state, action, reward, next_state, done, info)

The environments are described by their gym id, and module for registration.

{
    "id": "CartPole-v0",
    "import_module": "gym"
}

And the agents by their class, and configuration dictionary.

{
    "__class__": "<class 'rl_agents.agents.deep_q_network.pytorch.DQNAgent'>",
    "model": {
        "type": "MultiLayerPerceptron",
        "layers": [512, 512]
    },
    "gamma": 0.99,
    "n_steps": 1,
    "batch_size": 32,
    "memory_capacity": 50000,
    "target_update": 1,
    "exploration": {
        "method": "EpsilonGreedy",
        "tau": 50000,
        "temperature": 1.0,
        "final_temperature": 0.1
    }
}

If keys are missing from these configurations, values in agent.default_config() will be used instead.

Finally, a batch of experiments can be scheduled in a benchmark. All experiments are then executed in parallel on several processes.

# Run a benchmark of several agents interacting with environments
$ python3 experiments.py benchmark cartpole_benchmark.json --test --processes=4

A benchmark configuration file contains a list of environment configurations and a list of agent configurations.

{
    "environments": ["envs/cartpole.json"],
    "agents": ["agents/dqn.json", "agents/mcts.json"]
}

Monitoring

There are several tools available to monitor the agent performances:

  • Run metadata: for the sake of reproducibility, the environment and agent configurations used for the run are merged and saved to a metadata.*.json file.
  • Gym Monitor: the main statistics (episode rewards, lengths, seeds) of each run are logged to an episode_batch.*.stats.json file. They can be automatically visualised by running scripts/analyze.py
  • Logging: agents can send messages through the standard python logging library. By default, all messages with log level INFO are saved to a logging.*.log file. Add the option scripts/experiments.py --verbose to save with log level DEBUG.
  • Tensorboard: by default, a tensoboard writer records information about useful scalars, images and model graphs to the run directory. It can be visualized by running: tensorboard --logdir <path-to-runs-dir>

Agents

The following agents are currently implemented:

Planning

Perform a Value Iteration to compute the state-action value, and acts greedily with respect to it.

Only compatible with finite-mdp environments, or environments that handle an env.to_finite_mdp() conversion method.

Reference: Dynamic Programming, Bellman R., Princeton University Press (1957).

A sampling-based planning algorithm, in which sequences of actions are drawn from a prior gaussian distribution. This distribution is iteratively bootstraped by minimizing its cross-entropy to a target distribution approximated by the top-k candidates.

Only compatible with continuous action spaces. The environment is used as an oracle dynamics and reward model.

Reference: A Tutorial on the Cross-Entropy Method, De Boer P-T., Kroese D.P, Mannor S. and Rubinstein R.Y. (2005).

MCTS Monte-Carlo Tree Search

A world transition model is leveraged for trajectory search. A look-ahead tree is expanded so as to explore the trajectory space and quickly focus around the most promising moves.

References:

The tree is traversed by iteratively applying an optimistic selection rule at each depth, and the value at leaves is estimated by sampling. Empirical evidence shows that this popular algorithms performs well in many applications, but it has been proved theoretically to achieve a much worse performance (doubly-exponential) than uniform planning in some problems.

References:

This algorithm is tailored for systems with deterministic dynamics and rewards. It exploits the reward structure to achieve a polynomial rate on regret, and behaves efficiently in numerical experiments with dense rewards.

Reference: Optimistic Planning for Deterministic Systems, Hren J., Munos R. (2008).

References:

Reference: Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017).

Reference: Scale-free adaptive planning for deterministic dynamics & discounted rewards, Bartlett P., Gabillon V., Healey J., Valko M. (2019).

Safe planning

A list of possible finite-mdp models is provided in the agent configuration. The MDP ambiguity set is constrained to be rectangular: different models can be selected at every transition.The corresponding robust state-action value is computed so as to maximize the worst-case total reward.

References:

The MDP ambiguity set is assumed to be finite, and is constructed from a list of modifiers to the true environment. The corresponding robust value is approximately computed by Deterministic Optimistic Planning so as to maximize the worst-case total reward.

References:

We assume that the MDP is a parametrized dynamical system, whose parameter is uncertain and lies in a continuous ambiguity set. We use interval prediction to compute the set of states that can be reached at any time t, given that uncertainty, and leverage it to evaluate and improve a robust policy.

If the system is Linear Parameter-Varying (LPV) with polytopic uncertainty, an fast and stable interval predictor can be designed. Otherwise, sampling-based approaches can be used instead, with an increased computational load.

References:

Value-based

A neural-network model is used to estimate the state-action value function and produce a greedy optimal policy.

Implemented variants:

  • Double DQN
  • Dueling architecture
  • N-step targets

References:

A Q-function model is trained by performing each step of Value Iteration as a supervised learning procedure applied to a batch of transitions covering most of the state-action space.

Reference: Tree-Based Batch Mode Reinforcement Learning, Ernst D. et al (2005).

Safe Value-based

An adaptation of FTQ in the budgeted setting: we maximise the expected reward r of a policy π under the constraint that an expected cost c remains under a given budget β. The policy π(a | s, β) is conditioned on this cost budget β, which can be changed online.

To that end, the Q-function model is trained to predict both the expected reward Qr and the expected cost Qc of the optimal constrained policy π.

This agent can only be used with environments that provide a cost signal in their info field:

>>> obs, reward, done, info = env.step(action)
>>> info
{'cost': 1.0}

Reference: Budgeted Reinforcement Learning in Continuous State Space, Carrara N., Leurent E., Laroche R., Urvoy T., Maillard O-A., Pietquin O. (2019).

Citing

If you use this project in your work, please consider citing it with:

@misc{rl-agents,
  author = {Leurent, Edouard},
  title = {rl-agents: Implementations of Reinforcement Learning algorithms},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/eleurent/rl-agents}},
}

rl-agents's People

Contributors

ashishrana160796 avatar davidwitten avatar eleurent avatar gamenot avatar kexianshen avatar sritee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-agents's Issues

stochastic transitions for tree search agents

Eleurent, thanks for developing this great project and sharing it.

To my understanding, currently the MCTS agent deterministically transitions to new states during the planning phase. I was wondering what class you would modify for considering stochastic transitions? For instance, in case gaussian noise was added to the actions executed by other agents.

Thank you in advance

IndexError: tuple index out of range

When I try the value of print ("in_width ") I get this result
IndexError: tuple index out of range
I print ("env. Observation_space ")
get the box is (15,7)
Can a friend explain to me how to use in_width ?thanks

Hi,the problem of the reward function.

I'm very sorry to disturb you again.
In the highenv_env, the reward function define that the it's relate with the collision, the agent on the right lane ,and the high velocity reward. I can understand that the collision reward is that when the agent collide with other vehicle,we can get a collision reward and the simulation ends,but about the high velocity reward, I have two questions:
The first is that in the high velocity reward, i notice that a variable which named “SPEED_COUNT” and it‘s defined in the file vehicle/control.py, and it used to speed_to_index, but i don't know the meaning of the variable.
The second is that how often the high velocity reward is returned when the agent is trained in the environment ?It’s maybe a state gets rewarded once, a second, or depends on the last state?Can you tell me in which file do these problems are reflected? Thank you very much!

ImportError: No module named 'rl_agents.agents.budgeted_ftq'

Hey, I installed the highway-env via :
pip3 install --user git+https://github.com/eleurent/highway-env
and the rl-agents with:
pip3 install --user git+https://github.com/eleurent/rl-agents

When I run
python3 experiments.py benchmark configs/RoundaboutEnv/benchmark_robust_control.json \ --test --episodes=100 --processes=4

I get the error:
ImportError: No module named 'rl_agents.agents.budgeted_ftq'

What could cause this problem?

Questions from the entrant

FileNotFoundError: [Errno 2] No such file or directory: 'envs/CartPole.json'

Hello, I used your example **_python3 experiments.py evaluate envs/CartPole.json agents/dqn.json --train --episodes=200_** to test run. I have installed openai gym in my own virtual environment. What's the reason for this error?

Analyze Data

Hello,
I was able to run an experiment with the HighwayEnv successfully and now I am trying to analyze the data. Unfortunately I cannot seem to figure it out. I am running the command

!python3 scripts/analyze.py run path/to/data/openaigym.episode_batch.0.953.stats.json

but I keep getting the following error


INFO:root:Fetching data in out/HighwayEnv/DQNAgent/run_20200413-214123_953/run/openaigym.episode_batch.0.953.stats
INFO:root:Found 0 runs.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'episode'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "rl-agents/scripts/analyze.py", line 131, in
main()
File "rl-agents/scripts/analyze.py", line 36, in main
RunAnalyzer(opts['<run_folder>'], out=opts["--out"], episodes_range=episodes_range)
File "rl-agents/scripts/analyze.py", line 49, in init
self.analyze()
File "rl-agents/scripts/analyze.py", line 97, in analyze
self.find_best_run()
File "rl-agents/scripts/analyze.py", line 111, in find_best_run
df = df[df["episode"] == df["episode"].max()].sort_values(criteria, ascending=ascending)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2800, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'episode'


I assume the issue with pandas stems from the fact that it says "Found 0 runs" and therefore could not pass that data on. I'm not too familiar with json files and I've tried running

!python3 scripts/analyze.py run path/to/data/

among many others things to get it to work but have not been successful. Could you help me figure out how to analyze the data? Are there specific files that need to be in the same folder as the stats.json?

Possible issue about the robust value iteration

Hi,

I was trying to run the experiment file using RVI algorithm. But, I face an error as follows. I appreciate it if you let me know if I am doing sth wrong.
My bash command:
python3 experiments.py evaluate configs/FiniteMDPEnv/large/env_2.json configs/FiniteMDPEnv/large/agents/value_iteration.json --train

The error:
Traceback (most recent call last):
File "experiments.py", line 149, in
main()
File "experiments.py", line 44, in main
evaluate(opts[''], opts[''], opts)
File "experiments.py", line 60, in evaluate
env = load_environment(environment_config)
File "/.local/lib/python3.6/site-packages/rl_agents/agents/common/factory.py", line 69, in load_environment
import(env_config["import_module"])
ModuleNotFoundError: No module named 'finite_mdp'

Hi !

Hello!I have a problem in the ego_attention.json,The problem is when use the ego_attention.json to train the agent in the env_obs_attention, the error is happend:
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200408-221242_4475
Traceback (most recent call last):
File "experiments.py", line 148, in
main()
File "experiments.py", line 43, in main
evaluate(opts[''], opts[''], opts)
File "experiments.py", line 75, in evaluate
display_rewards=not options['--no-display'])
File "/home/cfxgg/rl-agents-master/rl_agents/trainer/evaluation.py", line 82, in init
self.agent.set_writer(self.writer)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/deep_q_network/pytorch.py", line 98, in set_writer
dtype=torch.float, device=self.device))
File "/home/cfxgg/conda/envs/test/lib/python3.7/site-packages/tensorboardX/writer.py", line 804, in add_graph
self._get_file_writer().add_graph(graph(model, input_to_model, verbose, profile_with_cuda, **kwargs))
File "/home/cfxgg/conda/envs/test/lib/python3.7/site-packages/tensorboardX/pytorch_graph.py", line 344, in graph
result = model(*args)
File "/home/cfxgg/conda/envs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/common/models.py", line 284, in forward
ego_embedded_att, _ = self.forward_attention(x)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/common/models.py", line 296, in forward_attention
ego, others, mask = self.split_input(x)
File "/home/cfxgg/rl-agents-master/rl_agents/agents/common/models.py", line 289, in split_input
ego = x[:, 0:1, :]
IndexError: too many indices for tensor of dimension 2
How can I solve the problem ? I look forward to your reply !

Questions about selection_rule in OPD

Hi, in selection_rule function in deterministic.py, it chooses the max value_upper, but in the OPD paper, the algorithm expands according to the max value_upper while it chooses action according to the value_lower.
2020-08-06 11-51-51屏幕截图

Flawed Management of internal search environments in tree search planning

Some tree search algorithms implemented might have flawed high performances because of the management of the environments inside the tree search. Specifically to conduct the search, the environment is copied and passed to the planners, but the environment seed is copied as well. This results in a kind of "foreseeing the future", because the planners optimize on the random realizations instead of in expectation. This happens in the OLOP planner and also in the deterministic planner (ODP). In the deterministic planner is not that serious since it is thought for deterministic transitions, but in practice if you run this planner with a stochastic environment, the effect is that it performs amazingly well (because it can "predict" the exact future realizations).
This can be easily fixed by setting a random seed to the environments after copying them for the planners, e.g. adding the seeding in the plan method of the planner.

Indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead

/pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. /pytorch/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
When I test the environment env_easy.json with the agent baseline.json, just to find this tip occurs repetitively on my screen .How can I ignore that or let it disappeared frommy screen?

DeterministicPlannerAgent checkpoint saving error

I have tried the DeterministicPlannerAgent in the Highway Env. During training, in the terminal, it published that a checkpoint model is saved. However, I didn't find the saved checkpoint policy at claiming location

The car cannot perform like the GIF after training converged

Hi Edouard Leurent, it is a great project for Rlers. But now I encountered a problem, when I do
"python scripts/experiments.py evaluate scripts/configs/HighwayEnv/env.json scripts/configs/HighwayEnv/agents/DQNAgent/baseline.json --train --episodes=2000", the score slowly converges to about 30 like:

"[INFO] Episode 595 score: 30.1
[INFO] Episode 596 score: 29.8
[INFO] Episode 597 score: 30.7
[INFO] Episode 598 score: 29.1
[INFO] Episode 599 score: 30.3
[INFO] Episode 600 score: 29.9
[INFO] Episode 601 score: 30.7
[INFO] Episode 602 score: 30.5
[INFO] Episode 603 score: 30.0
[INFO] Episode 604 score: 29.0"

But everytime when the video begin to record the vehicle running, it run into another car or cannot accelerate to overtake. So is this baseline.json for the GIF you add in highway-env repo? or I misunderstand something?

I appreciated if you can give me any suggestion. Thank you!

How to analyze the data after being successfully trained.

I'm sorry to bother you again, but I can't find the way to analyze the traning data . I wanna watch the results of the DQNAgent method ,and how good it is.

[rl_agents.trainer.evaluation:INFO] Saved DQNAgent model to /home/zhao/gym/rl-agents-master/scripts/out/HighwayEnv/DQNAgent/run_20200406-223934_3509/checkpoint-999.tar [rl_agents.trainer.evaluation:INFO] Saved DQNAgent model to /home/zhao/gym/rl-agents-master/scripts/out/HighwayEnv/DQNAgent/run_20200406-223934_3509/checkpoint-final.tar

So ,you can see it has been completely traind . Then , I use the command python3 analyze.py run out/HighwayEnv/DQNAgent/run_20200406-223934_3509 to analyze the trained model. But I get this error
NotADirectoryError: [Errno 20] Not a directory: out/HighwayEnv/DQNAgent/run_20200406-223934_3509/openaigym.video.0.3509.video000000.meta.json
Can you tell me how to analyze the trained model , and in which way can I see the results of the trained model ?

'Segmentation fault' when I was testing in the env ,env_medium with DQNAgent/1_step.

`(gymlab) root@iZ8vbhynnqk42im5ymgijyZ:~/rl-agents/scripts# python3 experiments.py evaluate configs/HighwayEnv/env_medium.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=1000
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
INFO: Making new env: highway-v0
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), -1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), 1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200405-230154_7882
profiler execution failed
ALSA lib confmisc.c:768:(parse_card) cannot find card '0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory
ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory

ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory

ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM default
INFO: Starting new video recorder writing to /root/rl-agents/scripts/out/HighwayEnv/DQNAgent/run_20200405-230154_7882/openaigym.video.0.7882.video000000.mp4
Segmentation fault`
When i was testing the env_medium with DQN , i got this fault. It should be noted that i was using the SSH to test it and the operated way is CPU instead of CUDA.Can u help me ?

Gridworld scenario running issues

Sorry, i find that MCTSAgent fails to run on the gridworld test environments including DummyEnv/gridenv.json (This seems like a toy gridworld implemented by yourself) and Gridworld/empty.json (this is imported from gym_minigrid). I am doing a research project about MCTS these days and really want to test its performance on the gridworld environment. Your open-source code has provided a wonderful platform for test tree-search algorithms. Could you please help fix the gridworld testing issues? Thanks.

How to find the env configure file -- .json file?

I want to test the env, highway_env.Wherase I can't find the configure file --.json file in my folder ,so i can't make t test with your tips: python3 experiments.py evaluate envs/cartpole.json agents/dqn.json --train --episodes=200
Can u help me solve the problem

recorded mp4 format video cannot play

I run the following code to start training progress.

Under the path of the script directory

python3 experiments.py evaluate configs/HighwayEnv/env_medium.json configs/HighwayEnv/agents/DQNAgent/ddqn.json --train --episodes=1000000

However, the recorded video cannot be played, as attached.

openaigym.video.0.28551.video000000.mp4

Might I request anyone to help?

Vehicle type changed in safe_deepcopy_env()

Hi, I'm using MCTS in your rl-agents repo under the env of your another repo highway_env. In agents/common/factory.py, I understand the function safe_deepcopy_env() copies the current state for simulation in MCTS. However, although I set the "other_vehicles_type": "highway_env.vehicle.behavior.IDMVehicle", the v in k, v in obj.__dict__.items(): has the type: highway_env.vehicle.controller.MDPVehicle(I print type(v) to the console).

I want to change some property of IDMVehicle when copy the state. But with the MDPVehicle, I cannot do that. In which function or .py file is the other vehicle's type changed to MDP class?

how to test the training data

Hello, I just start the reinforcement learning decision recently, this is an excellent project for me. But I have some questions when I run this project.
1 I train the data use this command: python experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/ego_attention.json --train --episodes=4000 --name-from-config. after training the data successfully. Where are the deep network parameters saved? saved_models/lastest.tar or in ego_attention_20200514-201107_29818 folder (which .xx file?)
2 I train agent ego_attention.json, dqn.json and ddqn.json and env.json together in different terminal. After that I run command: python experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/ego_attention.json --test --episodes=10 --recover-from=out/IntersectionEnv/DQNAgent/saved_models/latest.tar. The results are strange that ego-vehicle cash other cars all the time and its attention is always on one car even it passed. So only one deep networks weight parameters after training can be saved? or all networks parameters after training can be saved in different files or same file like lastest.tar?
Thank you very much.

Questions about MCTS

Hi, in update_branch function, you update the parent node value with the same total_reward with the current node. Why? I think they have different values, because action from parent node to current node will get a reward which should be added in parent total_reward.

    def update_branch(self, total_reward):
        self.update(total_reward)
        if self.parent:
            self.parent.update_branch(total_reward)

FileNotFoundError: [Errno 2] No such file or directory: 'configs\\logging.json'

(base) C:\Users\nikhi\Desktop\rl-agents>python scripts/experiments.py evaluate configs/CartPoleEnv/env.json configs/CartPoleEnv/DQNAgent.json --train --episodes=200

Traceback (most recent call last):
File "scripts/experiments.py", line 148, in
main()
File "scripts/experiments.py", line 43, in main
evaluate(opts[''], opts[''], opts)
File "scripts/experiments.py", line 56, in evaluate
logger.configure(LOGGING_CONFIG)
File "C:\Users\nikhi\AppData\Roaming\Python\Python37\site-packages\rl_agents\trainer\logger.py", line 50, in configure
with Path(config).open() as f:
File "C:\Users\nikhi\anaconda3\lib\pathlib.py", line 1203, in open
opener=self._opener)
File "C:\Users\nikhi\anaconda3\lib\pathlib.py", line 1058, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'configs\logging.json'

Please help @eleurent

hello,I am a beginner in the reinforcement learning!!

    Hello, first that I'm sorry to bother you again,when I run the command :

“ python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=10”,
the pygame environment named “highway-env” can only run 2 episode,and the 3 to 7 episode is vert quick ,when the eighty episode bagin,the pygame environment namd “ highway-env” rerun,why does this happend?Another question is that if i want to modify the reward function:i want to use a distance between the Self vehicle and the nearest vehicle which is on same lane to the Self vehicle,how can i modify the reward ?thanks for you help !I‘m sorry i ask these question which you may feel very low,I am a beginner in the reinforcement learning,thanks a lot!!

How to contribute?

I wanted to know if a contribution is welcomed here, and if it is, how to contribute? I mean, is there any guideline for how we should implement agents?

In fact, I wanted to implement agents like DDPG, SAC, and TD3. Are these in the scope of this project?

Some questions about ego attention DQN

Hi, I read your paper Social Attention for Autonomous Decision-Making in Dense Traffic, but I am not clear about why to fed tensors to several heads, and how to select the number of head? In HighwayEnv the number of heads is 2, I wonder if I should change this number in MergeEnv.

Can't import rl_agents in Python 3.7.1

I followed to instructions listed in the README to import rl_agents, but I wasn't able to get it to work.

I'm running Python 3.7.1 on Mac, and I ran the following commands:

pip3 install --user git+https://github.com/eleurent/highway-env
pip3 install --user git+https://github.com/eleurent/rl-agents

Within Python, I was able to import highway_env but could not import rl_agents. Is there anything else I should do to set this up?

Interestingly, when I run pip3 list, rl-agents appears in the list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.