Code Monkey home page Code Monkey logo

tianhongdai / reinforcement-learning-algorithms Goto Github PK

View Code? Open in Web Editor NEW
646.0 15.0 104.0 4.04 MB

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

Python 100.00%
deep-reinforcement-learning ddpg ppo proximal-policy-optimization deep-learning actor-critic algorithm dqn flappy-bird trpo

reinforcement-learning-algorithms's People

Contributors

tianhongdai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reinforcement-learning-algorithms's Issues

Bug using SAC with torch version 1.8.0a0+963f762

I run the SAC code with torch (compiled version) while i encounter the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

how can i fix it?

The whole error is listed below

Logging to logs//HalfCheetah-v2/
Initial exploration has been finished!
Traceback (most recent call last):
  File "train.py", line 14, in <module>
    sac_trainer.learn()
  File "/home/reinforcement-learning-algorithms/rl_algorithms/sac/sac_agent.py", line 97, in learn
    qf1_loss, qf2_loss, actor_loss, alpha, alpha_loss = self._update_newtork()
  File "/home/reinforcement-learning-algorithms/rl_algorithms/sac/sac_agent.py", line 189, in _update_newtork
    actor_loss.backward()
  File "/home/admin/anaconda3/envs/pytorch_build/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/admin/anaconda3/envs/pytorch_build/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(

this is how i run the code

python train.py --env-name HalfCheetah-v2 --cuda --seed 1

Retraining the saved model

I'm trying to retrain the saved model, but it behaves very strangely:

  1. does not seem to start from the same behaviour that has been saved
  2. repeats only one type of action after running the retraining

I guess this is pytorch issue, but maybe you've succeeded the retraining, and might know how should it be done?

Saving:

def save_model_for_training(self, episode, filepath):
        checkpoint = {
            'episode': episode,
            'state_dict': self.net.state_dict(),
            'optimizer': self.optimizer.state_dict()
        }
        torch.save(checkpoint, filepath)

 self.save_model_for_training(episode, filepath= self.model_path + 'model.pt')

Loading saved model:

checkpoint = torch.load(self.model_path + 'model.pt')
self.start_episode = checkpoint['episode']
self.net.load_state_dict(checkpoint['state_dict'])
self.optimizer.load_state_dict(checkpoint['optimizer'])

Thanks a lot in advance

How to visualize reward-epoch?

How to visualize reward-epoch?
Does It need extra code for post-processing(post-process print infomation) or is there some tools for visualization? I cound not find the extra code in github project.

A3C in description

Hi, just wanted to point out that it says "A3C" in the repo description instead of "A2C" (which is actually implemented).

the same code?

Thanks for your code for DRL. But I find the dueling_agent.py of dueling dqn is the same as ddqn_agent.py of double dqn. So where do you implement the dueling dqn algorithm into code?

SAC Agent still wrong with "tuple index out of range" use "MountainCar-v0"

I try to use "MountainCar-v0" env in sac agent
but still wrong with "tuple index out of range"
Could you tell me how to fix it ?
Thanks

Traceback (most recent call last):
File "D:\RL_library/sac/train.py", line 14, in
sac_trainer = sac_agent(env, args)
File "D:RL_library\sac\sac_agent.py", line 30, in init
self.qf1 = flatten_mlp(self.env.observation_space.shape[0], self.args.hidden_size, self.env.action_space.shape[0])
IndexError: tuple index out of range

BrokenPipeError: [Errno 32] Broken pipe

Hi, after I installed using pip install -e ., I tried to run:

python train.py --env-name PongNoFrameskip-v4 --cuda 

in reinforcement-learning-algorithms/rl_algorithms/a2c and I get the following error on google colab:

Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Logging to logs//PongNoFrameskip-v4/
Traceback (most recent call last):
  File "train.py", line 20, in <module>
    a2c_trainer.learn()
  File "/content/reinforcement-learning-algorithms/rl_algorithms/a2c/a2c_agent.py", line 42, in learn
    _, pi = self.net(input_tensor)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/reinforcement-learning-algorithms/rl_algorithms/a2c/models.py", line 47, in forward
    x = self.cnn_layer(inputs / 255.0)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/reinforcement-learning-algorithms/rl_algorithms/a2c/models.py", line 28, in forward
    x = x.view(-1, 32 * 7 * 7)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Exception ignored in: <bound method SubprocVecEnv.__del__ of <rl_utils.env_wrapper.multi_envs_wrapper.SubprocVecEnv object at 0x7f772325ca58>>
Traceback (most recent call last):
  File "/content/reinforcement-learning-algorithms/rl_utils/env_wrapper/multi_envs_wrapper.py", line 105, in __del__
  File "/content/reinforcement-learning-algorithms/rl_utils/env_wrapper/__init__.py", line 96, in close
  File "/content/reinforcement-learning-algorithms/rl_utils/env_wrapper/multi_envs_wrapper.py", line 89, in close_extras
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 368, in _send
BrokenPipeError: [Errno 32] Broken pipe

Add prioritized experience replay

Thanks for the excellent implementations of multiple classic RL agents, I have tried some of them, worked very well.
Just curious, do you plan to add the prioritized experience replay to the DQN? I have tried to embed the openai baseline PER files (https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py) to the current DQN, should be a really easy job, sadly it doesn't work very well, seems no obvious improvements against the vanilla DQN.
If you have any insights on this or have you tried some similar implementation, please let me know, really appreciate it.

Plotted Reward Scale

Hi, I am oscar and I do appreciate those source codes with integrating various algorithms.

I have tried to run the nature DQN with default setting through Pong and BeamRider environment and found that the reward scale is not as large as the one posted in main page.
For Pong Environment,
I just manually set the clip_rewards = False and got the final mean around 27.430 which is far from the max level(around 300) posted.

Is it due to difference hyper-parameters setting or may be due to some plotting techniques?
BTW, I will really appreciate if you can update the plotting code, Thank you!

Screenshot from 2021-10-27 13-15-38

How to use the results?

Could you please tell me how to use the results?
image
I want the data just like: mean_reward & time_step

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.