uvipen / super-mario-bros-ppo-pytorch Goto Github PK

View Code? Open in Web Editor NEW

1.0K 28.0 189.0 181.86 MB

Proximal Policy Optimization (PPO) algorithm for Super Mario Bros

License: MIT License

Dockerfile 1.81% Python 98.19%

reinforcement-learning ppo ppo2 pytorch gym python3 python deep-learning super-mario-bros mario

super-mario-bros-ppo-pytorch's People

Contributors

Stargazers

Watchers

Forkers

sudo-raza thuync pjain261 lehoailinh278 manishbaral1 muzammil-md-ai gplast johndpope akuma527 tanmaysgs jiaodaxiaozi chorseng hongthana choojoshua beyonddream-productions laguna-liu sachin-n-ai ntviet18 rodrigoalvort frankfan007 reedip gbhanuteja23 selvadevan hephaex shakyarujan llt1 online-articles cqllzp tungsontran jackthewhite a1tairai dank33r liangzai951 davincibj fmzwy jithinraj shashgpt sekmet wamba85 ztupidts krisller dut3062796s lving bifengjiao deeplearning2012 dimitriosp paulrolsen mohdimran043 hanclinto npg-andrewhuynh anotherother ashiquebiniqbal nlebang feiya1314 alomdaelmasry nisheethjaiswal czh513 ravi1g ducthangqd1998 zeta1999 numz mojomoon shadowkun aicomputervision tony23545 xiong3134 pralhadstha xrosliang thexu tungvuthanh pepsi7959 bluesky92 hjbplayer adv27 flyer822 panyicast zhl2013 masonyang bbpatil yakitoritabetai wang-xiaoyang zhukkang play3577 qiaoshiji dfenglei okap5811 tfgzs yw10 freegliboracle zhoutingchuan diegosiqueir4 coolemiao tuanlhbk mohitjuneja xiaolonghao yanchenta chaoninglin fil82 kalumsneha dieptran43

super-mario-bros-ppo-pytorch's Issues

I think the next step you can use one model weights to play all world

gym conflict

base ❯ pip install nes_py-8.1.2-cp37-cp37m-macosx_10_15_x86_64.whl
Processing ./nes_py-8.1.2-cp37-cp37m-macosx_10_15_x86_64.whl
Requirement already satisfied: numpy>=1.18.5 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from nes-py==8.1.2) (1.19.2)
Requirement already satisfied: tqdm>=4.32.2 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from nes-py==8.1.2) (4.49.0)
Collecting gym>=0.17.2
Using cached gym-0.18.0-py3-none-any.whl
Requirement already satisfied: numpy>=1.18.5 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from nes-py==8.1.2) (1.19.2)
Requirement already satisfied: Pillow<=7.2.0 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from gym>=0.17.2->nes-py==8.1.2) (7.2.0)
Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from gym>=0.17.2->nes-py==8.1.2) (1.6.0)
Requirement already satisfied: scipy in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from gym>=0.17.2->nes-py==8.1.2) (1.5.2)
Using cached gym-0.17.3-py3-none-any.whl
Using cached gym-0.17.2-py3-none-any.whl
INFO: pip is looking at multiple versions of nes-py to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install nes-py and nes-py==8.1.2 because these package versions have conflicting dependencies.

The conflict is caused by:
nes-py 8.1.2 depends on pyglet>=1.5.5
gym 0.18.0 depends on pyglet<=1.5.0 and >=1.4.0
nes-py 8.1.2 depends on pyglet>=1.5.5
gym 0.17.3 depends on pyglet<=1.5.0 and >=1.4.0
nes-py 8.1.2 depends on pyglet>=1.5.5
gym 0.17.2 depends on pyglet<=1.5.0 and >=1.4.0

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

OS：macOS 10.15.7

a question regarding action sampling

Hi,
Thank you so much for your work, it inspires me a lot. One thing I'm not clear is whether I should use action sampling in the rollout phase. From the PPO blog's code (repo: CleanRL, The 37 Implementation Details of Proximal Policy Optimization), seems it should have. I hope you can help me explain. Thank you

Best regards,
Yichao

Please correct code in C#, this is same like super mario b game

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Artemismovement : MonoBehaviour
{

public float moveSpeed, jumpForce;

public bool Jumping;

public Rigidbody2D RG2D;
public int moveX, moveY;

// Start is called before the first frame update
void Start()
{
    RG2D= GetComponent<Rigidbody2D>();

    moveSpeed= 11f; 
    jumpForce= 16f; 

    Jumping= true;  
}

// Update is called once per frame
void Update()
{
   moveX= input.getAxisraw("Horizontal"); 
   moveY=  input.getAxisraw("Vertical");


 //Horizontal Movement (X-Axis)
    if (moveX !=0)
    {
        RG2D.velocity= new Vector2(moveSpeed * moveX, RG2D.velocity.y);
    }


    //Jumping 
    if(moveY==1 && !Jumping)
    {
        RG2D.velocity= new Vector2(RG2D.velocity.x, jumpForce);
        Jumping= true;

    }

    //Crouching
    if(moveY==-1)
    {
        transform.localScale= new Vector2(1f, 0.5f); 
    }

    else
    {
        transform.localScale= new Vector2(1f, 1f); 
    }
}
void OnCollisionEnter2D(Collision2D col)
{
    Jumping= false; 
}
// static void Main(string[] args){
//     Artemismovement m = new Artemismovement;
//     m.start();
//     m.Update();
//     m.OnCollisionEnter2D();
// }

}

Can you tell me I will use how mang GPU

I get a error
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.95 GiB total capacity; 2.69 GiB already allocated; 10.69 MiB free; 2.78 GiB reserved in total by PyTorch)

training from scratch?

i want to train the model for world 1-1 from scratch. How many update in the network need to get the result which is shown here?

'gym_super_mario_bros' not found

I tried to run train getting this error.

(env_pytorch) c:\GitHub\uvipen\Super-mario-bros-PPO-pytorch>conda deactivate

(base) c:\GitHub\uvipen\Super-mario-bros-PPO-pytorch>python train.py --world 5 --stage 2 --lr 1e-4
Traceback (most recent call last):
  File "train.py", line 10, in <module>
    from src.env import MultipleEnvironments
  File "c:\GitHub\uvipen\Super-mario-bros-PPO-pytorch\src\env.py", line 5, in <module>
    import gym_super_mario_bros
ModuleNotFoundError: No module named 'gym_super_mario_bros'

OSError and EOFError

File "E:\anaconda\lib\multiprocessing\connection.py", line 170, in fileno
self._check_closed()
File "E:\anaconda\lib\multiprocessing\connection.py", line 136, in _check_closed
raise OSError("handle is closed")
OSError: handle is closed
Traceback (most recent call last):
File "", line 1, in
File "E:\anaconda\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\anaconda\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

AttributeError: 'Monitor' object has no attribute 'pipe'

While testing the model i get this:
Traceback (most recent call last):
File "test.py", line 65, in
test(opt)
File "test.py", line 55, in test
state, reward, done, info = env.step(action)
File "C:\Users\johnm\DeepLearning\Super-mario-bros-PPO-pytorch\src\env.py", line 100, in step
state, reward, done, info = self.env.step(action)
File "C:\Users\johnm\DeepLearning\Super-mario-bros-PPO-pytorch\src\env.py", line 55, in step
self.monitor.record(state)
File "C:\Users\johnm\DeepLearning\Super-mario-bros-PPO-pytorch\src\env.py", line 27, in record
self.pipe.stdin.write(image_array.tostring())
AttributeError: 'Monitor' object has no attribute 'pipe'
Can you help please?

size issue on GAE process

While study your Mario PPO codes, https://github.com/uvipen/Super-mario-bros-PPO-pytorch/blob/master/train.py, it’s hard to understand the following codes:

################################################################################
values = torch.cat(values).detach() # torch.Size([4096])

states = torch.cat(states)
gae = 0
R = []
for value, reward, done in list(zip(values, rewards, dones))[::-1]: # len(list(zip(values, rewards, dones))[::-1]) is 512
gae = gae * opt.gamma * opt.tau
gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach()
next_value = value
R.append(gae + value)
##################################################################################

Question: with  --num_local_steps=512 and  —num_processes=8,  after 'values = torch.cat(values).detach()’, the values.shape is torch.Size([4096]). But this list:  "list(zip(values, rewards, dones))[::-1]”,  the length is 512,   which mean only the first 512 items “values" are used in the "for…loop”,  the others are discarded.

So, in every 512 local_steps, only the values of first 64(=512/8) steps are used to calculate GAE and R.  Is it a problem or I have misunderstanding?

Looking for your answer, thanks!

Fatal bug in implementation of GAE

Super-mario-bros-PPO-pytorch/train.py

Line 119 in ab4248d

gae = gae * opt.gamma * opt.tau

It should be

gae = gae * opt.gamma * opt.tau*(1 - done)

Suppose worker 1 has to sample 500 steps. The game prematurely ends at 250 steps, the worker will restart the game and continue sampling 250 steps. The trajectory would be s1,s2,...,s250,s1',s2',...s250'.
The wrong implementation forgets to reset GAE to zero when calculating GAE of s250. It will make GAE bigger than expected. This will cause the advantage of s250 become bigger and bigger, which will make the network think you should output a250 when seeing s250. (However, this is not true, performing s250 at a250 make you die).

Therefore, the critic loss diverges (advantage becomes bigger and bigger, network can't predict it right). Stuck at action that make you die. The agent does not learn anything.

Env custom reward explain

I see that you created a custom reward env. Can you explain why you use it and the impact of it compared with the standard reward from library?
Additionally, I see that you modified the custom reward (maybe to solve 7-4 and 4-4 levels) compared with your custom reward use in A3C.
I think you should explain the reason and importance of your custom reward. The more successful level may be because you use a better reward system not only by better algorithms?

ModuleNotFoundError: No module named 'gym_super_mario_bros'

there is a problem with the output video

I downloaded the code, and ran the code sucessfully, but there is a problem with the output video as attachment.

video_1_1.mp4

ModuleNotFoundError: No module named 'gym_super_mario_bros'

When I run the test code "python test.py --world 5 --stage 2", a error occurs. It showed "ModuleNotFoundError: No module named 'gym_super_mario_bros' ", could you mind telling me how to solve it? Thx.

What's going on? It's not dead?

`OSError: handle is closed` during training

First of all, thanks for this project!

I'm able to run it during the test phase, but not for the train phase. I'm encountering:

C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\Scripts\python.exe C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py --world 5 --stage 2 --lr 1e-4
Traceback (most recent call last):
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 154, in <module>
    train(opt)
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 55, in train
    envs = MultipleEnvironments(opt.world, opt.stage, opt.action_type, opt.num_processes)
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 122, in __init__
    process.start()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 939, in reduce_pipe_connection
    dh = reduction.DupHandle(conn.fileno(), access)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 170, in fileno
    self._check_closed()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 136, in _check_closed
    raise OSError("handle is closed")
OSError: handle is closed
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

When setting the number of processes to 1:


C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\Scripts\python.exe C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py
Process Process-1:
Traceback (most recent call last):
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 135, in run
    self.env_conns[index].send(self.envs[index].reset())
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 92, in reset
    state = self.env.reset()
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 65, in reset
    return process_frame(self.env.reset())
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\nes_py\wrappers\joypad_space.py", line 78, in reset
    return self.env.reset()
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\gym\wrappers\time_limit.py", line 25, in reset
    return self.env.reset(**kwargs)
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\nes_py\nes_env.py", line 258, in reset
    self._restore()
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\nes_py\nes_env.py", line 220, in _restore
    _LIB.Restore(self._env)
OSError: exception: access violation reading 0x0000020D6BE8F220
Traceback (most recent call last):
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 154, in <module>
    train(opt)
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 64, in train
    curr_states = [agent_conn.recv() for agent_conn in envs.agent_conns]
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 64, in <listcomp>
    curr_states = [agent_conn.recv() for agent_conn in envs.agent_conns]
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    raise EOFError
EOFError

I'm on Python 3.6 in combination with Windows 10. I resolved my dependencies using:

pip install https://download.pytorch.org/whl/cu90/torchvision-0.3.0-cp36-cp36m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu90/torch-1.1.0-cp36-cp36m-win_amd64.whl
pip install gym-super-mario-bros
pip install opencv-python
choco install ffmpeg

Package versions :

(venv) C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch>pip list
Package              Version
-------------------- --------
cloudpickle          1.3.0
gym                  0.17.2
gym-super-mario-bros 7.3.2
nes-py               8.1.4
numpy                1.19.1
opencv-python        4.4.0.40
Pillow               7.2.0
pip                  19.0.3
pyglet               1.5.7
scipy                1.5.2
setuptools           40.8.0
six                  1.15.0
torch                1.1.0
torchvision          0.3.0
tqdm                 4.48.2

Any ideas?

about train.py

Hello, look at your code, feel some questions. It's on line 114 of train.py, i.e. values = torch.cat(values).detach(). I think this statement should come after line 123. In line 120, i.e., gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach(). It will become a constant（size=1） instead of a vector（size=8）. You can check it and solve my question. Thank you.

Why is the reward value so designed？

Hello, I'm curious about the logic of reward value design here, can you introduce it？

Super-mario-bros-PPO-pytorch/src/env.py

Line 54 in 9cd3fe4

reward += (info["score"] - self.curr_score) / 40.

Super-mario-bros-PPO-pytorch/src/env.py

Line 61 in 9cd3fe4

return state, reward / 10., done, info

18 stages completed with A3C

In fact, I used your code completed 18/32 stages with A3C, but anyway, 29/32 is much better.
Here is my A3C models you can test:
链接: https://pan.baidu.com/s/1F24higD2uXHn7TeMGRwbjw 密码: mgt3

Done world and stages

1-1,4

2-1,2,3,4

3-1,2,3,4

4-1

5-1

6-1,3

7-1,3

8-2,3

Not able to solve any level using provided trained models

Hi,

I have tried to use the provided models in trained_models. Using them agent was able to proceed however couldn't solve the complete level. Either it used to get stuck or die. Were you able to complete the levels using these models?

OSError and EOFError2

I learned to use your code on windows, but encountered the problem of oserror and eoferror. I checked it env.py I don't think the following paragraph is meaningful. Would you please explain the meaning of this code? Because my mother tongue is not English, can you understand what I mean??

#     for index in range(num_envs):
#
#         process = mp.Process(target=self.run, args=(index,))
#         process.start()
#
#         # self.env_conns[index].close()
#
# def run(self, index):
#
#     # self.agent_conns[index].close()
#     while True:
#
#         request, action = self.env_conns[index].recv()
#         print(request,action)
#         if request == "step":
#             self.env_conns[index].send(self.envs[index].step(action.item()))
#         elif request == "reset":
#             self.env_conns[index].send(self.envs[index].reset())
#
#         else:
#             raise NotImplementedError