Code Monkey home page Code Monkey logo

super-mario-bros-ppo-pytorch's People

Contributors

uvipen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

super-mario-bros-ppo-pytorch's Issues

gym conflict

base ❯ pip install nes_py-8.1.2-cp37-cp37m-macosx_10_15_x86_64.whl
Processing ./nes_py-8.1.2-cp37-cp37m-macosx_10_15_x86_64.whl
Requirement already satisfied: numpy>=1.18.5 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from nes-py==8.1.2) (1.19.2)
Requirement already satisfied: tqdm>=4.32.2 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from nes-py==8.1.2) (4.49.0)
Collecting gym>=0.17.2
Using cached gym-0.18.0-py3-none-any.whl
Requirement already satisfied: numpy>=1.18.5 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from nes-py==8.1.2) (1.19.2)
Requirement already satisfied: Pillow<=7.2.0 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from gym>=0.17.2->nes-py==8.1.2) (7.2.0)
Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from gym>=0.17.2->nes-py==8.1.2) (1.6.0)
Requirement already satisfied: scipy in /Users/etsiva/anaconda3/lib/python3.7/site-packages (from gym>=0.17.2->nes-py==8.1.2) (1.5.2)
Using cached gym-0.17.3-py3-none-any.whl
Using cached gym-0.17.2-py3-none-any.whl
INFO: pip is looking at multiple versions of nes-py to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install nes-py and nes-py==8.1.2 because these package versions have conflicting dependencies.

The conflict is caused by:
nes-py 8.1.2 depends on pyglet>=1.5.5
gym 0.18.0 depends on pyglet<=1.5.0 and >=1.4.0
nes-py 8.1.2 depends on pyglet>=1.5.5
gym 0.17.3 depends on pyglet<=1.5.0 and >=1.4.0
nes-py 8.1.2 depends on pyglet>=1.5.5
gym 0.17.2 depends on pyglet<=1.5.0 and >=1.4.0

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

OS:macOS 10.15.7

a question regarding action sampling

Hi,
Thank you so much for your work, it inspires me a lot. One thing I'm not clear is whether I should use action sampling in the rollout phase. From the PPO blog's code (repo: CleanRL, The 37 Implementation Details of Proximal Policy Optimization), seems it should have. I hope you can help me explain. Thank you

Best regards,
Yichao

Please correct code in C#, this is same like super mario b game

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Artemismovement : MonoBehaviour
{

public float moveSpeed, jumpForce;

public bool Jumping;

public Rigidbody2D RG2D;
public int moveX, moveY;

// Start is called before the first frame update
void Start()
{
    RG2D= GetComponent<Rigidbody2D>();

    moveSpeed= 11f; 
    jumpForce= 16f; 

    Jumping= true;  
}

// Update is called once per frame
void Update()
{
   moveX= input.getAxisraw("Horizontal"); 
   moveY=  input.getAxisraw("Vertical");


 //Horizontal Movement (X-Axis)
    if (moveX !=0)
    {
        RG2D.velocity= new Vector2(moveSpeed * moveX, RG2D.velocity.y);
    }


    //Jumping 
    if(moveY==1 && !Jumping)
    {
        RG2D.velocity= new Vector2(RG2D.velocity.x, jumpForce);
        Jumping= true;

    }

    //Crouching
    if(moveY==-1)
    {
        transform.localScale= new Vector2(1f, 0.5f); 
    }

    else
    {
        transform.localScale= new Vector2(1f, 1f); 
    }
}
void OnCollisionEnter2D(Collision2D col)
{
    Jumping= false; 
}
// static void Main(string[] args){
//     Artemismovement m = new Artemismovement;
//     m.start();
//     m.Update();
//     m.OnCollisionEnter2D();
// }

}

Can you tell me I will use how mang GPU

I get a error
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.95 GiB total capacity; 2.69 GiB already allocated; 10.69 MiB free; 2.78 GiB reserved in total by PyTorch)

training from scratch?

i want to train the model for world 1-1 from scratch. How many update in the network need to get the result which is shown here?

'gym_super_mario_bros' not found

I tried to run train getting this error.

(env_pytorch) c:\GitHub\uvipen\Super-mario-bros-PPO-pytorch>conda deactivate

(base) c:\GitHub\uvipen\Super-mario-bros-PPO-pytorch>python train.py --world 5 --stage 2 --lr 1e-4
Traceback (most recent call last):
  File "train.py", line 10, in <module>
    from src.env import MultipleEnvironments
  File "c:\GitHub\uvipen\Super-mario-bros-PPO-pytorch\src\env.py", line 5, in <module>
    import gym_super_mario_bros
ModuleNotFoundError: No module named 'gym_super_mario_bros'

OSError and EOFError

File "E:\anaconda\lib\multiprocessing\connection.py", line 170, in fileno
self._check_closed()
File "E:\anaconda\lib\multiprocessing\connection.py", line 136, in _check_closed
raise OSError("handle is closed")
OSError: handle is closed
Traceback (most recent call last):
File "", line 1, in
File "E:\anaconda\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\anaconda\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

AttributeError: 'Monitor' object has no attribute 'pipe'

While testing the model i get this:
Traceback (most recent call last):
File "test.py", line 65, in
test(opt)
File "test.py", line 55, in test
state, reward, done, info = env.step(action)
File "C:\Users\johnm\DeepLearning\Super-mario-bros-PPO-pytorch\src\env.py", line 100, in step
state, reward, done, info = self.env.step(action)
File "C:\Users\johnm\DeepLearning\Super-mario-bros-PPO-pytorch\src\env.py", line 55, in step
self.monitor.record(state)
File "C:\Users\johnm\DeepLearning\Super-mario-bros-PPO-pytorch\src\env.py", line 27, in record
self.pipe.stdin.write(image_array.tostring())
AttributeError: 'Monitor' object has no attribute 'pipe'
Can you help please?

size issue on GAE process

While study your Mario PPO codes, https://github.com/uvipen/Super-mario-bros-PPO-pytorch/blob/master/train.py, it’s hard to understand the following codes:

################################################################################
values = torch.cat(values).detach() # torch.Size([4096])

states = torch.cat(states)
gae = 0
R = []
for value, reward, done in list(zip(values, rewards, dones))[::-1]: # len(list(zip(values, rewards, dones))[::-1]) is 512
gae = gae * opt.gamma * opt.tau
gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach()
next_value = value
R.append(gae + value)
##################################################################################

Question: with  --num_local_steps=512 and  —num_processes=8,  after 'values = torch.cat(values).detach()’, the values.shape is torch.Size([4096]). But this list:  "list(zip(values, rewards, dones))[::-1]”,  the length is 512,   which mean only the first 512 items “values" are used in the "for…loop”,  the others are discarded.

So, in every 512 local_steps, only the values of first 64(=512/8) steps are used to calculate GAE and R.  Is it a problem or I have misunderstanding?

Looking for your answer, thanks!

Fatal bug in implementation of GAE

gae = gae * opt.gamma * opt.tau

It should be

gae = gae * opt.gamma * opt.tau*(1 - done)

Suppose worker 1 has to sample 500 steps. The game prematurely ends at 250 steps, the worker will restart the game and continue sampling 250 steps. The trajectory would be s1,s2,...,s250,s1',s2',...s250'.
The wrong implementation forgets to reset GAE to zero when calculating GAE of s250. It will make GAE bigger than expected. This will cause the advantage of s250 become bigger and bigger, which will make the network think you should output a250 when seeing s250. (However, this is not true, performing s250 at a250 make you die).

Therefore, the critic loss diverges (advantage becomes bigger and bigger, network can't predict it right). Stuck at action that make you die. The agent does not learn anything.

Env custom reward explain

I see that you created a custom reward env. Can you explain why you use it and the impact of it compared with the standard reward from library?
Additionally, I see that you modified the custom reward (maybe to solve 7-4 and 4-4 levels) compared with your custom reward use in A3C.
I think you should explain the reason and importance of your custom reward. The more successful level may be because you use a better reward system not only by better algorithms?

`OSError: handle is closed` during training

First of all, thanks for this project!

I'm able to run it during the test phase, but not for the train phase. I'm encountering:

C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\Scripts\python.exe C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py --world 5 --stage 2 --lr 1e-4
Traceback (most recent call last):
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 154, in <module>
    train(opt)
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 55, in train
    envs = MultipleEnvironments(opt.world, opt.stage, opt.action_type, opt.num_processes)
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 122, in __init__
    process.start()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 939, in reduce_pipe_connection
    dh = reduction.DupHandle(conn.fileno(), access)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 170, in fileno
    self._check_closed()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 136, in _check_closed
    raise OSError("handle is closed")
OSError: handle is closed
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

When setting the number of processes to 1:


C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\Scripts\python.exe C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py
Process Process-1:
Traceback (most recent call last):
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 135, in run
    self.env_conns[index].send(self.envs[index].reset())
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 92, in reset
    state = self.env.reset()
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\src\env.py", line 65, in reset
    return process_frame(self.env.reset())
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\nes_py\wrappers\joypad_space.py", line 78, in reset
    return self.env.reset()
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\gym\wrappers\time_limit.py", line 25, in reset
    return self.env.reset(**kwargs)
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\nes_py\nes_env.py", line 258, in reset
    self._restore()
  File "C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch\venv\lib\site-packages\nes_py\nes_env.py", line 220, in _restore
    _LIB.Restore(self._env)
OSError: exception: access violation reading 0x0000020D6BE8F220
Traceback (most recent call last):
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 154, in <module>
    train(opt)
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 64, in train
    curr_states = [agent_conn.recv() for agent_conn in envs.agent_conns]
  File "C:/Users/maarten/PycharmProjects/Super-mario-bros-PPO-pytorch/train.py", line 64, in <listcomp>
    curr_states = [agent_conn.recv() for agent_conn in envs.agent_conns]
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "C:\Users\maarten\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    raise EOFError
EOFError

I'm on Python 3.6 in combination with Windows 10. I resolved my dependencies using:

pip install https://download.pytorch.org/whl/cu90/torchvision-0.3.0-cp36-cp36m-win_amd64.whl
pip install https://download.pytorch.org/whl/cu90/torch-1.1.0-cp36-cp36m-win_amd64.whl
pip install gym-super-mario-bros
pip install opencv-python
choco install ffmpeg

Package versions :

(venv) C:\Users\maarten\PycharmProjects\Super-mario-bros-PPO-pytorch>pip list
Package              Version
-------------------- --------
cloudpickle          1.3.0
gym                  0.17.2
gym-super-mario-bros 7.3.2
nes-py               8.1.4
numpy                1.19.1
opencv-python        4.4.0.40
Pillow               7.2.0
pip                  19.0.3
pyglet               1.5.7
scipy                1.5.2
setuptools           40.8.0
six                  1.15.0
torch                1.1.0
torchvision          0.3.0
tqdm                 4.48.2


Any ideas?

about train.py

Hello, look at your code, feel some questions. It's on line 114 of train.py, i.e. values = torch.cat(values).detach(). I think this statement should come after line 123. In line 120, i.e., gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach(). It will become a constant(size=1) instead of a vector(size=8). You can check it and solve my question. Thank you.

Not able to solve any level using provided trained models

Hi,

I have tried to use the provided models in trained_models. Using them agent was able to proceed however couldn't solve the complete level. Either it used to get stuck or die. Were you able to complete the levels using these models?

OSError and EOFError2

I learned to use your code on windows, but encountered the problem of oserror and eoferror. I checked it env.py I don't think the following paragraph is meaningful. Would you please explain the meaning of this code? Because my mother tongue is not English, can you understand what I mean??

#     for index in range(num_envs):
#
#         process = mp.Process(target=self.run, args=(index,))
#         process.start()
#
#         # self.env_conns[index].close()
#
# def run(self, index):
#
#     # self.agent_conns[index].close()
#     while True:
#
#         request, action = self.env_conns[index].recv()
#         print(request,action)
#         if request == "step":
#             self.env_conns[index].send(self.envs[index].step(action.item()))
#         elif request == "reset":
#             self.env_conns[index].send(self.envs[index].reset())
#
#         else:
#             raise NotImplementedError

confused at detach()

hello, thanks for your code.
I have read your ppo code carefully, and i'm really confused at one function : detach()
old_log_policies = torch.cat(old_log_policies).detach()
or
values = torch.cat(values).detach()

i understand the detach() function is to stop down the autograd backward
but still wondering why you "detach" them in your code ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.