atgambardella / pytorch-es Goto Github PK

View Code? Open in Web Editor NEW

347.0 347.0 41.0 65 KB

Evolution Strategies in PyTorch

License: MIT License

Python 100.00%

machinelearning python pytorch

pytorch-es's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle luotongml johannah rbunn80110 swtyree ashutoshkrjha lolz0r bionicles leechikara hedgefair ajaytalati chaoshangcs shubhampachori12110095 mahjiong afcarl changyong-oh ossidy williamd4112 forkedreposbak auserj xiaoiker yanxiaobin-ben rikirolly vimalthilak vanangamudi lisun-ai ml-tina denis-xiao zxzang nuaa-codemonkey ghost100453 xrosliang wayneouyang doitdodo huahuagithub lilujie chupeng24 iq-scm

pytorch-es's Issues

Support Capability to Use GPUs

Hey Andrew, thanks immensely for putting this together. Very useful example of evolution strategies.

I was wondering what your thoughts of including GPU support. My thought is that any action done by each model can be ran on the gpu and the environment is then ran on the CPU. Due to the nature of the GPU batching, one idea is that you would batch actions, then let the environment respond, and continue this process.

I feel that the biggest bottleneck at this point would be pcie lanes depending on how much bandwidth you have to the GPU. The bottom line is that models would be stored on the gpu and would execute on the gpu while the env is ran on the cpu. Does gym allow the actual env to run on the gpu?

Adding Prioritized Experience Replay

Hi,

I was wondering if you had any ideas how a Prioritized Experience Replay buffer could be added to ES?

They do something similar to that here - Leveraging Demonstrations for Deep Reinforcement
Learning on Robotics Problems with Sparse Rewards, with DDPG for a robotics application

I'm guessing though ES would be more general?

Perhaps OpenAI's prioritized replay_buffer, from the baselines repo could be used?

Spawn processes outside of the training loop

As the code is now, n processes are spawned per gradient step. As python startup time takes a while (~30 ms per process), this causes non-negligible overhead.

Tic-tac-toe environment for the ES training process

Hello, @atgambardella.

Using as base your code, I have developed a new Tic-tac-toe environment for the ES training process. As this game can be studied in full depth by a classical min-max tree, I've used this classic AI to play against our neural network model in the "step" phase and to return so the reward.

The last result is a model (a simple "Linear" one) that thanks to the evolutionary computation can reach a zero-perfect game against the classical AI brute force strategy.

My code is here: https://github.com/Zeta36/pytorch-es-tic-tac-toe

I simplified also your code a little and I removed thing I knew I was not going to need.

Thanks for your work, friend.

it stuck in selu?

hi~, i run CartPole-v1 , and it is ok.
But, when i run with other env-name, they all stuck in the same place:

here in model.py , i add some print to help check where they stuck:
def forward(self, inputs):
if self.small_net:
x = selu(self.linear1(inputs))
x = selu(self.linear2(x))
return self.actor_linear(x)
else:
print('model: !!!forward!!! big-net(4conv+1lstm)')
inputs, (hx, cx) = inputs
print('model: !!!after update: input, (hx,cx) = inputs')
x = selu(self.conv1(inputs))
x = selu(self.conv2(x))
x = selu(self.conv3(x))
x = selu(self.conv4(x))
print('model: !!!after 4conv end selu process')
x = x.view(-1, 3233)
print('model: !!!after x reshape: x.view(-1,3233)')
......

and here below is the output of the " python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam" command:
(venv_openai-es) l00221575@F0817-S05:~/venv_openai-es/pytorch-es$ python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam
[2018-10-23 22:23:10,929] Making new env: PongDeterministic-v4
Preprocessing env
Num params in network 588710
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
(Variable(state.unsqueeze(0), volatile=True),
model: !!!forward!!! big-net(4conv+1lstm)
model: !!!after update: input, (hx,cx) = inputs
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
(Variable(state.unsqueeze(0), volatile=True),
model: !!!forward!!! big-net(4conv+1lstm)
model: !!!after update: input, (hx,cx) = inputs
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
(Variable(state.unsqueeze(0), volatile=True),

 
and I guess they stuck in selu, and  i add some print in selu and run PongDeterministic-v4 again, but the output stay the same as above, and other env-name like Kangaroo-ram-v0, Skiing-v0, Freeway-v0 and Gravitar-v0 , they all stuck in the same place like I run PongDeterministic-v4.

Please help~~~

def selu(x):
    print('selu begin')
    alpha = 1.6732632423543772848170429916717
    scale = 1.0507009873554804934193349852946
    print('selu ends')
    return scale * F.elu(x, alpha)

python3 main.py --small-net --env-name CartPole-v1

and I get a screen full of errors like this

IndexError: too many indices for array
Process Process-41:
Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ajay/PythonProjects/pytorch-es-master/train.py", line 49, in do_rollouts
    state, reward, done, _ = env.step(action[0, 0])
IndexError: too many indices for array

atgambardella / pytorch-es Goto Github PK

pytorch-es's People

Contributors

Stargazers

Watchers

Forkers

pytorch-es's Issues

Support Capability to Use GPUs

Adding Prioritized Experience Replay

Spawn processes outside of the training loop

Tic-tac-toe environment for the ES training process

it stuck in selu?

Learning rate

Performance on MountainCar

Question about action selection

IndexError: too many indices for array

Momentum-based optimizers

Remove numpy dependency

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent