atgambardella / pytorch-es Goto Github PK
View Code? Open in Web Editor NEWEvolution Strategies in PyTorch
License: MIT License
Evolution Strategies in PyTorch
License: MIT License
Hey Andrew, thanks immensely for putting this together. Very useful example of evolution strategies.
I was wondering what your thoughts of including GPU support. My thought is that any action done by each model can be ran on the gpu and the environment is then ran on the CPU. Due to the nature of the GPU batching, one idea is that you would batch actions, then let the environment respond, and continue this process.
I feel that the biggest bottleneck at this point would be pcie lanes depending on how much bandwidth you have to the GPU. The bottom line is that models would be stored on the gpu and would execute on the gpu while the env is ran on the cpu. Does gym
allow the actual env to run on the gpu?
Hi,
I was wondering if you had any ideas how a Prioritized Experience Replay buffer could be added to ES?
They do something similar to that here - Leveraging Demonstrations for Deep Reinforcement
Learning on Robotics Problems with Sparse Rewards, with DDPG for a robotics application
I'm guessing though ES would be more general?
Perhaps OpenAI's prioritized replay_buffer, from the baselines repo could be used?
As the code is now, n
processes are spawned per gradient step. As python startup time takes a while (~30 ms per process), this causes non-negligible overhead.
Hello, @atgambardella.
Using as base your code, I have developed a new Tic-tac-toe environment for the ES training process. As this game can be studied in full depth by a classical min-max tree, I've used this classic AI to play against our neural network model in the "step" phase and to return so the reward.
The last result is a model (a simple "Linear" one) that thanks to the evolutionary computation can reach a zero-perfect game against the classical AI brute force strategy.
My code is here: https://github.com/Zeta36/pytorch-es-tic-tac-toe
I simplified also your code a little and I removed thing I knew I was not going to need.
Thanks for your work, friend.
hi~, i run CartPole-v1 , and it is ok.
But, when i run with other env-name, they all stuck in the same place:
here in model.py , i add some print to help check where they stuck:
def forward(self, inputs):
if self.small_net:
x = selu(self.linear1(inputs))
x = selu(self.linear2(x))
return self.actor_linear(x)
else:
print('model: !!!forward!!! big-net(4conv+1lstm)')
inputs, (hx, cx) = inputs
print('model: !!!after update: input, (hx,cx) = inputs')
x = selu(self.conv1(inputs))
x = selu(self.conv2(x))
x = selu(self.conv3(x))
x = selu(self.conv4(x))
print('model: !!!after 4conv end selu process')
x = x.view(-1, 3233)
print('model: !!!after x reshape: x.view(-1,3233)')
......
and here below is the output of the " python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam" command:
(venv_openai-es) l00221575@F0817-S05:~/venv_openai-es/pytorch-es$ python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam
[2018-10-23 22:23:10,929] Making new env: PongDeterministic-v4
Preprocessing env
Num params in network 588710
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
(Variable(state.unsqueeze(0), volatile=True),
model: !!!forward!!! big-net(4conv+1lstm)
model: !!!after update: input, (hx,cx) = inputs
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
(Variable(state.unsqueeze(0), volatile=True),
model: !!!forward!!! big-net(4conv+1lstm)
model: !!!after update: input, (hx,cx) = inputs
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
(Variable(state.unsqueeze(0), volatile=True),
and I guess they stuck in selu, and i add some print in selu and run PongDeterministic-v4 again, but the output stay the same as above, and other env-name like Kangaroo-ram-v0, Skiing-v0, Freeway-v0 and Gravitar-v0 , they all stuck in the same place like I run PongDeterministic-v4.
Please help~~~
def selu(x):
print('selu begin')
alpha = 1.6732632423543772848170429916717
scale = 1.0507009873554804934193349852946
print('selu ends')
return scale * F.elu(x, alpha)
Is the default learning rate of 0.3 corresponding to 12 threads or 40?
Should learning rate scale linearly with batch size?
Does anyone train the model on MountainCar-v0? I can only obtain the minimum reward -200. I tried both smaller sigma and larger sigma but none of both worked.
In case of policy gradients, we try to approximate a softmax policy from which we sample actions based on probabilities stochastically.
How about in ES in case of discrete action space? Does the method follow greedy policy or softmax policy? From the code, I could see it is greedy policy, is it the right behavior?
Hi,
I trying to run cartpole using this command
python3 main.py --small-net --env-name CartPole-v1
and I get a screen full of errors like this
IndexError: too many indices for array
Process Process-41:
Traceback (most recent call last):
File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ajay/PythonProjects/pytorch-es-master/train.py", line 49, in do_rollouts
state, reward, done, _ = env.step(action[0, 0])
IndexError: too many indices for array
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.