Code Monkey home page Code Monkey logo

pytorch-es's People

Contributors

atgambardella avatar lolz0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-es's Issues

Support Capability to Use GPUs

Hey Andrew, thanks immensely for putting this together. Very useful example of evolution strategies.

I was wondering what your thoughts of including GPU support. My thought is that any action done by each model can be ran on the gpu and the environment is then ran on the CPU. Due to the nature of the GPU batching, one idea is that you would batch actions, then let the environment respond, and continue this process.

I feel that the biggest bottleneck at this point would be pcie lanes depending on how much bandwidth you have to the GPU. The bottom line is that models would be stored on the gpu and would execute on the gpu while the env is ran on the cpu. Does gym allow the actual env to run on the gpu?

Tic-tac-toe environment for the ES training process

Hello, @atgambardella.

Using as base your code, I have developed a new Tic-tac-toe environment for the ES training process. As this game can be studied in full depth by a classical min-max tree, I've used this classic AI to play against our neural network model in the "step" phase and to return so the reward.

The last result is a model (a simple "Linear" one) that thanks to the evolutionary computation can reach a zero-perfect game against the classical AI brute force strategy.

My code is here: https://github.com/Zeta36/pytorch-es-tic-tac-toe

I simplified also your code a little and I removed thing I knew I was not going to need.

Thanks for your work, friend.

it stuck in selu?

hi~, i run CartPole-v1 , and it is ok.
But, when i run with other env-name, they all stuck in the same place:

here in model.py , i add some print to help check where they stuck:
def forward(self, inputs):
if self.small_net:
x = selu(self.linear1(inputs))
x = selu(self.linear2(x))
return self.actor_linear(x)
else:
print('model: !!!forward!!! big-net(4conv+1lstm)')
inputs, (hx, cx) = inputs
print('model: !!!after update: input, (hx,cx) = inputs')
x = selu(self.conv1(inputs))
x = selu(self.conv2(x))
x = selu(self.conv3(x))
x = selu(self.conv4(x))
print('model: !!!after 4conv end selu process')
x = x.view(-1, 3233)
print('model: !!!after x reshape: x.view(-1,3233)')
......

and here below is the output of the " python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam" command:
(venv_openai-es) l00221575@F0817-S05:~/venv_openai-es/pytorch-es$ python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam
[2018-10-23 22:23:10,929] Making new env: PongDeterministic-v4
Preprocessing env
Num params in network 588710
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
(Variable(state.unsqueeze(0), volatile=True),
model: !!!forward!!! big-net(4conv+1lstm)
model: !!!after update: input, (hx,cx) = inputs
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
(Variable(state.unsqueeze(0), volatile=True),
model: !!!forward!!! big-net(4conv+1lstm)
model: !!!after update: input, (hx,cx) = inputs
/home/l00221575/venv_openai-es/pytorch-es/train.py:50: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
(Variable(state.unsqueeze(0), volatile=True),

 
and I guess they stuck in selu, and  i add some print in selu and run PongDeterministic-v4 again, but the output stay the same as above, and other env-name like Kangaroo-ram-v0, Skiing-v0, Freeway-v0 and Gravitar-v0 , they all stuck in the same place like I run PongDeterministic-v4.

Please help~~~

def selu(x):
    print('selu begin')
    alpha = 1.6732632423543772848170429916717
    scale = 1.0507009873554804934193349852946
    print('selu ends')
    return scale * F.elu(x, alpha)

Learning rate

Is the default learning rate of 0.3 corresponding to 12 threads or 40?
Should learning rate scale linearly with batch size?

Performance on MountainCar

Does anyone train the model on MountainCar-v0? I can only obtain the minimum reward -200. I tried both smaller sigma and larger sigma but none of both worked.

Question about action selection

In case of policy gradients, we try to approximate a softmax policy from which we sample actions based on probabilities stochastically.

How about in ES in case of discrete action space? Does the method follow greedy policy or softmax policy? From the code, I could see it is greedy policy, is it the right behavior?

IndexError: too many indices for array

Hi,

I trying to run cartpole using this command

python3 main.py --small-net --env-name CartPole-v1

and I get a screen full of errors like this

IndexError: too many indices for array
Process Process-41:
Traceback (most recent call last):
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/ajay/anaconda3/envs/py35_pytorch/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ajay/PythonProjects/pytorch-es-master/train.py", line 49, in do_rollouts
    state, reward, done, _ = env.step(action[0, 0])
IndexError: too many indices for array

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.