mishalaskin / curl Goto Github PK
View Code? Open in Web Editor NEWCURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
License: MIT License
CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
License: MIT License
Will there be scripts for discrete/Atari environments?
When I run the code, errors occur.
CRITICAL:absl:Shadow framebuffer is not complete, error 0x8cd7
CRITICAL:absl:Could not allocate display lists
CRITICAL:absl:Could not allocate display lists
Must I run on the machine with a display? Can the code be changed to go without a display?
I noticed when reading through the paper and the code that your pseudocode in the paper says that the key encoder needs to be detached from the graph but in your actual code you don't set detach = True for z_pos = self.CURL.encode(obs_pos, ema=True)
. I wanted to know whether the paper or code is correct. Or maybe I am missing some part of the computation.
This is what is in the code for curl_sac.py:
def update_cpc(self, obs_anchor, obs_pos, cpc_kwargs, L, step):
z_a = self.CURL.encode(obs_anchor)
z_pos = self.CURL.encode(obs_pos, ema=True)
logits = self.CURL.compute_logits(z_a, z_pos)
labels = torch.arange(logits.shape[0]).long().to(self.device)
loss = self.cross_entropy_loss(logits, labels)
self.encoder_optimizer.zero_grad()
self.cpc_optimizer.zero_grad()
loss.backward()
self.encoder_optimizer.step()
self.cpc_optimizer.step()
if step % self.log_interval == 0:
L.log('train/curl_loss', loss, step)
and this is what is in the pseudocode for the paper:
for x in loader:
x_q = aug(x)
x_k = aug(x)
z_q = f_q.forward(x_q)
z_k = f_k.forward(x_k)
z_k = z_k.detach()
proj_k = matmul(W, z_k.T)
logits = matmul(z_q, proj_k)
logits = logits - max(logits, axis=1)
labels = arange(logits.shape[0])
loss = CrossEntropyLoss(logits, labels)
loss.backward()
update(f_q.params)
update(W)
f_k.params = m*f_k.params+(1-m)*f_q.params
Hello! Thank you so much for putting up this valuable resource! I was wondering if I may ask for some kind advice about replicating the results, which I have been unable to do.
Mainly, I have been testing CURL (using the default settings + command listed on https://github.com/MishaLaskin/curl) against CURL with the following lines commented out (which should give me pixel SAC):
# if step % self.cpc_update_freq == 0 and self.encoder_type == 'pixel':
# obs_anchor, obs_pos = cpc_kwargs["obs_anchor"], cpc_kwargs["obs_pos"]
# self.update_cpc(obs_anchor, obs_pos,cpc_kwargs, L, step,0)
For [cartpole, swingup], I obtained ~ 850 for CURL but strangely I also obtained ~850 (and very quickly too) for pixel SAC. These results showing no difference were replicated over 5 seeds and very robust. Is my code change correct, or have I manipulated the code in the wrong way?
For the task [finger,spin] I obtained ~ 350 for both CURL and pixel SAC, also no difference.
Thank you in advance for the kind help! :)
Hi, thank you for your great research!
I'm afraid I think there is a bug at the random_crop function in utils.py:
Lines 244 to 245 in 23b0880
crop_max
should be modified as crop_max + 1
.In CURL.encode, what is the arg "ema"?
I understand that the response might be delayed, but I'm having difficulty locating the MoCo implementation in the CURL codebase. Could you kindly point me to the relevant section or file where MoCo is implemented? Thank you for your assistance.
Dear CURL authors,
Thanks for such a big-impact work and released code !
Following the hyper-parameters from table 3 in the Implementation Details of appendix, l run each reported game for five seeds.
The results are:
500K steps score | Our results | CURL paper
Finger, Spin | 828 +/- 137 | 926 +/- 45
Cartpole, Swingup | 809 +/- 39 | 841 +/- 45
Reacher, Easy | 951 +/- 27 | 929 +/- 44
Cheetah, Run | 526 +/- 59 | 518 +/- 28
Walker, Walk | 892 +/- 49 | 902 +/- 43
Ball in Cup, Catch | 846 +/- 103 | 959 +/- 27
From the above results, we find in some games (e,g. Finger, Spin, Ball in Cup), the mean score is lower than your results, and the
std is relatively high.
Besides, l find the 100K and 500K results of Pixel SAC are almost the same in Table 1 of the paper.
Have you met these questions when you run current codebase? Thank you so much!
Thanks for you great work. I have a problem when I want to modify the code. It hinted that I must to use retain_graph=True. Where am I wrong perhaps?
Thanks for sharing your code, it's great to be able to go through the implementation.
Maybe I'm misunderstanding this, but it seem that if you intend self.cpc_optimizer
to only optimise W, then
self.cpc_optimizer = torch.optim.Adam(
self.CURL.parameters(), lr=encoder_lr
)
should be
self.cpc_optimizer = torch.optim.Adam(
self.CURL.parameters(recursive=False), lr=encoder_lr
)
or
self.cpc_optimizer = torch.optim.Adam(
[self.CURL.W], lr=encoder_lr
)
The code I'm referring to is here and the torch docs for parameter are here. And I'm comparing it to section 4.7 of your paper.
As it stands it seems that encoder is optimised twice, once in encoder_optimizer
and again in cpc_optimizer
.
Or am I missing something?
First of all, thank you so much for kindly sharing your great research and also the code.
However, I have one question regarding the labels generation from the logits using the following code (curl_sac.py line 424):
labels = torch.arange(logits.shape[0]).long().to(self.device)
What if, for example, we get several same observations in the batch sampled from the replay buffer? Isn't the code will set same features as different classes since we use torch.arange
?
Please correct me if I am wrong. Thank you so much.
Great work and thanks a lot for releasing the code! It’s awesome to see this simple contrastive loss term performing so well without the need for reconstruction.
Quick question regarding the environment step count: if we consider a DMC episode of standard length 1000 steps and we use a frameskip of 4, do the reported results consider the episode to have 1000 steps or 250 steps? Put differently, do the 100k step results mean 100k “low-level DMC” steps or 100k “agent-applying-an-action” steps?
Hello, I followed the README to run
bash scripts/run.sh
This is what I got:
FileNotFoundError: [Errno 2] No such file or directory: './tmp/cartpole/cartpole-swingup-05-10-im84-b128-s482469-pixel/args.json'
Hi, I was going thought the code and couldn't find where momentum encoder was being updated, I think it is initialized only once at the beginning and then isn't trained at all
Edit: Never mind, had missing dependencies.
Hi, thank you for sharing your code.
When I run run.sh in a ubuntu server, I got a error:
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
CRITICAL:absl:OpenGL version 1.5 or higher required
CRITICAL:absl:OpenGL ARB_framebuffer_object required
CRITICAL:absl:OpenGL ARB_vertex_buffer_object required
./scripts/run.sh: line 10: 4276 Segmentation fault (core dumped)
But when I run it in my own computer, the error didn't appear. I can't fix it in the server, I'd appreciate any help on this error.
I'm looking into the code and find that in def update_cpc()
both self.encoder_optimizer.step()
and self.cpc_optimizer.step()
are called. However the parameters of critic.encoder
are carried by both optimizer. Isn't it true that, in def update_cpc()
, critic.encoder
is updated twice using the same gradient?
I got WARN: Box bound precision lowered by casting to float32 when I run the code. Should this be fixed?
Hi,
I see that you use the cross-entropy(CE) loss for the contrastive learning. As far as I understand, this does not penalize the negative samples, as the CE loss gives zero weights to the non-diagonal entries in the [B, B] matrix. Do I make any mistake?
Best,
Sherwin
Hi! Great paper btw!
When I run the code the RAM usage continuously increases. I am running the default code with no changes. The memory consumption keeps on increasing and after 8k iterations, the OS kills the process. My PC specs: Intel i7 processor, Nvidia RTX2070 GPU, 16GB RAM. Can you please help me out? Thank you.
I've been trying to implement CURL for a different environment than that of DeepMind Suite which is Google football environment. But I've been getting errors regarding action_shape,obs_shape and channels.
1) Issue with channels:
RuntimeError: Given groups=1, weight of size 32 6 3 3, expected input[1, 144, 84, 3] to have 6 channels, but got 144 channels instead.
2) Issue while assigning value of action shape from that of the environments:
Traceback (most recent call last):
File "train.py", line 291, in
main()
File "train.py", line 226, in main
device=device
File "train.py", line 148, in make_agent
curl_latent_dim=args.curl_latent_dim
File "/home/atharva/CURL/curl/curl_sac.py", line 285, in init
num_layers, num_filters
File "/home/atharva/CURL/curl/curl_sac.py", line 73, in init
nn.Linear(hidden_dim, 2 * action_shape[0])
IndexError: tuple index out of range
3) Issue with PixelEncoder :
Traceback (most recent call last):
File "train.py", line 292, in
main()
File "train.py", line 240, in main
evaluate(env, agent, video, args.num_eval_episodes, L, step,args)
File "train.py", line 116, in evaluate
run_eval_loop(sample_stochastically=False)
File "train.py", line 101, in run_eval_loop
action = agent.select_action(obs)
File "/home/atharva/CURL/curl/curl_sac.py", line 355, in select_action
obs, compute_pi=False, compute_log_pi=False
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/atharva/CURL/curl/curl_sac.py", line 82, in forward
obs = self.encoder(obs, detach=detach_encoder)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/atharva/CURL/curl/encoder.py", line 67, in forward
h_fc = self.fc(h)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [1 x 672], m2: [39200 x 50] at /tmp/pip-req-build-ocx5vxk7/aten/src/THC/generic/THCTensorMathBlas.cu:290
4) Issue with Padded input :
Traceback (most recent call last):
File "train.py", line 292, in
main()
File "train.py", line 240, in main
evaluate(env, agent, video, args.num_eval_episodes, L, step,args)
File "train.py", line 116, in evaluate
run_eval_loop(sample_stochastically=False)
File "train.py", line 101, in run_eval_loop
action = agent.select_action(obs)
File "/home/atharva/CURL/curl/curl_sac.py", line 355, in select_action
obs, compute_pi=False, compute_log_pi=False
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/atharva/CURL/curl/curl_sac.py", line 82, in forward
obs = self.encoder(obs, detach=detach_encoder)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/atharva/CURL/curl/encoder.py", line 62, in forward
h = self.forward_conv(obs)
File "/home/atharva/CURL/curl/encoder.py", line 55, in forward_conv
conv = torch.relu(self.convsi)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Calculated padded input size per channel: (21 x 1). Kernel size: (3 x 3). Kernel size can't be greater than actual input size.
5)Issue with Action in action set :
_Traceback (most recent call last):
File "train.py", line 292, in
main()
File "train.py", line 240, in main
evaluate(env, agent, video, args.num_eval_episodes, L, step,args)
File "train.py", line 116, in evaluate
run_eval_loop(sample_stochastically=False)
File "train.py", line 102, in run_eval_loop
obs, reward, done, _ = env.step(action)
File "/home/atharva/CURL/curl/utils.py", line 226, in step
obs, reward, done, info = self.env.step(action)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 234, in step
return self.env.step(action)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 280, in step
observation, reward, done, info = self.env.step(action)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 268, in step
observation, reward, done, info = self.env.step(action)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gym/core.py", line 268, in step
observation, reward, done, info = self.env.step(action)
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_env.py", line 177, in step
_, reward, done, info = self._env.step(self.get_actions())
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_env_core.py", line 160, in step
for a in action
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_env_core.py", line 160, in
for a in action
File "/home/atharva/anaconda3/envs/curl/lib/python3.6/site-packages/gfootball/env/football_action_set.py", line 217, in named_action_from_action_set
assert False, "Action {} not found in action set".format(action)
AssertionError: Action -0.049828674644231796 not found in action set
It just seems trying to solve one gives a rise to another one. Can you please let me know how could these issues be resolved ?
Thank you.
Hi, thank you for your code. I'm a little bit confused of the infinit bootstrap in
Line 269 in 8416d6e
FileNotFoundError: [Errno 2] No such file or directory: './tmp/cartpole/cartpole-swingup-06-22-im84-b128-s202969-pixel/args.json'
Hi, can we integrate the update_critic function and update_cpc function by adding the critic_loss and cpc_loss together?
Meanwhile, we only need two optimizers.
Is it feasible?
self.cpc_optimizer = torch.optim.Adam([self.CURL.W], lr=encoder_lr)
self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=critic_lr, betas=(critic_beta, 0.999))
loss = critic_loss + cpc_loss
loss.backward()
self.critic_optimizer.step()
self.cpc_optimizer.step()
Hi, thanks for sharing your code. I want to ask what is the configuration of the machine on which the code is running
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.