Hello, i download the demo for SAC and i'm trying to train from scratch. <p di

error:"Gripper position is nan",about deep-reinforcement-learning-book/chapter16-robot-learning-in-simulation

Comments (22)

luweiqing commented on August 18, 2024

Hello，
I have the same problem during the training process，Do you solve the problem？
Can you give me some advice?

Thank you in advance.
Lu

from chapter16-robot-learning-in-simulation.

jianye0428 commented on August 18, 2024

Sorry, i switched on vrep 3.6.2, but it seems still did not work.

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

Me,too.
and if we switch another control mode "end_position" or giving a random target in the control mode”joint_velocity“,the SAC net is broken and the reward function value always unstable

from chapter16-robot-learning-in-simulation.

quantumiracle commented on August 18, 2024

Hi,
The sawyer simulation in V-REP seems to be unstable sometimes, which leads to a broken gripper during exploration. This is the reason why I have the code to restart the environment every 20 episodes during training, so that from my side the agent can smoothly finish the training process with thousands of episodes.

So could you @jianye0428 check if the gripper is still broken after the restarting code line above, via visualization of the robot scene?

I'm not sure if this problem is caused by the package version. To make sure this project works well, we recommend to use V-REP 3.6.2 and a compatible PyRep that we forked here, rather than directly installing the latest version.

As for the "end_position" mode @luweiqing, this project is not a solution for that. You may need to change the code a bit and fine-tune it to make that work.

Best,
Zihan

from chapter16-robot-learning-in-simulation.

jianye0428 commented on August 18, 2024

hello,
thanks for reply. I'll have a try and feedback later.
best,
Jian

from chapter16-robot-learning-in-simulation.

jianye0428 commented on August 18, 2024

Hello,
i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.

Best,
Jian

from chapter16-robot-learning-in-simulation.

quantumiracle commented on August 18, 2024

Hello,
i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.

Best,
Jian

Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

Hi,
Thank you for your sincere reply,
I have solved the error "Gripper position is nan" during thousands of eposides
I have trained 80000s eposide but the reward value is always unstable and is not convergence。 the success rate is very low .
can I need more eposide training？？
can you give me some advice about how many eposides do I need to train the value function to be stable.

Best

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

as for the error "Gripper position is nan",it is because the output of policynetwork is [nan,nan,nan,nan,nan,nan,nan], it cause that
if math.isnan(ax): # capture the broken gripper cases during exploration
print('Gripper position is nan.')
self.reinit()

from chapter16-robot-learning-in-simulation.

jianye0428 commented on August 18, 2024

Hello,
i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.
Best,
Jian

Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?

I think it still broken when the environment is reinitialized. In my training process the reward turned out to be zero when the episodes exceeds 20. And I get the same error.

best,
Jian

from chapter16-robot-learning-in-simulation.

jianye0428 commented on August 18, 2024

Hi,
Thank you for your sincere reply,
I have solved the error "Gripper position is nan" during thousands of eposides
I have trained 80000s eposide but the value is always unstable and is not convergence。 the success rate is very low .
can I need more eposide training？？
can you give me some advice about how many eposides do I need to train the value function to be stable.

Best

Lu

Hello,

I tried with the forked pyrep package but still with the same error.

Can I ask how did you solve the problem? Did you just use the forked pyrep or have you changed other things ?

Bests,
Jian

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

Hi.
I changed robot from sawyer to baxter
the reward value is always unstable and is not convergence even though I trained above 80000 eposides

from chapter16-robot-learning-in-simulation.

quantumiracle commented on August 18, 2024

Hi.
I changed robot from sawyer to baxter
the reward value is always unstable and is not convergence even though I trained above 80000 eposides

Did you change the environment script after you change the robot from Sawyer to Baxter? Since the environment is basically customized for Sawyer, I'm not sure if it could directly work with Baxter. As for Sawyer, I only take thousands of episodes to have some primary learning results as the learning curve in Readme.

from chapter16-robot-learning-in-simulation.

quantumiracle commented on August 18, 2024

Hello,
i have tried with vrep 3.6.2 and pyrep package forked, but i still did not work.
Best,
Jian

Can you check if the gripper is broken during exploration and it's still broken after restarting the environment with code ?

I think it still broken when the environment is reinitialized. In my training process the reward turned out to be zero when the episodes exceeds 20. And I get the same error.

best,
Jian

If so, I would say different cases happen after the reinitialization from your side and my side. The gripper is complete and works well after reinitialization even if it's broken before in my tests. I currently do not know what causes this difference.

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

Today that error "Gripper position is nan“ appeared again，I am sure that I changed the enviroment script. on the other hand, I set the target object target for random ， is this project a solution for that？
or what I should make any other changes to this network？

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

I find the reason why the error comes out.
the code is -= not =-

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

I find that reducing the number of threads can avoid the error :"Gripper position is nan“

from chapter16-robot-learning-in-simulation.

quantumiracle commented on August 18, 2024

I find that reducing the number of threads can avoid the error :"Gripper position is nan“

You mean the process? How many did you use when you meet the error?

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

I find that reducing the number of threads can avoid the error :"Gripper position is nan“

You mean the process? How many did you use when you meet the error?

yes，I use more than 4 processes when I meet the error,And as the number of training sessions increases, the value function will converge when I use only 1 process

from chapter16-robot-learning-in-simulation.

luweiqing commented on August 18, 2024

And I have a question, how to summarize the final training results of multi-threaded training？ A3C is multi-threaded sampling, but still single-threaded training。

from chapter16-robot-learning-in-simulation.

quantumiracle commented on August 18, 2024

And I have a question, how to summarize the final training results of multi-threaded training？ A3C is multi-threaded sampling, but still single-threaded training。

If you use multi-threading, the variables and objects can be shared across threads within a process, in which case you can log the results easily by reading these shared objects; if you use multi-process, a queue can be used for sending information across processes.

from chapter16-robot-learning-in-simulation.

Mr-Trigg commented on August 18, 2024

Hi, I've been trying to train the sac_learn file but I was getting the "Gripper position is nan" error. I tried the suggestions here, I was using 4 parallel process, then 2, both cases crashed with the gripper position thing, now I've been running the training with just 1 process, has been 13 hrs by now, episode 24k+ and the episode reward still around -3 ~ -2, sometimes there's a 7 but is quite rare.

I'm using Ubuntu 18.04 as OS, python3.6.9, v-rep pro edu v3.6.2, this github pyrep version and pytorch 1.8.1 with CUDA 11.1 with a RTX 2080 super as gpu and a ryzen 7 3800x as cpu

from chapter16-robot-learning-in-simulation.

error:"Gripper position is nan" about chapter16-robot-learning-in-simulation HOT 22 CLOSED

Comments (22)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent