kairproject / kair_algorithms_draft Goto Github PK
View Code? Open in Web Editor NEWReinforcement learning algorithms for robot control tasks
Reinforcement learning algorithms for robot control tasks
proceeding PR #56
강화학습에서 random seed도 성능에 영향을 미치기 때문에 wandb에 seed에 대한 정보도 포함되면 좋을 듯 합니다.
Should add target_position to state to let agent know the goal.
현재 episode 별로 학습 진행이 되고 param save도 그렇게 되는데, time-step 별로 적용하는게 필요해보입니다.
random seed를 지정해주는 위치가 2군데 있는데 바깥으로 몰아버리는게 낫지 않을까 합니다.
kair_algorithms_draft/scripts/run_lunarlander_continuous.py
Lines 56 to 58 in 7f4756a
Dear All:
I have been trying to run sacfd in Gazebo environment. To do this I have to firstly copy a sacfd (e.g. scripts/config/agent/lunarlander_continuous_v2/sacfd.py) into scripts/config/agent/open_manipulator_reacher_v0/sacfd.py
However, when I run the sacfd with the command "python run_open_manipulator_reacher_v0.py --algo sacfd --off-render", I got the following error:
Traceback (most recent call last):
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/run_open_manipulator_reacher_v0.py", line 78, in
main()
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/run_open_manipulator_reacher_v0.py", line 74, in main
agent.train()
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/sac/agent.py", line 372, in train
loss = self.update_model(experiences)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/fd/sac_agent.py", line 88, in update_model
new_actions, log_prob, pre_tanh_value, mu, std = self.actor(states)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/common/networks/mlp.py", line 186, in forward
mu, _, std = super(TanhGaussianDistParams, self).get_dist_params(x)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/common/networks/mlp.py", line 152, in get_dist_params
hidden = super(GaussianDist, self).get_last_activation(x)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/common/networks/mlp.py", line 79, in get_last_activation
x = self.hidden_activation(hidden_layer(x))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [64 x 11], m2: [25 x 256] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:136
I found the problem is the default data has dimension 11 but the observation space of the environment has the dimension 25. The default demo data is in "scripts/data/reacher_demo.pkl"
So I suspect the default demo data is wrong. I used the the script run_open_manipulator_demo.py to regenerate the demo data, but the format is in json and the agent training requires pkl data file?
Thank you!
No available test function in ddpg now.
현재 min_action, max_action 범위가 하드 코딩되어 전달이 되는데 environment로부터 받아서 관리하거나 scripts/examples/env/
에서 받도록 변경하면 좋을 것 같습니다.
Dear All:
After running the command "rosrun kair_algorithms run_open_manipulator_reacher_v0.py --algo ddpgfd --off-render --log", I got the error:
Traceback (most recent call last):
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/run_open_manipulator_reacher_v0.py", line 16, in
from envs.open_manipulator.open_manipulator_reacher_env import OpenManipulatorReacherEnv
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/init.py", line 1, in
from .open_manipulator import OpenManipulatorReacherEnv
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/open_manipulator/init.py", line 1, in
from .open_manipulator_reacher_env import OpenManipulatorReacherEnv
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/open_manipulator/open_manipulator_reacher_env.py", line 7, in
from ros_interface import (
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/open_manipulator/ros_interface.py", line 16, in
from pykdl_utils.kdl_kinematics import KDLKinematics
ImportError: No module named pykdl_utils.kdl_kinematics
Thanks for helping!
should implement dict -> pkl code to use demonstration data(real) for training(sim)
Low training score result came out.
https://app.wandb.ai/kairproject/kair_algorithms_draft-scripts/runs/yp4ye7fc?workspace=user-kairproject
resume training 시에 저장된 param을 사용하더라도 memory를 불러오지 않으면 동일한 조건에서 resume 할 수 없게 된다.
@Curt-Park @MrSyee DDPG 학습 시 actor_loss
는 마이너스로 계속 내려가고 critic_loss
는 2000~3000대에서 머무르면서 score variation이 매우 심한데 원래 이런가요?
baseline 알고리즘 성능이랑 비교가 필요하다고 생각이 듭니다.
https://github.com/medipixel/reinforcement_learning_examples 기준으로 확인했고 episode는 300회 정도 학습시켰습니다.
http://proceedings.mlr.press/v87/golemo18a.html 논문을 참고하여 demonstration 시에 policy adaptation을 해줄 수 있는 LSTM 네트워크 구현 진행.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.