kairproject / kair_algorithms_draft Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 10.0 2.67 MB

Reinforcement learning algorithms for robot control tasks

CMake 3.60% Makefile 0.29% Python 94.26% Dockerfile 1.52% Shell 0.34%

python reinforcement-learning robotics

kair_algorithms_draft's People

Contributors

Stargazers

Watchers

Forkers

mch5048 pjy953 gracedgl focusssss wassname siweiyong wwchung91 julie-nuaa rl-code-lib maybemind

kair_algorithms_draft's Issues

segment tree size when defining PirioritizedReplayBufferfD

obs doesn't change at the first episode

Add random seed to configurations in wandb

강화학습에서 random seed도 성능에 영향을 미치기 때문에 wandb에 seed에 대한 정보도 포함되면 좋을 듯 합니다.

Add target_position to state

Should add target_position to state to let agent know the goal.

Step processing time of ROS and Python differ

ROS publish, subscribe processing speed > Python processing speed

Episode-step exceeding max-episode-steps

Support for time-step management during training

현재 episode 별로 학습 진행이 되고 param save도 그렇게 되는데, time-step 별로 적용하는게 필요해보입니다.

TD3 actor update 시에 진행 time-step에 대한 조건 존재.
대부분의 논문들이 time-step(x축)을 기준으로 성능 비교

Duplicate random seed location

random seed를 지정해주는 위치가 2군데 있는데 바깥으로 몰아버리는게 낫지 않을까 합니다.

kair_algorithms_draft/scripts/run_lunarlander_continuous.py

Lines 56 to 58 in 7f4756a

    
           env.seed(args.seed) 
        
           torch.manual_seed(args.seed) 
        
           np.random.seed(args.seed)

kair_algorithms_draft/scripts/algorithms/common/noise.py

Line 26 in 7f4756a

random.seed(seed)

Success_count doesn't reset in start of the episode

How to run sacfd with open_manipulator in Gazebo?

Dear All:

I have been trying to run sacfd in Gazebo environment. To do this I have to firstly copy a sacfd (e.g. scripts/config/agent/lunarlander_continuous_v2/sacfd.py) into scripts/config/agent/open_manipulator_reacher_v0/sacfd.py

However, when I run the sacfd with the command "python run_open_manipulator_reacher_v0.py --algo sacfd --off-render", I got the following error:
Traceback (most recent call last):
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/run_open_manipulator_reacher_v0.py", line 78, in
main()
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/run_open_manipulator_reacher_v0.py", line 74, in main
agent.train()
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/sac/agent.py", line 372, in train
loss = self.update_model(experiences)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/fd/sac_agent.py", line 88, in update_model
new_actions, log_prob, pre_tanh_value, mu, std = self.actor(states)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/common/networks/mlp.py", line 186, in forward
mu, _, std = super(TanhGaussianDistParams, self).get_dist_params(x)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/common/networks/mlp.py", line 152, in get_dist_params
hidden = super(GaussianDist, self).get_last_activation(x)
File "/home/yz/research/robotics/yumi_ws/src/kair_algorithms_draft/scripts/algorithms/common/networks/mlp.py", line 79, in get_last_activation
x = self.hidden_activation(hidden_layer(x))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [64 x 11], m2: [25 x 256] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:136

I found the problem is the default data has dimension 11 but the observation space of the environment has the dimension 25. The default demo data is in "scripts/data/reacher_demo.pkl"

So I suspect the default demo data is wrong. I used the the script run_open_manipulator_demo.py to regenerate the demo data, but the format is in json and the agent training requires pkl data file?

Thank you!

Implement test function for ddpg

No available test function in ddpg now.

Add action boundary value from environment

현재 min_action, max_action 범위가 하드 코딩되어 전달이 되는데 environment로부터 받아서 관리하거나 scripts/examples/env/ 에서 받도록 변경하면 좋을 것 같습니다.

kair_algorithms_draft/scripts/algorithms/ddpg/agent.py

Line 89 in 7f4756a

selected_action = torch.clamp(selected_action, -1.0, 1.0)

Add docker image for travis-CI test

ImportError: No module named pykdl_utils.kdl_kinematics

Dear All:

After running the command "rosrun kair_algorithms run_open_manipulator_reacher_v0.py --algo ddpgfd --off-render --log", I got the error:

Traceback (most recent call last):
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/run_open_manipulator_reacher_v0.py", line 16, in
from envs.open_manipulator.open_manipulator_reacher_env import OpenManipulatorReacherEnv
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/init.py", line 1, in
from .open_manipulator import OpenManipulatorReacherEnv
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/open_manipulator/init.py", line 1, in
from .open_manipulator_reacher_env import OpenManipulatorReacherEnv
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/open_manipulator/open_manipulator_reacher_env.py", line 7, in
from ros_interface import (
File "/home/local/ha3/mi_ws/src/kair_algorithms_draft/scripts/envs/open_manipulator/ros_interface.py", line 16, in
from pykdl_utils.kdl_kinematics import KDLKinematics
ImportError: No module named pykdl_utils.kdl_kinematics

Thanks for helping!

param 저장 시에 memory save 기능 추가

DDPG performance for LunarLanderContinuous,v2

@Curt-Park @MrSyee DDPG 학습 시 actor_loss는 마이너스로 계속 내려가고 critic_loss는 2000~3000대에서 머무르면서 score variation이 매우 심한데 원래 이런가요?

baseline 알고리즘 성능이랑 비교가 필요하다고 생각이 듭니다.

https://github.com/medipixel/reinforcement_learning_examples 기준으로 확인했고 episode는 300회 정도 학습시켰습니다.

Demo collector initialization jerk issue

LSTM for demonstration 구현

http://proceedings.mlr.press/v87/golemo18a.html 논문을 참고하여 demonstration 시에 policy adaptation을 해줄 수 있는 LSTM 네트워크 구현 진행.

	env.seed(args.seed)
	torch.manual_seed(args.seed)
	np.random.seed(args.seed)