roboterax / humanoid-gym Goto Github PK

Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695

Home Page: https://sites.google.com/view/humanoid-gym/

Python 100.00%

ai artificial-intelligence control-systems humanoid-robot locomotion machine-learning robotics

humanoid-gym's Introduction

Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer

Project Page | arXiv | Twitter

Xinyang Gu*, Yen-Jen Wang*, Jianyu Chen†

*: Equal contribution. Project Co-lead., †: Corresponding Author.

Humanoid-Gym is an easy-to-use reinforcement learning (RL) framework based on Nvidia Isaac Gym, designed to train locomotion skills for humanoid robots, emphasizing zero-shot transfer from simulation to the real-world environment. Humanoid-Gym also integrates a sim-to-sim framework from Isaac Gym to Mujoco that allows users to verify the trained policies in different physical simulations to ensure the robustness and generalization of the policies.

This codebase is verified by RobotEra's XBot-S (1.2-meter tall humanoid robot) and XBot-L (1.65-meter tall humanoid robot) in a real-world environment with zero-shot sim-to-real transfer.

Features

1. Humanoid Robot Training

This repository offers comprehensive guidance and scripts for the training of humanoid robots. Humanoid-Gym features specialized rewards for humanoid robots, simplifying the difficulty of sim-to-real transfer. In this repository, we use RobotEra's XBot-L as a primary example. It can also be used for other robots with minimal adjustments. Our resources cover setup, configuration, and execution. Our goal is to fully prepare the robot for real-world locomotion by providing in-depth training and optimization.

Comprehensive Training Guidelines: We offer thorough walkthroughs for each stage of the training process.
Step-by-Step Configuration Instructions: Our guidance is clear and succinct, ensuring an efficient setup process.
Execution Scripts for Easy Deployment: Utilize our pre-prepared scripts to streamline the training workflow.

2. Sim2Sim Support

We also share our sim2sim pipeline, which allows you to transfer trained policies to highly accurate and carefully designed simulated environments. Once you acquire the robot, you can confidently deploy the RL-trained policies in real-world settings.

Our simulator settings, particularly with Mujoco, are finely tuned to closely mimic real-world scenarios. This careful calibration ensures that the performances in both simulated and real-world environments are closely aligned. This improvement makes our simulations more trustworthy and enhances our confidence in their applicability to real-world scenarios.

3. Denoising World Model Learning

Robotics: Science and Systems (RSS), 2024 (Best Paper Award Finalist)

Paper | Twitter

Xinyang Gu*, Yen-Jen Wang*, Xiang Zhu*, Chengming Shi*, Yanjiang Guo, Yichen Liu, Jianyu Chen†

*: Equal contribution. Project Co-lead., †: Corresponding Author.

Denoising World Model Learning(DWL) presents an advanced sim-to-real framework that integrates state estimation and system identification. This dual-method approach ensures the robot's learning and adaptation are both practical and effective in real-world contexts.

Enhanced Sim-to-real Adaptability: Techniques to optimize the robot's transition from simulated to real environments.
Improved State Estimation Capabilities: Advanced tools for precise and reliable state analysis.

Perceptive Locomotion Learning for Humanoid Robots (Coming Soon!)

Twitter

Dexterous Hand Manipulation (Coming Soon!)

Twitter

Installation

Generate a new Python virtual environment with Python 3.8 using conda create -n myenv python=3.8.
For the best performance, we recommend using NVIDIA driver version 525 sudo apt install nvidia-driver-525. The minimal driver version supported is 515. If you're unable to install version 525, ensure that your system has at least version 515 to maintain basic functionality.
Install PyTorch 1.13 with Cuda-11.7:
- conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
Install numpy-1.23 with conda install numpy=1.23.
Install Isaac Gym:
- Download and install Isaac Gym Preview 4 from https://developer.nvidia.com/isaac-gym.
- cd isaacgym/python && pip install -e .
- Run an example with cd examples && python 1080_balls_of_solitude.py.
- Consult isaacgym/docs/index.html for troubleshooting.
Install humanoid-gym:
- Clone this repository.
- cd humanoid-gym && pip install -e .

Usage Guide

Examples

# Launching PPO Policy Training for 'v1' Across 4096 Environments
# This command initiates the PPO algorithm-based training for the humanoid task.
python scripts/train.py --task=humanoid_ppo --run_name v1 --headless --num_envs 4096

# Evaluating the Trained PPO Policy 'v1'
# This command loads the 'v1' policy for performance assessment in its environment. 
# Additionally, it automatically exports a JIT model, suitable for deployment purposes.
python scripts/play.py --task=humanoid_ppo --run_name v1

# Implementing Simulation-to-Simulation Model Transformation
# This command facilitates a sim-to-sim transformation using exported 'v1' policy.
python scripts/sim2sim.py --load_model /path/to/logs/XBot_ppo/exported/policies/policy_1.pt

# Run our trained policy
python scripts/sim2sim.py --load_model /path/to/logs/XBot_ppo/exported/policies/policy_example.pt

1. Default Tasks

humanoid_ppo
- Purpose: Baseline, PPO policy, Multi-frame low-level control
- Observation Space: Variable $(47 \times H)$ dimensions, where $H$ is the number of frames
- $[O_{t-H} ... O_t]$
- Privileged Information: $73$ dimensions
humanoid_dwl (coming soon)

2. PPO Policy

Training Command: For training the PPO policy, execute:

python humanoid/scripts/train.py --task=humanoid_ppo --load_run log_file_path --name run_name

Running a Trained Policy: To deploy a trained PPO policy, use:

python humanoid/scripts/play.py --task=humanoid_ppo --load_run log_file_path --name run_name

By default, the latest model of the last run from the experiment folder is loaded. However, other run iterations/models can be selected by adjusting load_run and checkpoint in the training config.

3. Sim-to-sim

Please note: Before initiating the sim-to-sim process, ensure that you run play.py to export a JIT policy.
Mujoco-based Sim2Sim Deployment: Utilize Mujoco for executing simulation-to-simulation (sim2sim) deployments with the command below:
```
python scripts/sim2sim.py --load_model /path/to/export/model.pt
```

4. Parameters

CPU and GPU Usage: To run simulations on the CPU, set both --sim_device=cpu and --rl_device=cpu. For GPU operations, specify --sim_device=cuda:{0,1,2...} and --rl_device={0,1,2...} accordingly. Please note that CUDA_VISIBLE_DEVICES is not applicable, and it's essential to match the --sim_device and --rl_device settings.
Headless Operation: Include --headless for operations without rendering.
Rendering Control: Press 'v' to toggle rendering during training.
Policy Location: Trained policies are saved in humanoid/logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt.

5. Command-Line Arguments

For RL training, please refer to humanoid/utils/helpers.py#L161. For the sim-to-sim process, please refer to humanoid/scripts/sim2sim.py#L169.

Code Structure

Every environment hinges on an env file (legged_robot.py) and a configuration file (legged_robot_config.py). The latter houses two classes: LeggedRobotCfg (encompassing all environmental parameters) and LeggedRobotCfgPPO (denoting all training parameters).
Both env and config classes use inheritance.
Non-zero reward scales specified in cfg contribute a function of the corresponding name to the sum-total reward.
Tasks must be registered with task_registry.register(name, EnvClass, EnvConfig, TrainConfig). Registration may occur within envs/__init__.py, or outside of this repository.

Add a new environment

The base environment legged_robot constructs a rough terrain locomotion task. The corresponding configuration does not specify a robot asset (URDF/ MJCF) and no reward scales.

If you need to add a new environment, create a new folder in the envs/ directory with a configuration file named <your_env>_config.py. The new configuration should inherit from existing environment configurations.
If proposing a new robot:
- Insert the corresponding assets in the resources/ folder.
- In the cfg file, set the path to the asset, define body names, default_joint_positions, and PD gains. Specify the desired train_cfg and the environment's name (python class).
- In the train_cfg, set the experiment_name and run_name.
If needed, create your environment in <your_env>.py. Inherit from existing environments, override desired functions and/or add your reward functions.
Register your environment in humanoid/envs/__init__.py.
Modify or tune other parameters in your cfg or cfg_train as per requirements. To remove the reward, set its scale to zero. Avoid modifying the parameters of other environments!
If you want a new robot/environment to perform sim2sim, you may need to modify humanoid/scripts/sim2sim.py:
- Check the joint mapping of the robot between MJCF and URDF.
- Change the initial joint position of the robot according to your trained policy.

Troubleshooting

Observe the following cases:

# error
ImportError: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

# solution
# set the correct path
export LD_LIBRARY_PATH="~/miniconda3/envs/your_env/lib:$LD_LIBRARY_PATH" 

# OR
sudo apt install libpython3.8

# error
AttributeError: module 'distutils' has no attribute 'version'

# solution
# install pytorch 1.12.0
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

# error, results from libstdc++ version distributed with conda differing from the one used on your system to build Isaac Gym
ImportError: /home/roboterax/anaconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.20` not found (required by /home/roboterax/carbgym/python/isaacgym/_bindings/linux64/gym_36.so)

# solution
mkdir ${YOUR_CONDA_ENV}/lib/_unused
mv ${YOUR_CONDA_ENV}/lib/libstdc++* ${YOUR_CONDA_ENV}/lib/_unused

Citation

Please cite the following if you use this code or parts of it:

@article{gu2024humanoid,
  title={Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer},
  author={Gu, Xinyang and Wang, Yen-Jen and Chen, Jianyu},
  journal={arXiv preprint arXiv:2404.05695},
  year={2024}
}

@inproceedings{gu2024advancing,
  title={Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning},
  author={Gu, Xinyang and Wang, Yen-Jen and Zhu, Xiang and Shi, Chengming and Guo, Yanjiang and Liu, Yichen and Chen, Jianyu},
  booktitle={Robotics: Science and Systems},
  year={2024},
  url={https://enriquecoronadozu.github.io/rssproceedings2024/rss20/p058.pdf}
}

Acknowledgment

The implementation of Humanoid-Gym relies on resources from legged_gym and rsl_rl projects, created by the Robotic Systems Lab. We specifically utilize the LeggedRobot implementation from their research to enhance our codebase.

Any Questions?

If you have any more questions, please contact [email protected] or create an issue in this repository.

humanoid-gym's People

Contributors

Stargazers

Watchers

Forkers

zlw21gxy kaneikiken rafaken hdadong gaiyi7788 ironshannon xinqiangyu jianghan1913 litaobate bruinxiong yassine-cherni mrzhou109 linecode iantonga mizrex lvanroye decdestiny001 whtqh jhzsyang evgps djj-gyx piao-0429 zkangning studeas wanghg1992 duisir ltinphan kscalelabs kk-zhou topsailcbd chenci107 drawtea1234 jinyin-z19 njjustin beiliwzl tomtang502 robertnann p1terq hktonylee leonkding artificialzeng jufrick tom0brien hshi74 nigel-gu lin5five5 xiaohai-ai yanxxx henryhenry-space liuyt deedive superfhwl julianyu123456 levineyang nooshin-kohli eddy-yang fitz-liu kou-qr cyb3rwiz nubots martain-liu deansereigns zaiyangliuxixi qqsq12321 li3498637 wufeiafei rogercummins abelzhourobot 0iui0 wernerb43 yuchuanlu zi-ang-cao shileishan aibo-ryan changzhengwu lingdomuw q5228 kwonnahyun0625 maanecho humarobot mtchen2016 jimys gitshohoku biomechatronics001 cvjie mjy2002 kunni918 arxxyr vvyric xiongy24 toonasinensis linzhuyue brunoscaglione bluet-neurorobotics huashuichengbing hljmssjg xzbreeze felixpun shiwj16 rl-learning

humanoid-gym's Issues

Sim 2 sim transfer

Dear author.

Recently I ‘ve found out this amazing project to enhance sim 2 real transfer. Well reviewed your code and I have some question that you guys have done for sim 2 sim framework.

First of all, I also tested sim 2 sim transfer with my own robot model(ledgged based robot), however I ve found out that It is really hard to safely control the robot on different simulation because physics engine differes I guess. Is there specific way to enhance sim 2 sim transfer(Isaac orbit to Mujoco) or any idea how to reduce the gap between different simulation.

I’m also wondering that you guys have different control term on Isaac gym environment(following the code). One is the dynamic randomization term and the other is PD control term. Which one did you use to train the policy?

Huge thanks to your open soure library again :)

question about real robot and manipulation tasks?

how do i do if i don't have the xbot real robot.
and when did you want the robot to do some manipulation task?

How does policy handle privileged observations in real-world?

Dear author.

Thank you for your awesome work, How does policy handle privileged observations in real-world?
and Why do the privileged observations have 73 dimensions in Table 3 but a different number in Table 1 in the paper?What does privileged observation involve?

load new URDF

when i change the urdf path， i found there have some .STL file. How can i get the .STL file of my own urdf file.
截图 2024-07-04 14-56-20

Hard Coded Value in legged_robot.py

Hi just realized that in the init_buffers() function the dimension of self.rigid_states is hard-coded, this might cause problems if other robots are used and has different number of bodies

https://github.com/roboterax/humanoid-gym/blob/main/humanoid/envs/base/legged_robot.py#L455

Version compatibility

Hello, I am really interested in this project and would like to have implement it.

During installation setup, you mentioned using python 3.8 in the first step, however, in step 3 it does not seem to be compatible with it, here is the output:

Could not solve for environment specs
The following packages are incompatible
├─ pin-1 is installable and it requires
│ └─ python 3.11.* , which can be installed;
└─ torchvision 0.14.1 is not installable because there are no viable options
├─ torchvision 0.14.1 would require
│ └─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
├─ torchvision 0.14.1 would require
│ └─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
├─ torchvision 0.14.1 would require
│ └─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
└─ torchvision 0.14.1 would require
└─ python >=3.9,<3.10.0a0 , which conflicts with any installable versions previously reported.

It would be greatly appreciated if we could have a further discussion regarding the entire project, please contact me if you are interested, my email address is [email protected]
Both Mandarin and English are fine with me.

Thank you in advance

All the best
Zhengting Li

how to sim2real?

Mapping between urdf and mjcf

Hey, thanks for the great contribution!
I have question regarding urdf to mjcf mapping. How did you transfer the robot model between these two formats?

The function compute_ref_state() in humanoid_env.py

The function that compute reference state is very hard to understand. Why ref_dof_pos is related to sin of phase? What does the scale1/2 stand for? Why only 2,3,4 and 8,9,10 are specified?

I really appricate your help and answer.

    def compute_ref_state(self):
        phase = self._get_phase()
        sin_pos = torch.sin(2 * torch.pi * phase)
        sin_pos_l = sin_pos.clone()
        sin_pos_r = sin_pos.clone()
        self.ref_dof_pos = torch.zeros_like(self.dof_pos)
        scale_1 = self.cfg.rewards.target_joint_pos_scale
        scale_2 = 2 * scale_1
        # left foot stance phase set to default joint pos
        sin_pos_l[sin_pos_l > 0] = 0
        self.ref_dof_pos[:, 2] = sin_pos_l * scale_1
        self.ref_dof_pos[:, 3] = sin_pos_l * scale_2
        self.ref_dof_pos[:, 4] = sin_pos_l * scale_1
        # right foot stance phase set to default joint pos
        sin_pos_r[sin_pos_r < 0] = 0
        self.ref_dof_pos[:, 8] = sin_pos_r * scale_1
        self.ref_dof_pos[:, 9] = sin_pos_r * scale_2
        self.ref_dof_pos[:, 10] = sin_pos_r * scale_1
        # Double support phase
        self.ref_dof_pos[torch.abs(sin_pos) < 0.1] = 0

        self.ref_action = 2 * self.ref_dof_pos

load new URDF

when i change the urdf path， i found there have some .STL file. How can i get the .STL file of my own urdf file.

kinematics and dynamics

Where is the code for robot kinematics and dynamics? Additionally, I would like to migrate our project, PPO, to ROS2 Gazebo for simulation. Do you have any suggestions or references?

Is the ankle kp = 15 a typo or a feature?

Hello!

In the config file I found that kp = 15 and kd = 10 for the ankles. The kp/kd pair seems mismatching compared to other joints. Is it a typo for kp = 150 or an intentional design?

Control a little 10-dof robot

Dear Author
Appreciating for your good job!Does this can be used to trainning a method to control a small 10-dof robot,which is 0.45m tall and 10 kgs weight and with "rotate-ankle" joint missing?
I have tried this, and the "train" and "play" is stable, but "sim2sim" is unstable, which params I need to adjust in "humanoid_cfg.py"?

install demand

Can you provide docker support? @wangyenjen

Domain rand

I see that the code has many domain randomization Settings, as shown in the figure, but only a small part of them are used. Can such Settings ensure the robustness of sim2real

Failed to create a PhysX CUDA Context Manager. Falling back to CPU.

运行python train的命令报一下错误：
[Warning] [carb.gym.plugin] Failed to create a PhysX CUDA Context Manager. Falling back to CPU.
Physics Engine: PhysX
Physics Device: cpu
GPU Pipeline: disabled
/home/eit-robot2/anaconda3/envs/humanoid/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541702/work/aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[Error] [carb.gym.plugin] Gym cuda error: invalid resource handle: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137
[Error] [carb.gym.plugin] Must enable GPU pipeline to use state tensors
[Error] [carb.gym.plugin] Must enable GPU pipeline to use state tensors
Traceback (most recent call last):
File "scripts/train.py", line 43, in
train(args)
File "scripts/train.py", line 37, in train
env, env_cfg = task_registry.make_env(name=args.task, args=args)
File "/home/eit-robot2/Documents/humanoid-gym/humanoid/utils/task_registry.py", line 97, in make_env
env = task_class( cfg=env_cfg,
File "/home/eit-robot2/Documents/humanoid-gym/humanoid/envs/custom/humanoid_env.py", line 78, in init
super().init(cfg, sim_params, physics_engine, sim_device, headless)
File "/home/eit-robot2/Documents/humanoid-gym/humanoid/envs/base/legged_robot.py", line 80, in init
self._init_buffers()
File "/home/eit-robot2/Documents/humanoid-gym/humanoid/envs/base/legged_robot.py", line 452, in _init_buffers
self.base_euler_xyz = get_euler_xyz_tensor(self.base_quat)
File "/home/eit-robot2/Documents/humanoid-gym/humanoid/envs/base/legged_robot.py", line 51, in get_euler_xyz_tensor
r, p, w = get_euler_xyz(quat)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: CUDA driver error: an illegal memory access was encountered
感觉是没用上cuda，请大佬帮忙看看

About the simulation environment

The Isaac Gym Preview 4 version used in the project is older and has many known problems, such as contact force, coordinate system, model solution, etc.

I noticed that legged_gym has been ported to the new simulation platform orbit in January this year. Will the author consider porting in the future?

Inquiry about Including Upper Body Joints in Reinforcement Learning for Humanoid Robot

Dear Authors,

I have been exploring your humanoid robot project and have some questions about the details shown in your demo. I noticed that in the demonstration video, only the lower body joints of the robot are active, while the upper body joints remain static. I am curious about this design choice and would like to learn more.

Does the current design of the project support motion for the upper body joints? If not, what are the reasons behind this decision?
Would it be feasible if I added functionality (enabling arm swing movements, etc.) for the upper body joints within your framework and trained them using reinforcement learning? What parts of the code would need modification?

Thank you for your time and assistance. I look forward to your response.

Best Regards,
Eric Wang

Comparison between IsaacGym and Mujoco

Could you do a brief performance comparison, like GPU usage? Appreciate it.

The kp setting for ankle roll and ankle pitc

In the code, the kp setting for ankle roll and ankle pitch is 15, which completely fails to track the desired position. However, in the paper, the curves in Mujoco and on the real machine can track quite perfectly. I would like to ask if such a small kp setting is unique to Isaacgym. When transferring to a real robot, do these two joints also fail to track the desired position as they do in the simulation?

when will the code and paper of denoising World Model Learning release?

Pretrained model?

Model migration questions

Hello author! Your open source code can very well solve the walking problem of our humanoid robot. But after several days of trying, we have a problem that we haven't solved yet.
Our model can walk normally when training "plane", but it cannot walk normally when training "trimesh", and even forgets how to walk. We increased the number of iterations to 10,000 before the robot could barely walk. I would like to ask which part of the modification have we overlooked? Or what should I pay attention to when changing models?

Is there a relationship between the target feet height and the initial robot joint position and the reference joint position?

Hi, how to set self.cfg.rewards.target_feet_height value ? Is there a relationship between the target feet height and the initial robot joint position and the reference joint position ? and i find 0.05 in code feet_z = self.rigid_state[:, self.feet_indices, 2] - 0.05 , what's it means ? Looking forward to your help！

why the env_frictions and body_mass term in privileged_obs_buf is zero?

self.env_frictions = torch.zeros(self.num_envs, 1, dtype=torch.float32, device=self.device) self.body_mass = torch.zeros(self.num_envs, 1, dtype=torch.float32, device=self.device, requires_grad=False)
No value was found anywhere else in the source code.

I can't evaluate the Trained PPO Policy 'v1'

Dear author：

I run this command:
python scripts/play.py --task=humanoid_ppo --run_name v1
But it doesn't seem to be running correctly.
Can you tell me how to solve this problem?

XBot sim2sim can't walk

Dear author:
Thank you very much for your work. I trained your robot in Isaac Gym and it can walk normally, but when I verified it with mujoco, I found that it can't walk normally. Is there any important modification I overlooked? I didn't make any changes to your code during this process. I look forward to your reply.