arrival-ltd / catalyst-rl-tutorial Goto Github PK

View Code? Open in Web Editor NEW

137.0 7.0 14.0 22.49 MB

Using Catalyst.RL to train a robot to perform peg-in-hole insertion in simulation.

License: MIT License

Shell 14.02% Python 85.98%

sim2real robotics deep-reinforcement-learning

catalyst-rl-tutorial's Introduction

Robotic Assembly using Deep Reinforcement Learning

Introduction

One of the most exciting advancements, that has pushed the frontier of the Artificial Intelligence (AI) in recent years, is Deep Reinforcement Learning (DRL). DRL belongs to the family of machine learning algorithms. It assumes that intelligent machines can learn from their actions similar to the way humans learn from experience. Over the recent years we could witness some impressive real-world applications of DRL. The algorithms allowed for major progress especially in the field of robotics. If you are interested in learning more about DRL, we encourage you to get familiar with the exceptional Introduction to RL by OpenAI. We believe this is the best place to start your adventure with DRL.

The goal of this tutorial is to show how you can apply DRL to solve your own robotic challenge. For the sake of this tutorial we have chosen one of the classic assembly tasks: peg-in-hole insertion. By the time you finish the tutorial, you will understand how to create a complete, end-to-end pipeline for training the robot in the simulation using DRL.

The accompanying code together with all the details of the implementation can be found in our GitHub repository.

Setup

Download the robot simulation platform, CoppeliaSim, from the official website. This tutorial is compatible with the version 4.1.0.
Setup toolkit for robot learning research, PyRep, from their github repository. PyRep library is built on top of CoppeliaSim to facilitate prototyping in python.
Create an environment for the RL agent: It could be either a simulation or a real environment. We limit ourselves to simulation for faster prototyping and training. The agent interacts with the environment to collect experience. This allows it to learn a policy which maximizes the expected (discounted) sum of future rewards and hence solves the designed task. Most RL practitioners are familiar with the OpenAI Gym environments, a toolkit with toy environments used for developing and benchmarking reinforcement learning algorithms. However, our use case, robotic assembly task, is very specific. The goal is to train a robot to perform peg-in-hole insertion. This is why we created our simulation environment in CoppeliaSim. The simulator comes with various robot manipulators and grippers. For our tutorial, we picked UR5 robot with RG2 gripper (Figure 1).

Figure 1: UR5 manipulator with a peg attached to its gripper. The mating part is placed on the ground in the scene. CoppeliaSim caters to a variety of different robotic tasks. Feel free to come up with your own challenge and design your own simulation! RLBench (the robot learning benchmark and learning environment) also provides more off-the-shelf, advanced simulation environments.
Create a gym environment wrapped around the simulation scene:

import os
import cv2
import logging
import numpy as np

from gym import Space
from gym.spaces.box import Box
from gym.spaces.dict import Dict
from pyrep import PyRep, objects

from catalyst_rl.rl.core import EnvironmentSpec
from catalyst_rl.rl.utils import extend_space


class CoppeliaSimEnvWrapper(EnvironmentSpec):
    def __init__(self, visualize=True,
                 mode="train",
                 **params):
        super().__init__(visualize=visualize, mode=mode)

        # Scene selection
        scene_file_path = os.path.join(os.getcwd(), 'simulation/UR5.ttt')

        # Simulator launch
        self.env = PyRep()
        self.env.launch(scene_file_path, headless=False)
        self.env.start()
        self.env.step()

        # Task related initialisations in Simulator
        self.vision_sensor = objects.vision_sensor.VisionSensor("Vision_sensor")
        self.gripper = objects.dummy.Dummy("UR5_target")
        self.gripper_zero_pose = self.gripper.get_pose()
        self.goal = objects.dummy.Dummy("goal_target")
        self.goal_STL = objects.shape.Shape("goal")
        self.goal_STL_zero_pose = self.goal_STL.get_pose()
        self.grasped_STL = objects.shape.Shape("Peg")
        self.stacking_area = objects.shape.Shape("Plane")
        self.vision_sensor = objects.vision_sensor.VisionSensor("Vision_sensor")

        self.step_counter = 0
        self.max_step_count = 100
        self.target_pose = None
        self.initial_distance = None
        self.image_width, self.image_height = 320, 240
        self.vision_sensor.set_resolution((self.image_width, self.image_height))
        self._history_len = 1

        self._observation_space = Dict(
                {"cam_image": Box(0, 255,
                                  [self.image_height, self.image_width, 1],
                                  dtype=np.uint8)})

        self._action_space = Box(-1, 1, (3,))
        self._state_space = extend_space(self._observation_space, self._history_len)

    @property
    def history_len(self):
        return self._history_len

    @property
    def observation_space(self) -> Space:
        return self._observation_space

    @property
    def state_space(self) -> Space:
        return self._state_space

    @property
    def action_space(self) -> Space:
        return self._action_space

    def step(self, action):
        done = False
        info = {}
        prev_distance_to_goal = self.distance_to_goal()

        # Make a step in simulation
        self.apply_controls(action)
        self.env.step()
        self.step_counter += 1

        # Reward calculations
        success_reward = self.success_check()
        distance_reward = (prev_distance_to_goal - self.distance_to_goal()) / self.initial_distance

        reward = distance_reward + success_reward

        # Check reset conditions
        if self.step_counter > self.max_step_count:
            done = True
            logging.info('--------Reset: Timeout--------')
        elif self.distance_to_goal() > 0.8:
            done = True
            logging.info('--------Reset: Too far from target--------')
        elif self.collision_check():
            done = True
            logging.info('--------Reset: Collision--------')

        return self.get_observation(), reward, done, info

    def reset(self):
        logging.info("Episode reset...")
        self.step_counter = 0
        self.env.stop()
        self.env.start()
        self.env.step()
        self.setup_scene()
        observation = self.get_observation()
        return observation
# -------------- all methods above are required for any Gym environment, everything below is env-specific --------------

    def distance_to_goal(self):
        goal_pos = self.goal.get_position()
        tip_pos = self.gripper.get_position()
        return np.linalg.norm(np.array(tip_pos) - np.array(goal_pos))

    def setup_goal(self):
        goal_position = self.goal_STL_zero_pose[:3]
        # 2D goal randomization
        self.target_pose = [goal_position[0] + (2 * np.random.rand() - 1.) * 0.1,
                            goal_position[1] + (2 * np.random.rand() - 1.) * 0.1,
                            goal_position[2]]
        self.target_pose = np.append(self.target_pose,
                                     self.goal_STL_zero_pose[3:]).tolist()
        self.goal_STL.set_pose(self.target_pose)

        # Randomizing the RGB of the goal and the plane
        rgb_values_goal = list(np.random.rand(3,))
        rgb_values_plane = list(np.random.rand(3,))
        self.goal_STL.set_color(rgb_values_goal)
        self.stacking_area.set_color(rgb_values_plane)

        self.initial_distance = self.distance_to_goal()

    def setup_scene(self):
        self.setup_goal()
        self.gripper.set_pose(self.gripper_zero_pose)

    def get_observation(self):
        cam_image = self.vision_sensor.capture_rgb()
        gray_image = np.uint8(cv2.cvtColor(cam_image, cv2.COLOR_BGR2GRAY) * 255)
        obs_image = np.expand_dims(gray_image, axis=2)
        return {"cam_image": obs_image}

    def collision_check(self):
        return self.grasped_STL.check_collision(
            self.stacking_area) or self.grasped_STL.check_collision(self.goal_STL)

    def success_check(self):
        success_reward = 0.
        if self.distance_to_goal() < 0.01:
            success_reward = 0.01
            logging.info('--------Success state--------')
        return success_reward

    def apply_controls(self, action):
        gripper_position = self.gripper.get_position()
        # predicted action is in range (-1, 1) so we are normalizing it to physical units
        new_position = [gripper_position[i] + (action[i] / 200.) for i in range(3)]
        self.gripper.set_position(new_position)

For our reinforcement learning project we use Catalyst RL, a distributed framework for reproducible RL research. This is just one of the elements of the marvellous Catalyst project. Catalyst is a PyTorch ecosystem framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation and codebase reuse. This means that the user can seamlessly run training loop with metrics, model checkpointing, advanced logging and distributed training support without the boilerplate code. We strongly encourage you to get familiar with the Intro to Catalyst and incorporating the framework into your daily work!

We reuse its general Catalyst RL environment (EnvironmentSpec) class to create our custom environment. By inheriting from the EnvironmentSpec, you can quickly design your own environment, be it an Atari game, classic control task or robotic simulation. Finally, we specify states/observations, actions and rewards using OpenAI's gym spaces type.

A brief summary of the `CoppeliaSimEnvWrapper` in `src/env.py`

This class wraps around the general RL environment class to launch the CoppeliaSim with our custom scene. Additionally, in the beginning of every episode, it initialises the properties of the mating part: 2D position in the workspace (setup_goal() method), as well as its colour.

The environment wrapper contains following methods:

get_observation(), capture a grayscale image as an observation.
distance_to_goal(), compute the distance between the target and current position. The distance is used in reward design.
success_check(), check whether the goal state is reached. If yes, significantly boost agent's reward.
collision_check(), check whether an agent collided with any object.

Episode termination occurs when the robot gets too far from the target, collides with any object in the environment or exceeds the maximum number of time steps. Those conditions are specified at the end of step() method and are checked at each step taken in the environment by the agent. Once the episode terminates, the whole cycle is repeated for the next episode.

Defining the RL algorithm

So far we have created an environment and specified how the agent can act (action space) and what the agent observes (observation space). But the intelligence of the robot is determined by the neural network. This "brain" of the robot is being trained using Deep Reinforcement Learning. Depending on the modality of the input (defined in self.observation_space property of the environment wrapper) , the architecture of agent's brain changes. It could be a multi-layer perceptron (MLP) or a convolutional neural network (CNN). Catalyst provides an easy way to configure an agent using a YAML file. Additionally, it provides implementations of state-of-the-art RL algorithms like PPO, DDPG, TD3, SAC etc. One could pick the type of the algorithm by changing algorithm: variable in configs/config.yml. The hyper-parameters related to training can also be configured here.

In this tutorial, an off-policy, model-free RL algorithm TD3 is used.

Figure 2: Architecture of the actor and critic in our TD3 algorithm.

As depicted in Figure 2, the actor and critic(s) (TD3 concurrently learns two value networks) are modelled as agent classes in Catalyst. We customize them and configure the config file by setting agent: UR5Actor and agent: UR5StateActionCritic. The details of the neural network architecture for both actor and critic(s) can be configured by further editing the YAML file.

The CNN network image_net, used to process camera images, can be created as shown below. The layers of network are defined by channels , bias , dropout , normalization (booleans) and activation functions (strings). These parameters are used by the function get_convolution_net in src/network.py.

image_net_params:
   history_len: *history_len
   channels: [16, 32, 32, 32, 16]
   use_bias: True
   use_groups: False
   use_normalization: True
   use_dropout: False
   activation: ReLU

A MLP can be created using the block shown below. In our example, main_net, action_net are created in similar fashion through get_linear_net function.

features: [64, 64]
use_bias: False
use_normalization: False
use_dropout: False
activation: ReLU

Once the actor and critic network architectures are defined, we are ready to start the training.

Training

Figure 3: Samplers explore the environment and collect the data. Trainer uses the collected data to train a policy. Both the trainer and samplers are also configurable in configs/config.yml. The sampler starts with a random policy and after certain transitions, governed by save_period variable, the sampler updates its policy with the latest trainer weights. As the training progresses, the sampler keeps on gathering data collected by better policies while the trainer improves the policy until convergence. All the collected data is stored in a database. Source: Sample Efficient Ensemble Learning with Catalyst.RL.

Once the parameters of trainer and sampler (in the tutorial we use a single sampler) are configured, the training process can be started by launching scripts/run-training.sh.

This opens a tmux session, which starts sampler, trainer, database, and tensorboard to monitor the training process.

Once you clone our repository, install CoppeliaSim and PyRep, you are ready to start training. Even though Catalyst is very much focused on reproducibility, due to asynchronous manner of training we can not guarantee the convergence of the training pipeline. If you don't see a progress of the robot after ~1h of training, you can try changing random seed, noise and action step values. In any case, we encourage you to play with the parameters and alter the code to your liking.

You can launch the pipeline by running scripts/run-training.sh. The moment the training starts, the agents progress can be also monitored visually in the CoppeliaSim simulation.

Final Results

Figure 4: Reward per episode, collected over around 10k episodes.

Once the policy converges, you can either test it (run inference) in the simulator or directly on the real robot. This is can be done by editing configs/config_inference.yml and passing the path of converged policy (.pth file) to resume: variable. Finally, launch run scripts/run-inference.sh.

Inference on a real robot

About the Team

This tutorial is based on the research done at ARRIVAL by the outstanding robotics team:

The team is creating flexible factories of the future for the assembly of Arrival electric vehicles. One of the topics we are actively working on is transferring the knowledge obtained in the simulation to the physical robot. We encourage you to check out our recent research publication: Sim2Real for Peg-Hole Insertion with Eye-in-Hand Camera. If you have any questions about the contents of that tutorial or simply want to chat about robots, feel free to reach out to us!

catalyst-rl-tutorial's People

Contributors

Stargazers

Watchers

Forkers

joe-nano lianhui1993 perferom123 zfshengit r-ceph agb24 lylwy vvlad1slavv sz-zr gah703 klins101 liuzhy71 wang-feihong mando1106

catalyst-rl-tutorial's Issues

After loading the model parameters, the simulation code is stuck

After loading the model parameters, the code is stuck and no training information is output. What could be the cause?

Error during training

When {CUDA_VISIBLE_DEVICES='0' catalyst-rl run-trainer --config configs/config.yml} is executed during run-training.sh, catalyst.utils.tools.registry.RegistryException: No factory with name 'CoppeliaSimEnvWrapper' was registered

what is the problem and which version of catalyst should I use?

Traceback (most recent call last):
File "/workspace/mazhengyu/anaconda3/envs/catalystenv/bin/catalyst-rl", line 8, in
sys.exit(main())
File "/workspace/mazhengyu/anaconda3/envs/catalystenv/lib/python3.6/site-packages/catalyst/rl/main.py", line 44, in main
COMMANDS[args.command].main(args, uargs)
File "/workspace/mazhengyu/anaconda3/envs/catalystenv/lib/python3.6/site-packages/catalyst/rl/scripts/run_trainer.py", line 69, in main
env = ENVIRONMENTS.get_from_params(**config["environment"])
File "/workspace/mazhengyu/anaconda3/envs/catalystenv/lib/python3.6/site-packages/catalyst/utils/tools/registry.py", line 244, in get_from_params
return self.get_instance(name, meta_factory=meta_factory, **kwargs)
File "/workspace/mazhengyu/anaconda3/envs/catalystenv/lib/python3.6/site-packages/catalyst/utils/tools/registry.py", line 216, in get_instance
f = self.get(name)
File "/workspace/mazhengyu/anaconda3/envs/catalystenv/lib/python3.6/site-packages/catalyst/utils/tools/registry.py", line 187, in get
f"No factory with name '{name}' was registered"
catalyst.utils.tools.registry.RegistryException: No factory with name 'CoppeliaSimEnvWrapper' was registered

Package throws errors relating to catalyst_rl and copeliasim

Dear arrival team,
Thank you for the great package, i get the following error after trying to install the catalyst-rl-tutorial package on a fresh install of ubuntu 18.04 LTS:

Machine specs are as follows :
OS: Ubuntu 18.04
Intel® Core™ i7-9750H CPU @ 2.60GHz × 12
Graphics card : GeForce RTX 2070 with Max-Q Design/PCIe/SSE2
gnome 3.28.2

This was attempted on 2 additional machines and it always throws the same issue as below. package initiates copeliasim however it crashes within seconds and shows the following terminal messages.

Sampler terminal error:

Traceback (most recent call last):
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/catalyst_rl/utils/tools/registry.py", line 220, in get_instance
return f.get_from_params(*args, **kwargs)
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/src/actor.py", line 47, in get_from_params
state_net = StateNet.get_from_params(im_width, im_height, in_channels, **state_net_params)
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/src/network.py", line 149, in get_from_params
image_net = _get_convolution_net(in_channels, **image_net_params)
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/src/network.py", line 57, in _get_convolution_net
net.apply(utils.initialization.get_optimal_inner_init(activation_fn))
AttributeError: module 'catalyst.utils' has no attribute 'initialization'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/bin/catalyst-rl", line 8, in
sys.exit(main())
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/catalyst_rl/rl/main.py", line 44, in main
COMMANDS[args.command].main(args, uargs)
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/catalyst_rl/rl/scripts/run_trainer.py", line 89, in main
algorithm = algorithm_fn.prepare_for_trainer(env_spec=env, config=config)
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/catalyst_rl/rl/offpolicy/algorithms/td3.py", line 442, in prepare_for_trainer
env_spec=env_spec,
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/catalyst_rl/utils/tools/registry.py", line 244, in get_from_params
return self.get_instance(name, meta_factory=meta_factory, **kwargs)
File "/home/lws1/Desktop/RL/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/catalyst_rl/utils/tools/registry.py", line 225, in get_instance
) from e
catalyst_rl.utils.tools.registry.RegistryException: Factory 'UR5Actor' call failed: args=() kwargs={'state_net_params': OrderedDict([('image_net_params', OrderedDict([('history_len', 1), ('channels', [16, 32, 32, 32, 16]), ('use_bias', True), ('use_groups', False), ('use_normalization', True), ('use_dropout', False), ('activation', 'ReLU')])), ('main_net_params', OrderedDict([('features', [256, 256]), ('use_bias', False), ('use_normalization', False), ('use_dropout', False), ('activation', 'ReLU')]))]), 'policy_head_params': OrderedDict([('in_features', 256), ('policy_type', None), ('out_activation', 'Tanh'), ('out_features', 3)]), 'env_spec': <src.env.CoppeliaSimEnvWrapper object at 0x7f4dc7ce80f0>}

Trainer terminal error:
Same error as above in addtion to the following error
Error: signal 11:
/home/lws1/COPPELIASIM/libcoppeliaSim.so.1(_Z11_segHandleri+0x2b)[0x7f5fc8f91a4b]
/lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7f60f2118040]
/home/lws1/COPPELIASIM/libQt5Core.so.5(_ZN6QMutex4lockEv+0x15)[0x7f5ff937d375]
/home/lws1/COPPELIASIM/libQt5Gui.so.5(+0x387b0e)[0x7f5fc3b3ab0e]
/home/lws1/COPPELIASIM/libQt5Gui.so.5(+0x37f3cf)[0x7f5fc3b323cf]
/home/lws1/COPPELIASIM/libQt5Gui.so.5(_ZN18QRasterPaintEngine11updateBrushERK6QBrush+0x87)[0x7f5fc3b35837]
/home/lws1/COPPELIASIM/libQt5Gui.so.5(_ZN18QRasterPaintEngine8fillRectERK6QRectFRK6QBrush+0x2a)[0x7f5fc3b3630a]
/home/lws1/COPPELIASIM/libQt5Gui.so.5(_ZN8QPainter8fillRectERK5QRectRK6QBrush+0xc5)[0x7f5fc3b49165]
/home/lws1/COPPELIASIM/libQt5Widgets.so.5(_ZNK12QFusionStyle11drawControlEN6QStyle14ControlElementEPK12QStyleOptionP8QPainterPK7QWidget+0x2da7)[0x7f5fc8509207]

terminate called after throwing an instance of 'c10::Error'

Hey there, I can successfully run the project but the training doesn't start, when's sampling is done the second Coppeliasim thread closes, what makes me assume its a parallelism issue. Great work btw!
Environment: Azure VM, Ubuntu 18.04.5 LTS (no GPU)

source catalystenv/bin/activate
CUDA_VISIBLE_DEVICES=0 catalyst-rl run-trainer --config ./configs/config.yml
root@Linux:~/Downloads/catalyst-rl-tutorial# source catalystenv/bin/activate
(catalystenv) root@Linux:~/Downloads/catalyst-rl-tutorial# CUDA_VISIBLE_DEVICES=0 catalyst-rl run-trainer --config ./configs/config.yml
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
-----------Actor------------
UR5Actor(
  (state_net): StateNet(
    (main_net): Sequential(
      (0): Linear(in_features=1120, out_features=256, bias=False)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=256, out_features=256, bias=False)
      (3): ReLU(inplace=True)
    )
    (image_net): Sequential(
      (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (2): ReLU(inplace=True)
      (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (5): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (6): ReLU(inplace=True)
      (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (10): ReLU(inplace=True)
      (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (12): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (14): ReLU(inplace=True)
      (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (16): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (17): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (18): ReLU(inplace=True)
      (19): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
  )
  (head_net): PolicyHead(
    (head_net): SequentialNet(
      (net): Sequential(
        (block_0): Sequential(
          (layer): Linear(in_features=256, out_features=3, bias=True)
          (act): Tanh()
        )
      )
    )
  )
)
---------Critic--------
UR5StateActionCritic(
  (state_action_net): StateActionNet(
    (main_net): Sequential(
      (0): Linear(in_features=1184, out_features=256, bias=False)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=256, out_features=256, bias=False)
      (3): ReLU(inplace=True)
    )
    (image_net): Sequential(
      (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (2): ReLU(inplace=True)
      (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (5): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (6): ReLU(inplace=True)
      (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (10): ReLU(inplace=True)
      (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (12): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (14): ReLU(inplace=True)
      (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (16): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (17): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (18): ReLU(inplace=True)
      (19): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
    (action_net): Sequential(
      (0): Linear(in_features=3, out_features=64, bias=False)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=64, out_features=64, bias=False)
      (3): ReLU(inplace=True)
    )
  )
  (head_net): ValueHead(
    (value_heads): ModuleList(
      (0): Linear(in_features=256, out_features=1, bias=True)
    )
  )
)
---------Critic--------
UR5StateActionCritic(
  (state_action_net): StateActionNet(
    (main_net): Sequential(
      (0): Linear(in_features=1184, out_features=256, bias=False)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=256, out_features=256, bias=False)
      (3): ReLU(inplace=True)
    )
    (image_net): Sequential(
      (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (2): ReLU(inplace=True)
      (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (5): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (6): ReLU(inplace=True)
      (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (8): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (10): ReLU(inplace=True)
      (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (12): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (14): ReLU(inplace=True)
      (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (16): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (17): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      (18): ReLU(inplace=True)
      (19): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
    (action_net): Sequential(
      (0): Linear(in_features=3, out_features=64, bias=False)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=64, out_features=64, bias=False)
      (3): ReLU(inplace=True)
    )
  )
  (head_net): ValueHead(
    (value_heads): ModuleList(
      (0): Linear(in_features=256, out_features=1, bias=True)
    )
  )
)
================================================================================
Something go wrong with trajectory:
'NoneType' object has no attribute 'get'
None
================================================================================
================================================================================
Something go wrong with trajectory:
'NoneType' object has no attribute 'get'
None
================================================================================
================================================================================
Something go wrong with trajectory:
'NoneType' object has no attribute 'get'
None
================================================================================
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000000 | transitions: 000000000 | buffer size: 000000000/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000001 | transitions: 000000101 | buffer size: 000000101/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000002 | transitions: 000000202 | buffer size: 000000202/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000003 | transitions: 000000303 | buffer size: 000000303/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000004 | transitions: 000000404 | buffer size: 000000404/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000005 | transitions: 000000505 | buffer size: 000000505/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000006 | transitions: 000000606 | buffer size: 000000606/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000007 | transitions: 000000707 | buffer size: 000000707/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000008 | transitions: 000000808 | buffer size: 000000808/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000009 | transitions: 000000909 | buffer size: 000000909/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000010 | transitions: 000001010 | buffer size: 000001010/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000011 | transitions: 000001111 | buffer size: 000001111/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000012 | transitions: 000001212 | buffer size: 000001212/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000013 | transitions: 000001313 | buffer size: 000001313/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000014 | transitions: 000001414 | buffer size: 000001414/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000015 | transitions: 000001515 | buffer size: 000001515/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000016 | transitions: 000001616 | buffer size: 000001616/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000017 | transitions: 000001717 | buffer size: 000001717/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000018 | transitions: 000001818 | buffer size: 000001818/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000019 | transitions: 000001919 | buffer size: 000001919/000002000
--- fps:     0.0 | updates per sample:     0.0 | trajectories: 000000020 | transitions: 000002020 | buffer size: 000002020/000002000
terminate called after throwing an instance of 'c10::Error'
  what():  HIP error: hipErrorNoDevice
Exception raised from deviceCount at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:98 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7ff75f036d12 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x57d4f1 (0x7ff75fa2f4f1 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_hip.so)
frame #2: torch::autograd::Engine::start_device_threads() + 0x442 (0x7ff7910c2252 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0xf907 (0x7ff7ab34d907 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: torch::autograd::Engine::initialize_device_threads_pool() + 0xd5 (0x7ff7910bf785 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x2f (0x7ff7910c7faf in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptr<torch::autograd::GraphTask> const&, std::shared_ptr<torch::autograd::Node>) + 0x3c (0x7ff79fbb9d1c in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0xacd (0x7ff7910c746d in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0x4e (0x7ff79fbb9b1e in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: THPEngine_run_backward(THPEngine*, _object*, _object*) + 0xe3f (0x7ff79fbbabef in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #10: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a4a5]
frame #11: _PyEval_EvalFrameDefault + 0x1226 (0x50cc96 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #12: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x507be4]
frame #13: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x509900]
frame #14: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #15: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #16: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x507be4]
frame #17: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x509900]
frame #18: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #19: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #20: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #21: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #22: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #23: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x507be4]
frame #24: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x509900]
frame #25: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #26: _PyEval_EvalFrameDefault + 0x1226 (0x50cc96 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #27: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x507be4]
frame #28: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x509900]
frame #29: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #30: _PyEval_EvalFrameDefault + 0x1226 (0x50cc96 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #31: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #32: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #33: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #34: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #35: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #36: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #37: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #38: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #39: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #40: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #41: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #42: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #43: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #44: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #45: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #46: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #47: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #48: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #49: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #50: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #51: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #52: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x5095c8]
frame #53: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x50a2fd]
frame #54: _PyEval_EvalFrameDefault + 0x444 (0x50beb4 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #55: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x507be4]
frame #56: PyEval_EvalCode + 0x23 (0x50ad03 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #57: /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python() [0x634e72]
frame #58: PyRun_FileExFlags + 0x97 (0x634f27 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #59: PyRun_SimpleFileExFlags + 0x17f (0x6386df in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #60: Py_Main + 0x591 (0x639281 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #61: main + 0xe0 (0x4b0dc0 in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)
frame #62: __libc_start_main + 0xe7 (0x7ff7ab57ebf7 in /lib/x86_64-linux-gnu/libc.so.6)
frame #63: _start + 0x2a (0x5b259a in /home/azurezyn/Downloads/catalyst-rl-tutorial/catalystenv/bin/python)

Aborted (core dumped)

Meet errors when running

Hello, your work about the peg-in-hole with the reinforcement is excellent!
I am so happy to have a tutorial like this.
Unfortunately, I had some trouble when running your example, the details are as below:

(python:46275): Gtk-WARNING **: 16:25:05.465: Theme parsing error: gtk.css:8008:70: The :focused pseudo-class is deprecated. Use : focus instead.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

I had tried so many times but could not solve the problem
and my device is Ubuntu20.04(CPU) with the conda environment as below

python=3.6.5
torch=1.3.1
torchivision=0.4.2
catalyst=20.9
catalyst-rl=20.3

Thank you for your time. And I'm looking forward to your reply!

about reward visualization

Which file has content about the visualization of the network structure and related parameters, there is a reward curve graph at the end of the README.md. And the size of the convolution kernel of image_net is 3*3 by default?

Error during installation

I tried to run the code using ./run-training.sh, but it shows the error:

Failed to build pyarrow
ERROR: Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

About the shape of the goal

Thanks for your code. I have successfully launched your code.
I still have two questions. 1. How to edit the "goal" in Coppeliasim? For example, I want change the shape of the goal's center, now is circle. I want to change it to triangle or rectangle, or other shapes. I found I can't edit this compound shape directly in Coppeliasim. How to create another new goal?
2. I noticed that the action space of UR5 is Three-dimensional, which is the goal position (x,y,z). Is there any possibility to change the control mode？For example, I want to control every joint of the UR5 instead of giving a target position.

Questions about poses of the peg and the hole

firstly, thanks to the open-source project, it is easy to use~
Meanwhile, in the training and later inference experiments, are the poses of the peg and hole known (in the world coordinate system)? Have U considered whether a large deviation in the pose of the peg will have a greater impact on the verification of the algorithm(when using the real UR robot)?

QMutex: destroying locked mutex

After running run-training.sh file the terminal displays QMutex error. I'm not able to identify the error as there are very less resources about this out there.

how to implement it on real system?

My question is regarding its implementation. How can we implement it on real robot? Could you also please provide some description.

It is very interesting simulation example and it is highly appreciated.

Thank you for sharing.

ERROR during install catalyst-rl

I tried to run the code pip install -r requiremrnts.txt ,but it shows the error:

ERROR: Failed building wheel for pyarrow
Failed to build pyarrow
ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects

the version of python is 3.8.5.
the plantform is ubuntu18.04

How to run the simulation

How to run the simulation. The documentation does not mention how to run, nor does it mention the version of each library.

terminate called after throwing an instance of 'c10::DistBackendError'

I'm running below SFT command on an linux EC2 virtual env:

deepspeed trainer_sft.py --configs llama2-7b-sft-RLAIF --wandb-entity tammosta --show_dataset_stats --deepspeed

However, I'm getting the following error:

nvs/ml_v4/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /opt/conda/envs/ml_v4/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
[4/4] c++ cpu_adam.o cpu_adam_impl.o custom_cuda_kernel.cuda.o -shared -lcurand -L/opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cpu_adam.so
Loading extension module cpu_adam...
Time to load cpu_adam op: 32.65589904785156 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 32.57991623878479 seconds
Loading extension module cpu_adam...Loading extension module cpu_adam...

Time to load cpu_adam op: 32.65556788444519 secondsTime to load cpu_adam op: 32.653648376464844 seconds

Loading extension module cpu_adam...
Time to load cpu_adam op: 32.60007572174072 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 32.65678668022156 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 32.62584376335144 seconds
[rank7]:[E ProcessGroupNCCL.cpp:523] [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800599 milliseconds before timing out.
[rank7]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank7]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank7]:[E ProcessGroupNCCL.cpp:1182] [Rank 7] NCCL watchdog thread terminated with exception: [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800599 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f25395dcd87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f253a7846e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f253a787c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f253a788839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f258dcb8793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f28bdbdb609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f28bd99a353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 7] NCCL watchdog thread terminated with exception: [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800599 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f25395dcd87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f253a7846e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f253a787c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f253a788839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f258dcb8793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f28bdbdb609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f28bd99a353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f25395dcd87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7f253a4deb11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7f258dcb8793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7f28bdbdb609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7f28bd99a353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[E ProcessGroupNCCL.cpp:523] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800695 milliseconds before timing out.
[rank1]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank1]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank1]:[E ProcessGroupNCCL.cpp:1182] [Rank 1] NCCL watchdog thread terminated with exception: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800695 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f42a68f3d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f42a7a9b6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f42a7a9ec3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f42a7a9f839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f42fafce793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f462aef1609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f462acb0353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 1] NCCL watchdog thread terminated with exception: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800695 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f42a68f3d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f42a7a9b6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f42a7a9ec3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f42a7a9f839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f42fafce793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f462aef1609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f462acb0353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f42a68f3d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7f42a77f5b11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7f42fafce793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7f462aef1609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7f462acb0353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank3]:[E ProcessGroupNCCL.cpp:523] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800778 milliseconds before timing out.
[rank3]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank3]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank3]:[E ProcessGroupNCCL.cpp:1182] [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800778 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7efd2a176d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7efd2b31e6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7efd2b321c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7efd2b322839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7efd7e853793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f00ae776609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f00ae535353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 3] NCCL watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800778 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7efd2a176d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7efd2b31e6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7efd2b321c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7efd2b322839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7efd7e853793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f00ae776609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f00ae535353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7efd2a176d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7efd2b078b11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7efd7e853793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7f00ae776609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7f00ae535353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank4]:[E ProcessGroupNCCL.cpp:523] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800792 milliseconds before timing out.
[rank4]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank4]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank4]:[E ProcessGroupNCCL.cpp:1182] [Rank 4] NCCL watchdog thread terminated with exception: [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800792 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd06b3a0d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7fd06c5486e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7fd06c54bc3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7fd06c54c839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7fd0bfa7c793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7fd3ef99f609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7fd3ef75e353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 4] NCCL watchdog thread terminated with exception: [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800792 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd06b3a0d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7fd06c5486e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7fd06c54bc3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7fd06c54c839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7fd0bfa7c793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7fd3ef99f609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7fd3ef75e353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd06b3a0d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7fd06c2a2b11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7fd0bfa7c793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7fd3ef99f609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7fd3ef75e353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank5]:[E ProcessGroupNCCL.cpp:523] [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800875 milliseconds before timing out.
[rank5]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank5]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank5]:[E ProcessGroupNCCL.cpp:1182] [Rank 5] NCCL watchdog thread terminated with exception: [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800875 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fee13cb6d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7fee14e5e6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7fee14e61c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7fee14e62839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7fee68391793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7ff1982b4609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7ff198073353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 5] NCCL watchdog thread terminated with exception: [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800875 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fee13cb6d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7fee14e5e6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7fee14e61c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7fee14e62839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7fee68391793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7ff1982b4609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7ff198073353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fee13cb6d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7fee14bb8b11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7fee68391793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7ff1982b4609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7ff198073353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank6]:[E ProcessGroupNCCL.cpp:523] [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800902 milliseconds before timing out.
[rank6]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank6]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank6]:[E ProcessGroupNCCL.cpp:1182] [Rank 6] NCCL watchdog thread terminated with exception: [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800902 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f3202f33d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f32040db6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f32040dec3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f32040df839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f3257610793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f3587533609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f35872f2353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 6] NCCL watchdog thread terminated with exception: [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800902 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f3202f33d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f32040db6e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f32040dec3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f32040df839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f3257610793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f3587533609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f35872f2353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f3202f33d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7f3203e35b11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7f3257610793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7f3587533609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7f35872f2353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank2]:[E ProcessGroupNCCL.cpp:523] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800918 milliseconds before timing out.
[rank2]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank2]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank2]:[E ProcessGroupNCCL.cpp:1182] [Rank 2] NCCL watchdog thread terminated with exception: [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800918 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f8600788d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f86019306e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f8601933c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f8601934839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f8654ee9793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f8984e0c609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f8984bcb353 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [Rank 2] NCCL watchdog thread terminated with exception: [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=880, OpType=BROADCAST, NumelIn=131137536, NumelOut=131137536, Timeout(ms)=1800000) ran for 1800918 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f8600788d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f86019306e6 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f8601933c3d in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f8601934839 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe6793 (0x7f8654ee9793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f8984e0c609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f8984bcb353 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f8600788d87 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7f860168ab11 in /opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe6793 (0x7f8654ee9793 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x8609 (0x7f8984e0c609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: clone + 0x43 (0x7f8984bcb353 in /lib/x86_64-linux-gnu/libc.so.6)

[2024-01-30 21:38:11,090] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631563
[2024-01-30 21:38:13,081] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631564
[2024-01-30 21:38:13,081] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631565
[2024-01-30 21:38:13,099] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631566
[2024-01-30 21:38:13,115] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631567
[2024-01-30 21:38:13,130] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631568
[2024-01-30 21:38:13,144] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631569
[2024-01-30 21:38:13,159] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3631570
[2024-01-30 21:38:43,204] [ERROR] [launch.py:321:sigkill_handler] ['/opt/conda/envs/ml_v4/bin/python3.10', '-u', 'trainer_sft.py', '--local_rank=7', '--configs', 'llama2-7b-sft-RLAIF', '--wandb-entity', 'tammosta', '--show_dataset_stats', '--deepspeed'] exits with return code = -6

When I run nvidia-smi in real time, I get this:

(ml_v3) ubuntu@ip-172-31-43-52:/mnt/efs/data/tammosta/Open-Assistant/model/model_training$ nvidia-smi
Tue Jan 30 21:14:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:10:1C.0 Off |                    0 |
| N/A   29C    P0              72W / 400W |   2589MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  | 00000000:10:1D.0 Off |                    0 |
| N/A   28C    P0              78W / 400W |   2245MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  | 00000000:20:1C.0 Off |                    0 |
| N/A   31C    P0              77W / 400W |   2245MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  | 00000000:20:1D.0 Off |                    0 |
| N/A   28C    P0              75W / 400W |   2245MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  | 00000000:90:1C.0 Off |                    0 |
| N/A   30C    P0              83W / 400W |   2245MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  | 00000000:90:1D.0 Off |                    0 |
| N/A   28C    P0              75W / 400W |   2245MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  | 00000000:A0:1C.0 Off |                    0 |
| N/A   31C    P0              79W / 400W |   2245MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  | 00000000:A0:1D.0 Off |                    0 |
| N/A   30C    P0              81W / 400W |   2101MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3631563      C   /opt/conda/envs/ml_v4/bin/python3.10       2572MiB |
|    1   N/A  N/A   3631564      C   /opt/conda/envs/ml_v4/bin/python3.10       2226MiB |
|    2   N/A  N/A   3631565      C   /opt/conda/envs/ml_v4/bin/python3.10       2226MiB |
|    3   N/A  N/A   3631566      C   /opt/conda/envs/ml_v4/bin/python3.10       2226MiB |
|    4   N/A  N/A   3631567      C   /opt/conda/envs/ml_v4/bin/python3.10       2226MiB |
|    5   N/A  N/A   3631568      C   /opt/conda/envs/ml_v4/bin/python3.10       2226MiB |
|    6   N/A  N/A   3631569      C   /opt/conda/envs/ml_v4/bin/python3.10       2226MiB |
|    7   N/A  N/A   3631570      C   /opt/conda/envs/ml_v4/bin/python3.10       2082MiB |
+---------------------------------------------------------------------------------------+

It looks GPU 0 is inactive. Could anyone please help me solve this? Thanks

use the previous model for training

Can I continue to use the previous model for training? What settings need to be changed? thx