The paper only contains some of the discriminator hyperparameters <a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yeah, that should be the correct one. In comp

Discriminator hyperparameters about hierarchical_morphology_transfer HOT 3 OPEN

jhejna commented on July 26, 2024

Discriminator hyperparameters

from hierarchical_morphology_transfer.

Comments (3)

jhejna commented on July 26, 2024

All transfers in navigation environments using the discriminator were done using the point mass. Thus, the point mass row contains the correct hyperparameters. As mentioned in the text, we use a decay on the learning rate of the discriminator (hence --discrim-decay true), do not collect online data for the discriminator (--discrim-decay false), and discrim-time-limit refers to the episode length of the imitated agent. For example, the point mass is much faster than the ant, thus it doesn't make sense to collect data from the point mass where it is just sitting at the goal. discrim-time-limit refers to how long the episode lengths are for the point mass during data collection.

Here's the general procedure for reproducing the maze results with the discriminator.

Train the Point mass low level on PointMass_Low (I believe its named something similar)
Train the point mass high level, PointMaze_High using PointMass_Low
Train the Ant low level with the discriminator using data from PointMass_Low
Compose PointMaze high with AntDiscrim_Low in zero-shot manner. This can be done with the composition_test.py script. Note that depending on the type of maze evaluation you want to do, you may need to edit the compose_params function in utils/loader.py.

from hierarchical_morphology_transfer.

chongyi-zheng commented on July 26, 2024

@jhejna Many thanks for the quick reply!

I found the name of PointMass_Low was PointMassLargeMJ_Low, and I just want to confirm it with you.
I trained a PointMaze_High policy and an Ant_Low policy to do a zero-shot transfer as mentioned in my other question. And I didn't edit the compose_params function as you said. Do you mean that I need to edit it with Ant_Discrim?
I got the Ant sometime stuck in a location during the zero-shot transfer as you can see in this image, do you have any idea for the reason? (Even though the Ant is not overturned)

The Ant_Low looks correct
Do I need to always set high-level-skips manually? I think you try to store it here, but it doesn't work now.

hierarchical_morphology_transfer/bot_transfer/utils/loader.py

Line 361 in 14202b6

k = low_params['env_args']['k'] if 'k' in low_params['env_args'] else high_params['env_args']['k']
I found some minor bugs in the code

hierarchical_morphology_transfer/bot_transfer/utils/cmd_util.py

Line 100 in 14202b6

parser.add_argument("--discrim-batch-size", "-db", default=None, type=float)

The type is int, right?

hierarchical_morphology_transfer/bot_transfer/utils/tester.py

Line 84 in 14202b6

if 'frames' in info:

I got empty images with this code, the following should work

hierarchical_morphology_transfer/bot_transfer/utils/tester.py

Line 87 in 14202b6

# frame = env.render(mode='rgb_array')
I updated test_composition function to do onscreen rendering

hierarchical_morphology_transfer/bot_transfer/utils/tester.py

Line 72 in 14202b6

def test_composition(low_name, high_name, env_name, g, k=None, num_ep=100):

def test_composition(low_name, high_name, env_name, g=0, k=None, num_ep=100):
    params = compose_params(low_name, high_name, env_name, k=k)
    model, env = load(high_name, params, best=True)
    print("COMPOSED PARAMS", params)
    print("ENV", env)

    ep_rewards = list()
    rewards = list()
    obs = env.reset()
    if g == 0:
        while True:
            action, _states = model.predict(obs)
            obs, reward, done, info = env.step(action)
            rewards.append(reward)
            if done:
                ep_rewards.append(sum(rewards))
                print("REWARD", sum(rewards), len(rewards), "Ep to go:", num_ep, "cur avg", np.mean(ep_rewards))
                num_ep -= 1
                rewards = []
                if num_ep == 0:
                    break
                obs = env.reset()
            env.render()
    else:
        gif_frames = list()
        for _ in range(g):
            action, _states = model.predict(obs)
            obs, reward, done, info = env.step(action)
            frame = env.render(mode='rgb_array')
            gif_frames.append(frame)
            rewards.append(reward)
            if done:
                ep_rewards.append(sum(rewards))
                print("REWARD", sum(rewards), len(rewards), "Ep to go:", num_ep, "cur avg", np.mean(ep_rewards))
                num_ep -= 1
                rewards = []
                if num_ep == 0:
                    break
                obs = env.reset()

        import imageio
        render_path = os.path.join(RENDERS, 'composition_' + low_name + '.gif')
        os.makedirs(os.path.dirname(render_path), exist_ok=True)
        print("saving to ", render_path)
        imageio.mimsave(render_path, gif_frames[::4], subrectangles=True, duration=0.05)
        print("completed saving")

from hierarchical_morphology_transfer.

jhejna commented on July 26, 2024

Yeah, that should be the correct one.
In compose_params there is a line that disables sampling goals for the maze. This is the difference between the Maze and Maze End evaluations. Depending on the type of evaluation you want to run, you will need to comment / uncomment this. https://github.com/jhejna/hierarchical_morphology_transfer/blob/master/bot_transfer/utils/loader.py#L403
Hmmmm. We did observe this once or twice but not to the extent that is seen here. I'm not sure exactly what would be causing this -- perhaps try training the Ant Low level for more than 2.5 million timesteps, then make sure that you are using the "best" policy saved during training. Additionally confirm that contact information is enabled in the environment and that the mujoco_py version is <2.0. I can perhaps investigate this later when I have more time.
The code makes its best guess at what the skip level should be. When running evals, we set this by hand to 35.
Thanks for pointing that out in the parser! As far as rendering goes, this is only meant to be used when debug rendering is enabled for the high level wrapper: https://github.com/jhejna/hierarchical_morphology_transfer/blob/master/bot_transfer/envs/hierarchical.py#L196. It's commented out because it makes everything run really slowly when enabled.

from hierarchical_morphology_transfer.

Discriminator hyperparameters about hierarchical_morphology_transfer HOT 3 OPEN

Comments (3)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent