Code Monkey home page Code Monkey logo

Comments (3)

jhejna avatar jhejna commented on July 26, 2024

All transfers in navigation environments using the discriminator were done using the point mass. Thus, the point mass row contains the correct hyperparameters. As mentioned in the text, we use a decay on the learning rate of the discriminator (hence --discrim-decay true), do not collect online data for the discriminator (--discrim-decay false), and discrim-time-limit refers to the episode length of the imitated agent. For example, the point mass is much faster than the ant, thus it doesn't make sense to collect data from the point mass where it is just sitting at the goal. discrim-time-limit refers to how long the episode lengths are for the point mass during data collection.

Here's the general procedure for reproducing the maze results with the discriminator.

  1. Train the Point mass low level on PointMass_Low (I believe its named something similar)
  2. Train the point mass high level, PointMaze_High using PointMass_Low
  3. Train the Ant low level with the discriminator using data from PointMass_Low
  4. Compose PointMaze high with AntDiscrim_Low in zero-shot manner. This can be done with the composition_test.py script. Note that depending on the type of maze evaluation you want to do, you may need to edit the compose_params function in utils/loader.py.

from hierarchical_morphology_transfer.

chongyi-zheng avatar chongyi-zheng commented on July 26, 2024

@jhejna Many thanks for the quick reply!

  1. I found the name of PointMass_Low was PointMassLargeMJ_Low, and I just want to confirm it with you.

  2. I trained a PointMaze_High policy and an Ant_Low policy to do a zero-shot transfer as mentioned in my other question. And I didn't edit the compose_params function as you said. Do you mean that I need to edit it with Ant_Discrim?

  3. I got the Ant sometime stuck in a location during the zero-shot transfer as you can see in this image, do you have any idea for the reason? (Even though the Ant is not overturned)
    Ant_Low_SAC_0
    The Ant_Low looks correct
    Ant_Low_SAC_0

  4. Do I need to always set high-level-skips manually? I think you try to store it here, but it doesn't work now.

    k = low_params['env_args']['k'] if 'k' in low_params['env_args'] else high_params['env_args']['k']

  5. I found some minor bugs in the code

    parser.add_argument("--discrim-batch-size", "-db", default=None, type=float)

    The type is int, right?

    I got empty images with this code, the following should work

    # frame = env.render(mode='rgb_array')

  6. I updated test_composition function to do onscreen rendering

    def test_composition(low_name, high_name, env_name, g, k=None, num_ep=100):

def test_composition(low_name, high_name, env_name, g=0, k=None, num_ep=100):
    params = compose_params(low_name, high_name, env_name, k=k)
    model, env = load(high_name, params, best=True)
    print("COMPOSED PARAMS", params)
    print("ENV", env)

    ep_rewards = list()
    rewards = list()
    obs = env.reset()
    if g == 0:
        while True:
            action, _states = model.predict(obs)
            obs, reward, done, info = env.step(action)
            rewards.append(reward)
            if done:
                ep_rewards.append(sum(rewards))
                print("REWARD", sum(rewards), len(rewards), "Ep to go:", num_ep, "cur avg", np.mean(ep_rewards))
                num_ep -= 1
                rewards = []
                if num_ep == 0:
                    break
                obs = env.reset()
            env.render()
    else:
        gif_frames = list()
        for _ in range(g):
            action, _states = model.predict(obs)
            obs, reward, done, info = env.step(action)
            frame = env.render(mode='rgb_array')
            gif_frames.append(frame)
            rewards.append(reward)
            if done:
                ep_rewards.append(sum(rewards))
                print("REWARD", sum(rewards), len(rewards), "Ep to go:", num_ep, "cur avg", np.mean(ep_rewards))
                num_ep -= 1
                rewards = []
                if num_ep == 0:
                    break
                obs = env.reset()

        import imageio
        render_path = os.path.join(RENDERS, 'composition_' + low_name + '.gif')
        os.makedirs(os.path.dirname(render_path), exist_ok=True)
        print("saving to ", render_path)
        imageio.mimsave(render_path, gif_frames[::4], subrectangles=True, duration=0.05)
        print("completed saving")

from hierarchical_morphology_transfer.

jhejna avatar jhejna commented on July 26, 2024
  1. Yeah, that should be the correct one.
  2. In compose_params there is a line that disables sampling goals for the maze. This is the difference between the Maze and Maze End evaluations. Depending on the type of evaluation you want to run, you will need to comment / uncomment this. https://github.com/jhejna/hierarchical_morphology_transfer/blob/master/bot_transfer/utils/loader.py#L403
  3. Hmmmm. We did observe this once or twice but not to the extent that is seen here. I'm not sure exactly what would be causing this -- perhaps try training the Ant Low level for more than 2.5 million timesteps, then make sure that you are using the "best" policy saved during training. Additionally confirm that contact information is enabled in the environment and that the mujoco_py version is <2.0. I can perhaps investigate this later when I have more time.
  4. The code makes its best guess at what the skip level should be. When running evals, we set this by hand to 35.
  5. Thanks for pointing that out in the parser! As far as rendering goes, this is only meant to be used when debug rendering is enabled for the high level wrapper: https://github.com/jhejna/hierarchical_morphology_transfer/blob/master/bot_transfer/envs/hierarchical.py#L196. It's commented out because it makes everything run really slowly when enabled.

from hierarchical_morphology_transfer.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.