Code Monkey home page Code Monkey logo

gym_solo's People

Contributors

goobta avatar mahajanrevant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

droneswat

gym_solo's Issues

Create a detailed readme

Since we are making this open to the general public, we should think about creating a detailed README which talks about the steps to create new observations, rewards, and terminations. We should also have a brief explanation for the package so it is easy to understand why we have our code organized in this particular way.

Remove and Refactor RewardsFactory

Just piggy backing onto the end of this PR cause it'll be easier to merge in. Create an AdditiveReward and MultiplicitiveReward to make easier reward function creation.

Note that this means that RewardFactory should be factored out soon.

Originally posted by @agupta231 in #51 (comment)

RewardsFactory is basically just a fancy AdditiveReward. The RewardsFactory should probably be taken out in favor of AdditiveReward, and maybe a new class like RewardWrapper could be implemented (that AdditiveReward and MultiplicitveReward inherit from) to handle all of the Pybullet passthrough business.

Forcing One PyBullet Client per Environment

I guess that we didn't know how to PyBullet worked when we started. When trying to train the models in a env and then creating a new gui-based environment for rendering, we found that we had run into a lot of issues.

The only way to circumvent them was to restart the kernel, which also happens to be the only way to terminate the pybullet physics server. I'm gonna conjecture that when we create a second environment, it causes conflicts because there already is a simulation running on the PyBullet server. I believe if we forced each environment to maintain its own physics client, then we should be a-ok.

It's either that or find a way to train without having to render and be able to start rendering mid-pybullet session.

Motor Encoder Observation Has NaN Observation Space

Seems that the motor encoder observation is trying to get information from PyBullet and it's not getting back real numbers. This actually needs to be be fixed relatively soon as this breaks the entire normalization behavior of the observation factory otherwise.

With that being said, stable-baselines has an automatically-normalizing environment wrapper, so I'm going to be using that for the time being. However, that performs the normalization via rms standardization, which is just a hack when we should be using min-max normalization.

Create Rewards Factory

Similar to the observation factory. The reward factory should be able to take in Reward objects, similar to an Observation object, evaluate the rewards for the state, and combine them.

I'm thinking of making the final reward a linear combination of the registered rewards. With that in mind, consider the following example:

r1 = Reward()
r2 = Reward()
r3 = Reward()

rf = RewardFactory()

r1.register_reward(-1, r1) 
r1.register_reward(.1, r2)
r1.register_reward(.9, r3)

Notice that register_reward() has two args: weight: float and reward: gym_solo.core.rewards.Reward. Thus, the final reward would evaluate to: -r1() + 0.1 * r2() + 0.9 * r3(). Figure that if you need functionality more elaborate than a linear combination, you should be offloading that processing into a Reward class.

@mahajanrevant thoughts on this? lmk if you think we need anything stronger than linear combinations in the RewardFactory.

Convert Motors to Position Control

Motors were initially made to be torque controlled. Since the Arduino is going to implement a PID controller, we can abstract it out to position control.

Create Testing Module

We have a lot of "testing objects" such as core.test_obs_factory.CompliantObs and envs.test_solo8v2vanilla.SimpleReward that can be used across tests. We should move everything to a separate module and/or a util module as well.

Convert the Solo8VanillaEnv to be an ABC and subclass combination

The Solo8VanillaEnv was never meant to be an all-in-one package. It's intentioned to be a two-part system, where there is a BaseEnv that does a lot of the duplicated work, and the Solo8VanillaEnv would just be a subclass of that. Additionally, any new models and/or envs could inherit from BaseEnv and it should (in theory) just work like that.

Unfortunately, when there was only one model, it was easier to debug by putting everything together. This should get resolved before we start heavy design iterations however.

Time Based Stopping Criteria

This should be an implementation of #19.

Figure that this would be our "basic" stopping criteria--after n steps, terminate the episode.

Make Rewards Environment Independant

Basically just an extension of #18. In the Observation class, there is a PyBullet client that can set at the Observation instantiation time:

@property
def client(self) -> bullet_client.BulletClient:
"""Get the Observation's physics client.
Raises:
ValueError: If the PyBullet client hasn't been set yet.
Returns:
bullet_client.BulletClient: The active client for the observation.
"""
if not self._client:
raise ValueError('PyBullet client needs to be set')
return self._client

This basically just needs to extend to Rewards for the same reasons.

Note that we might need to extend this to Terminations in the future. However, all of our Terminations are just time based and this is a relatively painless change to implement, so I think it might be wise to leave it off until we know we're going to actually need it.

VRAM Memory Leak in env.reset()

Currently the environment is reset by removing all of the bodies in the simulation (via removeBody()) and reloading all of the bodies again.

Running it over and over again causes the bodies to gradually disappear--leading me to believe that there is a VRAM memory leak Pybullet's implementation.

Instead, I basically copy and pasted all of the env initiation code and plopped it right after a env.resetSimulation(). This logic should be encompassed in the Solo8BaseEnv, so it should be moved there.

Motion Based Stopping Criteria

Implementation of #19.

From the most basic RL testing, it seems as if the model pretty quickly just ends up on the ground twitching. Obviously, once it's in this state, it can be very difficult to get out of. Figure that it could be interesting to track the derivative of position over time and terminate the episode when it approaches 0.

Rename TestRewardsFactory

Currently the file is named test_rewards_factory.py. Since it contains more tests than just the factory, it should probably be renamed to something like test_rewards.py

Create Motor Encoder Observation for Solo 8 v2 Vanilla

Need to implement a gym_solo.core.obs.Observation that will return the motor encoder values for the Solo8v2 Vanilla model (already implemented in the environment).

At the very least, these should return the position of the motor encoders (labelled for each joint name, of course). According to the documentation, it seems that we can get the position, velocity, and applied angular forces on all of joints.

I'm not really sure if we need the velocity / applied angular forces in the observation-- @andrew103 or @mahajanrevant could one of you pitch in and give some insight to what will be realistic in terms of the robot? Worst comes to worst, we can add those as another observation cause I think that information could be useful in the RL aspect of the project.

Create Episode Termination Factory

After running the first couple of simulations, the robot usually ends up on the ground and twitching. This is going to be an artifact of RL and we need to make a way to early terminate an episode so that the agent can start fresh.

I figure that the interface to this factory would be similar to our ObservationFactory and RewardsFactory. Create your env, register your stopping criteria, and you are off the races!

Create Never Ending Termination

A couple of PRs have this issue (#33, #35), but if a user wants to run gym indefinitely, they would need to create a TimeBasedTermination with a really big number. While that works, it's pretty jank and we should just make a NoTermination which just goes on forever instead.

A reward for getting the robot from flat on the ground to the home position

It would be easier to create a reward that is geared towards getting the robot from flat on the ground to the home position.
The home position would be when all the legs are straight and the body is parallel to the ground. In theory, this is a much easier task as compared to getting the robot to completely stand up on two legs while being flat on the ground.

Create Upright Reward

At the very least, we know that our reward will need to be somewhat correlated with how upright the robot is. I'm reaching out to a roommate rn to verify the math, but I feel like the cosine similarity between the torso IMU vector and straight upright should basically be it, correct? @andrew103 and @mahajanrevant. Choosing cosine similarity over inner product because it's automatically normalized to be in [-1, 1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.