The gym_solo from wpi-mmr

Create a detailed readme

Since we are making this open to the general public, we should think about creating a detailed README which talks about the steps to create new observations, rewards, and terminations. We should also have a brief explanation for the package so it is easy to understand why we have our code organized in this particular way.

Remove and Refactor RewardsFactory

Just piggy backing onto the end of this PR cause it'll be easier to merge in. Create an AdditiveReward and MultiplicitiveReward to make easier reward function creation.

Note that this means that RewardFactory should be factored out soon.

Originally posted by @agupta231 in #51 (comment)

RewardsFactory is basically just a fancy AdditiveReward. The RewardsFactory should probably be taken out in favor of AdditiveReward, and maybe a new class like RewardWrapper could be implemented (that AdditiveReward and MultiplicitveReward inherit from) to handle all of the Pybullet passthrough business.

Forcing One PyBullet Client per Environment

I guess that we didn't know how to PyBullet worked when we started. When trying to train the models in a env and then creating a new gui-based environment for rendering, we found that we had run into a lot of issues.

The only way to circumvent them was to restart the kernel, which also happens to be the only way to terminate the pybullet physics server. I'm gonna conjecture that when we create a second environment, it causes conflicts because there already is a simulation running on the PyBullet server. I believe if we forced each environment to maintain its own physics client, then we should be a-ok.

It's either that or find a way to train without having to render and be able to start rendering mid-pybullet session.

Write Observation Space Tests for Solo8 v2 Vanilla Env

gym_solo/gym_solo/envs/solo8v2vanilla.py

Lines 95 to 98 in 88d9d64

    
           @property 
        
           def observation_space(self): 
        
             # TODO: Write tests for this function 
        
             return self.obs_factory.get_observation_space()

it's just been neglected for a while... now that the observation factory is done, just need to plop in some tests.

Motor Encoder Observation Has NaN Observation Space

Seems that the motor encoder observation is trying to get information from PyBullet and it's not getting back real numbers. This actually needs to be be fixed relatively soon as this breaks the entire normalization behavior of the observation factory otherwise.

With that being said, stable-baselines has an automatically-normalizing environment wrapper, so I'm going to be using that for the time being. However, that performs the normalization via rms standardization, which is just a hack when we should be using min-max normalization.

Create Rewards Factory

Similar to the observation factory. The reward factory should be able to take in Reward objects, similar to an Observation object, evaluate the rewards for the state, and combine them.

I'm thinking of making the final reward a linear combination of the registered rewards. With that in mind, consider the following example:

r1 = Reward()
r2 = Reward()
r3 = Reward()

rf = RewardFactory()

r1.register_reward(-1, r1) 
r1.register_reward(.1, r2)
r1.register_reward(.9, r3)

Notice that register_reward() has two args: weight: float and reward: gym_solo.core.rewards.Reward. Thus, the final reward would evaluate to: -r1() + 0.1 * r2() + 0.9 * r3(). Figure that if you need functionality more elaborate than a linear combination, you should be offloading that processing into a Reward class.

@mahajanrevant thoughts on this? lmk if you think we need anything stronger than linear combinations in the RewardFactory.

Integrate Reward Factory with Solo Env

It seems as if the Solo Env is still sending out a hard coded reward--this needs to be switched to the dynamic model asap:

gym_solo/gym_solo/envs/solo8v2vanilla.py

Line 77 in 88d9d64

return obs_values, 0.0, False, {'labels': obs_labels}

^^ Here the 0.0 needs to be dynamically computed for the reward.

Convert Motors to Position Control

Motors were initially made to be torque controlled. Since the Arduino is going to implement a PID controller, we can abstract it out to position control.

Create Testing Module

We have a lot of "testing objects" such as core.test_obs_factory.CompliantObs and envs.test_solo8v2vanilla.SimpleReward that can be used across tests. We should move everything to a separate module and/or a util module as well.

Convert the Solo8VanillaEnv to be an ABC and subclass combination

The Solo8VanillaEnv was never meant to be an all-in-one package. It's intentioned to be a two-part system, where there is a BaseEnv that does a lot of the duplicated work, and the Solo8VanillaEnv would just be a subclass of that. Additionally, any new models and/or envs could inherit from BaseEnv and it should (in theory) just work like that.

Unfortunately, when there was only one model, it was easier to debug by putting everything together. This should get resolved before we start heavy design iterations however.

Time Based Stopping Criteria

This should be an implementation of #19.

Figure that this would be our "basic" stopping criteria--after n steps, terminate the episode.

Make Rewards Environment Independant

Basically just an extension of #18. In the Observation class, there is a PyBullet client that can set at the Observation instantiation time:

gym_solo/gym_solo/core/obs.py

Lines 70 to 82 in 0ee3f1b

    
             @property 
        
             def client(self) -> bullet_client.BulletClient: 
        
               """Get the Observation's physics client. 
        
               Raises: 
        
                 ValueError: If the PyBullet client hasn't been set yet. 
        
               Returns: 
        
                 bullet_client.BulletClient: The active client for the observation. 
        
               """ 
        
               if not self._client: 
        
                 raise ValueError('PyBullet client needs to be set') 
        
               return self._client

This basically just needs to extend to Rewards for the same reasons.

Note that we might need to extend this to Terminations in the future. However, all of our Terminations are just time based and this is a relatively painless change to implement, so I think it might be wise to leave it off until we know we're going to actually need it.

VRAM Memory Leak in env.reset()

Currently the environment is reset by removing all of the bodies in the simulation (via removeBody()) and reloading all of the bodies again.

Running it over and over again causes the bodies to gradually disappear--leading me to believe that there is a VRAM memory leak Pybullet's implementation.

Instead, I basically copy and pasted all of the env initiation code and plopped it right after a env.resetSimulation(). This logic should be encompassed in the Solo8BaseEnv, so it should be moved there.

Potential Issue with Consistent Angular Units

gym_solo/gym_solo/core/obs.py

Lines 205 to 206 in e4bcb11

    
           else: 
        
             v_ang = np.radians(v_ang)

.

it doesn't really make sense that they would report one value in radians and one in degrees??

Motion Based Stopping Criteria

Implementation of #19.

From the most basic RL testing, it seems as if the model pretty quickly just ends up on the ground twitching. Obviously, once it's in this state, it can be very difficult to get out of. Figure that it could be interesting to track the derivative of position over time and terminate the episode when it approaches 0.

Rename TestRewardsFactory

Currently the file is named test_rewards_factory.py. Since it contains more tests than just the factory, it should probably be renamed to something like test_rewards.py

Create Motor Encoder Observation for Solo 8 v2 Vanilla

Need to implement a gym_solo.core.obs.Observation that will return the motor encoder values for the Solo8v2 Vanilla model (already implemented in the environment).

At the very least, these should return the position of the motor encoders (labelled for each joint name, of course). According to the documentation, it seems that we can get the position, velocity, and applied angular forces on all of joints.

I'm not really sure if we need the velocity / applied angular forces in the observation-- @andrew103 or @mahajanrevant could one of you pitch in and give some insight to what will be realistic in terms of the robot? Worst comes to worst, we can add those as another observation cause I think that information could be useful in the RL aspect of the project.

Create Episode Termination Factory

After running the first couple of simulations, the robot usually ends up on the ground and twitching. This is going to be an artifact of RL and we need to make a way to early terminate an episode so that the agent can start fresh.

I figure that the interface to this factory would be similar to our ObservationFactory and RewardsFactory. Create your env, register your stopping criteria, and you are off the races!

Add termination factory to solo8v2vanilla environment

Was never added.

Create Never Ending Termination

A couple of PRs have this issue (#33, #35), but if a user wants to run gym indefinitely, they would need to create a TimeBasedTermination with a really big number. While that works, it's pretty jank and we should just make a NoTermination which just goes on forever instead.

Compute Rewards Dynamically in Solo Env

It seems as if the Solo Env is still sending out a hard coded reward--this needs to be switched to the dynamic model asap:

gym_solo/gym_solo/envs/solo8v2vanilla.py

Line 77 in 88d9d64

return obs_values, 0.0, False, {'labels': obs_labels}

^^ Here the 0.0 needs to be dynamically computed for the reward.

A reward for getting the robot from flat on the ground to the home position

It would be easier to create a reward that is geared towards getting the robot from flat on the ground to the home position.
The home position would be when all the legs are straight and the body is parallel to the ground. In theory, this is a much easier task as compared to getting the robot to completely stand up on two legs while being flat on the ground.

Create Upright Reward

At the very least, we know that our reward will need to be somewhat correlated with how upright the robot is. I'm reaching out to a roommate rn to verify the math, but I feel like the cosine similarity between the torso IMU vector and straight upright should basically be it, correct? @andrew103 and @mahajanrevant. Choosing cosine similarity over inner product because it's automatically normalized to be in [-1, 1]

	@property
	def observation_space(self):
	# TODO: Write tests for this function
	return self.obs_factory.get_observation_space()

	@property
	def client(self) -> bullet_client.BulletClient:
	"""Get the Observation's physics client.

	Raises:
	ValueError: If the PyBullet client hasn't been set yet.

	Returns:
	bullet_client.BulletClient: The active client for the observation.
	"""
	if not self._client:
	raise ValueError('PyBullet client needs to be set')
	return self._client

wpi-mmr / gym_solo Goto Github PK

gym_solo's People

Contributors

Stargazers

Watchers

Forkers

gym_solo's Issues

Recommend Projects

Recommend Topics

Recommend Org