berkeleydeeprlcourse / homework Goto Github PK

Assignments for CS294-112.

License: MIT License

Python 92.56% Shell 0.74% TeX 6.71%

homework's Issues

Port HW1 over to use OpenAI RoboSchool

OpenAI has re-implemented the Mujuco environments in Bullet making them available to everyone. It would be really helpful for people following the course after the fact if HW1 could be ported over to use these new environments.

I could probably send some time working on this, but I'm not sure how I would generate new expert policies.

Cost function in HW4, index error, dimension of states

Hi,

It seems the cost function for the half-cheetah environment (cost_functions.py) has a mistake. I'm getting the error

---> 50     score -= (next_state[17] - state[17]) / 0.01 #+ 0.1 * (np.sum(action**2))
IndexError: index 17 is out of bounds for axis 0 with size 17

Is the dimension 17 correct for the environment? Thanks :)

hw4 and HalfCheetah pybullet/roboschool

Hi all,
I'm trying to figure out how to port the cost function of HalfCheetah MuJoCo to HalfCheetah pybullet for model based RL. The state vector is not really the same for the two environments. 26-dim for pybullet instead of 20-dim. Any idea on how to implement the cost function in order to produce the same behaviour with the two environments ?

I cannot find tensorflow 1.10.5

When I use conda search

I can find version 1.10.0 , 1.11.0..., I dont know if it may matter if I choose version 1.10.0.

Or, Where could I find version 1.10.5.

Sorry I am a newer for tf

Assignment

hw1 dropout implementation

In tf_utils.py HW1, I found dropout implementation as follows,

def dropout(x, pkeep, phase=None, mask=None):
    mask = tf.floor(pkeep + tf.random_uniform(tf.shape(x))) if mask is None else mask
    if phase is None:
        return mask * x
    else:
        return switch(phase, mask*x, pkeep*x)

In test phase, should it be x/pkeep instead of x*pkeep(based on inverse dropout theory)? If not, why?

Thanks for your explanation.

hw2 instruction, Probable Typo?

Hi,
At the end of the page three of the HW2 instruction, (Problem 1.a) Few equations sum over is. Shouldn't they be ts.

AttributeError for get_wrapper_by_name(env, "Monitor") in hw3

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-005c8955d609> in <module>()
     12     frame_history_len=4,
     13     target_update_freq=10000,
---> 14     grad_norm_clipping=10
     15 )

/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
    135     for t in count():
    136         ### Check stopping criterion
--> 137         if stopping_criterion is not None and stopping_criterion(env, t):
    138             break
    139 

<ipython-input-11-ab71394a8fd6> in stopping_criterion(env, t)
      2         # notice that here t is the number of steps of the wrapped env,
      3         # which is different from the number of steps in the underlying env
----> 4         return get_wrapper_by_name(env, "Monitor").get_total_steps() >= num_timesteps

AttributeError: '_Monitor' object has no attribute 'get_total_steps'

AttributeError                            Traceback (most recent call last)
<ipython-input-9-005c8955d609> in <module>()
     12     frame_history_len=4,
     13     target_update_freq=10000,
---> 14     grad_norm_clipping=10
     15 )

/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
    201 
    202         ### 4. Log progress
--> 203         episode_rewards = get_wrapper_by_name(env, "Monitor").get_episode_rewards()
    204         if len(episode_rewards) > 0:
    205             mean_episode_reward = np.mean(episode_rewards[-100:])

AttributeError: '_Monitor' object has no attribute 'get_episode_rewards'

Seems like whenever I can't get any specified attributes when calling get_wrapper_by_name(env, "Monitor")

I have update gym version, but it didn't work out.

Copyrights of lecture notes

Thx for the amazing work.
For now, I am personally writing lecture notes based on cs294-fall2017 course. And I am doing this for clearing my thoughts of some ideas and math. Besides, I am planning to open source it (here is the repo), but I don't know if there is any copyright issues I should aware of, or do you think it's appropriate?

HW1 Setup

I encountered two issues when trying to follow the setup instruction for HW1.

Latest version of mujoco-py wants MuJoCo 1.50 rather than MuJoCo 1.31.
XXX-v1 environment is no longer available from latest version of gym; I had to use XXX-v2 instead.

`gym.benchmark_spec` no longer valid

Got AttributeError: module 'gym' has no attribute 'benchmark_spec' thrown from this line.

gym no longer have this experimental feature, manually setting env = gym.make(game_name) and max_timestep as mentioned from here as workaround.

[README] Mujoco Requirement

Hi folks, first of all, thanks for the amazing material and work you open sourced, outstanding and generous gift to the community.

I wondered if would be interesting to add a side note regarding Mujoco 1.3 on recent Macs.
This version didn't support NVMe disks and still an open issue where there are more details regarding it.

ReplayBuffer - a subtle bug around head pointer boundary

Hi!

Looking at the _encode_observation function it seems you have a subtle bug in there.

Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.

But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).

The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.

Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:

    def _handle_start_index_edge_cases(self, start_index, end_index):
      # Edge case 1:
      if not self._buffer_full() and start_index < 0:
          start_index = 0

      # Edge case 2:
      # Handle the case where start index crosses the buffer head pointer - the data before and after the head pointer
      # belongs to completely different episodes
      if self._buffer_full():
          if 0 < (self.current_free_slot_index - start_index) % self.max_buffer_size < self.num_previous_frames_to_fetch:
              start_index = self.current_free_slot_index

where my num_previous_frames_to_fetch is your frame_history_len.

Could you please release some experiment result so I can check if my code is right?

I am not t student of this class, I study it by watching videos of this class.
I am finishing the homework to practice. I know it's not appropriate to release answer, but is it possible to release some experiments result so I check self-check if my answer is right?

For example:
when I follow guide and complete the code for an assignment, and run the experiments as guide suggest. the guide tell me what result should I expect, if my result deviate too much, then I probably wrong.

berkeleydeeprlcourse / homework Goto Github PK

homework's Issues

Port HW1 over to use OpenAI RoboSchool

Cost function in HW4, index error, dimension of states

hw4 and HalfCheetah pybullet/roboschool

I cannot find tensorflow 1.10.5

Assignment

hw1 dropout implementation

hw2 instruction, Probable Typo?

AttributeError for get_wrapper_by_name(env, "Monitor") in hw3

Copyrights of lecture notes

HW1 Setup

`gym.benchmark_spec` no longer valid

[README] Mujoco Requirement

ReplayBuffer - a subtle bug around head pointer boundary

Could you please release some experiment result so I can check if my code is right?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent