berkeleydeeprlcourse / homework Goto Github PK
View Code? Open in Web Editor NEWAssignments for CS294-112.
License: MIT License
Assignments for CS294-112.
License: MIT License
OpenAI has re-implemented the Mujuco environments in Bullet making them available to everyone. It would be really helpful for people following the course after the fact if HW1 could be ported over to use these new environments.
I could probably send some time working on this, but I'm not sure how I would generate new expert policies.
Hi,
It seems the cost function for the half-cheetah environment (cost_functions.py) has a mistake. I'm getting the error
---> 50 score -= (next_state[17] - state[17]) / 0.01 #+ 0.1 * (np.sum(action**2))
IndexError: index 17 is out of bounds for axis 0 with size 17
Is the dimension 17 correct for the environment? Thanks :)
Hi all,
I'm trying to figure out how to port the cost function of HalfCheetah MuJoCo to HalfCheetah pybullet for model based RL. The state vector is not really the same for the two environments. 26-dim for pybullet instead of 20-dim. Any idea on how to implement the cost function in order to produce the same behaviour with the two environments ?
When I use conda search
I can find version 1.10.0
, 1.11.0
..., I dont know if it may matter if I choose version 1.10.0
.
Or, Where could I find version 1.10.5
.
Sorry I am a newer for tf
In tf_utils.py HW1, I found dropout implementation as follows,
def dropout(x, pkeep, phase=None, mask=None):
mask = tf.floor(pkeep + tf.random_uniform(tf.shape(x))) if mask is None else mask
if phase is None:
return mask * x
else:
return switch(phase, mask*x, pkeep*x)
In test phase, should it be x/pkeep instead of x*pkeep(based on inverse dropout theory)? If not, why?
Thanks for your explanation.
Hi,
At the end of the page three of the HW2 instruction, (Problem 1.a) Few equations sum over i
s. Shouldn't they be t
s.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-005c8955d609> in <module>()
12 frame_history_len=4,
13 target_update_freq=10000,
---> 14 grad_norm_clipping=10
15 )
/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
135 for t in count():
136 ### Check stopping criterion
--> 137 if stopping_criterion is not None and stopping_criterion(env, t):
138 break
139
<ipython-input-11-ab71394a8fd6> in stopping_criterion(env, t)
2 # notice that here t is the number of steps of the wrapped env,
3 # which is different from the number of steps in the underlying env
----> 4 return get_wrapper_by_name(env, "Monitor").get_total_steps() >= num_timesteps
AttributeError: '_Monitor' object has no attribute 'get_total_steps'
AttributeError Traceback (most recent call last)
<ipython-input-9-005c8955d609> in <module>()
12 frame_history_len=4,
13 target_update_freq=10000,
---> 14 grad_norm_clipping=10
15 )
/Users/mac/Projects/ml-playground/reinforcement/deep_q_learning.py in dqn_learing(env, q_func, optimizer_spec, exploration, stopping_criterion, replay_buffer_size, batch_size, gamma, learning_starts, learning_freq, frame_history_len, target_update_freq, grad_norm_clipping)
201
202 ### 4. Log progress
--> 203 episode_rewards = get_wrapper_by_name(env, "Monitor").get_episode_rewards()
204 if len(episode_rewards) > 0:
205 mean_episode_reward = np.mean(episode_rewards[-100:])
AttributeError: '_Monitor' object has no attribute 'get_episode_rewards'
Seems like whenever I can't get any specified attributes when calling get_wrapper_by_name(env, "Monitor")
I have update gym version, but it didn't work out.
Thx for the amazing work.
For now, I am personally writing lecture notes based on cs294-fall2017 course. And I am doing this for clearing my thoughts of some ideas and math. Besides, I am planning to open source it (here is the repo), but I don't know if there is any copyright issues I should aware of, or do you think it's appropriate?
Hi folks, first of all, thanks for the amazing material and work you open sourced, outstanding and generous gift to the community.
I wondered if would be interesting to add a side note regarding Mujoco 1.3 on recent Macs.
This version didn't support NVMe disks and still an open issue where there are more details regarding it.
Hi!
Looking at the _encode_observation function it seems you have a subtle bug in there.
Namely, you're only handling the start_idx edge case where the buffer is still not full and start_idx is negative.
But even in the case where the buffer is full but start_idx crosses the buffer's head pointer boundary you'll be stacking fresh experience with super old experience (especially in a 1M slot buffer).
The bigger the buffer the less probable this event would be, and even if it happened, since it's a low-frequency event it won't affect the Q-learning function but I still thought flagging it.
Since I'm implementing my own version of DQN here is a snippet of how I handle the start index:
def _handle_start_index_edge_cases(self, start_index, end_index):
# Edge case 1:
if not self._buffer_full() and start_index < 0:
start_index = 0
# Edge case 2:
# Handle the case where start index crosses the buffer head pointer - the data before and after the head pointer
# belongs to completely different episodes
if self._buffer_full():
if 0 < (self.current_free_slot_index - start_index) % self.max_buffer_size < self.num_previous_frames_to_fetch:
start_index = self.current_free_slot_index
where my num_previous_frames_to_fetch
is your frame_history_len
.
I am not t student of this class, I study it by watching videos of this class.
I am finishing the homework to practice. I know it's not appropriate to release answer, but is it possible to release some experiments result so I check self-check if my answer is right?
For example:
when I follow guide and complete the code for an assignment, and run the experiments as guide suggest. the guide tell me what result should I expect, if my result deviate too much, then I probably wrong.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.