Hi there! I am trying to reproduce the result of homework 1, problem 1(b). I use the file requirements.txt to install all my dependencies. And when I ran the command:
Loading expert policy from... cs285/policies/experts/HalfCheetah.pkl
obs (1, 17) (1, 17)
Done restoring expert policy...
********** Iteration 0 ************
Training agent using sampled data from replay buffer...
Beginning logging procedure...
Collecting data for eval...
Eval_AverageReturn : 4.991946220397949
Eval_StdReturn : 17.147544860839844
Eval_MaxReturn : 32.29301452636719
Eval_MinReturn : -9.376068115234375
Eval_AverageEpLen : 1000.0
Train_AverageReturn : 4205.7783203125
Train_StdReturn : 83.038818359375
Train_MaxReturn : 4288.81689453125
Train_MinReturn : 4122.7392578125
Train_AverageEpLen : 1000.0
Train_EnvstepsSoFar : 0
TimeSinceStart : 4.198240041732788
Initial_DataCollection_AverageReturn : 4205.7783203125
Done logging...
Saving agent's actor...
So the average return of evaluation is about 4.99, which does not match the result provided in folder ./hw1/run_logs/bc_test_bc_hcheetah_HalfCheetah-v2_16-09-2019_00-58-58/. I was wondering which part I've done wrong and it would be nice if you could help me figure it out. Many thanks!