Comments (6)
(Sample report 1)
Summary
- Total run step: 48M STEP
- Average Score (last 16 lines): 610-710
- Learning curve:
- Experimenter: Itsukara
- Web site: http://itsukara.hateblo.jp/
Run Structure
- 0M - 48M STEP: --train-episode-steps=30 --lives-lost-reward=-0.03 --reset-max-reward=True --psc-use=True --randomness-time=3000 --color-maximizing-in-gs=False --color-averaging-in-ale=True
Details
- 0M - 48M STEP: --train-episode-steps=30 --lives-lost-reward=-0.03 --reset-max-reward=True --psc-use=True --randomness-time=3000 --color-maximizing-in-gs=False --color-averaging-in-ale=True
******************** options ********************^M
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=True, color_maximizing_in_gs=False, display=True, end_mega_step=80, end_time_step=80000000, entropy_beta=0.01, frames_skip_in_ale=4, frames_skip_in_gs=1, gamma=0.99, grad_norm_clip=40.0, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=-0.03, lives_lost_rratio=1.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=100, max_play_steps=4500, max_play_time=300, max_time_step=100000000, max_to_keep=5, no_reward_steps=225, no_reward_time=15, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=2.2222222222222223e-05, randomness_log_interval=1500.0, randomness_log_num=30, randomness_steps=45000, randomness_time=3000.0, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, terminate_on_lives_lost=False, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)^M
<Average Episode score (last 16 lines)>
@@@ Average Episode score = 698.000000, s= 47597186,th=3
@@@ Average Episode score = 641.000000, s= 47601045,th=2
@@@ Average Episode score = 665.000000, s= 47609651,th=4
@@@ Average Episode score = 675.000000, s= 47616326,th=7
@@@ Average Episode score = 610.000000, s= 47674768,th=0
@@@ Average Episode score = 692.000000, s= 47705725,th=5
@@@ Average Episode score = 698.000000, s= 47758888,th=3
@@@ Average Episode score = 661.000000, s= 47791572,th=1
@@@ Average Episode score = 701.000000, s= 47798075,th=4
@@@ Average Episode score = 672.000000, s= 47815701,th=6
@@@ Average Episode score = 700.000000, s= 47821598,th=2
@@@ Average Episode score = 610.000000, s= 47858878,th=0
@@@ Average Episode score = 689.000000, s= 47859319,th=7
@@@ Average Episode score = 710.000000, s= 47928128,th=5
@@@ Average Episode score = 661.000000, s= 47993088,th=3
@@@ Average Episode score = 665.000000, s= 47993704,th=1
from async_deep_reinforce.
(Sample Report2)
Summary
- Total run step: 26M STEP
- Average Score (last 16 lines): 359 - 424
- Learning curve:
- Experimenter: Itsukara
- Web site: http://itsukara.hateblo.jp/
Run Structure
- 0M - 3M STEP: --train-episode-steps=30 --lives-lost-reward=-0.02 --reset-max-reward=True --psc-use=True --randomness-time=3000 --color-maximizing-in-gs=False --color-averaging-in-ale=True
- 3M - 26M STEP: --train-episode-steps=30 --lives-lost-reward=-0.02 --reset-max-reward=True --psc-use=True --randomness-time=3000 --color-maximizing-in-gs=True --color-averaging-in-ale=False
Details
- 0M - 3M STEP: --train-episode-steps=30 --lives-lost-reward=-0.02 --reset-max-reward=True --psc-use=True --randomness-time=3000 --color-maximizing-in-gs=False --color-averaging-in-ale=True
******************** options ********************
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=True, color_maximizing_in_gs=False, display=True, end_mega_step=3, end_time_step=3000000, entropy_beta=0.01, frames_skip_in_ale=4, frames_skip_in_gs=1, gamma=0.99, grad_norm_clip=40.0, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=-0.02, lives_lost_rratio=1.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=100, max_play_steps=4500, max_play_time=300, max_time_step=100000000, max_to_keep=5, no_reward_steps=225, no_reward_time=15, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=2.2222222222222223e-05, randomness_log_interval=1500.0, randomness_log_num=30, randomness_steps=45000, randomness_time=3000.0, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, terminate_on_lives_lost=False, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)
- 3M - 26M STEP: --train-episode-steps=30 --lives-lost-reward=-0.02 --reset-max-reward=True --psc-use=True --randomness-time=3000 --color-maximizing-in-gs=True --color-averaging-in-ale=False
******************** options ********************
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=False, color_maximizing_in_gs=True, display=True, end_mega_step=80, end_time_step=80000000, entropy_beta=0.01, frames_skip_in_ale=1, frames_skip_in_gs=4, gamma=0.99, grad_norm_clip=40.0, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=-0.02, lives_lost_rratio=1.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=100, max_play_steps=4500, max_play_time=300, max_time_step=100000000, max_to_keep=5, no_reward_steps=225, no_reward_time=15, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=2.2222222222222223e-05, randomness_log_interval=1500.0, randomness_log_num=30, randomness_steps=45000, randomness_time=3000.0, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, terminate_on_lives_lost=False, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)
<Average Episode score (last 16 lines)>
@@@ Average Episode score = 400.000000, s= 25752794,th=6
@@@ Average Episode score = 419.000000, s= 25781176,th=1
@@@ Average Episode score = 409.000000, s= 25791626,th=2
@@@ Average Episode score = 362.000000, s= 25795054,th=3
@@@ Average Episode score = 381.000000, s= 25810532,th=4
@@@ Average Episode score = 376.000000, s= 25812920,th=7
@@@ Average Episode score = 396.000000, s= 25814558,th=5
@@@ Average Episode score = 424.000000, s= 25949909,th=1
@@@ Average Episode score = 359.000000, s= 25950264,th=0
@@@ Average Episode score = 373.000000, s= 25951475,th=7
@@@ Average Episode score = 397.000000, s= 25957375,th=6
@@@ Average Episode score = 369.000000, s= 25963558,th=3
@@@ Average Episode score = 386.000000, s= 25996207,th=5
@@@ Average Episode score = 392.000000, s= 26015968,th=4
@@@ Average Episode score = 388.000000, s= 26053439,th=6
@@@ Average Episode score = 415.000000, s= 26065448,th=2
from async_deep_reinforce.
Summary
- Total run step: 43M STEP
- Average Score (last 20 lines): 298-350
- Learning curve: http://tflare.com/async_deep_reinforce/a3c_20160912.png
- Experimenter: tflare
- Web site: http://tflare.com
Details
- --train-episode-steps=30 --lives-lost-reward=-0.02 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --repeat-action-probability=0.0 --randomness-time=3000 --end-mega-step=80 --log-interval=800 --max-to-keep=5 --color-maximizing-in-gs=False --color-averaging-in-ale=True
options
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=True, color_maximizing_in_gs=False, display=True, end_mega_step=80, end_time_step=80000000, entropy_beta=0.01, frames_skip_in_ale=4, frames_skip_in_gs=1, gamma=0.99, grad_norm_clip=40.0, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=-0.02, lives_lost_rratio=0.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=100, max_play_steps=4500, max_play_time=300, max_time_step=100000000, max_to_keep=5, no_reward_steps=225, no_reward_time=15, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=2.2222222222222223e-05, randomness_log_interval=1500.0, randomness_log_num=30, randomness_steps=45000, randomness_time=3000.0, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, terminate_on_lives_lost=False, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)
last 20 Average
@@@ Average Episode score = 317.000000, s= 43205428,th=5
@@@ Average Episode score = 332.000000, s= 43229035,th=4
@@@ Average Episode score = 321.000000, s= 43232792,th=6
@@@ Average Episode score = 312.000000, s= 43250069,th=3
@@@ Average Episode score = 336.000000, s= 43256872,th=2
@@@ Average Episode score = 298.000000, s= 43301059,th=0
@@@ Average Episode score = 350.000000, s= 43349114,th=7
@@@ Average Episode score = 331.000000, s= 43352754,th=1
@@@ Average Episode score = 315.000000, s= 43362133,th=6
@@@ Average Episode score = 307.000000, s= 43393477,th=5
@@@ Average Episode score = 329.000000, s= 43425115,th=4
@@@ Average Episode score = 340.000000, s= 43439061,th=2
@@@ Average Episode score = 301.000000, s= 43456306,th=3
@@@ Average Episode score = 329.000000, s= 43512264,th=1
@@@ Average Episode score = 307.000000, s= 43519510,th=0
@@@ Average Episode score = 298.000000, s= 43525898,th=6
@@@ Average Episode score = 337.000000, s= 43529651,th=7
@@@ Average Episode score = 323.000000, s= 43626017,th=4
@@@ Average Episode score = 346.000000, s= 43626579,th=2
@@@ Average Episode score = 312.000000, s= 43634385,th=5
from async_deep_reinforce.
Summary
- Total run step: 84M STEP (in 85M STEP)
- Average Score (last 16 lines): 1007 - 1280
- Learning curve:
- Checkpoints data: https://github.com/Itsukara/async_deep_reinforce/tree/master/checkpoints.montezuma-b-rap000-avg-84M
- Experimenter: Itsukara
- Web site: http://itsukara.hateblo.jp/
Run Structure
- 0M - 84M STEP: --train-episode-steps=30 --tes-extend=True --lives-lost-reward=0.0 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --repeat-action-probability=0.0 --randomness-time=3000 --end-mega-step=100 --log-interval=800 --color-maximizing-in-gs=False --color-averaging-in-ale=True --max-mega-step=200
Details
- 0M - 84M STEP: --train-episode-steps=30 --tes-extend=True --lives-lost-reward=0.0 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --repeat-action-probability=0.0 --randomness-time=3000 --end-mega-step=100 --log-interval=800 --color-maximizing-in-gs=False --color-averaging-in-ale=True --max-mega-step=200
******************** options ********************
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=True, color_maximizing_in_gs=False, display=True, end_mega_step=100, end_time_step=100000000, entropy_beta=0.01, frames_skip_in_ale=4, frames_skip_in_gs=1, gamma=0.99, grad_norm_clip=40.0, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=0.0, lives_lost_rratio=0.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=200, max_play_steps=4500, max_play_time=300, max_time_step=200000000, max_to_keep=None, no_reward_steps=225, no_reward_time=15, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=2.2222222222222223e-05, randomness_log_interval=1500.0, randomness_log_num=30, randomness_steps=45000, randomness_time=3000.0, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, terminate_on_lives_lost=False, tes_extend=True, tes_extend_ratio=5.0, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)
use_gpu=True, use_lstm=False, verbose=True)
<Average Episode score (last 16 lines just before 84M STEP)>
@@@ Average Episode score = 1114.000000, s= 83401163,th=4
@@@ Average Episode score = 1280.000000, s= 83447970,th=3
@@@ Average Episode score = 1130.000000, s= 83486089,th=7
@@@ Average Episode score = 1193.000000, s= 83510661,th=6
@@@ Average Episode score = 1213.000000, s= 83570385,th=2
@@@ Average Episode score = 1081.000000, s= 83572954,th=5
@@@ Average Episode score = 1023.000000, s= 83623526,th=1
@@@ Average Episode score = 1012.000000, s= 83633493,th=0
@@@ Average Episode score = 1156.000000, s= 83691654,th=4
@@@ Average Episode score = 1265.000000, s= 83771600,th=3
@@@ Average Episode score = 1007.000000, s= 83772564,th=1
@@@ Average Episode score = 1169.000000, s= 83777307,th=6
@@@ Average Episode score = 1054.000000, s= 83779671,th=5
@@@ Average Episode score = 1065.000000, s= 83787417,th=7
@@@ Average Episode score = 1255.000000, s= 83913801,th=2
@@@ Average Episode score = 1017.000000, s= 83935966,th=0
from async_deep_reinforce.
Summary
- Total run step: 65.2M STEP
- Average Score (last 16 lines): 1090 - 1573
- Learning curve:
- Checkpoints data:
- Experimenter: Itsukara
- Web site: http://itsukara.hateblo.jp/
Run Structure
- 0M - 62.5M STEP:option=--train-episode-steps=30 --tes-extend=True --lives-lost-reward=0.0 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --no-reward-time=600 --end-mega-step=100 --log-interval=800 --color-maximizing-in-gs=False --color-averaging-in-ale=True --max-mega-step=200 --greediness=0.01 --repeat-action-ratio=0.25
Details
- 0M - 62.5M STEP:option=--train-episode-steps=30 --tes-extend=True --lives-lost-reward=0.0 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --no-reward-time=600 --end-mega-step=100 --log-interval=800 --color-maximizing-in-gs=False --color-averaging-in-ale=True --max-mega-step=200 --greediness=0.01 --repeat-action-ratio=0.25
******************** options ********************
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=True, color_maximizing_in_gs=False, display=True, end_mega_step=100, end_time_step=100000000, entropy_beta=0.01, frames_skip_in_ale=4, frames_skip_in_gs=1, gamma=0.99, grad_norm_clip=40.0, greediness=0.01, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=0.0, lives_lost_rratio=0.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=200, max_play_steps=4500, max_play_time=300, max_time_step=200000000, max_to_keep=None, no_reward_steps=9000, no_reward_time=600, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=0.00022222222222222223, randomness_log_interval=150.0, randomness_log_num=30, randomness_steps=4500, randomness_time=300, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, repeat_action_ratio=0.25, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, terminate_on_lives_lost=False, tes_extend=True, tes_extend_ratio=5.0, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)
<Average Episode score (last 16 lines 65.2M STEP)>
@@@ Average Episode score = 1369.000000, s= 64695930,th=2
@@@ Average Episode score = 1441.000000, s= 64752913,th=6
@@@ Average Episode score = 1183.000000, s= 64781579,th=5
@@@ Average Episode score = 1217.000000, s= 64783400,th=3
@@@ Average Episode score = 1431.000000, s= 64823778,th=0
@@@ Average Episode score = 1252.000000, s= 64847953,th=4
@@@ Average Episode score = 1508.000000, s= 64858656,th=7
@@@ Average Episode score = 1090.000000, s= 64887300,th=1
@@@ Average Episode score = 1522.000000, s= 64988743,th=6
@@@ Average Episode score = 1349.000000, s= 65035796,th=2
@@@ Average Episode score = 1202.000000, s= 65065045,th=5
@@@ Average Episode score = 1201.000000, s= 65143090,th=3
@@@ Average Episode score = 1492.000000, s= 65156749,th=0
@@@ Average Episode score = 1573.000000, s= 65180094,th=7
@@@ Average Episode score = 1357.000000, s= 65206441,th=4
@@@ Average Episode score = 1191.000000, s= 65217488,th=1
from async_deep_reinforce.
Summary
- Total run step: 100M STEP
- Best Average Score 1520 - 1754 (36 lines @ 78.459M - 79.864M STEP)>
- Learning curve:
- Checkpoints data: https://github.com/Itsukara/async_deep_reinforce/blob/master/checkpoints.montezuma-c-avg-greedy-rar025-78M.tgz
- Experimenter: Itsukara
- Web site: http://itsukara.hateblo.jp/
Run Structure
- 0M - 100M STEP:option=--train-episode-steps=30 --tes-extend=True --lives-lost-reward=0.0 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --no-reward-time=600 --end-mega-step=100 --log-interval=800 --color-maximizing-in-gs=False --color-averaging-in-ale=True --max-mega-step=200 --greediness=0.01 --repeat-action-ratio=0.25
Details
- 0M - 100M STEP:option=--train-episode-steps=30 --tes-extend=True --lives-lost-reward=0.0 --lives-lost-rratio=0.0 --reset-max-reward=True --psc-use=True --no-reward-time=600 --end-mega-step=100 --log-interval=800 --color-maximizing-in-gs=False --color-averaging-in-ale=True --max-mega-step=200 --greediness=0.01 --repeat-action-ratio=0.25
******************** options ********************
Namespace(action_size=18, average_score_log_interval=10, basic_income=0.0, basic_income_time=100000000000000000000, checkpoint_dir='checkpoints', color_averaging_in_ale=True, color_maximizing_in_gs=False, display=True, end_mega_step=100, end_time_step=100000000, entropy_beta=0.01, frames_skip_in_ale=4, frames_skip_in_gs=1, gamma=0.99, grad_norm_clip=40.0, greediness=0.01, initial_alpha_high=0.01, initial_alpha_log_rate=0.4226, initial_alpha_low=0.0001, lives_lost_reward=0.0, lives_lost_rratio=0.0, lives_lost_weight=1.0, local_t_max=5, log_file='tmp/a3c_log', log_interval=800, max_mega_step=200, max_play_steps=4500, max_play_time=300, max_time_step=200000000, max_to_keep=None, no_reward_steps=9000, no_reward_time=600, num_experiments=1, parallel_size=8, performance_log_interval=1500, psc_beta=0.01, psc_frsize=42, psc_maxval=127, psc_use=True, randomness=0.00022222222222222223, randomness_log_interval=150.0, randomness_log_num=30, randomness_steps=4500, randomness_time=300, record_gs_screen_dir=None, record_new_record_dir=None, record_screen_dir=None, repeat_action_probability=0.0, repeat_action_ratio=0.25, reset_max_reward=True, reward_clip=1.0, rmsp_alpha=0.99, rmsp_epsilon=0.1, rom='montezuma_revenge.bin', save_mega_interval=3, save_time_interval=3000000, score_averaging_length=100, score_highest_ratio=0.5, score_log_interval=900, stack_frames_in_gs=False, terminate_on_lives_lost=False, tes_extend=True, tes_extend_ratio=5.0, train_episode_steps=30, train_in_eval=False, use_gpu=True, use_lstm=False, verbose=True)
<Average Episode score (36 lines @ 78.459M - 79.864M STEP)>
@@@ Average Episode score = 1683.000000, s= 78459385,th=2
@@@ Average Episode score = 1570.000000, s= 78474074,th=5
@@@ Average Episode score = 1600.000000, s= 78534232,th=3
@@@ Average Episode score = 1528.000000, s= 78534463,th=1
@@@ Average Episode score = 1561.000000, s= 78602286,th=0
@@@ Average Episode score = 1730.000000, s= 78649996,th=7
@@@ Average Episode score = 1596.000000, s= 78763734,th=4
@@@ Average Episode score = 1539.000000, s= 78787510,th=6
@@@ Average Episode score = 1585.000000, s= 78790811,th=5
@@@ Average Episode score = 1723.000000, s= 78799980,th=2
@@@ Average Episode score = 1532.000000, s= 78864929,th=1
@@@ Average Episode score = 1561.000000, s= 78889484,th=3
@@@ Average Episode score = 1541.000000, s= 78962613,th=0
@@@ Average Episode score = 1754.000000, s= 78976248,th=7
@@@ Average Episode score = 1621.000000, s= 78996270,th=4
@@@ Average Episode score = 1619.000000, s= 79100351,th=5
@@@ Average Episode score = 1664.000000, s= 79115006,th=2
@@@ Average Episode score = 1529.000000, s= 79128643,th=6
@@@ Average Episode score = 1552.000000, s= 79202649,th=3
@@@ Average Episode score = 1573.000000, s= 79225666,th=1
@@@ Average Episode score = 1604.000000, s= 79308819,th=4
@@@ Average Episode score = 1734.000000, s= 79311025,th=7
@@@ Average Episode score = 1561.000000, s= 79323126,th=0
@@@ Average Episode score = 1680.000000, s= 79414686,th=5
@@@ Average Episode score = 1703.000000, s= 79427641,th=2
@@@ Average Episode score = 1600.000000, s= 79437383,th=6
@@@ Average Episode score = 1569.000000, s= 79558727,th=1
@@@ Average Episode score = 1592.000000, s= 79562370,th=3
@@@ Average Episode score = 1562.000000, s= 79625564,th=4
@@@ Average Episode score = 1520.000000, s= 79657699,th=0
@@@ Average Episode score = 1636.000000, s= 79668274,th=7
@@@ Average Episode score = 1678.000000, s= 79721917,th=5
@@@ Average Episode score = 1553.000000, s= 79739129,th=6
@@@ Average Episode score = 1668.000000, s= 79739390,th=2
@@@ Average Episode score = 1520.000000, s= 79784754,th=1
@@@ Average Episode score = 1546.000000, s= 79864012,th=3
from async_deep_reinforce.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from async_deep_reinforce.