zfw1226 / active_tracking_rl Goto Github PK

Active visual tracking library based on PyTorch.

License: MIT License

Python 100.00%

active-tracking adversarial-reinforcement-learning multi-agent-learning

active_tracking_rl's Introduction

AD-VAT

This repository is the code for AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking (ICLR 2019).

It contains the code for training/testing(Pytorch) and the 2D environments. The 3D environments are hosted in gym-unrealcv.

Dependencies

This repository requires:

Python >= 3.6
Pytorch >= 1.0
Opencv >= 3.4
Numpy == 1.14.0
setproctitle, scikit-image, imageio, TensorboardX

See requirements.txt for more details.

Installation

To download the repository and install the requirements, you can run as:

git clone https://github.com/zfw1226/active_tracking_rl.git
cd active_tracking_rl
pip install -r requirements.txt

Note that you need install OpenCV, Pytorch, and the 2D/3D environments additionally.

Prepare the 2D/3D Environments

We provide various 2D and 3D environments to validate the effectiveness of AD-VAT.

The 2D environment is a matrix map where obstacles are randomly placed. The 2D experiment can run on a CPU-only machine. In the 2D environments, you can evaluate and quantify the effectiveness of AD-VAT in a few minutes.

To install 2D environments(gym-track2d), you need run:

pip install -e envs/gym_track2d

The 3D environments are built on Unreal Engine(UE4), which could be flexibly customized to simulate real-world active tracking scenarios. To run the 3D environments, GPU is necessary.

To install 3D environments, please follow the instructions in gym-unrealcv.

Running on 2D Environments

Training

You can try AD-VAT in 2D environments by running:

python main.py --shared-optimizer --workers 16 --split --train-mode -1 --env Track2D-BlockPartialPZR-v0

Note that you need adjust the number of --workers according to the number of your cpu cores. It is important to limit number of worker processes to number of cpu cores available as too many processes (e.g. more than one process per cpu core available) will actually be detrimental in training speed and effectiveness. --split means that it will save the tracker and target model separately for further evaluation. Besides, you can also run the two baseline methods referred in the paper.

To train tracker with Ram target:

python main.py --shared-optimizer --workers 16 --split --train-mode 0 --env Track2D-BlockPartialRam-v0

To train tracker with Nav target:

python main.py --shared-optimizer --workers 16 --split --train-mode 0 --env Track2D-BlockPartialNav-v0

To train tracker and target under Naive dueling:

python main.py --shared-optimizer  --workers 16 --split --network maze-lstm --entropy-target 0.01 --aux none --env Track2D-BlockPartialAdv-v0

Evaluation

You can evaluate the tracker by running:

python gym_eval.py --env {ENV_NAME} --network tat-maze-lstm --load-tracker {PATH_TO_YOUR_TRACKER}

The ENV_NAME we used to evaluate in the paper is:

Track2D-BlockPartialNav-v0 (Block-Nav),
Track2D-BlockPartialRam-v0 (Block-Ram),
Track2D-MazePartialNav-v0 (Maze-Nav).
Track2D-MazePartialRam-v0 (Maze-Ram),

If you use the the default setting while training, the PATH_TO_YOUR_MODLE should be logs/{ENV_NAME}/{DATE}/tracker-best.dat

You can also evaluate the effectiveness of the tracker-aware target by running with different trackers, as:

python gym_eval.py --env Track2D-BlockPartialAdv-v0 --network tat-maze-lstm --load-tracker {PATH_TO_YOUR_TRACKER} --load-target {PATH_TO_YOUR_TARGET}

Running on 3D Environments

Training

You can try AD-VAT in 3D environments by running:

python main.py --shared-optimizer  --workers 6  --split --network tat-cnn-lstm --rnn-out 256 --entropy-target 0.05 --sleep-time 30 --env UnrealTrack-DuelingRoomPZR-DiscreteColor-v4 --env-base UnrealTrack-DuelingRoomNav-DiscreteColor-v4 --gray --rescale --lr 0.0001

To train the baselines, you only need reset --env to UnrealTrack-DuelingRoomRam-DiscreteColor-v4 or UnrealTrack-DuelingRoomNav-DiscreteColor-v4, and set --train-mode to 0 meanwhile.

Evaluation

You can evaluate the tracker by running:

python gym_eval.py --env {ENV_NAME} --network cnn-lstm --gray --rescale --rnn-out 256 --load-tracker {PATH_TO_YOUR_TRACKER}

The ENV_NAME we used to evaluate in the paper is:

UnrealTrack-DuelingRoomNav-DiscreteColor-v4 (DR Room),
UnrealTrack-UrbanCityNav-DiscreteColor-v1 (Urban City),
UnrealTrack-SnowForestNav-DiscreteColor-v1 (Snow Village),
UnrealTrack-GarageNav-DiscreteColor-v0 (Parking Lot)

Visualization

You could monitor the performance while training using tensorboard:

tensorboard --logdir {PATH_TO_LOGS}

If you use the the default setting while training, PATH_TO_LOGS should be logs/{ENV_NAME}/{DATE}

Citation

If you found AD-VAT useful, please consider citing:

@inproceedings{zhong2018advat,
  title={{AD}-{VAT}: An Asymmetric Dueling mechanism for learning Visual Active Tracking},
  author={Fangwei Zhong and Peng Sun and Wenhan Luo and Tingyun Yan and Yizhou Wang},
  booktitle={International Conference on Learning Representations},
  year={2019},
  url={https://openreview.net/forum?id=HkgYmhR9KX},
  }

  @article{zhong2021advat,
  author={Zhong, Fangwei and Sun, Peng and Luo, Wenhan and Yan, Tingyun and Wang, Yizhou},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={AD-VAT+: An Asymmetric Dueling Mechanism for Learning and Understanding Visual Active Tracking}, 
  year={2021},
  volume={43},
  number={5},
  pages={1467-1482},
  doi={10.1109/TPAMI.2019.2952590}
  }

Contact

If you have any suggestion/questions, get in touch at [email protected].

active_tracking_rl's People

Contributors

Stargazers

Watchers

Forkers

denghan6688 gangmingzhao yangkang779 liuhu0018 stephen9412 houxiaonan hyzcn dongzhou-1996 xuzhiping123 venom12138 yzlicloud xrosliang unrealtracking gradiustwinbee skyisnotwarm zhenzhenzhizhi liuyi61111 kely117 melonlala

active_tracking_rl's Issues

您好，我在3D环境上面，按照ReadMe里面的提示重新训练了模型，tensorboard 日志显示，训练的 reward 和以及各种loss的方差都相当大，是什么原因造成的呢？虽然最优的reward和您在论文里面汇报的差不多。

Log Files Sharing

Thanks for your excellent work! I recently want to reproduce the results in your paper. However, the success rate of the trained model is really low with only 0.32 for UrbanCity. Therefore, I am wondering if you can share your terminal commands and log files including steps 1 and step2? It would be beneficial for checking my operations. Thanks again!

Issues Reproducing Experimental Performance in 2D Environment Algorithm

Dear @zfw1226 ,

I hope this message finds you well. I am writing to discuss some challenges I've encountered while attempting to replicate the experimental performance outlined in your paper, specifically within a 2D environment.

Despite conducting several trials using the provided training commands under the "Running on 2D Environments" section, I have observed minimal variance in the training outcomes. For instance, executing the command:

    python main.py --shared-optimizer --workers 16 --split --train-mode 0 --env Track2D-BlockPartialRam-v0

yielded a prompt "Save best! Tmax_score -3.1261218051310204" at iteration 1454, which was approximately 54 minutes into the process, and the average episode length recorded was 28.42. Post this iteration, up until iteration 17722, the script did not prompt any additional "save best!" messages, and the average episode length was noted to be 18.52 at iteration 17722.

This outcome has left me somewhat perplexed. I have closely adhered to the instructions provided in the README.MD for training and have not made any alterations to the codebase in the repository.

The experiments were conducted in an environment running Ubuntu 20.04, with Python version 3.9.16, and torch 1.12.0+cu113.

I am eager to understand the potential causes of these discrepancies and would greatly appreciate any insights or suggestions you might have. Your guidance in this matter would be invaluable in helping me resolve these issues.

Thank you for your time and assistance. I look forward to your response.
Best regards

Here are the parameters and the history information for the top few iterations of my logger:
2023-11-23 22:05:50,750 : lr: 0.001
2023-11-23 22:05:50,750 : gamma: 0.9
2023-11-23 22:05:50,750 : tau: 1.0
2023-11-23 22:05:50,750 : entropy: 0.01
2023-11-23 22:05:50,750 : entropy_target: 0.2
2023-11-23 22:05:50,750 : seed: 1
2023-11-23 22:05:50,750 : workers: 8
2023-11-23 22:05:50,750 : num_steps: 20
2023-11-23 22:05:50,750 : test_eps: 100
2023-11-23 22:05:50,750 : env: Track2D-BlockPartialRam-v0
2023-11-23 22:05:50,750 : env_base: Track2D-BlockPartialNav-v0
2023-11-23 22:05:50,750 : optimizer: Adam
2023-11-23 22:05:50,750 : amsgrad: True
2023-11-23 22:05:50,750 : load_model_dir: None
2023-11-23 22:05:50,751 : log_dir: logs/Track2D-BlockPartialRam-v0/Nov23_22-05
2023-11-23 22:05:50,751 : network: tat-maze-lstm
2023-11-23 22:05:50,751 : aux: reward
2023-11-23 22:05:50,751 : gpu_ids: [-1]
2023-11-23 22:05:50,751 : obs: img
2023-11-23 22:05:50,751 : single: False
2023-11-23 22:05:50,751 : gray: False
2023-11-23 22:05:50,751 : crop: False
2023-11-23 22:05:50,751 : inv: False
2023-11-23 22:05:50,751 : rescale: False
2023-11-23 22:05:50,751 : render: False
2023-11-23 22:05:50,751 : shared_optimizer: True
2023-11-23 22:05:50,751 : split: True
2023-11-23 22:05:50,751 : train_mode: 0
2023-11-23 22:05:50,751 : stack_frames: 1
2023-11-23 22:05:50,751 : input_size: 80
2023-11-23 22:05:50,751 : rnn_out: 128
2023-11-23 22:05:50,751 : sleep_time: 0
2023-11-23 22:05:50,751 : max_step: 150000
2023-11-23 22:05:50,751 : init_step: -1
2023-11-23 22:05:50,750 : lr: 0.001
2023-11-23 22:05:50,750 : gamma: 0.9
2023-11-23 22:05:50,750 : tau: 1.0
2023-11-23 22:05:50,750 : entropy: 0.01
2023-11-23 22:05:50,750 : entropy_target: 0.2
2023-11-23 22:05:50,750 : seed: 1
2023-11-23 22:05:50,750 : workers: 8
2023-11-23 22:05:50,750 : num_steps: 20
2023-11-23 22:05:50,750 : test_eps: 100
2023-11-23 22:05:50,750 : env: Track2D-BlockPartialRam-v0
2023-11-23 22:05:50,750 : env_base: Track2D-BlockPartialNav-v0
2023-11-23 22:05:50,750 : optimizer: Adam
2023-11-23 22:05:50,750 : amsgrad: True
2023-11-23 22:05:50,750 : load_model_dir: None
2023-11-23 22:05:50,751 : log_dir: logs/Track2D-BlockPartialRam-v0/Nov23_22-05
2023-11-23 22:05:50,751 : network: tat-maze-lstm
2023-11-23 22:05:50,751 : aux: reward
2023-11-23 22:05:50,751 : gpu_ids: [-1]
2023-11-23 22:05:50,751 : obs: img
2023-11-23 22:05:50,751 : single: False
2023-11-23 22:05:50,751 : gray: False
2023-11-23 22:05:50,751 : crop: False
2023-11-23 22:05:50,751 : inv: False
2023-11-23 22:05:50,751 : rescale: False
2023-11-23 22:05:50,751 : render: False
2023-11-23 22:05:50,751 : shared_optimizer: True
2023-11-23 22:05:50,751 : split: True
2023-11-23 22:05:50,751 : train_mode: 0
2023-11-23 22:05:50,751 : stack_frames: 1
2023-11-23 22:05:50,751 : input_size: 80
2023-11-23 22:05:50,751 : rnn_out: 128
2023-11-23 22:05:50,751 : sleep_time: 0
2023-11-23 22:05:50,751 : max_step: 150000
2023-11-23 22:05:50,751 : init_step: -1
2023-11-23 22:05:53,148 : Time 00h 00m 02s, ave eps reward [-10.59194057 10.59194057], ave eps length 18.03, reward step [-0.58746204 0.58746204]
2023-11-23 22:05:53,149 : Save best! Tmax_score -100
2023-11-23 22:05:55,198 : Time 00h 00m 04s, ave eps reward [-10.44377069 10.44377069], ave eps length 17.98, reward step [-0.58085488 0.58085488]
2023-11-23 22:05:55,198 : Save best! Tmax_score -10.591940573185248
2023-11-23 22:05:57,403 : Time 00h 00m 06s, ave eps reward [-9.93790386 9.93790386], ave eps length 18.45, reward step [-0.53863978 0.53863978]
2023-11-23 22:05:57,403 : Save best! Tmax_score -10.443770690788794
2023-11-23 22:05:59,681 : Time 00h 00m 08s, ave eps reward [-9.60469202 9.60469202], ave eps length 19.85, reward step [-0.48386358 0.48386358]
2023-11-23 22:05:59,681 : Save best! Tmax_score -9.937903862911584
2023-11-23 22:06:01,980 : Time 00h 00m 11s, ave eps reward [-9.783564 9.783564], ave eps length 19.52, reward step [-0.50120717 0.50120717]
2023-11-23 22:06:04,170 : Time 00h 00m 13s, ave eps reward [-9.78404724 9.78404724], ave eps length 18.55, reward step [-0.5274419 0.5274419]
2023-11-23 22:06:06,242 : Time 00h 00m 15s, ave eps reward [-10.05376462 10.05376462], ave eps length 19.74, reward step [-0.50930925 0.50930925]
2023-11-23 22:06:08,505 : Time 00h 00m 17s, ave eps reward [-10.63108069 10.63108069], ave eps length 18.31, reward step [-0.58061609 0.58061609]
2023-11-23 22:06:10,833 : Time 00h 00m 20s, ave eps reward [-9.57825452 9.57825452], ave eps length 19.89, reward step [-0.48156131 0.48156131]
2023-11-23 22:06:10,833 : Save best! Tmax_score -9.604692023687148
2023-11-23 22:06:13,035 : Time 00h 00m 22s, ave eps reward [-9.72288102 9.72288102], ave eps length 19.65, reward step [-0.49480311 0.49480311]
2023-11-23 22:06:15,133 : Time 00h 00m 24s, ave eps reward [-10.65068184 10.65068184], ave eps length 18.44, reward step [-0.57758578 0.57758578]
2023-11-23 22:06:17,880 : Time 00h 00m 27s, ave eps reward [-4.95734629 4.95734629], ave eps length 22.88, reward step [-0.21666723 0.21666723]
2023-11-23 22:06:17,880 : Save best! Tmax_score -9.578254515302184
2023-11-23 22:06:20,104 : Time 00h 00m 29s, ave eps reward [-9.73633368 9.73633368], ave eps length 19.15, reward step [-0.50842474 0.50842474]
2023-11-23 22:06:22,367 : Time 00h 00m 31s, ave eps reward [-10.34160097 10.34160097], ave eps length 18.41, reward step [-0.56173824 0.56173824]
2023-11-23 22:06:24,708 : Time 00h 00m 33s, ave eps reward [-9.63335653 9.63335653], ave eps length 19.61, reward step [-0.49124715 0.49124715]
2023-11-23 22:06:26,922 : Time 00h 00m 36s, ave eps reward [-11.0302941 11.0302941], ave eps length 17.15, reward step [-0.64316584 0.64316584]
2023-11-23 22:06:29,055 : Time 00h 00m 38s, ave eps reward [-10.23326248 10.23326248], ave eps length 18.7, reward step [-0.54723329 0.54723329]
2023-11-23 22:06:31,333 : Time 00h 00m 40s, ave eps reward [-9.95012408 9.95012408], ave eps length 19.09, reward step [-0.5212218 0.5212218]
2023-11-23 22:06:33,625 : Time 00h 00m 42s, ave eps reward [-10.26096099 10.26096099], ave eps length 18.53, reward step [-0.55374857 0.55374857]
2023-11-23 22:06:35,697 : Time 00h 00m 44s, ave eps reward [-10.2174475 10.2174475], ave eps length 17.95, reward step [-0.56921713 0.56921713]
2023-11-23 22:06:37,972 : Time 00h 00m 47s, ave eps reward [-9.58882095 9.58882095], ave eps length 19.53, reward step [-0.49097906 0.49097906]
2023-11-23 22:06:40,224 : Time 00h 00m 49s, ave eps reward [-9.85489348 9.85489348], ave eps length 18.88, reward step [-0.52197529 0.52197529]
2023-11-23 22:06:42,476 : Time 00h 00m 51s, ave eps reward [-10.09319288 10.09319288], ave eps length 18.36, reward step [-0.54973817 0.54973817]
2023-11-23 22:06:44,525 : Time 00h 00m 53s, ave eps reward [-11.03300383 11.03300383], ave eps length 16.86, reward step [-0.65438931 0.65438931]
2023-11-23 22:06:46,963 : Time 00h 00m 56s, ave eps reward [-5.14335535 5.14335535], ave eps length 22.97, reward step [-0.22391621 0.22391621]
2023-11-23 22:06:49,267 : Time 00h 00m 58s, ave eps reward [-9.95429311 9.95429311], ave eps length 19.25, reward step [-0.51710614 0.51710614]
2023-11-23 22:06:51,415 : Time 00h 01m 00s, ave eps reward [-9.90422546 9.90422546], ave eps length 18.59, reward step [-0.53277168 0.53277168]
2023-11-23 22:06:53,637 : Time 00h 01m 02s, ave eps reward [-10.02535256 10.02535256], ave eps length 18.45, reward step [-0.54337954 0.54337954]
2023-11-23 22:06:55,697 : Time 00h 01m 04s, ave eps reward [-9.89726635 9.89726635], ave eps length 18.79, reward step [-0.52673051 0.52673051]
2023-11-23 22:06:57,904 : Time 00h 01m 07s, ave eps reward [-9.82420788 9.82420788], ave eps length 18.62, reward step [-0.52761589 0.52761589]
2023-11-23 22:07:00,146 : Time 00h 01m 09s, ave eps reward [-9.88615193 9.88615193], ave eps length 19.71, reward step [-0.50158051 0.50158051]
2023-11-23 22:07:02,578 : Time 00h 01m 11s, ave eps reward [-9.82887571 9.82887571], ave eps length 18.83, reward step [-0.52197959 0.52197959]
2023-11-23 22:07:04,903 : Time 00h 01m 14s, ave eps reward [-9.8480646 9.8480646], ave eps length 18.77, reward step [-0.52467046 0.52467046]
2023-11-23 22:07:07,092 : Time 00h 01m 16s, ave eps reward [-10.31012244 10.31012244], ave eps length 18.19, reward step [-0.56680167 0.56680167]
2023-11-23 22:07:09,325 : Time 00h 01m 18s, ave eps reward [-9.66492162 9.66492162], ave eps length 19.09, reward step [-0.50628191 0.50628191]
2023-11-23 22:07:11,566 : Time 00h 01m 20s, ave eps reward [-10.328663 10.328663], ave eps length 18.51, reward step [-0.55800448 0.55800448]
2023-11-23 22:07:13,788 : Time 00h 01m 23s, ave eps reward [-10.26148816 10.26148816], ave eps length 18.7, reward step [-0.54874268 0.54874268]
2023-11-23 22:07:15,900 : Time 00h 01m 25s, ave eps reward [-9.8965475 9.8965475], ave eps length 18.05, reward step [-0.54828518 0.54828518]
2023-11-23 22:07:18,059 : Time 00h 01m 27s, ave eps reward [-9.90949309 9.90949309], ave eps length 19.22, reward step [-0.51558237 0.51558237]
2023-11-23 22:07:20,272 : Time 00h 01m 29s, ave eps reward [-10.8216789 10.8216789], ave eps length 17.12, reward step [-0.63210741 0.63210741]
2023-11-23 22:07:22,456 : Time 00h 01m 31s, ave eps reward [-10.15613436 10.15613436], ave eps length 18.49, reward step [-0.54927714 0.54927714]
2023-11-23 22:07:24,791 : Time 00h 01m 34s, ave eps reward [-9.84071719 9.84071719], ave eps length 19.82, reward step [-0.4965044 0.4965044]
2023-11-23 22:07:27,118 : Time 00h 01m 36s, ave eps reward [-10.19293859 10.19293859], ave eps length 18.35, reward step [-0.55547349 0.55547349]
2023-11-23 22:07:29,268 : Time 00h 01m 38s, ave eps reward [-9.67124668 9.67124668], ave eps length 19.17, reward step [-0.50449904 0.50449904]
2023-11-23 22:07:31,559 : Time 00h 01m 40s, ave eps reward [-9.36901214 9.36901214], ave eps length 19.4, reward step [-0.48293877 0.48293877]
2023-11-23 22:07:33,835 : Time 00h 01m 43s, ave eps reward [-10.51284057 10.51284057], ave eps length 18.3, reward step [-0.57447216 0.57447216]
2023-11-23 22:07:36,118 : Time 00h 01m 45s, ave eps reward [-9.86678838 9.86678838], ave eps length 18.74, reward step [-0.52650952 0.52650952]
2023-11-23 22:07:38,340 : Time 00h 01m 47s, ave eps reward [-10.51069635 10.51069635], ave eps length 17.74, reward step [-0.5924857 0.5924857]
2023-11-23 22:07:40,596 : Time 00h 01m 49s, ave eps reward [-10.40731206 10.40731206], ave eps length 18.25, reward step [-0.57026367 0.57026367]
2023-11-23 22:07:43,002 : Time 00h 01m 52s, ave eps reward [-9.91075942 9.91075942], ave eps length 18.26, reward step [-0.54275791 0.54275791]
2023-11-23 22:07:45,290 : Time 00h 01m 54s, ave eps reward [-9.30873419 9.30873419], ave eps length 19.99, reward step [-0.46566954 0.46566954]
2023-11-23 22:07:47,501 : Time 00h 01m 56s, ave eps reward [-10.5279217 10.5279217], ave eps length 18.35, reward step [-0.5737287 0.5737287]
2023-11-23 22:07:49,795 : Time 00h 01m 59s, ave eps reward [-9.25809926 9.25809926], ave eps length 20.11, reward step [-0.46037291 0.46037291]
2023-11-23 22:07:52,074 : Time 00h 02m 01s, ave eps reward [-9.98898441 9.98898441], ave eps length 19.44, reward step [-0.51383665 0.51383665]
2023-11-23 22:07:54,372 : Time 00h 02m 03s, ave eps reward [-10.30038445 10.30038445], ave eps length 18.4, reward step [-0.5598035 0.5598035]
2023-11-23 22:07:56,472 : Time 00h 02m 05s, ave eps reward [-9.70765399 9.70765399], ave eps length 18.44, reward step [-0.52644544 0.52644544]
2023-11-23 22:07:58,783 : Time 00h 02m 08s, ave eps reward [-9.18472154 9.18472154], ave eps length 20.07, reward step [-0.45763436 0.45763436]
2023-11-23 22:08:00,947 : Time 00h 02m 10s, ave eps reward [-10.21715298 10.21715298], ave eps length 18.06, reward step [-0.56573383 0.56573383]
2023-11-23 22:08:03,248 : Time 00h 02m 12s, ave eps reward [-9.66952486 9.66952486], ave eps length 18.63, reward step [-0.51902978 0.51902978]
2023-11-23 22:08:05,632 : Time 00h 02m 14s, ave eps reward [-9.45807768 9.45807768], ave eps length 19.56, reward step [-0.4835418 0.4835418]
2023-11-23 22:08:07,791 : Time 00h 02m 17s, ave eps reward [-9.38695248 9.38695248], ave eps length 19.26, reward step [-0.48738071 0.48738071]
2023-11-23 22:08:10,209 : Time 00h 02m 19s, ave eps reward [-9.90569462 9.90569462], ave eps length 20.56, reward step [-0.48179449 0.48179449]
2023-11-23 22:08:12,436 : Time 00h 02m 21s, ave eps reward [-10.18932061 10.18932061], ave eps length 17.94, reward step [-0.56796659 0.56796659]
2023-11-23 22:08:14,656 : Time 00h 02m 23s, ave eps reward [-9.22837091 9.22837091], ave eps length 19.86, reward step [-0.46467124 0.46467124]
2023-11-23 22:08:16,952 : Time 00h 02m 26s, ave eps reward [-10.63702786 10.63702786], ave eps length 18.36, reward step [-0.57935882 0.57935882]
2023-11-23 22:08:19,199 : Time 00h 02m 28s, ave eps reward [-10.90153386 10.90153386], ave eps length 17.08, reward step [-0.63826311 0.63826311]
2023-11-23 22:08:21,690 : Time 00h 02m 30s, ave eps reward [-9.78557616 9.78557616], ave eps length 19.6, reward step [-0.49926409 0.49926409]
2023-11-23 22:08:23,860 : Time 00h 02m 33s, ave eps reward [-10.2292158 10.2292158], ave eps length 19.43, reward step [-0.52646504 0.52646504]
2023-11-23 22:08:26,233 : Time 00h 02m 35s, ave eps reward [-9.96857569 9.96857569], ave eps length 19.39, reward step [-0.51410911 0.51410911]
2023-11-23 22:08:28,458 : Time 00h 02m 37s, ave eps reward [-10.5571899 10.5571899], ave eps length 18.25, reward step [-0.57847616 0.57847616]
2023-11-23 22:08:30,941 : Time 00h 02m 40s, ave eps reward [-9.1183381 9.1183381], ave eps length 20.28, reward step [-0.44962219 0.44962219]
2023-11-23 22:08:33,289 : Time 00h 02m 42s, ave eps reward [-10.0139093 10.0139093], ave eps length 19.62, reward step [-0.51039293 0.51039293]
2023-11-23 22:08:35,466 : Time 00h 02m 44s, ave eps reward [-10.43872621 10.43872621], ave eps length 18.03, reward step [-0.57896429 0.57896429]
2023-11-23 22:08:37,874 : Time 00h 02m 47s, ave eps reward [-9.31679891 9.31679891], ave eps length 20.17, reward step [-0.46191368 0.46191368]
2023-11-23 22:08:40,171 : Time 00h 02m 49s, ave eps reward [-10.10328893 10.10328893], ave eps length 18.38, reward step [-0.54968928 0.54968928]
2023-11-23 22:08:42,612 : Time 00h 02m 51s, ave eps reward [-9.77867819 9.77867819], ave eps length 18.38, reward step [-0.53202819 0.53202819]
2023-11-23 22:08:44,822 : Time 00h 02m 54s, ave eps reward [-9.89588798 9.89588798], ave eps length 18.49, reward step [-0.53520216 0.53520216]
2023-11-23 22:08:47,142 : Time 00h 02m 56s, ave eps reward [-10.44159944 10.44159944], ave eps length 18.16, reward step [-0.57497794 0.57497794]
2023-11-23 22:08:49,246 : Time 00h 02m 58s, ave eps reward [-10.11243347 10.11243347], ave eps length 18.36, reward step [-0.55078614 0.55078614]
2023-11-23 22:08:51,460 : Time 00h 03m 00s, ave eps reward [-10.48383165 10.48383165], ave eps length 18.35, reward step [-0.57132598 0.57132598]
2023-11-23 22:08:53,520 : Time 00h 03m 02s, ave eps reward [-10.82003369 10.82003369], ave eps length 17.13, reward step [-0.63164236 0.63164236]
2023-11-23 22:08:55,743 : Time 00h 03m 04s, ave eps reward [-10.55494393 10.55494393], ave eps length 17.76, reward step [-0.59430991 0.59430991]
2023-11-23 22:08:57,918 : Time 00h 03m 07s, ave eps reward [-10.40406824 10.40406824], ave eps length 19.05, reward step [-0.54614531 0.54614531]
2023-11-23 22:09:00,262 : Time 00h 03m 09s, ave eps reward [-9.86211541 9.86211541], ave eps length 18.77, reward step [-0.52541904 0.52541904]
2023-11-23 22:09:02,569 : Time 00h 03m 11s, ave eps reward [-9.63801261 9.63801261], ave eps length 19.1, reward step [-0.50460799 0.50460799]
2023-11-23 22:09:04,711 : Time 00h 03m 13s, ave eps reward [-10.36858659 10.36858659], ave eps length 18.4, reward step [-0.56351014 0.56351014]
2023-11-23 22:09:07,018 : Time 00h 03m 16s, ave eps reward [-9.84495775 9.84495775], ave eps length 18.9, reward step [-0.52089724 0.52089724]
2023-11-23 22:09:09,512 : Time 00h 03m 18s, ave eps reward [-9.75935242 9.75935242], ave eps length 20.05, reward step [-0.48675074 0.48675074]
2023-11-23 22:09:11,876 : Time 00h 03m 21s, ave eps reward [-9.85333055 9.85333055], ave eps length 19.06, reward step [-0.51696383 0.51696383]
2023-11-23 22:09:14,180 : Time 00h 03m 23s, ave eps reward [-9.64193431 9.64193431], ave eps length 19.49, reward step [-0.49471187 0.49471187]
2023-11-23 22:09:16,563 : Time 00h 03m 25s, ave eps reward [-9.31726943 9.31726943], ave eps length 19.3, reward step [-0.48276007 0.48276007]
2023-11-23 22:09:19,280 : Time 00h 03m 28s, ave eps reward [-5.57140946 5.57140946], ave eps length 23.84, reward step [-0.23370006 0.23370006]
2023-11-23 22:09:21,595 : Time 00h 03m 30s, ave eps reward [-9.6879347 9.6879347], ave eps length 19.1, reward step [-0.50722171 0.50722171]
2023-11-23 22:09:23,860 : Time 00h 03m 33s, ave eps reward [-9.99483687 9.99483687], ave eps length 18.09, reward step [-0.55250618 0.55250618]
2023-11-23 22:09:26,148 : Time 00h 03m 35s, ave eps reward [-9.84945411 9.84945411], ave eps length 19.14, reward step [-0.51460053 0.51460053]
2023-11-23 22:09:28,631 : Time 00h 03m 37s, ave eps reward [-9.98298347 9.98298347], ave eps length 19.75, reward step [-0.50546752 0.50546752]
2023-11-23 22:09:30,891 : Time 00h 03m 40s, ave eps reward [-9.57795742 9.57795742], ave eps length 19.27, reward step [-0.49703982 0.49703982]
2023-11-23 22:09:33,116 : Time 00h 03m 42s, ave eps reward [-10.08319149 10.08319149], ave eps length 18.44, reward step [-0.54681082 0.54681082]
2023-11-23 22:09:35,427 : Time 00h 03m 44s, ave eps reward [-9.99879274 9.99879274], ave eps length 18.23, reward step [-0.54848013 0.54848013]
2023-11-23 22:09:37,506 : Time 00h 03m 46s, ave eps reward [-10.46280038 10.46280038], ave eps length 17.8, reward step [-0.58779777 0.58779777]
2023-11-23 22:09:39,740 : Time 00h 03m 48s, ave eps reward [-10.64790686 10.64790686], ave eps length 17.88, reward step [-0.59552052 0.59552052]
2023-11-23 22:09:42,059 : Time 00h 03m 51s, ave eps reward [-10.02871632 10.02871632], ave eps length 18.09, reward step [-0.55437901 0.55437901]
2023-11-23 22:09:44,389 : Time 00h 03m 53s, ave eps reward [-10.20433077 10.20433077], ave eps length 17.92, reward step [-0.5694381 0.5694381]
2023-11-23 22:09:46,939 : Time 00h 03m 56s, ave eps reward [-9.7519367 9.7519367], ave eps length 19.59, reward step [-0.49780177 0.49780177]
2023-11-23 22:09:49,205 : Time 00h 03m 58s, ave eps reward [-9.06315246 9.06315246], ave eps length 18.73, reward step [-0.48388427 0.48388427]
2023-11-23 22:09:51,640 : Time 00h 04m 00s, ave eps reward [-10.16455878 10.16455878], ave eps length 18.89, reward step [-0.53809205 0.53809205]

环境配置问题

您好，我安装了gym-unrealcv，运行了这句话：

python main.py --model simple-pos-act-lstm --tracker none --env UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --env-base UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --rnn-out 128 --seed 4 --seed-test 2 --train-mode -1 --test-eps 25 --norm-reward --aux reward --lr 0.001 --gpu-id 0
但是显示这个错误：
FileNotFoundError: [Errno 2] No such file or directory: '/home/sy/jia/cv/gym-unrealcv-1.0/gym_unrealcv/envs/UnrealEnv/textures'
请问我需要下载哪个环境？下载后应该命名为什么？放在哪个目录？
感谢！

Is it feasible to run 'UnrealTrack..' on windows?

Is it feasible to run 'UnrealTrack-DuelingRoomPZR-DiscreteColor-v4' on windows?Only linux binaries were provided in the load_env.py.I should download the windows edition somewhere or have to build a new environment from from scratch

The problem about the fine-tuning of meta strategy

Hello, the author, I want to ask a question about the training of the thesis of distractor. The second step of the meta strategy is to fine tune the teacher tracker. According to your tutorial, I put the “*. pth” file from the/log folder to a new folder, and then put the "nest. pth "file in another folder. Then use the command you give to train:
python main.py --model simple-pos-act-lstm --tracker none --env UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --env-base UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --rnn-out 128 --seed 4 --seed-test 2 --train-mode 0 --norm-reward --aux attack-reward --lr 0.0005 --gpu-id 0 --old old_oppent/ --load-model-dir new_model/new-204.pth
But it didn't produce any results. I don't know what's wrong.

Current./old_ Oppent has the current file

Current./new_ The model has the current file

But after running for several hours, nothing happened，I aborted the program and the following exception occurred：

论文中提到的目标的期望位置是怎么得到的啊？

谢谢您的非常漂亮的工作和无私分享代码。
我有几个问题请教：
1）论文中提到的目标的期望位置是怎么得到的啊？

2）论文公式（1）的代码实现只有距离分量，没有考虑角度分量。我的问题是：这个地方为什么可以简化处理呢？
公式（1）的实现
3）在这个环境的代码实现里，只有两个reward，没有distractor的reward，与原文也不一致。我的问题是：这里很迷惑。
Reward的代码实现
4）self.exp_distance怎么确定？
self.exp_distance
5）关于角度的计算和公式（1）中的有出入：角度的期望是什么呢？
def get_direction
以上几个问题，希望能够得到您的指点。
谢谢！

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.