Code Monkey home page Code Monkey logo

ssl_detection's Introduction

STAC is a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentation.

This code is only used for research. This is not an official Google product.

Instruction

Install dependencies

Set global enviroment variables.

export PRJROOT=/path/to/your/project/directory/STAC
export DATAROOT=/path/to/your/dataroot
export COCODIR=$DATAROOT/coco
export VOCDIR=$DATAROOT/voc
export PYTHONPATH=$PYTHONPATH:${PRJROOT}/third_party/FasterRCNN:${PRJROOT}/third_party/auto_augment:${PRJROOT}/third_party/tensorpack

Install virtual environment in the root folder of the project

cd ${PRJROOT}

sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3 --system-site-packages env3
. env3/bin/activate
pip install -r requirements.txt

# Make sure your tensorflow version is 1.14 not only in virtual environment but also in
# your machine, 1.15 can cause OOM issues.
python -c 'import tensorflow as tf; print(tf.__version__)'

# install coco apis
pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

(Optional) Install tensorpack

tensorpack with a compatible version is already included at third_party/tensorpack. bash cd ${PRJROOT}/third_party pip install --upgrade git+https://github.com/tensorpack/tensorpack.git

Download COCO/PASCAL VOC data and pre-trained models

Download data

See DATA.md

Download backbone model

cd ${COCODIR}
wget http://models.tensorpack.com/FasterRCNN/ImageNet-R50-AlignPadding.npz

Training

There are three steps:

  • 1. Train a standard detector on labeled data (detection/scripts/coco/train_stg1.sh).
  • 2. Predict pseudo boxes and labels of unlabeled data using the trained detector (detection/scripts/coco/eval_stg1.sh).
  • 3. Use labeled data and unlabeled data with pseudo labels to train a STAC detector (detection/scripts/coco/train_stg2.sh).

Besides instruction at here, detection/scripts/coco/train_stac.sh provides a combined script to train STAC.

detection/scripts/voc/train_stac.sh is a combined script to train STAC on PASCAL VOC.

The following example use labeled data as 10% train2017 and rest 90% train2017 data as unlabeled data.

Step 0: Set variables

cd ${PRJROOT}/detection

# Labeled and Unlabeled datasets
DATASET=coco_train2017.1@10
UNLABELED_DATASET=${DATASET}-unlabeled

# PATH to save trained models
CKPT_PATH=result/${DATASET}

# PATH to save pseudo labels for unlabeled data
PSEUDO_PATH=${CKPT_PATH}/PSEUDO_DATA

# Train with 8 GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

Step 1: Train FasterRCNN on labeled data

. scripts/coco/train_stg1.sh.

Set TRAIN.AUGTYPE_LAB=strong to apply strong data augmentation.

# --simple_path makes train_log/${DATASET}/${EXPNAME} as exact location to save
python3 train_stg1.py \
    --logdir ${CKPT_PATH} --simple_path --config \
    BACKBONE.WEIGHTS=${COCODIR}/ImageNet-R50-AlignPadding.npz \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${DATASET}',)" \
    MODE_MASK=False \
    FRCNN.BATCH_PER_IM=64 \
    PREPROC.TRAIN_SHORT_EDGE_SIZE="[500,800]" \
    TRAIN.EVAL_PERIOD=20 \
    TRAIN.AUGTYPE_LAB='default'

Step 2: Generate pseudo labels of unlabeled data

. scripts/coco/eval_stg1.sh.

Evaluate using COCO metrics and save eval.json

# Check pseudo path
if [ ! -d ${PSEUDO_PATH} ]; then
    mkdir -p ${PSEUDO_PATH}
fi

# Evaluate the model for sanity check
# model-180000 is the last checkpoint
# save eval.json at $PSEUDO_PATH

python3 predict.py \
    --evaluate ${PSEUDO_PATH}/eval.json \
    --load "${CKPT_PATH}"/model-180000 \
    --config \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${UNLABELED_DATASET}',)"

Generate pseudo labels for unlabeled data

Set EVAL.PSEUDO_INFERENCE=True to use original images rather than resized ones for inference.

# Extract pseudo label
python3 predict.py \
    --predict_unlabeled ${PSEUDO_PATH} \
    --load "${CKPT_PATH}"/model-180000 \
    --config \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${UNLABELED_DATASET}',)" \
    EVAL.PSEUDO_INFERENCE=True

Step 3: Train STAC

. scripts/coco/train_stg2.sh.

The dataloader loads pseudo labels from ${PSEUDO_PATH}/pseudo_data.npy.

Apply default augmentation on labeled data and strong augmentation on unlabeled data.

TRAIN.CONFIDENCE and TRAIN.WU are two major parameters of the method.

python3 train_stg2.py \
    --logdir=${CKPT_PATH}/STAC --simple_path \
    --pseudo_path=${PSEUDO_PATH} \
    --config \
    BACKBONE.WEIGHTS=${COCODIR}/ImageNet-R50-AlignPadding.npz \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${DATASET}',)" \
    DATA.UNLABEL="('${UNLABELED_DATASET}',)" \
    MODE_MASK=False \
    FRCNN.BATCH_PER_IM=64 \
    PREPROC.TRAIN_SHORT_EDGE_SIZE="[500,800]" \
    TRAIN.EVAL_PERIOD=20 \
    TRAIN.AUGTYPE_LAB='default' \
    TRAIN.AUGTYPE='strong' \
    TRAIN.CONFIDENCE=0.9 \
    TRAIN.WU=2

Tensorboard

All training logs and tensorboard info are under ${PRJROOT}/detection/train_log. Visualize using

tensorboard --logdir=${PRJROOT}/detection/train_log

Citation

@inproceedings{sohn2020detection,
  title={A Simple Semi-Supervised Learning Framework for Object Detection},
  author={Kihyuk Sohn and Zizhao Zhang and Chun-Liang Li and Han Zhang and Chen-Yu Lee and Tomas Pfister},
  year={2020},
  booktitle={arXiv:2005.04757}
}

Acknowledgement

ssl_detection's People

Contributors

zizhaozhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssl_detection's Issues

update pseudo label iteratively during self-training

Hi, have you tried to update pseudo label in an iterative manner?
let's say, the possible procedure goes like this:
teacher model M0 trained with labeled data ---> generated pseudo label ---->semi-supervised training, got model M1 ---->update pseudo label using M1---> semi-supervised training, got model M2 .........................
it is a common practice in semi-supervised classification task, so I wonder if you have tried this iterative approach in object detection? would you mind sharing your insight on this? Thanks a lot.

Confirmation for some training details

Hi. I want to confirm some details in the second self-training stage. Are all the hyper-parameters (including the batch size, threshold for positive and negative, number of proposals in the RCNN head, etc. ) are the same for both supervised and unsupervised loss? Also, the unsupervised loss is imposed on both RPN and RCNN head? Thanks.

Skipping cancelled dequeue attempt with queue not closed

  1. ERROR LOG (first epoch)
    [1210 18:09:10 @param.py:158] [HyperParamSetter] At global_step=0, learning_rate is set to 0.001000
    [1210 18:09:11 @prof.py:294] [HostMemoryTracker] Free RAM in before_train() is 238.12 GB.
    [1210 18:09:11 @stac_helper.py:83] ----------------------------------------------------------------------------------------------------
    [1210 18:09:11 @stac_helper.py:84] Model save path: result/VOC2007/instances_trainval
    [1210 18:09:11 @stac_helper.py:85] ----------------------------------------------------------------------------------------------------
    [1210 18:09:11 @eval.py:313] [EvalCallback] Will evaluate every 20 epochs
    [1210 18:09:28 @base.py:273] Start Epoch 1 ...
    0%| |0/500[00:00<?,?it/s]2021-12-10 18:09:43.544891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
    2021-12-10 18:10:23.596973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
    0%| |0/500[02:46<?,?it/s]
    2021-12-10 18:12:16.766932: W tensorflow/core/kernels/queue_base.cc:277] _0_QueueInput/input_queue: Skipping cancelled enqueue attempt with queue not closed
    Traceback (most recent call last):
    File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
    File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
    File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
    tensorflow.python.framework.errors_impl.DeadlineExceededError: Timed out waiting for notification

  2. Environment Information:


sys.platform linux
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Tensorpack v0.9.8-61-g4ac2e22b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /usr/lib64/libnvidia-ml.so.460.73.01
CUDA /mnt/lustre/share/cuda-10.0/lib64/libcudart.so.10.0.130
CUDNN /mnt/lustre/share/cuda-10.0/lib64/libcudnn.so.7.4.1
NCCL
CUDA_VISIBLE_DEVICES 1,2,3,4
GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB
Free RAM 344.40/376.39 GB
CPU Count 48
cv2 4.1.1
msgpack 1.0.3
python-prctl False


Training schedule about VOC.

Dear sir, really thanks to your great work! But I am stuck at the detail about training schedule of VOC stage1 now. May I ask some questions:

  1. The Batch size is 1 and total GPU is 1.
  2. Lr schedule is [7500, 10000], so total epoch is 10000 * 8 / 5011 = 16 epoch
  3. learning rate is 0.001, decay rate is 0.5
    Thanks in advance!

Data augmentation for label images

Thank you for your novel work, I have a question, why not do strong data enhancement on labeled data, I think the quality of such pseudo labels will also be improved. Looking forward for your response, thank you

Considering label scores in loss function

I can't seem to find where in the code is the score of a pseudo box is being taken into account. Specifically, where can we see the effect of zero scored boxes (those that didn't pass the confidence threshold).
To the best of my inquiry, it is missing from the code, though emphasized in the paper.
Thanks!

Docker

Do you happen to have a dockerfile for this? would be greatly appreciated and I am sure many will benefit from it, as it seems to be using a particular versions of tf and keras.

About the augmentation in teacher model

In the first stage, you train the teacher model in weak augmentation. However, the model trained in strong augmentation outperform model trained in weak augmentation in your experiment. Why do not use model trained in strong augmentation as teacher model.

the total_cost and wd_cost become nan.

hi, when I test your code with train_stg1.sh and compute the teacher model. the logs show that the total_cost and wd_cost become Nan, I did not change any code.
the data and the gpu is as follows:
DATASET='coco_train2017.1@10'
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
image

VOC Training and Data Scripts missing

Hi @zizhaozhang , thanks for providing the code to this paper.

I'm trying to replicate the VOC results but the instructions in here were incomplete. When will you be providing the necessary scripts to train the VOC model and is it possible for us to adapt the prepare_coco_data.py easily to do so in the meantime?

Thanks for the help.

GPU only using half of available memory

Hello,

Whether I use 1 or 2 GPUs (RTX 2080 Ti), only half of the capacity is used:

1 GPU
Screenshot from 2020-09-07 15-12-21

2 GPUs
Screenshot from 2020-09-07 15-10-33

I tried to increase the batch size (FRCNN.BATCH_PER_IM in train_stg1.sh) and also the number of workers (_C.DATA.NUM_WORKERS in third_party/FasterRCNN/FasterRCNN/config.py) but it doesn't seem to make a difference.

Since TensorFlow is supposed to use all available memory, is this being done on purpose?

Thank you for your time.

loss function

Hi, I'm confused with the (supervised) RPN loss function used in your paper (equation 3). It looks like there is no penalty for an anchor box being positive when there isn't an object. Take e.g. the extreme case of an image with no groundtruth objects in it, then there is no penalty for an anchor having high p_i.

Contrast this with the loss in the Faster RCNN paper which does impose a penalty in this case.

Training on a single GPU (Losses keep fluctuating and do not converge)

Hi,

I am training the Faster RCNN model on 10% of labelled COCO data. It seems like while training with 1 GPU, the losses don't converge and based on an earlier issue (#12), I understand that with 1 GPU and a batch size of 1 due to tensorpack constaints, the batch size may be too small for the network to train and converge. If that's the case, what are the alternatives? Is the only alternative to move away from tensorpack in order to be able to use a larger batch size?

Any inputs/suggestions are more than welcome as I am a bit stuck at the moment and do not have access to more than 1 GPU.

Regards,
Chandra

about using my own data

hi
I want to use my own coco-format data on your framework, however, it seems that I need to prepare annotation files (json) for unlabeled data and put them under "$COCODIR/annotations/semi_supervised". Is that true?
But my own unlabeled data does not have labels. What should I do?
Looking foreword to a practical answer from anyone who can help me!

Question about Table 1 in the paper

Hi,

Thanks for providing this interesting work and releasing the code. I am curious about the implementation details in Table 1 of the main paper.

(1) Are the results showed in Table 1 produced by using COCO2017 validation set (5000 instances)?

(2) In table 1, does the 100% COCO mean that you use 100% supervised COCO2017 data as the labeled set, the external COCO2017_unlabeled data as the unlabeled set, and COCO2017 validation set as the evaluation set?

Thank you!

Have you ever tried to generate pseudo-labels online?

In STAC, the pseudo-labels are predicted in an offline manner, i.e., after training a network in labeled data and then using it to predict pseudo-labels. It is a multi-stage training manner. However, in FixMatch, also your work, the pseudo-labels are predicted in an online manner, i.e., in a mini-batch, the pseudo-labels are generated. It is a one-stage training manner.

The anchor setting for VOC is mismatched with the paper.

Hi, thanks for your awesome work!

In your paper, you mentioned that you use RPN Anchor Sizes: [8, 16, 32] for VOC.

I noticed that you configure it here:

'TRAIN.LR_SCHEDULE=[7500,40000]', 'RPN.ANCHOR_SIZES=(8,16,32)',

However, according to this file:

sizes=cfg.RPN.ANCHOR_SIZES,

The RPN.ANCHOR_SIZES only works when you use ResNetC4Model . I think you never use it in all your experiments. ResNetFPNModel is your model. You need to set FPN.ANCHOR_SIZES for ResNetFPNModel.

Correct me if I'm wrong.

dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory

Can you please interpret me the following error? Is it a problem with CUDA version? I am not that much experienced and I would like to know so that I can solve it and continue.

�[33mWARNING:�[0m NVIDIA binaries may not be bound with --writable
�[32m[0706 13:49:52 @voc.py:279]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval']
�[32m[0706 13:49:52 @coco.py:271]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100']
�[32m[0706 13:49:52 @coco.py:205]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017']
�[32m[0706 13:49:52 @coco.py:260]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017', 'coco_unlabeledtrainval20class']
�[32m[0706 13:49:52 @logger.py:138]�[0m Directory '/home/vlamp/Documents/STAC/RESULTS' backuped to '/home/vlamp/Documents/STAC/RESULTS0706-134952'
�[32m[0706 13:49:52 @logger.py:92]�[0m Argv: /home/vlamp/Documents/STAC/detection/train_stg1_bdd.py --logdir /home/vlamp/Documents/STAC/RESULTS/ --simple_path --config BACKBONE.WEIGHTS=/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz DATA.BASEDIR=/home/vlamp/Documents/STAC/DATA_STAC/coco MODE_MASK=False FRCNN.BATCH_PER_IM=64 PREPROC.TRAIN_SHORT_EDGE_SIZE=[500,800] TRAIN.EVAL_PERIOD=20 TRAIN.AUGTYPE_LAB=default
�[32m[0706 13:49:54 @train_stg1_bdd.py:87]�[0m Environment Information:


sys.platform linux
Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
Tensorpack v0.10.1-9-g9c1b1b7b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /.singularity.d/libs/libnvidia-ml.so
CUDA /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.243
CUDNN /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4
NCCL
CUDA_VISIBLE_DEVICES 0,1
GPU 0,1 Tesla T4
Free RAM 369.15/376.54 GB
CPU Count 40
cv2 4.2.0
msgpack 1.0.0
python-prctl False


list(_C.DATA.TRAIN) = ['train2017']
list(_C.DATA.VAL) = ('val2017',)
datasets = ['train2017', 'val2017']
_C.DATA.CLASS_NAMES = ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle']
�[32m[0706 13:49:54 @config.py:352]�[0m Config: ------------------------------------------
{'BACKBONE': {'FREEZE_AFFINE': False,
'FREEZE_AT': 2,
'NORM': 'FreezeBN',
'RESNET_NUM_BLOCKS': [3, 4, 6, 3],
'STRIDE_1X1': False,
'TF_PAD_MODE': False,
'WEIGHTS': '/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz'},
'CASCADE': {'BBOX_REG_WEIGHTS': [[10.0, 10.0, 5.0, 5.0], [20.0, 20.0, 10.0, 10.0],
[30.0, 30.0, 15.0, 15.0]],
'IOUS': [0.5, 0.6, 0.7]},
'DATA': {'ABSOLUTE_COORD': True,
'BASEDIR': '/home/vlamp/Documents/STAC/DATA_STAC/coco',
'CLASS_NAMES': ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'],
'NUM_CATEGORY': 5,
'NUM_WORKERS': 24,
'TRAIN': ('train2017',),
'UNLABEL': ('',),
'VAL': ('val2017',)},
'EVAL': {'PSEUDO_INFERENCE': False},
'FPN': {'ANCHOR_SIZES': (32, 64, 128, 256, 512),
'ANCHOR_STRIDES': (4, 8, 16, 32, 64),
'CASCADE': False,
'FRCNN_CONV_HEAD_DIM': 256,
'FRCNN_FC_HEAD_DIM': 1024,
'FRCNN_HEAD_FUNC': 'fastrcnn_2fc_head',
'MRCNN_HEAD_FUNC': 'maskrcnn_up4conv_head',
'NORM': 'None',
'NUM_CHANNEL': 256,
'PROPOSAL_MODE': 'Level',
'RESOLUTION_REQUIREMENT': 32},
'FRCNN': {'BATCH_PER_IM': 64,
'BBOX_REG_WEIGHTS': [10.0, 10.0, 5.0, 5.0],
'FG_RATIO': 0.25,
'FG_THRESH': 0.5},
'MODE_FPN': True,
'MODE_MASK': False,
'MRCNN': {'ACCURATE_PASTE': True, 'HEAD_DIM': 256},
'PREPROC': {'MAX_SIZE': 1344.0,
'PIXEL_MEAN': [123.675, 116.28, 103.53],
'PIXEL_STD': [58.395, 57.12, 57.375],
'TEST_SHORT_EDGE_SIZE': 800,
'TRAIN_SHORT_EDGE_SIZE': [500, 800]},
'RPN': {'ANCHOR_RATIOS': (0.5, 1.0, 2.0),
'ANCHOR_SIZES': (32, 64, 128, 256, 512),
'ANCHOR_STRIDE': 16,
'BATCH_PER_IM': 256,
'CROWD_OVERLAP_THRESH': 9.99,
'FG_RATIO': 0.5,
'HEAD_DIM': 1024,
'MIN_SIZE': 0,
'NEGATIVE_ANCHOR_THRESH': 0.3,
'NUM_ANCHOR': 15,
'POSITIVE_ANCHOR_THRESH': 0.7,
'PROPOSAL_NMS_THRESH': 0.7,
'TEST_PER_LEVEL_NMS_TOPK': 1000,
'TEST_POST_NMS_TOPK': 1000,
'TEST_PRE_NMS_TOPK': 6000,
'TRAIN_PER_LEVEL_NMS_TOPK': 2000,
'TRAIN_POST_NMS_TOPK': 2000,
'TRAIN_PRE_NMS_TOPK': 12000},
'TEST': {'FRCNN_NMS_THRESH': 0.5,
'RESULTS_PER_IM': 100,
'RESULT_SCORE_THRESH': 0.05,
'RESULT_SCORE_THRESH_VIS': 0.5},
'TRAIN': {'AUGTYPE': 'strong',
'AUGTYPE_LAB': 'default',
'BASE_LR': 0.01,
'CHECKPOINT_PERIOD': 20,
'CONFIDENCE': 0.9,
'EVAL_PERIOD': 20,
'GAMMA': 0.1,
'LR_SCHEDULE': [120000, 160000, 180000],
'NO_PRN_LOSS': False,
'NUM_GPUS': 2,
'STAGE': 1,
'STARTING_EPOCH': 1,
'STEPS_PER_EPOCH': 500,
'WARMUP': 1000,
'WARMUP_INIT_LR': 0.0033000000000000004,
'WEIGHT_DECAY': 0.0001,
'WU': 2.0},
'TRAINER': 'replicated'}
�[32m[0706 13:49:54 @train_stg1_bdd.py:106]�[0m Warm Up Schedule (steps, value): [(0, 0.0033000000000000004), (1000, 0.01)]
�[32m[0706 13:49:54 @train_stg1_bdd.py:107]�[0m LR Schedule (epochs, value): [(2, 0.01), (960.0, 0.001), (1280.0, 0.00010000000000000002)]
loading annotations into memory...
Done (t=5.18s)
creating index...
index created!
�[32m[0706 13:49:59 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_train2017.json.

0%| | 0/69403 [00:00<?, ?it/s]
3%|3 | 2090/69403 [00:00<00:03, 20895.19it/s]
6%|5 | 4034/69403 [00:00<00:03, 20434.79it/s]
9%|8 | 6073/69403 [00:00<00:03, 20416.41it/s]
12%|#1 | 8201/69403 [00:00<00:02, 20666.09it/s]
15%|#4 | 10336/69403 [00:00<00:02, 20866.20it/s]
18%|#7 | 12465/69403 [00:00<00:02, 20991.31it/s]
21%|##1 | 14620/69403 [00:00<00:02, 21155.12it/s]
24%|##4 | 16775/69403 [00:00<00:02, 21271.79it/s]
27%|##7 | 18896/69403 [00:00<00:02, 21253.07it/s]
30%|### | 21042/69403 [00:01<00:02, 21313.93it/s]
33%|###3 | 23115/69403 [00:01<00:02, 21052.23it/s]
36%|###6 | 25181/69403 [00:01<00:02, 20796.20it/s]
39%|###9 | 27234/69403 [00:01<00:02, 20696.98it/s]
42%|####2 | 29285/69403 [00:01<00:01, 20509.34it/s]
45%|####5 | 31323/69403 [00:01<00:01, 20425.01it/s]
48%|####8 | 33357/69403 [00:01<00:01, 20302.50it/s]
51%|##### | 35382/69403 [00:01<00:01, 20251.87it/s]
54%|#####3 | 37403/69403 [00:01<00:01, 20201.65it/s]
57%|#####6 | 39488/69403 [00:01<00:01, 20390.27it/s]
60%|#####9 | 41550/69403 [00:02<00:01, 20456.26it/s]
63%|######2 | 43660/69403 [00:02<00:01, 20643.18it/s]
66%|######5 | 45767/69403 [00:02<00:01, 20768.95it/s]
69%|######8 | 47887/69403 [00:02<00:01, 20894.81it/s]
72%|#######2 | 50002/69403 [00:02<00:00, 20968.20it/s]
75%|#######5 | 52146/69403 [00:02<00:00, 21105.63it/s]
78%|#######8 | 54280/69403 [00:02<00:00, 21174.64it/s]
81%|########1 | 56406/69403 [00:02<00:00, 21198.35it/s]
84%|########4 | 58537/69403 [00:02<00:00, 21230.58it/s]
87%|########7 | 60701/69403 [00:02<00:00, 21351.07it/s]
91%|######### | 62872/69403 [00:03<00:00, 21456.21it/s]
94%|#########3| 65018/69403 [00:03<00:00, 21151.33it/s]
97%|#########6| 67169/69403 [00:03<00:00, 21256.36it/s]
100%|#########9| 69342/69403 [00:03<00:00, 21396.14it/s]
100%|##########| 69403/69403 [00:03<00:00, 20915.84it/s]�[32m[0706 13:50:03 @timer.py:45]�[0m Load annotations for instances_train2017.json finished, time:3.3659 sec.
�[32m[0706 13:50:05 @data.py:79]�[0m Ground-Truth category distribution:
�[36m| class | #box | class | #box | class | #box |
|:-------:|:-------|:----------:|:-------|:-----------:|:-------|
| car | 713210 | pedestrian | 91349 | big vehicle | 41643 |
| bicycle | 7210 | motorcycle | 3002 | | |
| total | 856414 | | | | |�[0m
�[32m[0706 13:50:05 @data.py:416]�[0m Filtered 0 images which contain no non-crowd groudtruth boxes. Total #images for training: 69403
�[32m[0706 13:50:05 @augmentation.py:171]�[0m ----------------------------------------------------------------------------------------------------
�[32m[0706 13:50:05 @augmentation.py:172]�[0m Augmentation type default: []
�[32m[0706 13:50:05 @augmentation.py:173]�[0m ----------------------------------------------------------------------------------------------------
�[32m[0706 13:50:05 @data.py:107]�[0m Use affine-enabled TrainingDataPreprocessor_aug
�[32m[0706 13:50:05 @train_stg1_bdd.py:112]�[0m Total passes of the training set is: 20.748
�[32m[0706 13:50:05 @sessinit.py:294]�[0m Loading dictionary from /home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz ...
�[32m[0706 13:50:06 @training.py:48]�[0m [DataParallel] Training a model of 2 towers.
�[32m[0706 13:50:06 @interface.py:41]�[0m Automatically applying StagingInput on the DataFlow.
�[32m[0706 13:50:06 @input_source.py:221]�[0m Setting up the queue 'QueueInput/input_queue' for CPU prefetching ...
�[32m[0706 13:50:06 @training.py:108]�[0m Building graph for training tower 0 on device /gpu:0 ...
�[32m[0706 13:50:06 @argtools.py:138]�[0m �[5m�[31mWRN�[0m Some BatchNorm layer uses moving_mean/moving_variance in training.
�[32m[0706 13:50:06 @registry.py:90]�[0m 'conv0': [1, 3, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'pool0': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/conv1': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/convshortcut': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block1/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block1/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block1/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block2/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block2/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block2/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group1/block0/conv1': [1, 256, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block0/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block0/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block0/convshortcut': [1, 256, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block1/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block1/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block1/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block2/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block2/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block2/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block3/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block3/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block3/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/conv1': [1, 512, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/convshortcut': [1, 512, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block1/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block1/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block1/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block2/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block2/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block2/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block3/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block3/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block3/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block4/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block4/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block4/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block5/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block5/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block5/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/conv1': [1, 1024, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/convshortcut': [1, 1024, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block1/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block1/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block1/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block2/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block2/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block2/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:80]�[0m 'fpn' input: [1, 256, ?, ?], [1, 512, ?, ?], [1, 1024, ?, ?], [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c3': [1, 512, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c4': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c5': [1, 2048, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/upsample_lat5': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/upsample_lat4': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/upsample_lat3': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p3': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p4': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p5': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/maxpool_p6': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:93]�[0m 'fpn' output: [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:80]�[0m 'rpn' input: [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'rpn/conv0': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'rpn/class': [1, 256, ?, ?] --> [1, 3, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'rpn/box': [1, 256, ?, ?] --> [1, 12, ?, ?]
�[32m[0706 13:50:08 @registry.py:93]�[0m 'rpn' output: [?, ?, 3], [?, ?, 3, 4]
�[32m[0706 13:50:09 @registry.py:80]�[0m 'fastrcnn' input: [?, 256, 7, 7]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/fc6': [?, 256, 7, 7] --> [?, 1024]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/fc7': [?, 1024] --> [?, 1024]
�[32m[0706 13:50:10 @registry.py:93]�[0m 'fastrcnn' output: [?, 1024]
�[32m[0706 13:50:10 @registry.py:80]�[0m 'fastrcnn/outputs' input: [?, 1024]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/outputs/class': [?, 1024] --> [?, 6]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/outputs/box': [?, 1024] --> [?, 24]
�[32m[0706 13:50:10 @registry.py:93]�[0m 'fastrcnn/outputs' output: [?, 6], [?, 6, 4]
�[32m[0706 13:50:10 @regularize.py:97]�[0m regularize_cost() found 57 variables to regularize.
�[32m[0706 13:50:10 @regularize.py:21]�[0m The following tensors will be regularized: group1/block0/conv1/W:0, group1/block0/conv2/W:0, group1/block0/conv3/W:0, group1/block0/convshortcut/W:0, group1/block1/conv1/W:0, group1/block1/conv2/W:0, group1/block1/conv3/W:0, group1/block2/conv1/W:0, group1/block2/conv2/W:0, group1/block2/conv3/W:0, group1/block3/conv1/W:0, group1/block3/conv2/W:0, group1/block3/conv3/W:0, group2/block0/conv1/W:0, group2/block0/conv2/W:0, group2/block0/conv3/W:0, group2/block0/convshortcut/W:0, group2/block1/conv1/W:0, group2/block1/conv2/W:0, group2/block1/conv3/W:0, group2/block2/conv1/W:0, group2/block2/conv2/W:0, group2/block2/conv3/W:0, group2/block3/conv1/W:0, group2/block3/conv2/W:0, group2/block3/conv3/W:0, group2/block4/conv1/W:0, group2/block4/conv2/W:0, group2/block4/conv3/W:0, group2/block5/conv1/W:0, group2/block5/conv2/W:0, group2/block5/conv3/W:0, group3/block0/conv1/W:0, group3/block0/conv2/W:0, group3/block0/conv3/W:0, group3/block0/convshortcut/W:0, group3/block1/conv1/W:0, group3/block1/conv2/W:0, group3/block1/conv3/W:0, group3/block2/conv1/W:0, group3/block2/conv2/W:0, group3/block2/conv3/W:0, fpn/lateral_1x1_c2/W:0, fpn/lateral_1x1_c3/W:0, fpn/lateral_1x1_c4/W:0, fpn/lateral_1x1_c5/W:0, fpn/posthoc_3x3_p2/W:0, fpn/posthoc_3x3_p3/W:0, fpn/posthoc_3x3_p4/W:0, fpn/posthoc_3x3_p5/W:0, rpn/conv0/W:0, rpn/class/W:0, rpn/box/W:0, fastrcnn/fc6/W:0, fastrcnn/fc7/W:0, fastrcnn/outputs/class/W:0, fastrcnn/outputs/box/W:0
�[32m[0706 13:50:12 @training.py:108]�[0m Building graph for training tower 1 on device /gpu:1 ...
�[32m[0706 13:50:14 @regularize.py:97]�[0m regularize_cost() found 57 variables to regularize.
�[32m[0706 13:50:16 @collection.py:152]�[0m Size of these collections were changed in tower1: (tf.GraphKeys.MODEL_VARIABLES: 161->194)
�[32m[0706 13:50:16 @collection.py:165]�[0m These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 76->77)
�[32m[0706 13:50:20 @training.py:350]�[0m 'sync_variables_from_main_tower' includes 607 operations.
�[32m[0706 13:50:20 @model_utils.py:67]�[0m �[36mList of Trainable Variables:
�[0mname shape #elements


group1/block0/conv1/W [1, 1, 256, 128] 32768
group1/block0/conv1/bn/gamma [128] 128
group1/block0/conv1/bn/beta [128] 128
group1/block0/conv2/W [3, 3, 128, 128] 147456
group1/block0/conv2/bn/gamma [128] 128
group1/block0/conv2/bn/beta [128] 128
group1/block0/conv3/W [1, 1, 128, 512] 65536
group1/block0/conv3/bn/gamma [512] 512
group1/block0/conv3/bn/beta [512] 512
group1/block0/convshortcut/W [1, 1, 256, 512] 131072
group1/block0/convshortcut/bn/gamma [512] 512
group1/block0/convshortcut/bn/beta [512] 512
group1/block1/conv1/W [1, 1, 512, 128] 65536
group1/block1/conv1/bn/gamma [128] 128
group1/block1/conv1/bn/beta [128] 128
group1/block1/conv2/W [3, 3, 128, 128] 147456
group1/block1/conv2/bn/gamma [128] 128
group1/block1/conv2/bn/beta [128] 128
group1/block1/conv3/W [1, 1, 128, 512] 65536
group1/block1/conv3/bn/gamma [512] 512
group1/block1/conv3/bn/beta [512] 512
group1/block2/conv1/W [1, 1, 512, 128] 65536
group1/block2/conv1/bn/gamma [128] 128
group1/block2/conv1/bn/beta [128] 128
group1/block2/conv2/W [3, 3, 128, 128] 147456
group1/block2/conv2/bn/gamma [128] 128
group1/block2/conv2/bn/beta [128] 128
group1/block2/conv3/W [1, 1, 128, 512] 65536
group1/block2/conv3/bn/gamma [512] 512
group1/block2/conv3/bn/beta [512] 512
group1/block3/conv1/W [1, 1, 512, 128] 65536
group1/block3/conv1/bn/gamma [128] 128
group1/block3/conv1/bn/beta [128] 128
group1/block3/conv2/W [3, 3, 128, 128] 147456
group1/block3/conv2/bn/gamma [128] 128
group1/block3/conv2/bn/beta [128] 128
group1/block3/conv3/W [1, 1, 128, 512] 65536
group1/block3/conv3/bn/gamma [512] 512
group1/block3/conv3/bn/beta [512] 512
group2/block0/conv1/W [1, 1, 512, 256] 131072
group2/block0/conv1/bn/gamma [256] 256
group2/block0/conv1/bn/beta [256] 256
group2/block0/conv2/W [3, 3, 256, 256] 589824
group2/block0/conv2/bn/gamma [256] 256
group2/block0/conv2/bn/beta [256] 256
group2/block0/conv3/W [1, 1, 256, 1024] 262144
group2/block0/conv3/bn/gamma [1024] 1024
group2/block0/conv3/bn/beta [1024] 1024
group2/block0/convshortcut/W [1, 1, 512, 1024] 524288
group2/block0/convshortcut/bn/gamma [1024] 1024
group2/block0/convshortcut/bn/beta [1024] 1024
group2/block1/conv1/W [1, 1, 1024, 256] 262144
group2/block1/conv1/bn/gamma [256] 256
group2/block1/conv1/bn/beta [256] 256
group2/block1/conv2/W [3, 3, 256, 256] 589824
group2/block1/conv2/bn/gamma [256] 256
group2/block1/conv2/bn/beta [256] 256
group2/block1/conv3/W [1, 1, 256, 1024] 262144
group2/block1/conv3/bn/gamma [1024] 1024
group2/block1/conv3/bn/beta [1024] 1024
group2/block2/conv1/W [1, 1, 1024, 256] 262144
group2/block2/conv1/bn/gamma [256] 256
group2/block2/conv1/bn/beta [256] 256
group2/block2/conv2/W [3, 3, 256, 256] 589824
group2/block2/conv2/bn/gamma [256] 256
group2/block2/conv2/bn/beta [256] 256
group2/block2/conv3/W [1, 1, 256, 1024] 262144
group2/block2/conv3/bn/gamma [1024] 1024
group2/block2/conv3/bn/beta [1024] 1024
group2/block3/conv1/W [1, 1, 1024, 256] 262144
group2/block3/conv1/bn/gamma [256] 256
group2/block3/conv1/bn/beta [256] 256
group2/block3/conv2/W [3, 3, 256, 256] 589824
group2/block3/conv2/bn/gamma [256] 256
group2/block3/conv2/bn/beta [256] 256
group2/block3/conv3/W [1, 1, 256, 1024] 262144
group2/block3/conv3/bn/gamma [1024] 1024
group2/block3/conv3/bn/beta [1024] 1024
group2/block4/conv1/W [1, 1, 1024, 256] 262144
group2/block4/conv1/bn/gamma [256] 256
group2/block4/conv1/bn/beta [256] 256
group2/block4/conv2/W [3, 3, 256, 256] 589824
group2/block4/conv2/bn/gamma [256] 256
group2/block4/conv2/bn/beta [256] 256
group2/block4/conv3/W [1, 1, 256, 1024] 262144
group2/block4/conv3/bn/gamma [1024] 1024
group2/block4/conv3/bn/beta [1024] 1024
group2/block5/conv1/W [1, 1, 1024, 256] 262144
group2/block5/conv1/bn/gamma [256] 256
group2/block5/conv1/bn/beta [256] 256
group2/block5/conv2/W [3, 3, 256, 256] 589824
group2/block5/conv2/bn/gamma [256] 256
group2/block5/conv2/bn/beta [256] 256
group2/block5/conv3/W [1, 1, 256, 1024] 262144
group2/block5/conv3/bn/gamma [1024] 1024
group2/block5/conv3/bn/beta [1024] 1024
group3/block0/conv1/W [1, 1, 1024, 512] 524288
group3/block0/conv1/bn/gamma [512] 512
group3/block0/conv1/bn/beta [512] 512
group3/block0/conv2/W [3, 3, 512, 512] 2359296
group3/block0/conv2/bn/gamma [512] 512
group3/block0/conv2/bn/beta [512] 512
group3/block0/conv3/W [1, 1, 512, 2048] 1048576
group3/block0/conv3/bn/gamma [2048] 2048
group3/block0/conv3/bn/beta [2048] 2048
group3/block0/convshortcut/W [1, 1, 1024, 2048] 2097152
group3/block0/convshortcut/bn/gamma [2048] 2048
group3/block0/convshortcut/bn/beta [2048] 2048
group3/block1/conv1/W [1, 1, 2048, 512] 1048576
group3/block1/conv1/bn/gamma [512] 512
group3/block1/conv1/bn/beta [512] 512
group3/block1/conv2/W [3, 3, 512, 512] 2359296
group3/block1/conv2/bn/gamma [512] 512
group3/block1/conv2/bn/beta [512] 512
group3/block1/conv3/W [1, 1, 512, 2048] 1048576
group3/block1/conv3/bn/gamma [2048] 2048
group3/block1/conv3/bn/beta [2048] 2048
group3/block2/conv1/W [1, 1, 2048, 512] 1048576
group3/block2/conv1/bn/gamma [512] 512
group3/block2/conv1/bn/beta [512] 512
group3/block2/conv2/W [3, 3, 512, 512] 2359296
group3/block2/conv2/bn/gamma [512] 512
group3/block2/conv2/bn/beta [512] 512
group3/block2/conv3/W [1, 1, 512, 2048] 1048576
group3/block2/conv3/bn/gamma [2048] 2048
group3/block2/conv3/bn/beta [2048] 2048
fpn/lateral_1x1_c2/W [1, 1, 256, 256] 65536
fpn/lateral_1x1_c2/b [256] 256
fpn/lateral_1x1_c3/W [1, 1, 512, 256] 131072
fpn/lateral_1x1_c3/b [256] 256
fpn/lateral_1x1_c4/W [1, 1, 1024, 256] 262144
fpn/lateral_1x1_c4/b [256] 256
fpn/lateral_1x1_c5/W [1, 1, 2048, 256] 524288
fpn/lateral_1x1_c5/b [256] 256
fpn/posthoc_3x3_p2/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p2/b [256] 256
fpn/posthoc_3x3_p3/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p3/b [256] 256
fpn/posthoc_3x3_p4/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p4/b [256] 256
fpn/posthoc_3x3_p5/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p5/b [256] 256
rpn/conv0/W [3, 3, 256, 256] 589824
rpn/conv0/b [256] 256
rpn/class/W [1, 1, 256, 3] 768
rpn/class/b [3] 3
rpn/box/W [1, 1, 256, 12] 3072
rpn/box/b [12] 12
fastrcnn/fc6/W [12544, 1024] 12845056
fastrcnn/fc6/b [1024] 1024
fastrcnn/fc7/W [1024, 1024] 1048576
fastrcnn/fc7/b [1024] 1024
fastrcnn/outputs/class/W [1024, 6] 6144
fastrcnn/outputs/class/b [6] 6
fastrcnn/outputs/box/W [1024, 24] 24576
fastrcnn/outputs/box/b [24] 24�[36m
Number of trainable variables: 156
Number of parameters (elements): 41147437
Storage space needed for all trainable variables: 156.97MB�[0m
�[32m[0706 13:50:20 @base.py:207]�[0m Setup callbacks graph ...

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
�[32m[0706 13:50:27 @argtools.py:138]�[0m �[5m�[31mWRN�[0m "import prctl" failed! Install python-prctl so that processes can be cleaned with guarantee.
�[32m[0706 13:50:29 @prof.py:291]�[0m [HostMemoryTracker] Free RAM in setup_graph() is 364.27 GB.
�[32m[0706 13:50:29 @tower.py:135]�[0m Building graph for predict tower 'tower-pred-0' on device /gpu:0 ...
�[32m[0706 13:50:30 @collection.py:152]�[0m Size of these collections were changed in tower-pred-0: (tf.GraphKeys.MODEL_VARIABLES: 194->227)
�[32m[0706 13:50:30 @collection.py:165]�[0m These collections were modified but restored in tower-pred-0: (tf.GraphKeys.SUMMARIES: 76->77)
�[32m[0706 13:50:30 @tower.py:135]�[0m Building graph for predict tower 'tower-pred-1' on device /gpu:1 with variable scope 'tower1'...
�[32m[0706 13:50:31 @collection.py:152]�[0m Size of these collections were changed in tower-pred-1: (tf.GraphKeys.MODEL_VARIABLES: 227->260)
�[32m[0706 13:50:31 @collection.py:165]�[0m These collections were modified but restored in tower-pred-1: (tf.GraphKeys.SUMMARIES: 76->77)
loading annotations into memory...
Done (t=0.75s)
creating index...
index created!
�[32m[0706 13:50:31 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 725119.19it/s]�[32m[0706 13:50:31 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0151 sec.
�[32m[0706 13:50:31 @data.py:456]�[0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.83s)
creating index...
index created!
�[32m[0706 13:50:32 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 739211.43it/s]�[32m[0706 13:50:32 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0150 sec.
�[32m[0706 13:50:32 @data.py:456]�[0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.82s)
creating index...
index created!
�[32m[0706 13:50:33 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 744062.40it/s]�[32m[0706 13:50:33 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0149 sec.
�[32m[0706 13:50:33 @data.py:456]�[0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.77s)
creating index...
index created!
�[32m[0706 13:50:34 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 713481.88it/s]�[32m[0706 13:50:34 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0153 sec.
�[32m[0706 13:50:34 @data.py:456]�[0m Found 9921 images for inference.
�[32m[0706 13:50:34 @summary.py:47]�[0m [MovingAverageSummary] 73 operations in collection 'MOVING_SUMMARY_OPS' will be run with session hooks.
�[32m[0706 13:50:34 @summary.py:94]�[0m Summarizing collection 'summaries' of size 76.
�[32m[0706 13:50:34 @base.py:228]�[0m Creating the session ...
2020-07-06 13:50:34.737615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-06 13:50:34.743032: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-07-06 13:50:34.887781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14c78d20 executing computations on platform CUDA. Devices:
2020-07-06 13:50:34.887822: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-07-06 13:50:34.887827: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla T4, Compute Capability 7.5
2020-07-06 13:50:34.890055: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494125000 Hz
2020-07-06 13:50:34.893901: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14a0c4f0 executing computations on platform Host. Devices:
2020-07-06 13:50:34.893919: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2020-07-06 13:50:34.896069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:3b:00.0Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slur
2020-07-06 13:50:34.896771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:d8:00.0
2020-07-06 13:50:34.897783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] m/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898069: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898538: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.901746: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-07-06 13:50:34.901764: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-07-06 13:50:34.901834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-06 13:50:34.901840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1
2020-07-06 13:50:34.901845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y
2020-07-06 13:50:34.901848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N

MultiProcessMapDataZMQ successfully cleaned-up.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node AllReduceGrads/NcclAllReduce}}with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'

 [[AllReduceGrads/NcclAllReduce]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/vlamp/Documents/STAC/detection/train_stg1_bdd.py", line 180, in
launch_train_with_config(traincfg, trainer)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train
self.initialize(session_creator, session_init)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize
super(TowerTrainer, self).initialize(session_creator, session_init)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 230, in initialize
self.sess = session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 88, in create_session
run(tf.global_variables_initializer())
File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 86, in run
sess.run(op)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node AllReduceGrads/NcclAllReduce (defined at usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/utils.py:154) with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'

 [[AllReduceGrads/NcclAllReduce]]

Errors may have originated from an input operation.
Input Source operations connected to node AllReduceGrads/NcclAllReduce:
tower0/gradients/AddN_126 (defined at usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/optimizer.py:29)
/cm/local/apps/slurm/var/spool/job18434303/slurm_script: line 29: t: command not found

Some question about Table 3 in paper

Hi , I'm pretty excited to see such a excellent work! Currently I'm trying to reproduce the results but meet some problem about Table 3.

To be more specific, I use 100% COCO understand standard 1x setting, training wIth C+{G,B}, but obtain the result of 34 mAP, which is much lower than yours (36.39). However, I found separately using Color augmentation or cutout doesn't seems to bring negative impact. So I think there might be problem with geometrys transform.

Could please share the config for the 36.39 results? Or could you please tell me if the geometry augmentation policy in Table3 and Table1 is exactly the same (rotate angel, translate percent etc. ). Thank you very much for your time and consideration.

About Cutout size

Hi!

I found that in your code, you set cutout_op as iaa.Cutout(nb_iterations=(1, 5), size=[0, 0.2], squared=True), here size=[0,0.2] means the size is either 0 or 0.2 ( under imgaug 0.4.0), which is different as what is described in paper.

Maybe we should change it to iaa.Cutout(nb_iterations=(1, 5), size=(0, 0.2), squared=True) ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.