Code Monkey home page Code Monkey logo

densecl's People

Contributors

wxinlong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

densecl's Issues

Semantic segmentation on PASCAL VOC

@WXinlong,

Thanks for sharing your great work!
I was able to reproduce your object detection result on Pascal VOC.
However, when I tested semantic segmentation on Pascal VOC using your pre-trained model on ImageNet1k "densecl_r50_imagenet_200ep.pth", I got mIoU 0.62, which is worse than the 0.69 reported in your paper. My test procedure is explained below,

  1. Install your modified mmsegmentation
  2. Download "densecl_r50_imagenet_200ep.pth" from your website
  3. Update the 5th line of code in fcn_r50-d8.py to pretrained='/pretrained/densecl_r50_imagenet_200ep.pth'
  4. Run ./tools/dist_train.sh configs/densecl/fcn_r50-d8_512x512_20k_voc12aug.py 2 --work-dir models/fcn_r50-d8_512x512_20k_voc12aug (running on 2 GPUs)

I got a result of mIoU 0.62, mAcc: 0.75, aAcc: 0.91 at the end of the training. I ran 3 rounds and got similar results. Attached are my configuration file and training log.

Do you know a possible reason?
Thanks!

20230213_204945.log
fcn_r50-d8_512x512_20k_voc12aug.zip

The performance of detection in VOC

(8gpus) When I use the pretrained network with coco-800ep-resnet50 to do the detection task with VOC, the "AP" is only 44.76, while you can achieve 56.7. I don't konw why the gap is so large. Note that I change the batchsize from 16 to 8, and as a result, the base lr is set from 0.02 to 0.01.

The performance of detection in COCO

Based on MMDetection,train COCO2017 & val COCO2017

FasterR-CNN,r50 From torchvision://resnet50

       1x: bbox_mAP: 0.3750

FasterR-CNN,r50 From My Reproduction Model Pretrained on ImageNet

       1x: bbox_mAP: 0.3580

FasterR-CNN,r50 From Your Pretrained Mode on ImageNetl

       1x: bbox_mAP: 0.3550

Which is not as good as expected? Could you give a help?

Training an Pretrained model on object detection task on single GPU

Hi @WXinlong thanks for the wonderful work.

I want to train the pre-trained model on the downstream task of object detection. I used the pre-trained model of mocov2 with 800 epochs here

I have followed the following process
step 1: Install detectron2.

step 2: Convert a pre-trained MoCo model to detectron2's format:

python3 convert-pretrain-to-detectron2.py input.pth.tar output.pkl
Put dataset under "./datasets" directory, following the directory structure required by detectron2.

step 3: Run training:

python train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml \
 --num-gpus 1 MODEL.WEIGHTS ./output.pkl

The only change I did is used a single gpu rather than 8 gpu

I am getting the following error an

[08/31 12:42:12] fvcore.common.checkpoint WARNING: Some model parameters or buffers are not found in the checkpoint:
�[34mproposal_generator.rpn_head.anchor_deltas.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.conv.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.objectness_logits.{bias, weight}�[0m
�[34mroi_heads.box_predictor.bbox_pred.{bias, weight}�[0m
�[34mroi_heads.box_predictor.cls_score.{bias, weight}�[0m
�[34mroi_heads.res5.norm.{bias, running_mean, running_var, weight}�[0m
[08/31 12:42:12] fvcore.common.checkpoint WARNING: The checkpoint state_dict contains keys that are not used by the model:
  �[35mstem.fc.0.{bias, weight}�[0m
  �[35mstem.fc.2.{bias, weight}�[0m
[08/31 12:42:12] d2.engine.train_loop INFO: Starting training from iteration 0
[08/31 12:42:13] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/defaults.py", line 493, in run_step
    self._trainer.run_step()
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 273, in run_step
    loss_dict = self.model(data)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 154, in forward
    features = self.backbone(images.tensor)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 445, in forward
    x = self.stem(x)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 356, in forward
    x = self.conv1(x)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/layers/wrappers.py", line 88, in forward
    x = self.norm(x)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
    return _get_group_size(group)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
    assert _default_pg is not None, \
AssertionError: Default process group is not initialized
[08/31 12:42:13] d2.engine.hooks INFO: Total training time: 0:00:00 (0:00:00 on hooks)
[08/31 12:42:13] d2.utils.events INFO:  iter: 0    lr: N/A  max_mem: 207M

how can we run the training on a single GPU ?
attached are the logs for details
log 3.23.54 PM.txt

Have you tried on keypoint matching task?

Since the model is trained on the dense matching loss, it would be natural to evaluate its performance on keypoint matching task and compare with sotas. May I know if you have conducted experiments or have related plan? Thank you!

GPU training problem

Are the weights trained by 2 gpus different from those trained by 8 gpus in downstream tasks?? Because the overall batch size is different. Hope to get a reply.

How to get negative key t_

In the paper, each negative key t− is the pooled feature vector of a view from a different image. I still don't know the exactly meaning of 'pooled feature vector'. Can you explain it? Thank you.

About the dense correspondence

Hi, thanks for your contribution, very interesting approach!

Have you tried to compute the dense correspondence directly from the geometric transformation (resize / crop / flip) between the views?

Neck weights

I only found resnet weights after pretraining, it would be usefull to have access to neck weights as well.

Is that possible ?

Thanks for your work

The checkpoint with neck.

Hello, thanks for your interesting work. I notice that the checkpoints that you released only contain the backbone. Now, I want a checkpoint with the neck and other data. Could you release it (pretrained on imagenet with 200 epoch)?

[Err]: RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 279, in forward
return self.forward_train(img, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 200, in forward_train
im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 132, in _batch_shuffle_ddp
x_gather = concat_all_gather(x)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 297, in concat_all_gather
for _ in range(torch.distributed.get_world_size())
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size
return _get_group_size(group)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size
python-BaseException
default_pg = _get_default_group()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group
raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Is it possible to gain dense correspondence from the known data augmentation?

Hi, Thank you very much for the nice work!

I have a question about the dense correspondence of views. In the paper, the correspondence is gained by calculating the similarity between feature vectors from the backbone. Since the data augmentation (e.g. rotating, cropping, flipping) performed to each view of the same image is known, it's possible to obtain the correspondence directly from these transformations.

For example, Image A is a left-right flipped copy of Image B. The two images are encoded to 3x3 feature maps, which can be represented as:

fa1, fa2, fa3
fa4, fa5, fa6
fa7, fa8, fa9

and

fb1, fb2, fb3
fb4, fb5, fb6
fb7, fb8, fb9

Since A and B are flipped views of the same image, the correspondence could be (fa1, fb3), (fa2, fb2), (fa3, fb1), ... .

From my perspective, the transformation-motivated correspondence is more straightforward but the paper doesn't use it. Are there any intuitions behind this?

Thank you again!

KeyError: 'GaussianBlur is already registered in pipeline'

Hi,
I am trying to run the code to train COCO (train2017) self supervised, I tried installing several times with the instructions but when run training it kept saying a lot of messages: KeyError: 'GaussianBlur is already registered in pipeline', and the code instantly stopped.

Command: bash tools/dist_train.sh configs/selfsup/densecl/densecl_coco_800ep.py 8

I am using torch version 1.7.1, CUDA 9.2. torch.cuda.is_available() = True

Have you tried reproduced the results in an entire new machine and faced this error?

Could you help me some suggestions on this bug?

Training speed

Could you provide me with the training log. My training process is extremely slow. Thank you.

Performance of Semantic Segmentation on Pascal VOC

Hi, I tried to reproduce your results on VOC Semantic Segmentation, but only got mIOU = 46.87 (while you can achieve 69.4)
Can you give me some help?

Here's the steps I have done for reproduction.

  1. Download your Pretrained model DenseCL IN/200Epoch
  2. Follow your steps in Readme.md

I did not modify any setting about batch size or learning rate.
Is there anything I have ignored?

dataset preparing

I've download ImageNet from Kaggle and can't find the train.txt, could you tell me where to download this file? @WXinlong
Here are the files can be download from Kaggle:
image

The performance of DenseCL on classification task

Hi, @WXinlong . Thanks for the great work.
Since the article claims that the proposed method mainly aims to solve the dense prediction tasks (e.g., detection and segmentation), I wonder if you have tried DenseCL on the classification task and what is the performance.

Evaluation setting on Semantic segmentation

Thanks for your outstanding work. Here is a question about the evaluation setting on Semantic segmentation
Dis you used "two extra 3×3 convolutions of 256 channels, with BN and ReLU, and then a 1×1 convolution for per�pixel classification. The total stride is 16 (FCN-16s [43]). We set dilation = 6 in the two extra 3×3 convolutions, following the large field-of-view design in [6]" this setting during the evaluation of Semantic segmentation (the same as mMoCo), or just used a classic FCN?

About the loss of Denscl

I tried your algorithm for training and found that the loss is a bit strange. It rose from 8.0 at the beginning to 9.3 and then slowly dropped to 7.3. What is the reason? Is this normal?

Semi-supervised object detection

Hi, thanks for your excellent work! Could you kindly release the corresponding 10% training data list and config for semi-supervised object detection in Table 3 of your paper? Thanks in advance!

Details about loss_lambda warmup

Thank you for your great work.
Could you give the implementation detail or code of the loss_lambda warmup setting stated in the DenseCL paper?

The model and loaded state dict do not match exactly

Hi when I try to use extract.py to extract the features, I download the pretrained model from the link and run, but it shows the following:

The model and loaded state dict do not match exactly

unexpected key in source state_dict: conv1.weight, bn1.weight, bn1.bias, bn1.running_mean, bn1.running_var, bn1.num_batches_tracked, layer1.0.conv1.weight, layer1.0.bn1.weight, layer1.0.bn1.bias, layer1.0.bn1.running_mean, layer1.0.bn1.running_var, layer1.0.bn1.num_batches_tracked, layer1.0.conv2.weight, layer1.0.bn2.weight, layer1.0.bn2.bias, layer1.0.bn2.running_mean, layer1.0.bn2.running_var, layer1.0.bn2.num_batches_tracked, layer1.0.conv3.weight, layer1.0.bn3.weight, layer1.0.bn3.bias, layer1.0.bn3.running_mean, layer1.0.bn3.running_var, layer1.0.bn3.num_batches_tracked, layer1.0.downsample.0.weight, layer1.0.downsample.1.weight, layer1.0.downsample.1.bias, layer1.0.downsample.1.running_mean, layer1.0.downsample.1.running_var, layer1.0.downsample.1.num_batches_tracked, layer1.1.conv1.weight, layer1.1.bn1.weight, layer1.1.bn1.bias, layer1.1.bn1.running_mean, layer1.1.bn1.running_var, layer1.1.bn1.num_batches_tracked, layer1.1.conv2.weight, layer1.1.bn2.weight, layer1.1.bn2.bias, layer1.1.bn2.running_mean, layer1.1.bn2.running_var, layer1.1.bn2.num_batches_tracked, layer1.1.conv3.weight, layer1.1.bn3.weight, layer1.1.bn3.bias, layer1.1.bn3.running_mean, layer1.1.bn3.running_var, layer1.1.bn3.num_batches_tracked, layer1.2.conv1.weight, layer1.2.bn1.weight, layer1.2.bn1.bias, layer1.2.bn1.running_mean, layer1.2.bn1.running_var, layer1.2.bn1.num_batches_tracked, layer1.2.conv2.weight, layer1.2.bn2.weight, layer1.2.bn2.bias, layer1.2.bn2.running_mean, layer1.2.bn2.running_var, layer1.2.bn2.num_batches_tracked, layer1.2.conv3.weight, layer1.2.bn3.weight, layer1.2.bn3.bias, layer1.2.bn3.running_mean, layer1.2.bn3.running_var, layer1.2.bn3.num_batches_tracked, layer2.0.conv1.weight, layer2.0.bn1.weight, layer2.0.bn1.bias, layer2.0.bn1.running_mean, layer2.0.bn1.running_var, layer2.0.bn1.num_batches_tracked, layer2.0.conv2.weight, layer2.0.bn2.weight, layer2.0.bn2.bias, layer2.0.bn2.running_mean, layer2.0.bn2.running_var, layer2.0.bn2.num_batches_tracked, layer2.0.conv3.weight, layer2.0.bn3.weight, layer2.0.bn3.bias, layer2.0.bn3.running_mean, layer2.0.bn3.running_var, layer2.0.bn3.num_batches_tracked, layer2.0.downsample.0.weight, layer2.0.downsample.1.weight, layer2.0.downsample.1.bias, layer2.0.downsample.1.running_mean, layer2.0.downsample.1.running_var, layer2.0.downsample.1.num_batches_tracked, layer2.1.conv1.weight, layer2.1.bn1.weight, layer2.1.bn1.bias, layer2.1.bn1.running_mean, layer2.1.bn1.running_var, layer2.1.bn1.num_batches_tracked, layer2.1.conv2.weight, layer2.1.bn2.weight, layer2.1.bn2.bias, layer2.1.bn2.running_mean, layer2.1.bn2.running_var, layer2.1.bn2.num_batches_tracked, layer2.1.conv3.weight, layer2.1.bn3.weight, layer2.1.bn3.bias, layer2.1.bn3.running_mean, layer2.1.bn3.running_var, layer2.1.bn3.num_batches_tracked, layer2.2.conv1.weight, layer2.2.bn1.weight, layer2.2.bn1.bias, layer2.2.bn1.running_mean, layer2.2.bn1.running_var, layer2.2.bn1.num_batches_tracked, layer2.2.conv2.weight, layer2.2.bn2.weight, layer2.2.bn2.bias, layer2.2.bn2.running_mean, layer2.2.bn2.running_var, layer2.2.bn2.num_batches_tracked, layer2.2.conv3.weight, layer2.2.bn3.weight, layer2.2.bn3.bias, layer2.2.bn3.running_mean, layer2.2.bn3.running_var, layer2.2.bn3.num_batches_tracked, layer2.3.conv1.weight, layer2.3.bn1.weight, layer2.3.bn1.bias, layer2.3.bn1.running_mean, layer2.3.bn1.running_var, layer2.3.bn1.num_batches_tracked, layer2.3.conv2.weight, layer2.3.bn2.weight, layer2.3.bn2.bias, layer2.3.bn2.running_mean, layer2.3.bn2.running_var, layer2.3.bn2.num_batches_tracked, layer2.3.conv3.weight, layer2.3.bn3.weight, layer2.3.bn3.bias, layer2.3.bn3.running_mean, layer2.3.bn3.running_var, layer2.3.bn3.num_batches_tracked, layer3.0.conv1.weight, layer3.0.bn1.weight, layer3.0.bn1.bias, layer3.0.bn1.running_mean, layer3.0.bn1.running_var, layer3.0.bn1.num_batches_tracked, layer3.0.conv2.weight, layer3.0.bn2.weight, layer3.0.bn2.bias, layer3.0.bn2.running_mean, layer3.0.bn2.running_var, layer3.0.bn2.num_batches_tracked, layer3.0.conv3.weight, layer3.0.bn3.weight, layer3.0.bn3.bias, layer3.0.bn3.running_mean, layer3.0.bn3.running_var, layer3.0.bn3.num_batches_tracked, layer3.0.downsample.0.weight, layer3.0.downsample.1.weight, layer3.0.downsample.1.bias, layer3.0.downsample.1.running_mean, layer3.0.downsample.1.running_var, layer3.0.downsample.1.num_batches_tracked, layer3.1.conv1.weight, layer3.1.bn1.weight, layer3.1.bn1.bias, layer3.1.bn1.running_mean, layer3.1.bn1.running_var, layer3.1.bn1.num_batches_tracked, layer3.1.conv2.weight, layer3.1.bn2.weight, layer3.1.bn2.bias, layer3.1.bn2.running_mean, layer3.1.bn2.running_var, layer3.1.bn2.num_batches_tracked, layer3.1.conv3.weight, layer3.1.bn3.weight, layer3.1.bn3.bias, layer3.1.bn3.running_mean, layer3.1.bn3.running_var, layer3.1.bn3.num_batches_tracked, layer3.2.conv1.weight, layer3.2.bn1.weight, layer3.2.bn1.bias, layer3.2.bn1.running_mean, layer3.2.bn1.running_var, layer3.2.bn1.num_batches_tracked, layer3.2.conv2.weight, layer3.2.bn2.weight, layer3.2.bn2.bias, layer3.2.bn2.running_mean, layer3.2.bn2.running_var, layer3.2.bn2.num_batches_tracked, layer3.2.conv3.weight, layer3.2.bn3.weight, layer3.2.bn3.bias, layer3.2.bn3.running_mean, layer3.2.bn3.running_var, layer3.2.bn3.num_batches_tracked, layer3.3.conv1.weight, layer3.3.bn1.weight, layer3.3.bn1.bias, layer3.3.bn1.running_mean, layer3.3.bn1.running_var, layer3.3.bn1.num_batches_tracked, layer3.3.conv2.weight, layer3.3.bn2.weight, layer3.3.bn2.bias, layer3.3.bn2.running_mean, layer3.3.bn2.running_var, layer3.3.bn2.num_batches_tracked, layer3.3.conv3.weight, layer3.3.bn3.weight, layer3.3.bn3.bias, layer3.3.bn3.running_mean, layer3.3.bn3.running_var, layer3.3.bn3.num_batches_tracked, layer3.4.conv1.weight, layer3.4.bn1.weight, layer3.4.bn1.bias, layer3.4.bn1.running_mean, layer3.4.bn1.running_var, layer3.4.bn1.num_batches_tracked, layer3.4.conv2.weight, layer3.4.bn2.weight, layer3.4.bn2.bias, layer3.4.bn2.running_mean, layer3.4.bn2.running_var, layer3.4.bn2.num_batches_tracked, layer3.4.conv3.weight, layer3.4.bn3.weight, layer3.4.bn3.bias, layer3.4.bn3.running_mean, layer3.4.bn3.running_var, layer3.4.bn3.num_batches_tracked, layer3.5.conv1.weight, layer3.5.bn1.weight, layer3.5.bn1.bias, layer3.5.bn1.running_mean, layer3.5.bn1.running_var, layer3.5.bn1.num_batches_tracked, layer3.5.conv2.weight, layer3.5.bn2.weight, layer3.5.bn2.bias, layer3.5.bn2.running_mean, layer3.5.bn2.running_var, layer3.5.bn2.num_batches_tracked, layer3.5.conv3.weight, layer3.5.bn3.weight, layer3.5.bn3.bias, layer3.5.bn3.running_mean, layer3.5.bn3.running_var, layer3.5.bn3.num_batches_tracked, layer4.0.conv1.weight, layer4.0.bn1.weight, layer4.0.bn1.bias, layer4.0.bn1.running_mean, layer4.0.bn1.running_var, layer4.0.bn1.num_batches_tracked, layer4.0.conv2.weight, layer4.0.bn2.weight, layer4.0.bn2.bias, layer4.0.bn2.running_mean, layer4.0.bn2.running_var, layer4.0.bn2.num_batches_tracked, layer4.0.conv3.weight, layer4.0.bn3.weight, layer4.0.bn3.bias, layer4.0.bn3.running_mean, layer4.0.bn3.running_var, layer4.0.bn3.num_batches_tracked, layer4.0.downsample.0.weight, layer4.0.downsample.1.weight, layer4.0.downsample.1.bias, layer4.0.downsample.1.running_mean, layer4.0.downsample.1.running_var, layer4.0.downsample.1.num_batches_tracked, layer4.1.conv1.weight, layer4.1.bn1.weight, layer4.1.bn1.bias, layer4.1.bn1.running_mean, layer4.1.bn1.running_var, layer4.1.bn1.num_batches_tracked, layer4.1.conv2.weight, layer4.1.bn2.weight, layer4.1.bn2.bias, layer4.1.bn2.running_mean, layer4.1.bn2.running_var, layer4.1.bn2.num_batches_tracked, layer4.1.conv3.weight, layer4.1.bn3.weight, layer4.1.bn3.bias, layer4.1.bn3.running_mean, layer4.1.bn3.running_var, layer4.1.bn3.num_batches_tracked, layer4.2.conv1.weight, layer4.2.bn1.weight, layer4.2.bn1.bias, layer4.2.bn1.running_mean, layer4.2.bn1.running_var, layer4.2.bn1.num_batches_tracked, layer4.2.conv2.weight, layer4.2.bn2.weight, layer4.2.bn2.bias, layer4.2.bn2.running_mean, layer4.2.bn2.running_var, layer4.2.bn2.num_batches_tracked, layer4.2.conv3.weight, layer4.2.bn3.weight, layer4.2.bn3.bias, layer4.2.bn3.running_mean, layer4.2.bn3.running_var, layer4.2.bn3.num_batches_tracked

missing keys in source state_dict: queue, queue_ptr, queue2, queue2_ptr, encoder_q.0.conv1.weight, encoder_q.0.bn1.weight, encoder_q.0.bn1.bias, encoder_q.0.bn1.running_mean, encoder_q.0.bn1.running_var, encoder_q.0.layer1.0.conv1.weight, encoder_q.0.layer1.0.bn1.weight, encoder_q.0.layer1.0.bn1.bias, encoder_q.0.layer1.0.bn1.running_mean, encoder_q.0.layer1.0.bn1.running_var, encoder_q.0.layer1.0.conv2.weight, encoder_q.0.layer1.0.bn2.weight, encoder_q.0.layer1.0.bn2.bias, encoder_q.0.layer1.0.bn2.running_mean, encoder_q.0.layer1.0.bn2.running_var, encoder_q.0.layer1.0.conv3.weight, encoder_q.0.layer1.0.bn3.weight, encoder_q.0.layer1.0.bn3.bias, encoder_q.0.layer1.0.bn3.running_mean, encoder_q.0.layer1.0.bn3.running_var, encoder_q.0.layer1.0.downsample.0.weight, encoder_q.0.layer1.0.downsample.1.weight, encoder_q.0.layer1.0.downsample.1.bias, encoder_q.0.layer1.0.downsample.1.running_mean, encoder_q.0.layer1.0.downsample.1.running_var, encoder_q.0.layer1.1.conv1.weight, encoder_q.0.layer1.1.bn1.weight, encoder_q.0.layer1.1.bn1.bias, encoder_q.0.layer1.1.bn1.running_mean, encoder_q.0.layer1.1.bn1.running_var, encoder_q.0.layer1.1.conv2.weight, encoder_q.0.layer1.1.bn2.weight, encoder_q.0.layer1.1.bn2.bias, encoder_q.0.layer1.1.bn2.running_mean, encoder_q.0.layer1.1.bn2.running_var, encoder_q.0.layer1.1.conv3.weight, encoder_q.0.layer1.1.bn3.weight, encoder_q.0.layer1.1.bn3.bias, encoder_q.0.layer1.1.bn3.running_mean, encoder_q.0.layer1.1.bn3.running_var, encoder_q.0.layer1.2.conv1.weight, encoder_q.0.layer1.2.bn1.weight, encoder_q.0.layer1.2.bn1.bias, encoder_q.0.layer1.2.bn1.running_mean, encoder_q.0.layer1.2.bn1.running_var, encoder_q.0.layer1.2.conv2.weight, encoder_q.0.layer1.2.bn2.weight, encoder_q.0.layer1.2.bn2.bias, encoder_q.0.layer1.2.bn2.running_mean, encoder_q.0.layer1.2.bn2.running_var, encoder_q.0.layer1.2.conv3.weight, encoder_q.0.layer1.2.bn3.weight, encoder_q.0.layer1.2.bn3.bias, encoder_q.0.layer1.2.bn3.running_mean, encoder_q.0.layer1.2.bn3.running_var, encoder_q.0.layer2.0.conv1.weight, encoder_q.0.layer2.0.bn1.weight, encoder_q.0.layer2.0.bn1.bias, encoder_q.0.layer2.0.bn1.running_mean, encoder_q.0.layer2.0.bn1.running_var, encoder_q.0.layer2.0.conv2.weight, encoder_q.0.layer2.0.bn2.weight, encoder_q.0.layer2.0.bn2.bias, encoder_q.0.layer2.0.bn2.running_mean, encoder_q.0.layer2.0.bn2.running_var, encoder_q.0.layer2.0.conv3.weight, encoder_q.0.layer2.0.bn3.weight, encoder_q.0.layer2.0.bn3.bias, encoder_q.0.layer2.0.bn3.running_mean, encoder_q.0.layer2.0.bn3.running_var, encoder_q.0.layer2.0.downsample.0.weight, encoder_q.0.layer2.0.downsample.1.weight, encoder_q.0.layer2.0.downsample.1.bias, encoder_q.0.layer2.0.downsample.1.running_mean, encoder_q.0.layer2.0.downsample.1.running_var, encoder_q.0.layer2.1.conv1.weight, encoder_q.0.layer2.1.bn1.weight, encoder_q.0.layer2.1.bn1.bias, encoder_q.0.layer2.1.bn1.running_mean, encoder_q.0.layer2.1.bn1.running_var, encoder_q.0.layer2.1.conv2.weight, encoder_q.0.layer2.1.bn2.weight, encoder_q.0.layer2.1.bn2.bias, encoder_q.0.layer2.1.bn2.running_mean, encoder_q.0.layer2.1.bn2.running_var, encoder_q.0.layer2.1.conv3.weight, encoder_q.0.layer2.1.bn3.weight, encoder_q.0.layer2.1.bn3.bias, encoder_q.0.layer2.1.bn3.running_mean, encoder_q.0.layer2.1.bn3.running_var, encoder_q.0.layer2.2.conv1.weight, encoder_q.0.layer2.2.bn1.weight, encoder_q.0.layer2.2.bn1.bias, encoder_q.0.layer2.2.bn1.running_mean, encoder_q.0.layer2.2.bn1.running_var, encoder_q.0.layer2.2.conv2.weight, encoder_q.0.layer2.2.bn2.weight, encoder_q.0.layer2.2.bn2.bias, encoder_q.0.layer2.2.bn2.running_mean, encoder_q.0.layer2.2.bn2.running_var, encoder_q.0.layer2.2.conv3.weight, encoder_q.0.layer2.2.bn3.weight, encoder_q.0.layer2.2.bn3.bias, encoder_q.0.layer2.2.bn3.running_mean, encoder_q.0.layer2.2.bn3.running_var, encoder_q.0.layer2.3.conv1.weight, encoder_q.0.layer2.3.bn1.weight, encoder_q.0.layer2.3.bn1.bias, encoder_q.0.layer2.3.bn1.running_mean, encoder_q.0.layer2.3.bn1.running_var, encoder_q.0.layer2.3.conv2.weight, encoder_q.0.layer2.3.bn2.weight, encoder_q.0.layer2.3.bn2.bias, encoder_q.0.layer2.3.bn2.running_mean, encoder_q.0.layer2.3.bn2.running_var, encoder_q.0.layer2.3.conv3.weight, encoder_q.0.layer2.3.bn3.weight, encoder_q.0.layer2.3.bn3.bias, encoder_q.0.layer2.3.bn3.running_mean, encoder_q.0.layer2.3.bn3.running_var, encoder_q.0.layer3.0.conv1.weight, encoder_q.0.layer3.0.bn1.weight, encoder_q.0.layer3.0.bn1.bias, encoder_q.0.layer3.0.bn1.running_mean, encoder_q.0.layer3.0.bn1.running_var, encoder_q.0.layer3.0.conv2.weight, encoder_q.0.layer3.0.bn2.weight, encoder_q.0.layer3.0.bn2.bias, encoder_q.0.layer3.0.bn2.running_mean, encoder_q.0.layer3.0.bn2.running_var, encoder_q.0.layer3.0.conv3.weight, encoder_q.0.layer3.0.bn3.weight, encoder_q.0.layer3.0.bn3.bias, encoder_q.0.layer3.0.bn3.running_mean, encoder_q.0.layer3.0.bn3.running_var, encoder_q.0.layer3.0.downsample.0.weight, encoder_q.0.layer3.0.downsample.1.weight, encoder_q.0.layer3.0.downsample.1.bias, encoder_q.0.layer3.0.downsample.1.running_mean, encoder_q.0.layer3.0.downsample.1.running_var, encoder_q.0.layer3.1.conv1.weight, encoder_q.0.layer3.1.bn1.weight, encoder_q.0.layer3.1.bn1.bias, encoder_q.0.layer3.1.bn1.running_mean, encoder_q.0.layer3.1.bn1.running_var, encoder_q.0.layer3.1.conv2.weight, encoder_q.0.layer3.1.bn2.weight, encoder_q.0.layer3.1.bn2.bias, encoder_q.0.layer3.1.bn2.running_mean, encoder_q.0.layer3.1.bn2.running_var, encoder_q.0.layer3.1.conv3.weight, encoder_q.0.layer3.1.bn3.weight, encoder_q.0.layer3.1.bn3.bias, encoder_q.0.layer3.1.bn3.running_mean, encoder_q.0.layer3.1.bn3.running_var, encoder_q.0.layer3.2.conv1.weight, encoder_q.0.layer3.2.bn1.weight, encoder_q.0.layer3.2.bn1.bias, encoder_q.0.layer3.2.bn1.running_mean, encoder_q.0.layer3.2.bn1.running_var, encoder_q.0.layer3.2.conv2.weight, encoder_q.0.layer3.2.bn2.weight, encoder_q.0.layer3.2.bn2.bias, encoder_q.0.layer3.2.bn2.running_mean, encoder_q.0.layer3.2.bn2.running_var, encoder_q.0.layer3.2.conv3.weight, encoder_q.0.layer3.2.bn3.weight, encoder_q.0.layer3.2.bn3.bias, encoder_q.0.layer3.2.bn3.running_mean, encoder_q.0.layer3.2.bn3.running_var, encoder_q.0.layer3.3.conv1.weight, encoder_q.0.layer3.3.bn1.weight, encoder_q.0.layer3.3.bn1.bias, encoder_q.0.layer3.3.bn1.running_mean, encoder_q.0.layer3.3.bn1.running_var, encoder_q.0.layer3.3.conv2.weight, encoder_q.0.layer3.3.bn2.weight, encoder_q.0.layer3.3.bn2.bias, encoder_q.0.layer3.3.bn2.running_mean, encoder_q.0.layer3.3.bn2.running_var, encoder_q.0.layer3.3.conv3.weight, encoder_q.0.layer3.3.bn3.weight, encoder_q.0.layer3.3.bn3.bias, encoder_q.0.layer3.3.bn3.running_mean, encoder_q.0.layer3.3.bn3.running_var, encoder_q.0.layer3.4.conv1.weight, encoder_q.0.layer3.4.bn1.weight, encoder_q.0.layer3.4.bn1.bias, encoder_q.0.layer3.4.bn1.running_mean, encoder_q.0.layer3.4.bn1.running_var, encoder_q.0.layer3.4.conv2.weight, encoder_q.0.layer3.4.bn2.weight, encoder_q.0.layer3.4.bn2.bias, encoder_q.0.layer3.4.bn2.running_mean, encoder_q.0.layer3.4.bn2.running_var, encoder_q.0.layer3.4.conv3.weight, encoder_q.0.layer3.4.bn3.weight, encoder_q.0.layer3.4.bn3.bias, encoder_q.0.layer3.4.bn3.running_mean, encoder_q.0.layer3.4.bn3.running_var, encoder_q.0.layer3.5.conv1.weight, encoder_q.0.layer3.5.bn1.weight, encoder_q.0.layer3.5.bn1.bias, encoder_q.0.layer3.5.bn1.running_mean, encoder_q.0.layer3.5.bn1.running_var, encoder_q.0.layer3.5.conv2.weight, encoder_q.0.layer3.5.bn2.weight, encoder_q.0.layer3.5.bn2.bias, encoder_q.0.layer3.5.bn2.running_mean, encoder_q.0.layer3.5.bn2.running_var, encoder_q.0.layer3.5.conv3.weight, encoder_q.0.layer3.5.bn3.weight, encoder_q.0.layer3.5.bn3.bias, encoder_q.0.layer3.5.bn3.running_mean, encoder_q.0.layer3.5.bn3.running_var, encoder_q.0.layer4.0.conv1.weight, encoder_q.0.layer4.0.bn1.weight, encoder_q.0.layer4.0.bn1.bias, encoder_q.0.layer4.0.bn1.running_mean, encoder_q.0.layer4.0.bn1.running_var, encoder_q.0.layer4.0.conv2.weight, encoder_q.0.layer4.0.bn2.weight, encoder_q.0.layer4.0.bn2.bias, encoder_q.0.layer4.0.bn2.running_mean, encoder_q.0.layer4.0.bn2.running_var, encoder_q.0.layer4.0.conv3.weight, encoder_q.0.layer4.0.bn3.weight, encoder_q.0.layer4.0.bn3.bias, encoder_q.0.layer4.0.bn3.running_mean, encoder_q.0.layer4.0.bn3.running_var, encoder_q.0.layer4.0.downsample.0.weight, encoder_q.0.layer4.0.downsample.1.weight, encoder_q.0.layer4.0.downsample.1.bias, encoder_q.0.layer4.0.downsample.1.running_mean, encoder_q.0.layer4.0.downsample.1.running_var, encoder_q.0.layer4.1.conv1.weight, encoder_q.0.layer4.1.bn1.weight, encoder_q.0.layer4.1.bn1.bias, encoder_q.0.layer4.1.bn1.running_mean, encoder_q.0.layer4.1.bn1.running_var, encoder_q.0.layer4.1.conv2.weight, encoder_q.0.layer4.1.bn2.weight, encoder_q.0.layer4.1.bn2.bias, encoder_q.0.layer4.1.bn2.running_mean, encoder_q.0.layer4.1.bn2.running_var, encoder_q.0.layer4.1.conv3.weight, encoder_q.0.layer4.1.bn3.weight, encoder_q.0.layer4.1.bn3.bias, encoder_q.0.layer4.1.bn3.running_mean, encoder_q.0.layer4.1.bn3.running_var, encoder_q.0.layer4.2.conv1.weight, encoder_q.0.layer4.2.bn1.weight, encoder_q.0.layer4.2.bn1.bias, encoder_q.0.layer4.2.bn1.running_mean, encoder_q.0.layer4.2.bn1.running_var, encoder_q.0.layer4.2.conv2.weight, encoder_q.0.layer4.2.bn2.weight, encoder_q.0.layer4.2.bn2.bias, encoder_q.0.layer4.2.bn2.running_mean, encoder_q.0.layer4.2.bn2.running_var, encoder_q.0.layer4.2.conv3.weight, encoder_q.0.layer4.2.bn3.weight, encoder_q.0.layer4.2.bn3.bias, encoder_q.0.layer4.2.bn3.running_mean, encoder_q.0.layer4.2.bn3.running_var, encoder_q.1.mlp.0.weight, encoder_q.1.mlp.0.bias, encoder_q.1.mlp.2.weight, encoder_q.1.mlp.2.bias, encoder_q.1.mlp2.0.weight, encoder_q.1.mlp2.0.bias, encoder_q.1.mlp2.2.weight, encoder_q.1.mlp2.2.bias, encoder_k.0.conv1.weight, encoder_k.0.bn1.weight, encoder_k.0.bn1.bias, encoder_k.0.bn1.running_mean, encoder_k.0.bn1.running_var, encoder_k.0.layer1.0.conv1.weight, encoder_k.0.layer1.0.bn1.weight, encoder_k.0.layer1.0.bn1.bias, encoder_k.0.layer1.0.bn1.running_mean, encoder_k.0.layer1.0.bn1.running_var, encoder_k.0.layer1.0.conv2.weight, encoder_k.0.layer1.0.bn2.weight, encoder_k.0.layer1.0.bn2.bias, encoder_k.0.layer1.0.bn2.running_mean, encoder_k.0.layer1.0.bn2.running_var, encoder_k.0.layer1.0.conv3.weight, encoder_k.0.layer1.0.bn3.weight, encoder_k.0.layer1.0.bn3.bias, encoder_k.0.layer1.0.bn3.running_mean, encoder_k.0.layer1.0.bn3.running_var, encoder_k.0.layer1.0.downsample.0.weight, encoder_k.0.layer1.0.downsample.1.weight, encoder_k.0.layer1.0.downsample.1.bias, encoder_k.0.layer1.0.downsample.1.running_mean, encoder_k.0.layer1.0.downsample.1.running_var, encoder_k.0.layer1.1.conv1.weight, encoder_k.0.layer1.1.bn1.weight, encoder_k.0.layer1.1.bn1.bias, encoder_k.0.layer1.1.bn1.running_mean, encoder_k.0.layer1.1.bn1.running_var, encoder_k.0.layer1.1.conv2.weight, encoder_k.0.layer1.1.bn2.weight, encoder_k.0.layer1.1.bn2.bias, encoder_k.0.layer1.1.bn2.running_mean, encoder_k.0.layer1.1.bn2.running_var, encoder_k.0.layer1.1.conv3.weight, encoder_k.0.layer1.1.bn3.weight, encoder_k.0.layer1.1.bn3.bias, encoder_k.0.layer1.1.bn3.running_mean, encoder_k.0.layer1.1.bn3.running_var, encoder_k.0.layer1.2.conv1.weight, encoder_k.0.layer1.2.bn1.weight, encoder_k.0.layer1.2.bn1.bias, encoder_k.0.layer1.2.bn1.running_mean, encoder_k.0.layer1.2.bn1.running_var, encoder_k.0.layer1.2.conv2.weight, encoder_k.0.layer1.2.bn2.weight, encoder_k.0.layer1.2.bn2.bias, encoder_k.0.layer1.2.bn2.running_mean, encoder_k.0.layer1.2.bn2.running_var, encoder_k.0.layer1.2.conv3.weight, encoder_k.0.layer1.2.bn3.weight, encoder_k.0.layer1.2.bn3.bias, encoder_k.0.layer1.2.bn3.running_mean, encoder_k.0.layer1.2.bn3.running_var, encoder_k.0.layer2.0.conv1.weight, encoder_k.0.layer2.0.bn1.weight, encoder_k.0.layer2.0.bn1.bias, encoder_k.0.layer2.0.bn1.running_mean, encoder_k.0.layer2.0.bn1.running_var, encoder_k.0.layer2.0.conv2.weight, encoder_k.0.layer2.0.bn2.weight, encoder_k.0.layer2.0.bn2.bias, encoder_k.0.layer2.0.bn2.running_mean, encoder_k.0.layer2.0.bn2.running_var, encoder_k.0.layer2.0.conv3.weight, encoder_k.0.layer2.0.bn3.weight, encoder_k.0.layer2.0.bn3.bias, encoder_k.0.layer2.0.bn3.running_mean, encoder_k.0.layer2.0.bn3.running_var, encoder_k.0.layer2.0.downsample.0.weight, encoder_k.0.layer2.0.downsample.1.weight, encoder_k.0.layer2.0.downsample.1.bias, encoder_k.0.layer2.0.downsample.1.running_mean, encoder_k.0.layer2.0.downsample.1.running_var, encoder_k.0.layer2.1.conv1.weight, encoder_k.0.layer2.1.bn1.weight, encoder_k.0.layer2.1.bn1.bias, encoder_k.0.layer2.1.bn1.running_mean, encoder_k.0.layer2.1.bn1.running_var, encoder_k.0.layer2.1.conv2.weight, encoder_k.0.layer2.1.bn2.weight, encoder_k.0.layer2.1.bn2.bias, encoder_k.0.layer2.1.bn2.running_mean, encoder_k.0.layer2.1.bn2.running_var, encoder_k.0.layer2.1.conv3.weight, encoder_k.0.layer2.1.bn3.weight, encoder_k.0.layer2.1.bn3.bias, encoder_k.0.layer2.1.bn3.running_mean, encoder_k.0.layer2.1.bn3.running_var, encoder_k.0.layer2.2.conv1.weight, encoder_k.0.layer2.2.bn1.weight, encoder_k.0.layer2.2.bn1.bias, encoder_k.0.layer2.2.bn1.running_mean, encoder_k.0.layer2.2.bn1.running_var, encoder_k.0.layer2.2.conv2.weight, encoder_k.0.layer2.2.bn2.weight, encoder_k.0.layer2.2.bn2.bias, encoder_k.0.layer2.2.bn2.running_mean, encoder_k.0.layer2.2.bn2.running_var, encoder_k.0.layer2.2.conv3.weight, encoder_k.0.layer2.2.bn3.weight, encoder_k.0.layer2.2.bn3.bias, encoder_k.0.layer2.2.bn3.running_mean, encoder_k.0.layer2.2.bn3.running_var, encoder_k.0.layer2.3.conv1.weight, encoder_k.0.layer2.3.bn1.weight, encoder_k.0.layer2.3.bn1.bias, encoder_k.0.layer2.3.bn1.running_mean, encoder_k.0.layer2.3.bn1.running_var, encoder_k.0.layer2.3.conv2.weight, encoder_k.0.layer2.3.bn2.weight, encoder_k.0.layer2.3.bn2.bias, encoder_k.0.layer2.3.bn2.running_mean, encoder_k.0.layer2.3.bn2.running_var, encoder_k.0.layer2.3.conv3.weight, encoder_k.0.layer2.3.bn3.weight, encoder_k.0.layer2.3.bn3.bias, encoder_k.0.layer2.3.bn3.running_mean, encoder_k.0.layer2.3.bn3.running_var, encoder_k.0.layer3.0.conv1.weight, encoder_k.0.layer3.0.bn1.weight, encoder_k.0.layer3.0.bn1.bias, encoder_k.0.layer3.0.bn1.running_mean, encoder_k.0.layer3.0.bn1.running_var, encoder_k.0.layer3.0.conv2.weight, encoder_k.0.layer3.0.bn2.weight, encoder_k.0.layer3.0.bn2.bias, encoder_k.0.layer3.0.bn2.running_mean, encoder_k.0.layer3.0.bn2.running_var, encoder_k.0.layer3.0.conv3.weight, encoder_k.0.layer3.0.bn3.weight, encoder_k.0.layer3.0.bn3.bias, encoder_k.0.layer3.0.bn3.running_mean, encoder_k.0.layer3.0.bn3.running_var, encoder_k.0.layer3.0.downsample.0.weight, encoder_k.0.layer3.0.downsample.1.weight, encoder_k.0.layer3.0.downsample.1.bias, encoder_k.0.layer3.0.downsample.1.running_mean, encoder_k.0.layer3.0.downsample.1.running_var, encoder_k.0.layer3.1.conv1.weight, encoder_k.0.layer3.1.bn1.weight, encoder_k.0.layer3.1.bn1.bias, encoder_k.0.layer3.1.bn1.running_mean, encoder_k.0.layer3.1.bn1.running_var, encoder_k.0.layer3.1.conv2.weight, encoder_k.0.layer3.1.bn2.weight, encoder_k.0.layer3.1.bn2.bias, encoder_k.0.layer3.1.bn2.running_mean, encoder_k.0.layer3.1.bn2.running_var, encoder_k.0.layer3.1.conv3.weight, encoder_k.0.layer3.1.bn3.weight, encoder_k.0.layer3.1.bn3.bias, encoder_k.0.layer3.1.bn3.running_mean, encoder_k.0.layer3.1.bn3.running_var, encoder_k.0.layer3.2.conv1.weight, encoder_k.0.layer3.2.bn1.weight, encoder_k.0.layer3.2.bn1.bias, encoder_k.0.layer3.2.bn1.running_mean, encoder_k.0.layer3.2.bn1.running_var, encoder_k.0.layer3.2.conv2.weight, encoder_k.0.layer3.2.bn2.weight, encoder_k.0.layer3.2.bn2.bias, encoder_k.0.layer3.2.bn2.running_mean, encoder_k.0.layer3.2.bn2.running_var, encoder_k.0.layer3.2.conv3.weight, encoder_k.0.layer3.2.bn3.weight, encoder_k.0.layer3.2.bn3.bias, encoder_k.0.layer3.2.bn3.running_mean, encoder_k.0.layer3.2.bn3.running_var, encoder_k.0.layer3.3.conv1.weight, encoder_k.0.layer3.3.bn1.weight, encoder_k.0.layer3.3.bn1.bias, encoder_k.0.layer3.3.bn1.running_mean, encoder_k.0.layer3.3.bn1.running_var, encoder_k.0.layer3.3.conv2.weight, encoder_k.0.layer3.3.bn2.weight, encoder_k.0.layer3.3.bn2.bias, encoder_k.0.layer3.3.bn2.running_mean, encoder_k.0.layer3.3.bn2.running_var, encoder_k.0.layer3.3.conv3.weight, encoder_k.0.layer3.3.bn3.weight, encoder_k.0.layer3.3.bn3.bias, encoder_k.0.layer3.3.bn3.running_mean, encoder_k.0.layer3.3.bn3.running_var, encoder_k.0.layer3.4.conv1.weight, encoder_k.0.layer3.4.bn1.weight, encoder_k.0.layer3.4.bn1.bias, encoder_k.0.layer3.4.bn1.running_mean, encoder_k.0.layer3.4.bn1.running_var, encoder_k.0.layer3.4.conv2.weight, encoder_k.0.layer3.4.bn2.weight, encoder_k.0.layer3.4.bn2.bias, encoder_k.0.layer3.4.bn2.running_mean, encoder_k.0.layer3.4.bn2.running_var, encoder_k.0.layer3.4.conv3.weight, encoder_k.0.layer3.4.bn3.weight, encoder_k.0.layer3.4.bn3.bias, encoder_k.0.layer3.4.bn3.running_mean, encoder_k.0.layer3.4.bn3.running_var, encoder_k.0.layer3.5.conv1.weight, encoder_k.0.layer3.5.bn1.weight, encoder_k.0.layer3.5.bn1.bias, encoder_k.0.layer3.5.bn1.running_mean, encoder_k.0.layer3.5.bn1.running_var, encoder_k.0.layer3.5.conv2.weight, encoder_k.0.layer3.5.bn2.weight, encoder_k.0.layer3.5.bn2.bias, encoder_k.0.layer3.5.bn2.running_mean, encoder_k.0.layer3.5.bn2.running_var, encoder_k.0.layer3.5.conv3.weight, encoder_k.0.layer3.5.bn3.weight, encoder_k.0.layer3.5.bn3.bias, encoder_k.0.layer3.5.bn3.running_mean, encoder_k.0.layer3.5.bn3.running_var, encoder_k.0.layer4.0.conv1.weight, encoder_k.0.layer4.0.bn1.weight, encoder_k.0.layer4.0.bn1.bias, encoder_k.0.layer4.0.bn1.running_mean, encoder_k.0.layer4.0.bn1.running_var, encoder_k.0.layer4.0.conv2.weight, encoder_k.0.layer4.0.bn2.weight, encoder_k.0.layer4.0.bn2.bias, encoder_k.0.layer4.0.bn2.running_mean, encoder_k.0.layer4.0.bn2.running_var, encoder_k.0.layer4.0.conv3.weight, encoder_k.0.layer4.0.bn3.weight, encoder_k.0.layer4.0.bn3.bias, encoder_k.0.layer4.0.bn3.running_mean, encoder_k.0.layer4.0.bn3.running_var, encoder_k.0.layer4.0.downsample.0.weight, encoder_k.0.layer4.0.downsample.1.weight, encoder_k.0.layer4.0.downsample.1.bias, encoder_k.0.layer4.0.downsample.1.running_mean, encoder_k.0.layer4.0.downsample.1.running_var, encoder_k.0.layer4.1.conv1.weight, encoder_k.0.layer4.1.bn1.weight, encoder_k.0.layer4.1.bn1.bias, encoder_k.0.layer4.1.bn1.running_mean, encoder_k.0.layer4.1.bn1.running_var, encoder_k.0.layer4.1.conv2.weight, encoder_k.0.layer4.1.bn2.weight, encoder_k.0.layer4.1.bn2.bias, encoder_k.0.layer4.1.bn2.running_mean, encoder_k.0.layer4.1.bn2.running_var, encoder_k.0.layer4.1.conv3.weight, encoder_k.0.layer4.1.bn3.weight, encoder_k.0.layer4.1.bn3.bias, encoder_k.0.layer4.1.bn3.running_mean, encoder_k.0.layer4.1.bn3.running_var, encoder_k.0.layer4.2.conv1.weight, encoder_k.0.layer4.2.bn1.weight, encoder_k.0.layer4.2.bn1.bias, encoder_k.0.layer4.2.bn1.running_mean, encoder_k.0.layer4.2.bn1.running_var, encoder_k.0.layer4.2.conv2.weight, encoder_k.0.layer4.2.bn2.weight, encoder_k.0.layer4.2.bn2.bias, encoder_k.0.layer4.2.bn2.running_mean, encoder_k.0.layer4.2.bn2.running_var, encoder_k.0.layer4.2.conv3.weight, encoder_k.0.layer4.2.bn3.weight, encoder_k.0.layer4.2.bn3.bias, encoder_k.0.layer4.2.bn3.running_mean, encoder_k.0.layer4.2.bn3.running_var, encoder_k.1.mlp.0.weight, encoder_k.1.mlp.0.bias, encoder_k.1.mlp.2.weight, encoder_k.1.mlp.2.bias, encoder_k.1.mlp2.0.weight, encoder_k.1.mlp2.0.bias, encoder_k.1.mlp2.2.weight, encoder_k.1.mlp2.2.bias, backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, backbone.bn1.running_var, backbone.layer1.0.conv1.weight, backbone.layer1.0.bn1.weight, backbone.layer1.0.bn1.bias, backbone.layer1.0.bn1.running_mean, backbone.layer1.0.bn1.running_var, backbone.layer1.0.conv2.weight, backbone.layer1.0.bn2.weight, backbone.layer1.0.bn2.bias, backbone.layer1.0.bn2.running_mean, backbone.layer1.0.bn2.running_var, backbone.layer1.0.conv3.weight, backbone.layer1.0.bn3.weight, backbone.layer1.0.bn3.bias, backbone.layer1.0.bn3.running_mean, backbone.layer1.0.bn3.running_var, backbone.layer1.0.downsample.0.weight, backbone.layer1.0.downsample.1.weight, backbone.layer1.0.downsample.1.bias, backbone.layer1.0.downsample.1.running_mean, backbone.layer1.0.downsample.1.running_var, backbone.layer1.1.conv1.weight, backbone.layer1.1.bn1.weight, backbone.layer1.1.bn1.bias, backbone.layer1.1.bn1.running_mean, backbone.layer1.1.bn1.running_var, backbone.layer1.1.conv2.weight, backbone.layer1.1.bn2.weight, backbone.layer1.1.bn2.bias, backbone.layer1.1.bn2.running_mean, backbone.layer1.1.bn2.running_var, backbone.layer1.1.conv3.weight, backbone.layer1.1.bn3.weight, backbone.layer1.1.bn3.bias, backbone.layer1.1.bn3.running_mean, backbone.layer1.1.bn3.running_var, backbone.layer1.2.conv1.weight, backbone.layer1.2.bn1.weight, backbone.layer1.2.bn1.bias, backbone.layer1.2.bn1.running_mean, backbone.layer1.2.bn1.running_var, backbone.layer1.2.conv2.weight, backbone.layer1.2.bn2.weight, backbone.layer1.2.bn2.bias, backbone.layer1.2.bn2.running_mean, backbone.layer1.2.bn2.running_var, backbone.layer1.2.conv3.weight, backbone.layer1.2.bn3.weight, backbone.layer1.2.bn3.bias, backbone.layer1.2.bn3.running_mean, backbone.layer1.2.bn3.running_var, backbone.layer2.0.conv1.weight, backbone.layer2.0.bn1.weight, backbone.layer2.0.bn1.bias, backbone.layer2.0.bn1.running_mean, backbone.layer2.0.bn1.running_var, backbone.layer2.0.conv2.weight, backbone.layer2.0.bn2.weight, backbone.layer2.0.bn2.bias, backbone.layer2.0.bn2.running_mean, backbone.layer2.0.bn2.running_var, backbone.layer2.0.conv3.weight, backbone.layer2.0.bn3.weight, backbone.layer2.0.bn3.bias, backbone.layer2.0.bn3.running_mean, backbone.layer2.0.bn3.running_var, backbone.layer2.0.downsample.0.weight, backbone.layer2.0.downsample.1.weight, backbone.layer2.0.downsample.1.bias, backbone.layer2.0.downsample.1.running_mean, backbone.layer2.0.downsample.1.running_var, backbone.layer2.1.conv1.weight, backbone.layer2.1.bn1.weight, backbone.layer2.1.bn1.bias, backbone.layer2.1.bn1.running_mean, backbone.layer2.1.bn1.running_var, backbone.layer2.1.conv2.weight, backbone.layer2.1.bn2.weight, backbone.layer2.1.bn2.bias, backbone.layer2.1.bn2.running_mean, backbone.layer2.1.bn2.running_var, backbone.layer2.1.conv3.weight, backbone.layer2.1.bn3.weight, backbone.layer2.1.bn3.bias, backbone.layer2.1.bn3.running_mean, backbone.layer2.1.bn3.running_var, backbone.layer2.2.conv1.weight, backbone.layer2.2.bn1.weight, backbone.layer2.2.bn1.bias, backbone.layer2.2.bn1.running_mean, backbone.layer2.2.bn1.running_var, backbone.layer2.2.conv2.weight, backbone.layer2.2.bn2.weight, backbone.layer2.2.bn2.bias, backbone.layer2.2.bn2.running_mean, backbone.layer2.2.bn2.running_var, backbone.layer2.2.conv3.weight, backbone.layer2.2.bn3.weight, backbone.layer2.2.bn3.bias, backbone.layer2.2.bn3.running_mean, backbone.layer2.2.bn3.running_var, backbone.layer2.3.conv1.weight, backbone.layer2.3.bn1.weight, backbone.layer2.3.bn1.bias, backbone.layer2.3.bn1.running_mean, backbone.layer2.3.bn1.running_var, backbone.layer2.3.conv2.weight, backbone.layer2.3.bn2.weight, backbone.layer2.3.bn2.bias, backbone.layer2.3.bn2.running_mean, backbone.layer2.3.bn2.running_var, backbone.layer2.3.conv3.weight, backbone.layer2.3.bn3.weight, backbone.layer2.3.bn3.bias, backbone.layer2.3.bn3.running_mean, backbone.layer2.3.bn3.running_var, backbone.layer3.0.conv1.weight, backbone.layer3.0.bn1.weight, backbone.layer3.0.bn1.bias, backbone.layer3.0.bn1.running_mean, backbone.layer3.0.bn1.running_var, backbone.layer3.0.conv2.weight, backbone.layer3.0.bn2.weight, backbone.layer3.0.bn2.bias, backbone.layer3.0.bn2.running_mean, backbone.layer3.0.bn2.running_var, backbone.layer3.0.conv3.weight, backbone.layer3.0.bn3.weight, backbone.layer3.0.bn3.bias, backbone.layer3.0.bn3.running_mean, backbone.layer3.0.bn3.running_var, backbone.layer3.0.downsample.0.weight, backbone.layer3.0.downsample.1.weight, backbone.layer3.0.downsample.1.bias, backbone.layer3.0.downsample.1.running_mean, backbone.layer3.0.downsample.1.running_var, backbone.layer3.1.conv1.weight, backbone.layer3.1.bn1.weight, backbone.layer3.1.bn1.bias, backbone.layer3.1.bn1.running_mean, backbone.layer3.1.bn1.running_var, backbone.layer3.1.conv2.weight, backbone.layer3.1.bn2.weight, backbone.layer3.1.bn2.bias, backbone.layer3.1.bn2.running_mean, backbone.layer3.1.bn2.running_var, backbone.layer3.1.conv3.weight, backbone.layer3.1.bn3.weight, backbone.layer3.1.bn3.bias, backbone.layer3.1.bn3.running_mean, backbone.layer3.1.bn3.running_var, backbone.layer3.2.conv1.weight, backbone.layer3.2.bn1.weight, backbone.layer3.2.bn1.bias, backbone.layer3.2.bn1.running_mean, backbone.layer3.2.bn1.running_var, backbone.layer3.2.conv2.weight, backbone.layer3.2.bn2.weight, backbone.layer3.2.bn2.bias, backbone.layer3.2.bn2.running_mean, backbone.layer3.2.bn2.running_var, backbone.layer3.2.conv3.weight, backbone.layer3.2.bn3.weight, backbone.layer3.2.bn3.bias, backbone.layer3.2.bn3.running_mean, backbone.layer3.2.bn3.running_var, backbone.layer3.3.conv1.weight, backbone.layer3.3.bn1.weight, backbone.layer3.3.bn1.bias, backbone.layer3.3.bn1.running_mean, backbone.layer3.3.bn1.running_var, backbone.layer3.3.conv2.weight, backbone.layer3.3.bn2.weight, backbone.layer3.3.bn2.bias, backbone.layer3.3.bn2.running_mean, backbone.layer3.3.bn2.running_var, backbone.layer3.3.conv3.weight, backbone.layer3.3.bn3.weight, backbone.layer3.3.bn3.bias, backbone.layer3.3.bn3.running_mean, backbone.layer3.3.bn3.running_var, backbone.layer3.4.conv1.weight, backbone.layer3.4.bn1.weight, backbone.layer3.4.bn1.bias, backbone.layer3.4.bn1.running_mean, backbone.layer3.4.bn1.running_var, backbone.layer3.4.conv2.weight, backbone.layer3.4.bn2.weight, backbone.layer3.4.bn2.bias, backbone.layer3.4.bn2.running_mean, backbone.layer3.4.bn2.running_var, backbone.layer3.4.conv3.weight, backbone.layer3.4.bn3.weight, backbone.layer3.4.bn3.bias, backbone.layer3.4.bn3.running_mean, backbone.layer3.4.bn3.running_var, backbone.layer3.5.conv1.weight, backbone.layer3.5.bn1.weight, backbone.layer3.5.bn1.bias, backbone.layer3.5.bn1.running_mean, backbone.layer3.5.bn1.running_var, backbone.layer3.5.conv2.weight, backbone.layer3.5.bn2.weight, backbone.layer3.5.bn2.bias, backbone.layer3.5.bn2.running_mean, backbone.layer3.5.bn2.running_var, backbone.layer3.5.conv3.weight, backbone.layer3.5.bn3.weight, backbone.layer3.5.bn3.bias, backbone.layer3.5.bn3.running_mean, backbone.layer3.5.bn3.running_var, backbone.layer4.0.conv1.weight, backbone.layer4.0.bn1.weight, backbone.layer4.0.bn1.bias, backbone.layer4.0.bn1.running_mean, backbone.layer4.0.bn1.running_var, backbone.layer4.0.conv2.weight, backbone.layer4.0.bn2.weight, backbone.layer4.0.bn2.bias, backbone.layer4.0.bn2.running_mean, backbone.layer4.0.bn2.running_var, backbone.layer4.0.conv3.weight, backbone.layer4.0.bn3.weight, backbone.layer4.0.bn3.bias, backbone.layer4.0.bn3.running_mean, backbone.layer4.0.bn3.running_var, backbone.layer4.0.downsample.0.weight, backbone.layer4.0.downsample.1.weight, backbone.layer4.0.downsample.1.bias, backbone.layer4.0.downsample.1.running_mean, backbone.layer4.0.downsample.1.running_var, backbone.layer4.1.conv1.weight, backbone.layer4.1.bn1.weight, backbone.layer4.1.bn1.bias, backbone.layer4.1.bn1.running_mean, backbone.layer4.1.bn1.running_var, backbone.layer4.1.conv2.weight, backbone.layer4.1.bn2.weight, backbone.layer4.1.bn2.bias, backbone.layer4.1.bn2.running_mean, backbone.layer4.1.bn2.running_var, backbone.layer4.1.conv3.weight, backbone.layer4.1.bn3.weight, backbone.layer4.1.bn3.bias, backbone.layer4.1.bn3.running_mean, backbone.layer4.1.bn3.running_var, backbone.layer4.2.conv1.weight, backbone.layer4.2.bn1.weight, backbone.layer4.2.bn1.bias, backbone.layer4.2.bn1.running_mean, backbone.layer4.2.bn1.running_var, backbone.layer4.2.conv2.weight, backbone.layer4.2.bn2.weight, backbone.layer4.2.bn2.bias, backbone.layer4.2.bn2.running_mean, backbone.layer4.2.bn2.running_var, backbone.layer4.2.conv3.weight, backbone.layer4.2.bn3.weight, backbone.layer4.2.bn3.bias, backbone.layer4.2.bn3.running_mean, backbone.layer4.2.bn3.running_var

Dimensions of data

Thanks for the wonderful work about DenseCL.I have a code-related questions and want to consult with you.That is why the input len(x)==1 of DenseCLNeck, and what is the dimension of x[0]?

    assert len(x) == 1
    x = x[0]

2.3 negative sample

I have 1 question and hope to hear from you:
In section, 2.3 ''Each negative key t_ is the pooled feature vector of a view from a different image.''
Why not use the other parts of the two views of the same image as negative samples?
This seems more make sense.

DenseNeck design

Have you tried different output channels for single projection and dense projection? Particularly, you used the same hidden channels and output channels for single mlp and dense mlp in the DenseCLNeck impl. As I know, the projection of instance representation requires a greater number of channels than the projection of dense representation. Treating both of them equally might lose lots of useful information from instance representation. How do you think about this problem? Most instance discrimination methods also design the projector as fc-bn-relu-fc so I wonder why you drop bn in DenseCLNeck? Is it just for simplicity?

        self.mlp = nn.Sequential(
            nn.Linear(in_channels, hid_channels), nn.ReLU(inplace=True),
            nn.Linear(hid_channels, out_channels))
        ...
        self.mlp2 = nn.Sequential(
            nn.Conv2d(in_channels, hid_channels, 1), nn.ReLU(inplace=True),
            nn.Conv2d(hid_channels, out_channels, 1))

Inferior performance on PASCAL VOC12 with DeepLabV3+

Thanks for revealing your code and the results are impressive.

I've tried the downloaded DenseCL pretrained models and tested on the VOC semantic segmentation dataset. When using the same FCN architecture, the result performance matches the expectation. The DenseCL ImageNet pretrained model outperforms the ImageNet classification model. However, when replacing the backbones of DeepLabV3+, the DenseCL model showed inferior performance. The results comparisons are as below:

Arch Dataset Pretrained Model mIoU
dv3+ VOC12 Sup ImageNet 71.33
dv3+ VOC12 DenseCL COCO 67.51
dv3+ VOC12 DenseCL ImageNet 69.5

The configs are borrowed from the official configs of MMSEG and I carefully tried to not make much modifications. Wondering if you have ever noticed same behavior on any other models or datasets?

Why use argmax for matching?

Hey there, thanks for sharing the code!

I just have a quick question:
Why is the argmax used to match features from different views, rather than using the spatial correspondance, which we have access to, since we know which data augmentations were applied to the images?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.