wxinlong / densecl Goto Github PK

View Code? Open in Web Editor NEW

541.0 541.0 69.0 556 KB

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021 Oral.

License: GNU General Public License v3.0

Python 96.69% Shell 3.31%

cvpr2021 dense-contrastive-learning densecl self-supervised-learning

densecl's People

Contributors

Stargazers

Watchers

Forkers

anikily apeizou xychen9459 wangxiaodong1021 peterzhousz zqsiat zzzzzz0407 githubltqc joejiong atch841 florinshen zeta1999 alohays wpfhtl l3str4nge vijayraven95 nebulordang lewislou ancientmooner triangleczh zhangzongliang tuskaw yongjunhe11 lifunudt ceringstrom xrosliang jie311 rogerzhangzz yangsuhui chnxindong minibat4 nickchang97 zwwwayne huilinyang nur1225 ayanamireifan runqing-formost yfh-yufeihu swhan0329 sbaokun trungpx franklinxzw hippogriff metavai peternara jlqzzz supersupermoon wuyongfa-genius hangeramber yxchng jeffjewett27 forest-repo dbaofd allysakate underthemangotree otakbeku jdekun whuhxb hhhtty chensiyu00 zdstandup pantheon5100 xufenggao tama14142356 iq-scm aiyeshi runngezhang yichuanyanyu26 arimitsu06

densecl's Issues

About Linear classification results on ImageNet

what about Linear classification results on ImageNet ?

How to visualize dense correspondence?

This is a great job.
Could you give more details about the visualization of dense correspondence?

Semantic segmentation on PASCAL VOC

@WXinlong,

Thanks for sharing your great work!
I was able to reproduce your object detection result on Pascal VOC.
However, when I tested semantic segmentation on Pascal VOC using your pre-trained model on ImageNet1k "densecl_r50_imagenet_200ep.pth", I got mIoU 0.62, which is worse than the 0.69 reported in your paper. My test procedure is explained below,

Install your modified mmsegmentation
Download "densecl_r50_imagenet_200ep.pth" from your website
Update the 5th line of code in fcn_r50-d8.py to pretrained='/pretrained/densecl_r50_imagenet_200ep.pth'
Run ./tools/dist_train.sh configs/densecl/fcn_r50-d8_512x512_20k_voc12aug.py 2 --work-dir models/fcn_r50-d8_512x512_20k_voc12aug (running on 2 GPUs)

I got a result of mIoU 0.62, mAcc: 0.75, aAcc: 0.91 at the end of the training. I ran 3 rounds and got similar results. Attached are my configuration file and training log.

Do you know a possible reason?
Thanks!

20230213_204945.log
fcn_r50-d8_512x512_20k_voc12aug.zip

The performance of detection in VOC

(8gpus) When I use the pretrained network with coco-800ep-resnet50 to do the detection task with VOC, the "AP" is only 44.76, while you can achieve 56.7. I don't konw why the gap is so large. Note that I change the batchsize from 16 to 8, and as a result, the base lr is set from 0.02 to 0.01.

Clarification on checkpoints

Hi!
Are the provided checkpoints after pretraining or fine-tuning?

Thank you.

The performance of detection in COCO

Based on MMDetection，train COCO2017 & val COCO2017

FasterR-CNN,r50 From torchvision://resnet50

       1x: bbox_mAP: 0.3750

FasterR-CNN,r50 From My Reproduction Model Pretrained on ImageNet

       1x: bbox_mAP: 0.3580

FasterR-CNN,r50 From Your Pretrained Mode on ImageNetl

       1x: bbox_mAP: 0.3550

Which is not as good as expected? Could you give a help？

Training an Pretrained model on object detection task on single GPU

Hi @WXinlong thanks for the wonderful work.

I want to train the pre-trained model on the downstream task of object detection. I used the pre-trained model of mocov2 with 800 epochs here

I have followed the following process
step 1: Install detectron2.

step 2: Convert a pre-trained MoCo model to detectron2's format:

python3 convert-pretrain-to-detectron2.py input.pth.tar output.pkl
Put dataset under "./datasets" directory, following the directory structure required by detectron2.

step 3: Run training:

python train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml \
 --num-gpus 1 MODEL.WEIGHTS ./output.pkl

The only change I did is used a single gpu rather than 8 gpu

I am getting the following error an

[08/31 12:42:12] fvcore.common.checkpoint WARNING: Some model parameters or buffers are not found in the checkpoint:
�[34mproposal_generator.rpn_head.anchor_deltas.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.conv.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.objectness_logits.{bias, weight}�[0m
�[34mroi_heads.box_predictor.bbox_pred.{bias, weight}�[0m
�[34mroi_heads.box_predictor.cls_score.{bias, weight}�[0m
�[34mroi_heads.res5.norm.{bias, running_mean, running_var, weight}�[0m
[08/31 12:42:12] fvcore.common.checkpoint WARNING: The checkpoint state_dict contains keys that are not used by the model:
  �[35mstem.fc.0.{bias, weight}�[0m
  �[35mstem.fc.2.{bias, weight}�[0m
[08/31 12:42:12] d2.engine.train_loop INFO: Starting training from iteration 0
[08/31 12:42:13] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/defaults.py", line 493, in run_step
    self._trainer.run_step()
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 273, in run_step
    loss_dict = self.model(data)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 154, in forward
    features = self.backbone(images.tensor)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 445, in forward
    x = self.stem(x)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 356, in forward
    x = self.conv1(x)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/layers/wrappers.py", line 88, in forward
    x = self.norm(x)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
    return _get_group_size(group)
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
    assert _default_pg is not None, \
AssertionError: Default process group is not initialized
[08/31 12:42:13] d2.engine.hooks INFO: Total training time: 0:00:00 (0:00:00 on hooks)
[08/31 12:42:13] d2.utils.events INFO:  iter: 0    lr: N/A  max_mem: 207M

how can we run the training on a single GPU ?
attached are the logs for details
log 3.23.54 PM.txt

Have you tried on keypoint matching task?

Since the model is trained on the dense matching loss, it would be natural to evaluate its performance on keypoint matching task and compare with sotas. May I know if you have conducted experiments or have related plan? Thank you!

GPU training problem

Are the weights trained by 2 gpus different from those trained by 8 gpus in downstream tasks?? Because the overall batch size is different. Hope to get a reply.

How to get negative key t_

In the paper, each negative key t− is the pooled feature vector of a view from a different image. I still don't know the exactly meaning of 'pooled feature vector'. Can you explain it? Thank you.

About the dense correspondence

Hi, thanks for your contribution, very interesting approach!

Have you tried to compute the dense correspondence directly from the geometric transformation (resize / crop / flip) between the views?

Neck weights

I only found resnet weights after pretraining, it would be usefull to have access to neck weights as well.

Is that possible ?

Thanks for your work

The checkpoint with neck.

Hello, thanks for your interesting work. I notice that the checkpoints that you released only contain the backbone. Now, I want a checkpoint with the neck and other data. Could you release it (pretrained on imagenet with 200 epoch)?

[Err]: RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 279, in forward
return self.forward_train(img, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 200, in forward_train
im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 132, in _batch_shuffle_ddp
x_gather = concat_all_gather(x)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 297, in concat_all_gather
for _ in range(torch.distributed.get_world_size())
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size
return _get_group_size(group)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size
python-BaseException
default_pg = _get_default_group()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group
raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Is it possible to gain dense correspondence from the known data augmentation?

Hi, Thank you very much for the nice work!

I have a question about the dense correspondence of views. In the paper, the correspondence is gained by calculating the similarity between feature vectors from the backbone. Since the data augmentation (e.g. rotating, cropping, flipping) performed to each view of the same image is known, it's possible to obtain the correspondence directly from these transformations.

For example, Image A is a left-right flipped copy of Image B. The two images are encoded to 3x3 feature maps, which can be represented as:

fa1, fa2, fa3
fa4, fa5, fa6
fa7, fa8, fa9

and

fb1, fb2, fb3
fb4, fb5, fb6
fb7, fb8, fb9

Since A and B are flipped views of the same image, the correspondence could be (fa1, fb3), (fa2, fb2), (fa3, fb1), ... .

From my perspective, the transformation-motivated correspondence is more straightforward but the paper doesn't use it. Are there any intuitions behind this?

Thank you again!

KeyError: 'GaussianBlur is already registered in pipeline'

Hi,
I am trying to run the code to train COCO (train2017) self supervised, I tried installing several times with the instructions but when run training it kept saying a lot of messages: KeyError: 'GaussianBlur is already registered in pipeline', and the code instantly stopped.

Command: bash tools/dist_train.sh configs/selfsup/densecl/densecl_coco_800ep.py 8

I am using torch version 1.7.1, CUDA 9.2. torch.cuda.is_available() = True

Have you tried reproduced the results in an entire new machine and faced this error?

Could you help me some suggestions on this bug?

Training speed

Could you provide me with the training log. My training process is extremely slow. Thank you.

Performance of Semantic Segmentation on Pascal VOC

Hi, I tried to reproduce your results on VOC Semantic Segmentation, but only got mIOU = 46.87 (while you can achieve 69.4)
Can you give me some help?

Here's the steps I have done for reproduction.

Download your Pretrained model DenseCL IN/200Epoch
Follow your steps in Readme.md

I did not modify any setting about batch size or learning rate.
Is there anything I have ignored?

dataset preparing

I've download ImageNet from Kaggle and can't find the train.txt, could you tell me where to download this file? @WXinlong
Here are the files can be download from Kaggle:

The performance of DenseCL on classification task

Hi, @WXinlong . Thanks for the great work.
Since the article claims that the proposed method mainly aims to solve the dense prediction tasks (e.g., detection and segmentation), I wonder if you have tried DenseCL on the classification task and what is the performance.

Evaluation setting on Semantic segmentation

Thanks for your outstanding work. Here is a question about the evaluation setting on Semantic segmentation
Dis you used "two extra 3×3 convolutions of 256 channels, with BN and ReLU, and then a 1×1 convolution for per�pixel classification. The total stride is 16 (FCN-16s [43]). We set dilation = 6 in the two extra 3×3 convolutions, following the large field-of-view design in [6]" this setting during the evaluation of Semantic segmentation (the same as mMoCo), or just used a classic FCN?

About the loss of Denscl

I tried your algorithm for training and found that the loss is a bit strange. It rose from 8.0 at the beginning to 9.3 and then slowly dropped to 7.3. What is the reason? Is this normal?

Semi-supervised object detection

Hi, thanks for your excellent work! Could you kindly release the corresponding 10% training data list and config for semi-supervised object detection in Table 3 of your paper? Thanks in advance!

GaussianBlur is already registered in pipeline

How can I evalute the performance of the model pretrained in coco(800)?

Details about loss_lambda warmup

Thank you for your great work.
Could you give the implementation detail or code of the loss_lambda warmup setting stated in the DenseCL paper?

The model and loaded state dict do not match exactly

Hi when I try to use extract.py to extract the features, I download the pretrained model from the link and run, but it shows the following:

The model and loaded state dict do not match exactly

unexpected key in source state_dict: conv1.weight, bn1.weight, bn1.bias, bn1.running_mean, bn1.running_var, bn1.num_batches_tracked, layer1.0.conv1.weight, layer1.0.bn1.weight, layer1.0.bn1.bias, layer1.0.bn1.running_mean, layer1.0.bn1.running_var, layer1.0.bn1.num_batches_tracked, layer1.0.conv2.weight, layer1.0.bn2.weight, layer1.0.bn2.bias, layer1.0.bn2.running_mean, layer1.0.bn2.running_var, layer1.0.bn2.num_batches_tracked, layer1.0.conv3.weight, layer1.0.bn3.weight, layer1.0.bn3.bias, layer1.0.bn3.running_mean, layer1.0.bn3.running_var, layer1.0.bn3.num_batches_tracked, layer1.0.downsample.0.weight, layer1.0.downsample.1.weight, layer1.0.downsample.1.bias, layer1.0.downsample.1.running_mean, layer1.0.downsample.1.running_var, layer1.0.downsample.1.num_batches_tracked, layer1.1.conv1.weight, layer1.1.bn1.weight, layer1.1.bn1.bias, layer1.1.bn1.running_mean, layer1.1.bn1.running_var, layer1.1.bn1.num_batches_tracked, layer1.1.conv2.weight, layer1.1.bn2.weight, layer1.1.bn2.bias, layer1.1.bn2.running_mean, layer1.1.bn2.running_var, layer1.1.bn2.num_batches_tracked, layer1.1.conv3.weight, layer1.1.bn3.weight, layer1.1.bn3.bias, layer1.1.bn3.running_mean, layer1.1.bn3.running_var, layer1.1.bn3.num_batches_tracked, layer1.2.conv1.weight, layer1.2.bn1.weight, layer1.2.bn1.bias, layer1.2.bn1.running_mean, layer1.2.bn1.running_var, layer1.2.bn1.num_batches_tracked, layer1.2.conv2.weight, layer1.2.bn2.weight, layer1.2.bn2.bias, layer1.2.bn2.running_mean, layer1.2.bn2.running_var, layer1.2.bn2.num_batches_tracked, layer1.2.conv3.weight, layer1.2.bn3.weight, layer1.2.bn3.bias, layer1.2.bn3.running_mean, layer1.2.bn3.running_var, layer1.2.bn3.num_batches_tracked, layer2.0.conv1.weight, layer2.0.bn1.weight, layer2.0.bn1.bias, layer2.0.bn1.running_mean, layer2.0.bn1.running_var, layer2.0.bn1.num_batches_tracked, layer2.0.conv2.weight, layer2.0.bn2.weight, layer2.0.bn2.bias, layer2.0.bn2.running_mean, layer2.0.bn2.running_var, layer2.0.bn2.num_batches_tracked, layer2.0.conv3.weight, layer2.0.bn3.weight, layer2.0.bn3.bias, layer2.0.bn3.running_mean, layer2.0.bn3.running_var, layer2.0.bn3.num_batches_tracked, layer2.0.downsample.0.weight, layer2.0.downsample.1.weight, layer2.0.downsample.1.bias, layer2.0.downsample.1.running_mean, layer2.0.downsample.1.running_var, layer2.0.downsample.1.num_batches_tracked, layer2.1.conv1.weight, layer2.1.bn1.weight, layer2.1.bn1.bias, layer2.1.bn1.running_mean, layer2.1.bn1.running_var, layer2.1.bn1.num_batches_tracked, layer2.1.conv2.weight, layer2.1.bn2.weight, layer2.1.bn2.bias, layer2.1.bn2.running_mean, layer2.1.bn2.running_var, layer2.1.bn2.num_batches_tracked, layer2.1.conv3.weight, layer2.1.bn3.weight, layer2.1.bn3.bias, layer2.1.bn3.running_mean, layer2.1.bn3.running_var, layer2.1.bn3.num_batches_tracked, layer2.2.conv1.weight, layer2.2.bn1.weight, layer2.2.bn1.bias, layer2.2.bn1.running_mean, layer2.2.bn1.running_var, layer2.2.bn1.num_batches_tracked, layer2.2.conv2.weight, layer2.2.bn2.weight, layer2.2.bn2.bias, layer2.2.bn2.running_mean, layer2.2.bn2.running_var, layer2.2.bn2.num_batches_tracked, layer2.2.conv3.weight, layer2.2.bn3.weight, layer2.2.bn3.bias, layer2.2.bn3.running_mean, layer2.2.bn3.running_var, layer2.2.bn3.num_batches_tracked, layer2.3.conv1.weight, layer2.3.bn1.weight, layer2.3.bn1.bias, layer2.3.bn1.running_mean, layer2.3.bn1.running_var, layer2.3.bn1.num_batches_tracked, layer2.3.conv2.weight, layer2.3.bn2.weight, layer2.3.bn2.bias, layer2.3.bn2.running_mean, layer2.3.bn2.running_var, layer2.3.bn2.num_batches_tracked, layer2.3.conv3.weight, layer2.3.bn3.weight, layer2.3.bn3.bias, layer2.3.bn3.running_mean, layer2.3.bn3.running_var, layer2.3.bn3.num_batches_tracked, layer3.0.conv1.weight, layer3.0.bn1.weight, layer3.0.bn1.bias, layer3.0.bn1.running_mean, layer3.0.bn1.running_var, layer3.0.bn1.num_batches_tracked, layer3.0.conv2.weight, layer3.0.bn2.weight, layer3.0.bn2.bias, layer3.0.bn2.running_mean, layer3.0.bn2.running_var, layer3.0.bn2.num_batches_tracked, layer3.0.conv3.weight, layer3.0.bn3.weight, layer3.0.bn3.bias, layer3.0.bn3.running_mean, layer3.0.bn3.running_var, layer3.0.bn3.num_batches_tracked, layer3.0.downsample.0.weight, layer3.0.downsample.1.weight, layer3.0.downsample.1.bias, layer3.0.downsample.1.running_mean, layer3.0.downsample.1.running_var, layer3.0.downsample.1.num_batches_tracked, layer3.1.conv1.weight, layer3.1.bn1.weight, layer3.1.bn1.bias, layer3.1.bn1.running_mean, layer3.1.bn1.running_var, layer3.1.bn1.num_batches_tracked, layer3.1.conv2.weight, layer3.1.bn2.weight, layer3.1.bn2.bias, layer3.1.bn2.running_mean, layer3.1.bn2.running_var, layer3.1.bn2.num_batches_tracked, layer3.1.conv3.weight, layer3.1.bn3.weight, layer3.1.bn3.bias, layer3.1.bn3.running_mean, layer3.1.bn3.running_var, layer3.1.bn3.num_batches_tracked, layer3.2.conv1.weight, layer3.2.bn1.weight, layer3.2.bn1.bias, layer3.2.bn1.running_mean, layer3.2.bn1.running_var, layer3.2.bn1.num_batches_tracked, layer3.2.conv2.weight, layer3.2.bn2.weight, layer3.2.bn2.bias, layer3.2.bn2.running_mean, layer3.2.bn2.running_var, layer3.2.bn2.num_batches_tracked, layer3.2.conv3.weight, layer3.2.bn3.weight, layer3.2.bn3.bias, layer3.2.bn3.running_mean, layer3.2.bn3.running_var, layer3.2.bn3.num_batches_tracked, layer3.3.conv1.weight, layer3.3.bn1.weight, layer3.3.bn1.bias, layer3.3.bn1.running_mean, layer3.3.bn1.running_var, layer3.3.bn1.num_batches_tracked, layer3.3.conv2.weight, layer3.3.bn2.weight, layer3.3.bn2.bias, layer3.3.bn2.running_mean, layer3.3.bn2.running_var, layer3.3.bn2.num_batches_tracked, layer3.3.conv3.weight, layer3.3.bn3.weight, layer3.3.bn3.bias, layer3.3.bn3.running_mean, layer3.3.bn3.running_var, layer3.3.bn3.num_batches_tracked, layer3.4.conv1.weight, layer3.4.bn1.weight, layer3.4.bn1.bias, layer3.4.bn1.running_mean, layer3.4.bn1.running_var, layer3.4.bn1.num_batches_tracked, layer3.4.conv2.weight, layer3.4.bn2.weight, layer3.4.bn2.bias, layer3.4.bn2.running_mean, layer3.4.bn2.running_var, layer3.4.bn2.num_batches_tracked, layer3.4.conv3.weight, layer3.4.bn3.weight, layer3.4.bn3.bias, layer3.4.bn3.running_mean, layer3.4.bn3.running_var, layer3.4.bn3.num_batches_tracked, layer3.5.conv1.weight, layer3.5.bn1.weight, layer3.5.bn1.bias, layer3.5.bn1.running_mean, layer3.5.bn1.running_var, layer3.5.bn1.num_batches_tracked, layer3.5.conv2.weight, layer3.5.bn2.weight, layer3.5.bn2.bias, layer3.5.bn2.running_mean, layer3.5.bn2.running_var, layer3.5.bn2.num_batches_tracked, layer3.5.conv3.weight, layer3.5.bn3.weight, layer3.5.bn3.bias, layer3.5.bn3.running_mean, layer3.5.bn3.running_var, layer3.5.bn3.num_batches_tracked, layer4.0.conv1.weight, layer4.0.bn1.weight, layer4.0.bn1.bias, layer4.0.bn1.running_mean, layer4.0.bn1.running_var, layer4.0.bn1.num_batches_tracked, layer4.0.conv2.weight, layer4.0.bn2.weight, layer4.0.bn2.bias, layer4.0.bn2.running_mean, layer4.0.bn2.running_var, layer4.0.bn2.num_batches_tracked, layer4.0.conv3.weight, layer4.0.bn3.weight, layer4.0.bn3.bias, layer4.0.bn3.running_mean, layer4.0.bn3.running_var, layer4.0.bn3.num_batches_tracked, layer4.0.downsample.0.weight, layer4.0.downsample.1.weight, layer4.0.downsample.1.bias, layer4.0.downsample.1.running_mean, layer4.0.downsample.1.running_var, layer4.0.downsample.1.num_batches_tracked, layer4.1.conv1.weight, layer4.1.bn1.weight, layer4.1.bn1.bias, layer4.1.bn1.running_mean, layer4.1.bn1.running_var, layer4.1.bn1.num_batches_tracked, layer4.1.conv2.weight, layer4.1.bn2.weight, layer4.1.bn2.bias, layer4.1.bn2.running_mean, layer4.1.bn2.running_var, layer4.1.bn2.num_batches_tracked, layer4.1.conv3.weight, layer4.1.bn3.weight, layer4.1.bn3.bias, layer4.1.bn3.running_mean, layer4.1.bn3.running_var, layer4.1.bn3.num_batches_tracked, layer4.2.conv1.weight, layer4.2.bn1.weight, layer4.2.bn1.bias, layer4.2.bn1.running_mean, layer4.2.bn1.running_var, layer4.2.bn1.num_batches_tracked, layer4.2.conv2.weight, layer4.2.bn2.weight, layer4.2.bn2.bias, layer4.2.bn2.running_mean, layer4.2.bn2.running_var, layer4.2.bn2.num_batches_tracked, layer4.2.conv3.weight, layer4.2.bn3.weight, layer4.2.bn3.bias, layer4.2.bn3.running_mean, layer4.2.bn3.running_var, layer4.2.bn3.num_batches_tracked

missing keys in source state_dict: queue, queue_ptr, queue2, queue2_ptr, encoder_q.0.conv1.weight, encoder_q.0.bn1.weight, encoder_q.0.bn1.bias, encoder_q.0.bn1.running_mean, encoder_q.0.bn1.running_var, encoder_q.0.layer1.0.conv1.weight, encoder_q.0.layer1.0.bn1.weight, encoder_q.0.layer1.0.bn1.bias, encoder_q.0.layer1.0.bn1.running_mean, encoder_q.0.layer1.0.bn1.running_var, encoder_q.0.layer1.0.conv2.weight, encoder_q.0.layer1.0.bn2.weight, encoder_q.0.layer1.0.bn2.bias, encoder_q.0.layer1.0.bn2.running_mean, encoder_q.0.layer1.0.bn2.running_var, encoder_q.0.layer1.0.conv3.weight, encoder_q.0.layer1.0.bn3.weight, encoder_q.0.layer1.0.bn3.bias, encoder_q.0.layer1.0.bn3.running_mean, encoder_q.0.layer1.0.bn3.running_var, encoder_q.0.layer1.0.downsample.0.weight, encoder_q.0.layer1.0.downsample.1.weight, encoder_q.0.layer1.0.downsample.1.bias, encoder_q.0.layer1.0.downsample.1.running_mean, encoder_q.0.layer1.0.downsample.1.running_var, encoder_q.0.layer1.1.conv1.weight, encoder_q.0.layer1.1.bn1.weight, encoder_q.0.layer1.1.bn1.bias, encoder_q.0.layer1.1.bn1.running_mean, encoder_q.0.layer1.1.bn1.running_var, encoder_q.0.layer1.1.conv2.weight, encoder_q.0.layer1.1.bn2.weight, encoder_q.0.layer1.1.bn2.bias, encoder_q.0.layer1.1.bn2.running_mean, encoder_q.0.layer1.1.bn2.running_var, encoder_q.0.layer1.1.conv3.weight, encoder_q.0.layer1.1.bn3.weight, encoder_q.0.layer1.1.bn3.bias, encoder_q.0.layer1.1.bn3.running_mean, encoder_q.0.layer1.1.bn3.running_var, encoder_q.0.layer1.2.conv1.weight, encoder_q.0.layer1.2.bn1.weight, encoder_q.0.layer1.2.bn1.bias, encoder_q.0.layer1.2.bn1.running_mean, encoder_q.0.layer1.2.bn1.running_var, encoder_q.0.layer1.2.conv2.weight, encoder_q.0.layer1.2.bn2.weight, encoder_q.0.layer1.2.bn2.bias, encoder_q.0.layer1.2.bn2.running_mean, encoder_q.0.layer1.2.bn2.running_var, encoder_q.0.layer1.2.conv3.weight, encoder_q.0.layer1.2.bn3.weight, encoder_q.0.layer1.2.bn3.bias, encoder_q.0.layer1.2.bn3.running_mean, encoder_q.0.layer1.2.bn3.running_var, encoder_q.0.layer2.0.conv1.weight, encoder_q.0.layer2.0.bn1.weight, encoder_q.0.layer2.0.bn1.bias, encoder_q.0.layer2.0.bn1.running_mean, encoder_q.0.layer2.0.bn1.running_var, encoder_q.0.layer2.0.conv2.weight, encoder_q.0.layer2.0.bn2.weight, encoder_q.0.layer2.0.bn2.bias, encoder_q.0.layer2.0.bn2.running_mean, encoder_q.0.layer2.0.bn2.running_var, encoder_q.0.layer2.0.conv3.weight, encoder_q.0.layer2.0.bn3.weight, encoder_q.0.layer2.0.bn3.bias, encoder_q.0.layer2.0.bn3.running_mean, encoder_q.0.layer2.0.bn3.running_var, encoder_q.0.layer2.0.downsample.0.weight, encoder_q.0.layer2.0.downsample.1.weight, encoder_q.0.layer2.0.downsample.1.bias, encoder_q.0.layer2.0.downsample.1.running_mean, encoder_q.0.layer2.0.downsample.1.running_var, encoder_q.0.layer2.1.conv1.weight, encoder_q.0.layer2.1.bn1.weight, encoder_q.0.layer2.1.bn1.bias, encoder_q.0.layer2.1.bn1.running_mean, encoder_q.0.layer2.1.bn1.running_var, encoder_q.0.layer2.1.conv2.weight, encoder_q.0.layer2.1.bn2.weight, encoder_q.0.layer2.1.bn2.bias, encoder_q.0.layer2.1.bn2.running_mean, encoder_q.0.layer2.1.bn2.running_var, encoder_q.0.layer2.1.conv3.weight, encoder_q.0.layer2.1.bn3.weight, encoder_q.0.layer2.1.bn3.bias, encoder_q.0.layer2.1.bn3.running_mean, encoder_q.0.layer2.1.bn3.running_var, encoder_q.0.layer2.2.conv1.weight, encoder_q.0.layer2.2.bn1.weight, encoder_q.0.layer2.2.bn1.bias, encoder_q.0.layer2.2.bn1.running_mean, encoder_q.0.layer2.2.bn1.running_var, encoder_q.0.layer2.2.conv2.weight, encoder_q.0.layer2.2.bn2.weight, encoder_q.0.layer2.2.bn2.bias, encoder_q.0.layer2.2.bn2.running_mean, encoder_q.0.layer2.2.bn2.running_var, encoder_q.0.layer2.2.conv3.weight, encoder_q.0.layer2.2.bn3.weight, encoder_q.0.layer2.2.bn3.bias, encoder_q.0.layer2.2.bn3.running_mean, encoder_q.0.layer2.2.bn3.running_var, encoder_q.0.layer2.3.conv1.weight, encoder_q.0.layer2.3.bn1.weight, encoder_q.0.layer2.3.bn1.bias, encoder_q.0.layer2.3.bn1.running_mean, encoder_q.0.layer2.3.bn1.running_var, encoder_q.0.layer2.3.conv2.weight, encoder_q.0.layer2.3.bn2.weight, encoder_q.0.layer2.3.bn2.bias, encoder_q.0.layer2.3.bn2.running_mean, encoder_q.0.layer2.3.bn2.running_var, encoder_q.0.layer2.3.conv3.weight, encoder_q.0.layer2.3.bn3.weight, encoder_q.0.layer2.3.bn3.bias, encoder_q.0.layer2.3.bn3.running_mean, encoder_q.0.layer2.3.bn3.running_var, encoder_q.0.layer3.0.conv1.weight, encoder_q.0.layer3.0.bn1.weight, encoder_q.0.layer3.0.bn1.bias, encoder_q.0.layer3.0.bn1.running_mean, encoder_q.0.layer3.0.bn1.running_var, encoder_q.0.layer3.0.conv2.weight, encoder_q.0.layer3.0.bn2.weight, encoder_q.0.layer3.0.bn2.bias, encoder_q.0.layer3.0.bn2.running_mean, encoder_q.0.layer3.0.bn2.running_var, encoder_q.0.layer3.0.conv3.weight, encoder_q.0.layer3.0.bn3.weight, encoder_q.0.layer3.0.bn3.bias, encoder_q.0.layer3.0.bn3.running_mean, encoder_q.0.layer3.0.bn3.running_var, encoder_q.0.layer3.0.downsample.0.weight, encoder_q.0.layer3.0.downsample.1.weight, encoder_q.0.layer3.0.downsample.1.bias, encoder_q.0.layer3.0.downsample.1.running_mean, encoder_q.0.layer3.0.downsample.1.running_var, encoder_q.0.layer3.1.conv1.weight, encoder_q.0.layer3.1.bn1.weight, encoder_q.0.layer3.1.bn1.bias, encoder_q.0.layer3.1.bn1.running_mean, encoder_q.0.layer3.1.bn1.running_var, encoder_q.0.layer3.1.conv2.weight, encoder_q.0.layer3.1.bn2.weight, encoder_q.0.layer3.1.bn2.bias, encoder_q.0.layer3.1.bn2.running_mean, encoder_q.0.layer3.1.bn2.running_var, encoder_q.0.layer3.1.conv3.weight, encoder_q.0.layer3.1.bn3.weight, encoder_q.0.layer3.1.bn3.bias, encoder_q.0.layer3.1.bn3.running_mean, encoder_q.0.layer3.1.bn3.running_var, encoder_q.0.layer3.2.conv1.weight, encoder_q.0.layer3.2.bn1.weight, encoder_q.0.layer3.2.bn1.bias, encoder_q.0.layer3.2.bn1.running_mean, encoder_q.0.layer3.2.bn1.running_var, encoder_q.0.layer3.2.conv2.weight, encoder_q.0.layer3.2.bn2.weight, encoder_q.0.layer3.2.bn2.bias, encoder_q.0.layer3.2.bn2.running_mean, encoder_q.0.layer3.2.bn2.running_var, encoder_q.0.layer3.2.conv3.weight, encoder_q.0.layer3.2.bn3.weight, encoder_q.0.layer3.2.bn3.bias, encoder_q.0.layer3.2.bn3.running_mean, encoder_q.0.layer3.2.bn3.running_var, encoder_q.0.layer3.3.conv1.weight, encoder_q.0.layer3.3.bn1.weight, encoder_q.0.layer3.3.bn1.bias, encoder_q.0.layer3.3.bn1.running_mean, encoder_q.0.layer3.3.bn1.running_var, encoder_q.0.layer3.3.conv2.weight, encoder_q.0.layer3.3.bn2.weight, encoder_q.0.layer3.3.bn2.bias, encoder_q.0.layer3.3.bn2.running_mean, encoder_q.0.layer3.3.bn2.running_var, encoder_q.0.layer3.3.conv3.weight, encoder_q.0.layer3.3.bn3.weight, encoder_q.0.layer3.3.bn3.bias, encoder_q.0.layer3.3.bn3.running_mean, encoder_q.0.layer3.3.bn3.running_var, encoder_q.0.layer3.4.conv1.weight, encoder_q.0.layer3.4.bn1.weight, encoder_q.0.layer3.4.bn1.bias, encoder_q.0.layer3.4.bn1.running_mean, encoder_q.0.layer3.4.bn1.running_var, encoder_q.0.layer3.4.conv2.weight, encoder_q.0.layer3.4.bn2.weight, encoder_q.0.layer3.4.bn2.bias, encoder_q.0.layer3.4.bn2.running_mean, encoder_q.0.layer3.4.bn2.running_var, encoder_q.0.layer3.4.conv3.weight, encoder_q.0.layer3.4.bn3.weight, encoder_q.0.layer3.4.bn3.bias, encoder_q.0.layer3.4.bn3.running_mean, encoder_q.0.layer3.4.bn3.running_var, encoder_q.0.layer3.5.conv1.weight, encoder_q.0.layer3.5.bn1.weight, encoder_q.0.layer3.5.bn1.bias, encoder_q.0.layer3.5.bn1.running_mean, encoder_q.0.layer3.5.bn1.running_var, encoder_q.0.layer3.5.conv2.weight, encoder_q.0.layer3.5.bn2.weight, encoder_q.0.layer3.5.bn2.bias, encoder_q.0.layer3.5.bn2.running_mean, encoder_q.0.layer3.5.bn2.running_var, encoder_q.0.layer3.5.conv3.weight, encoder_q.0.layer3.5.bn3.weight, encoder_q.0.layer3.5.bn3.bias, encoder_q.0.layer3.5.bn3.running_mean, encoder_q.0.layer3.5.bn3.running_var, encoder_q.0.layer4.0.conv1.weight, encoder_q.0.layer4.0.bn1.weight, encoder_q.0.layer4.0.bn1.bias, encoder_q.0.layer4.0.bn1.running_mean, encoder_q.0.layer4.0.bn1.running_var, encoder_q.0.layer4.0.conv2.weight, encoder_q.0.layer4.0.bn2.weight, encoder_q.0.layer4.0.bn2.bias, encoder_q.0.layer4.0.bn2.running_mean, encoder_q.0.layer4.0.bn2.running_var, encoder_q.0.layer4.0.conv3.weight, encoder_q.0.layer4.0.bn3.weight, encoder_q.0.layer4.0.bn3.bias, encoder_q.0.layer4.0.bn3.running_mean, encoder_q.0.layer4.0.bn3.running_var, encoder_q.0.layer4.0.downsample.0.weight, encoder_q.0.layer4.0.downsample.1.weight, encoder_q.0.layer4.0.downsample.1.bias, encoder_q.0.layer4.0.downsample.1.running_mean, encoder_q.0.layer4.0.downsample.1.running_var, encoder_q.0.layer4.1.conv1.weight, encoder_q.0.layer4.1.bn1.weight, encoder_q.0.layer4.1.bn1.bias, encoder_q.0.layer4.1.bn1.running_mean, encoder_q.0.layer4.1.bn1.running_var, encoder_q.0.layer4.1.conv2.weight, encoder_q.0.layer4.1.bn2.weight, encoder_q.0.layer4.1.bn2.bias, encoder_q.0.layer4.1.bn2.running_mean, encoder_q.0.layer4.1.bn2.running_var, encoder_q.0.layer4.1.conv3.weight, encoder_q.0.layer4.1.bn3.weight, encoder_q.0.layer4.1.bn3.bias, encoder_q.0.layer4.1.bn3.running_mean, encoder_q.0.layer4.1.bn3.running_var, encoder_q.0.layer4.2.conv1.weight, encoder_q.0.layer4.2.bn1.weight, encoder_q.0.layer4.2.bn1.bias, encoder_q.0.layer4.2.bn1.running_mean, encoder_q.0.layer4.2.bn1.running_var, encoder_q.0.layer4.2.conv2.weight, encoder_q.0.layer4.2.bn2.weight, encoder_q.0.layer4.2.bn2.bias, encoder_q.0.layer4.2.bn2.running_mean, encoder_q.0.layer4.2.bn2.running_var, encoder_q.0.layer4.2.conv3.weight, encoder_q.0.layer4.2.bn3.weight, encoder_q.0.layer4.2.bn3.bias, encoder_q.0.layer4.2.bn3.running_mean, encoder_q.0.layer4.2.bn3.running_var, encoder_q.1.mlp.0.weight, encoder_q.1.mlp.0.bias, encoder_q.1.mlp.2.weight, encoder_q.1.mlp.2.bias, encoder_q.1.mlp2.0.weight, encoder_q.1.mlp2.0.bias, encoder_q.1.mlp2.2.weight, encoder_q.1.mlp2.2.bias, encoder_k.0.conv1.weight, encoder_k.0.bn1.weight, encoder_k.0.bn1.bias, encoder_k.0.bn1.running_mean, encoder_k.0.bn1.running_var, encoder_k.0.layer1.0.conv1.weight, encoder_k.0.layer1.0.bn1.weight, encoder_k.0.layer1.0.bn1.bias, encoder_k.0.layer1.0.bn1.running_mean, encoder_k.0.layer1.0.bn1.running_var, encoder_k.0.layer1.0.conv2.weight, encoder_k.0.layer1.0.bn2.weight, encoder_k.0.layer1.0.bn2.bias, encoder_k.0.layer1.0.bn2.running_mean, encoder_k.0.layer1.0.bn2.running_var, encoder_k.0.layer1.0.conv3.weight, encoder_k.0.layer1.0.bn3.weight, encoder_k.0.layer1.0.bn3.bias, encoder_k.0.layer1.0.bn3.running_mean, encoder_k.0.layer1.0.bn3.running_var, encoder_k.0.layer1.0.downsample.0.weight, encoder_k.0.layer1.0.downsample.1.weight, encoder_k.0.layer1.0.downsample.1.bias, encoder_k.0.layer1.0.downsample.1.running_mean, encoder_k.0.layer1.0.downsample.1.running_var, encoder_k.0.layer1.1.conv1.weight, encoder_k.0.layer1.1.bn1.weight, encoder_k.0.layer1.1.bn1.bias, encoder_k.0.layer1.1.bn1.running_mean, encoder_k.0.layer1.1.bn1.running_var, encoder_k.0.layer1.1.conv2.weight, encoder_k.0.layer1.1.bn2.weight, encoder_k.0.layer1.1.bn2.bias, encoder_k.0.layer1.1.bn2.running_mean, encoder_k.0.layer1.1.bn2.running_var, encoder_k.0.layer1.1.conv3.weight, encoder_k.0.layer1.1.bn3.weight, encoder_k.0.layer1.1.bn3.bias, encoder_k.0.layer1.1.bn3.running_mean, encoder_k.0.layer1.1.bn3.running_var, encoder_k.0.layer1.2.conv1.weight, encoder_k.0.layer1.2.bn1.weight, encoder_k.0.layer1.2.bn1.bias, encoder_k.0.layer1.2.bn1.running_mean, encoder_k.0.layer1.2.bn1.running_var, encoder_k.0.layer1.2.conv2.weight, encoder_k.0.layer1.2.bn2.weight, encoder_k.0.layer1.2.bn2.bias, encoder_k.0.layer1.2.bn2.running_mean, encoder_k.0.layer1.2.bn2.running_var, encoder_k.0.layer1.2.conv3.weight, encoder_k.0.layer1.2.bn3.weight, encoder_k.0.layer1.2.bn3.bias, encoder_k.0.layer1.2.bn3.running_mean, encoder_k.0.layer1.2.bn3.running_var, encoder_k.0.layer2.0.conv1.weight, encoder_k.0.layer2.0.bn1.weight, encoder_k.0.layer2.0.bn1.bias, encoder_k.0.layer2.0.bn1.running_mean, encoder_k.0.layer2.0.bn1.running_var, encoder_k.0.layer2.0.conv2.weight, encoder_k.0.layer2.0.bn2.weight, encoder_k.0.layer2.0.bn2.bias, encoder_k.0.layer2.0.bn2.running_mean, encoder_k.0.layer2.0.bn2.running_var, encoder_k.0.layer2.0.conv3.weight, encoder_k.0.layer2.0.bn3.weight, encoder_k.0.layer2.0.bn3.bias, encoder_k.0.layer2.0.bn3.running_mean, encoder_k.0.layer2.0.bn3.running_var, encoder_k.0.layer2.0.downsample.0.weight, encoder_k.0.layer2.0.downsample.1.weight, encoder_k.0.layer2.0.downsample.1.bias, encoder_k.0.layer2.0.downsample.1.running_mean, encoder_k.0.layer2.0.downsample.1.running_var, encoder_k.0.layer2.1.conv1.weight, encoder_k.0.layer2.1.bn1.weight, encoder_k.0.layer2.1.bn1.bias, encoder_k.0.layer2.1.bn1.running_mean, encoder_k.0.layer2.1.bn1.running_var, encoder_k.0.layer2.1.conv2.weight, encoder_k.0.layer2.1.bn2.weight, encoder_k.0.layer2.1.bn2.bias, encoder_k.0.layer2.1.bn2.running_mean, encoder_k.0.layer2.1.bn2.running_var, encoder_k.0.layer2.1.conv3.weight, encoder_k.0.layer2.1.bn3.weight, encoder_k.0.layer2.1.bn3.bias, encoder_k.0.layer2.1.bn3.running_mean, encoder_k.0.layer2.1.bn3.running_var, encoder_k.0.layer2.2.conv1.weight, encoder_k.0.layer2.2.bn1.weight, encoder_k.0.layer2.2.bn1.bias, encoder_k.0.layer2.2.bn1.running_mean, encoder_k.0.layer2.2.bn1.running_var, encoder_k.0.layer2.2.conv2.weight, encoder_k.0.layer2.2.bn2.weight, encoder_k.0.layer2.2.bn2.bias, encoder_k.0.layer2.2.bn2.running_mean, encoder_k.0.layer2.2.bn2.running_var, encoder_k.0.layer2.2.conv3.weight, encoder_k.0.layer2.2.bn3.weight, encoder_k.0.layer2.2.bn3.bias, encoder_k.0.layer2.2.bn3.running_mean, encoder_k.0.layer2.2.bn3.running_var, encoder_k.0.layer2.3.conv1.weight, encoder_k.0.layer2.3.bn1.weight, encoder_k.0.layer2.3.bn1.bias, encoder_k.0.layer2.3.bn1.running_mean, encoder_k.0.layer2.3.bn1.running_var, encoder_k.0.layer2.3.conv2.weight, encoder_k.0.layer2.3.bn2.weight, encoder_k.0.layer2.3.bn2.bias, encoder_k.0.layer2.3.bn2.running_mean, encoder_k.0.layer2.3.bn2.running_var, encoder_k.0.layer2.3.conv3.weight, encoder_k.0.layer2.3.bn3.weight, encoder_k.0.layer2.3.bn3.bias, encoder_k.0.layer2.3.bn3.running_mean, encoder_k.0.layer2.3.bn3.running_var, encoder_k.0.layer3.0.conv1.weight, encoder_k.0.layer3.0.bn1.weight, encoder_k.0.layer3.0.bn1.bias, encoder_k.0.layer3.0.bn1.running_mean, encoder_k.0.layer3.0.bn1.running_var, encoder_k.0.layer3.0.conv2.weight, encoder_k.0.layer3.0.bn2.weight, encoder_k.0.layer3.0.bn2.bias, encoder_k.0.layer3.0.bn2.running_mean, encoder_k.0.layer3.0.bn2.running_var, encoder_k.0.layer3.0.conv3.weight, encoder_k.0.layer3.0.bn3.weight, encoder_k.0.layer3.0.bn3.bias, encoder_k.0.layer3.0.bn3.running_mean, encoder_k.0.layer3.0.bn3.running_var, encoder_k.0.layer3.0.downsample.0.weight, encoder_k.0.layer3.0.downsample.1.weight, encoder_k.0.layer3.0.downsample.1.bias, encoder_k.0.layer3.0.downsample.1.running_mean, encoder_k.0.layer3.0.downsample.1.running_var, encoder_k.0.layer3.1.conv1.weight, encoder_k.0.layer3.1.bn1.weight, encoder_k.0.layer3.1.bn1.bias, encoder_k.0.layer3.1.bn1.running_mean, encoder_k.0.layer3.1.bn1.running_var, encoder_k.0.layer3.1.conv2.weight, encoder_k.0.layer3.1.bn2.weight, encoder_k.0.layer3.1.bn2.bias, encoder_k.0.layer3.1.bn2.running_mean, encoder_k.0.layer3.1.bn2.running_var, encoder_k.0.layer3.1.conv3.weight, encoder_k.0.layer3.1.bn3.weight, encoder_k.0.layer3.1.bn3.bias, encoder_k.0.layer3.1.bn3.running_mean, encoder_k.0.layer3.1.bn3.running_var, encoder_k.0.layer3.2.conv1.weight, encoder_k.0.layer3.2.bn1.weight, encoder_k.0.layer3.2.bn1.bias, encoder_k.0.layer3.2.bn1.running_mean, encoder_k.0.layer3.2.bn1.running_var, encoder_k.0.layer3.2.conv2.weight, encoder_k.0.layer3.2.bn2.weight, encoder_k.0.layer3.2.bn2.bias, encoder_k.0.layer3.2.bn2.running_mean, encoder_k.0.layer3.2.bn2.running_var, encoder_k.0.layer3.2.conv3.weight, encoder_k.0.layer3.2.bn3.weight, encoder_k.0.layer3.2.bn3.bias, encoder_k.0.layer3.2.bn3.running_mean, encoder_k.0.layer3.2.bn3.running_var, encoder_k.0.layer3.3.conv1.weight, encoder_k.0.layer3.3.bn1.weight, encoder_k.0.layer3.3.bn1.bias, encoder_k.0.layer3.3.bn1.running_mean, encoder_k.0.layer3.3.bn1.running_var, encoder_k.0.layer3.3.conv2.weight, encoder_k.0.layer3.3.bn2.weight, encoder_k.0.layer3.3.bn2.bias, encoder_k.0.layer3.3.bn2.running_mean, encoder_k.0.layer3.3.bn2.running_var, encoder_k.0.layer3.3.conv3.weight, encoder_k.0.layer3.3.bn3.weight, encoder_k.0.layer3.3.bn3.bias, encoder_k.0.layer3.3.bn3.running_mean, encoder_k.0.layer3.3.bn3.running_var, encoder_k.0.layer3.4.conv1.weight, encoder_k.0.layer3.4.bn1.weight, encoder_k.0.layer3.4.bn1.bias, encoder_k.0.layer3.4.bn1.running_mean, encoder_k.0.layer3.4.bn1.running_var, encoder_k.0.layer3.4.conv2.weight, encoder_k.0.layer3.4.bn2.weight, encoder_k.0.layer3.4.bn2.bias, encoder_k.0.layer3.4.bn2.running_mean, encoder_k.0.layer3.4.bn2.running_var, encoder_k.0.layer3.4.conv3.weight, encoder_k.0.layer3.4.bn3.weight, encoder_k.0.layer3.4.bn3.bias, encoder_k.0.layer3.4.bn3.running_mean, encoder_k.0.layer3.4.bn3.running_var, encoder_k.0.layer3.5.conv1.weight, encoder_k.0.layer3.5.bn1.weight, encoder_k.0.layer3.5.bn1.bias, encoder_k.0.layer3.5.bn1.running_mean, encoder_k.0.layer3.5.bn1.running_var, encoder_k.0.layer3.5.conv2.weight, encoder_k.0.layer3.5.bn2.weight, encoder_k.0.layer3.5.bn2.bias, encoder_k.0.layer3.5.bn2.running_mean, encoder_k.0.layer3.5.bn2.running_var, encoder_k.0.layer3.5.conv3.weight, encoder_k.0.layer3.5.bn3.weight, encoder_k.0.layer3.5.bn3.bias, encoder_k.0.layer3.5.bn3.running_mean, encoder_k.0.layer3.5.bn3.running_var, encoder_k.0.layer4.0.conv1.weight, encoder_k.0.layer4.0.bn1.weight, encoder_k.0.layer4.0.bn1.bias, encoder_k.0.layer4.0.bn1.running_mean, encoder_k.0.layer4.0.bn1.running_var, encoder_k.0.layer4.0.conv2.weight, encoder_k.0.layer4.0.bn2.weight, encoder_k.0.layer4.0.bn2.bias, encoder_k.0.layer4.0.bn2.running_mean, encoder_k.0.layer4.0.bn2.running_var, encoder_k.0.layer4.0.conv3.weight, encoder_k.0.layer4.0.bn3.weight, encoder_k.0.layer4.0.bn3.bias, encoder_k.0.layer4.0.bn3.running_mean, encoder_k.0.layer4.0.bn3.running_var, encoder_k.0.layer4.0.downsample.0.weight, encoder_k.0.layer4.0.downsample.1.weight, encoder_k.0.layer4.0.downsample.1.bias, encoder_k.0.layer4.0.downsample.1.running_mean, encoder_k.0.layer4.0.downsample.1.running_var, encoder_k.0.layer4.1.conv1.weight, encoder_k.0.layer4.1.bn1.weight, encoder_k.0.layer4.1.bn1.bias, encoder_k.0.layer4.1.bn1.running_mean, encoder_k.0.layer4.1.bn1.running_var, encoder_k.0.layer4.1.conv2.weight, encoder_k.0.layer4.1.bn2.weight, encoder_k.0.layer4.1.bn2.bias, encoder_k.0.layer4.1.bn2.running_mean, encoder_k.0.layer4.1.bn2.running_var, encoder_k.0.layer4.1.conv3.weight, encoder_k.0.layer4.1.bn3.weight, encoder_k.0.layer4.1.bn3.bias, encoder_k.0.layer4.1.bn3.running_mean, encoder_k.0.layer4.1.bn3.running_var, encoder_k.0.layer4.2.conv1.weight, encoder_k.0.layer4.2.bn1.weight, encoder_k.0.layer4.2.bn1.bias, encoder_k.0.layer4.2.bn1.running_mean, encoder_k.0.layer4.2.bn1.running_var, encoder_k.0.layer4.2.conv2.weight, encoder_k.0.layer4.2.bn2.weight, encoder_k.0.layer4.2.bn2.bias, encoder_k.0.layer4.2.bn2.running_mean, encoder_k.0.layer4.2.bn2.running_var, encoder_k.0.layer4.2.conv3.weight, encoder_k.0.layer4.2.bn3.weight, encoder_k.0.layer4.2.bn3.bias, encoder_k.0.layer4.2.bn3.running_mean, encoder_k.0.layer4.2.bn3.running_var, encoder_k.1.mlp.0.weight, encoder_k.1.mlp.0.bias, encoder_k.1.mlp.2.weight, encoder_k.1.mlp.2.bias, encoder_k.1.mlp2.0.weight, encoder_k.1.mlp2.0.bias, encoder_k.1.mlp2.2.weight, encoder_k.1.mlp2.2.bias, backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, backbone.bn1.running_var, backbone.layer1.0.conv1.weight, backbone.layer1.0.bn1.weight, backbone.layer1.0.bn1.bias, backbone.layer1.0.bn1.running_mean, backbone.layer1.0.bn1.running_var, backbone.layer1.0.conv2.weight, backbone.layer1.0.bn2.weight, backbone.layer1.0.bn2.bias, backbone.layer1.0.bn2.running_mean, backbone.layer1.0.bn2.running_var, backbone.layer1.0.conv3.weight, backbone.layer1.0.bn3.weight, backbone.layer1.0.bn3.bias, backbone.layer1.0.bn3.running_mean, backbone.layer1.0.bn3.running_var, backbone.layer1.0.downsample.0.weight, backbone.layer1.0.downsample.1.weight, backbone.layer1.0.downsample.1.bias, backbone.layer1.0.downsample.1.running_mean, backbone.layer1.0.downsample.1.running_var, backbone.layer1.1.conv1.weight, backbone.layer1.1.bn1.weight, backbone.layer1.1.bn1.bias, backbone.layer1.1.bn1.running_mean, backbone.layer1.1.bn1.running_var, backbone.layer1.1.conv2.weight, backbone.layer1.1.bn2.weight, backbone.layer1.1.bn2.bias, backbone.layer1.1.bn2.running_mean, backbone.layer1.1.bn2.running_var, backbone.layer1.1.conv3.weight, backbone.layer1.1.bn3.weight, backbone.layer1.1.bn3.bias, backbone.layer1.1.bn3.running_mean, backbone.layer1.1.bn3.running_var, backbone.layer1.2.conv1.weight, backbone.layer1.2.bn1.weight, backbone.layer1.2.bn1.bias, backbone.layer1.2.bn1.running_mean, backbone.layer1.2.bn1.running_var, backbone.layer1.2.conv2.weight, backbone.layer1.2.bn2.weight, backbone.layer1.2.bn2.bias, backbone.layer1.2.bn2.running_mean, backbone.layer1.2.bn2.running_var, backbone.layer1.2.conv3.weight, backbone.layer1.2.bn3.weight, backbone.layer1.2.bn3.bias, backbone.layer1.2.bn3.running_mean, backbone.layer1.2.bn3.running_var, backbone.layer2.0.conv1.weight, backbone.layer2.0.bn1.weight, backbone.layer2.0.bn1.bias, backbone.layer2.0.bn1.running_mean, backbone.layer2.0.bn1.running_var, backbone.layer2.0.conv2.weight, backbone.layer2.0.bn2.weight, backbone.layer2.0.bn2.bias, backbone.layer2.0.bn2.running_mean, backbone.layer2.0.bn2.running_var, backbone.layer2.0.conv3.weight, backbone.layer2.0.bn3.weight, backbone.layer2.0.bn3.bias, backbone.layer2.0.bn3.running_mean, backbone.layer2.0.bn3.running_var, backbone.layer2.0.downsample.0.weight, backbone.layer2.0.downsample.1.weight, backbone.layer2.0.downsample.1.bias, backbone.layer2.0.downsample.1.running_mean, backbone.layer2.0.downsample.1.running_var, backbone.layer2.1.conv1.weight, backbone.layer2.1.bn1.weight, backbone.layer2.1.bn1.bias, backbone.layer2.1.bn1.running_mean, backbone.layer2.1.bn1.running_var, backbone.layer2.1.conv2.weight, backbone.layer2.1.bn2.weight, backbone.layer2.1.bn2.bias, backbone.layer2.1.bn2.running_mean, backbone.layer2.1.bn2.running_var, backbone.layer2.1.conv3.weight, backbone.layer2.1.bn3.weight, backbone.layer2.1.bn3.bias, backbone.layer2.1.bn3.running_mean, backbone.layer2.1.bn3.running_var, backbone.layer2.2.conv1.weight, backbone.layer2.2.bn1.weight, backbone.layer2.2.bn1.bias, backbone.layer2.2.bn1.running_mean, backbone.layer2.2.bn1.running_var, backbone.layer2.2.conv2.weight, backbone.layer2.2.bn2.weight, backbone.layer2.2.bn2.bias, backbone.layer2.2.bn2.running_mean, backbone.layer2.2.bn2.running_var, backbone.layer2.2.conv3.weight, backbone.layer2.2.bn3.weight, backbone.layer2.2.bn3.bias, backbone.layer2.2.bn3.running_mean, backbone.layer2.2.bn3.running_var, backbone.layer2.3.conv1.weight, backbone.layer2.3.bn1.weight, backbone.layer2.3.bn1.bias, backbone.layer2.3.bn1.running_mean, backbone.layer2.3.bn1.running_var, backbone.layer2.3.conv2.weight, backbone.layer2.3.bn2.weight, backbone.layer2.3.bn2.bias, backbone.layer2.3.bn2.running_mean, backbone.layer2.3.bn2.running_var, backbone.layer2.3.conv3.weight, backbone.layer2.3.bn3.weight, backbone.layer2.3.bn3.bias, backbone.layer2.3.bn3.running_mean, backbone.layer2.3.bn3.running_var, backbone.layer3.0.conv1.weight, backbone.layer3.0.bn1.weight, backbone.layer3.0.bn1.bias, backbone.layer3.0.bn1.running_mean, backbone.layer3.0.bn1.running_var, backbone.layer3.0.conv2.weight, backbone.layer3.0.bn2.weight, backbone.layer3.0.bn2.bias, backbone.layer3.0.bn2.running_mean, backbone.layer3.0.bn2.running_var, backbone.layer3.0.conv3.weight, backbone.layer3.0.bn3.weight, backbone.layer3.0.bn3.bias, backbone.layer3.0.bn3.running_mean, backbone.layer3.0.bn3.running_var, backbone.layer3.0.downsample.0.weight, backbone.layer3.0.downsample.1.weight, backbone.layer3.0.downsample.1.bias, backbone.layer3.0.downsample.1.running_mean, backbone.layer3.0.downsample.1.running_var, backbone.layer3.1.conv1.weight, backbone.layer3.1.bn1.weight, backbone.layer3.1.bn1.bias, backbone.layer3.1.bn1.running_mean, backbone.layer3.1.bn1.running_var, backbone.layer3.1.conv2.weight, backbone.layer3.1.bn2.weight, backbone.layer3.1.bn2.bias, backbone.layer3.1.bn2.running_mean, backbone.layer3.1.bn2.running_var, backbone.layer3.1.conv3.weight, backbone.layer3.1.bn3.weight, backbone.layer3.1.bn3.bias, backbone.layer3.1.bn3.running_mean, backbone.layer3.1.bn3.running_var, backbone.layer3.2.conv1.weight, backbone.layer3.2.bn1.weight, backbone.layer3.2.bn1.bias, backbone.layer3.2.bn1.running_mean, backbone.layer3.2.bn1.running_var, backbone.layer3.2.conv2.weight, backbone.layer3.2.bn2.weight, backbone.layer3.2.bn2.bias, backbone.layer3.2.bn2.running_mean, backbone.layer3.2.bn2.running_var, backbone.layer3.2.conv3.weight, backbone.layer3.2.bn3.weight, backbone.layer3.2.bn3.bias, backbone.layer3.2.bn3.running_mean, backbone.layer3.2.bn3.running_var, backbone.layer3.3.conv1.weight, backbone.layer3.3.bn1.weight, backbone.layer3.3.bn1.bias, backbone.layer3.3.bn1.running_mean, backbone.layer3.3.bn1.running_var, backbone.layer3.3.conv2.weight, backbone.layer3.3.bn2.weight, backbone.layer3.3.bn2.bias, backbone.layer3.3.bn2.running_mean, backbone.layer3.3.bn2.running_var, backbone.layer3.3.conv3.weight, backbone.layer3.3.bn3.weight, backbone.layer3.3.bn3.bias, backbone.layer3.3.bn3.running_mean, backbone.layer3.3.bn3.running_var, backbone.layer3.4.conv1.weight, backbone.layer3.4.bn1.weight, backbone.layer3.4.bn1.bias, backbone.layer3.4.bn1.running_mean, backbone.layer3.4.bn1.running_var, backbone.layer3.4.conv2.weight, backbone.layer3.4.bn2.weight, backbone.layer3.4.bn2.bias, backbone.layer3.4.bn2.running_mean, backbone.layer3.4.bn2.running_var, backbone.layer3.4.conv3.weight, backbone.layer3.4.bn3.weight, backbone.layer3.4.bn3.bias, backbone.layer3.4.bn3.running_mean, backbone.layer3.4.bn3.running_var, backbone.layer3.5.conv1.weight, backbone.layer3.5.bn1.weight, backbone.layer3.5.bn1.bias, backbone.layer3.5.bn1.running_mean, backbone.layer3.5.bn1.running_var, backbone.layer3.5.conv2.weight, backbone.layer3.5.bn2.weight, backbone.layer3.5.bn2.bias, backbone.layer3.5.bn2.running_mean, backbone.layer3.5.bn2.running_var, backbone.layer3.5.conv3.weight, backbone.layer3.5.bn3.weight, backbone.layer3.5.bn3.bias, backbone.layer3.5.bn3.running_mean, backbone.layer3.5.bn3.running_var, backbone.layer4.0.conv1.weight, backbone.layer4.0.bn1.weight, backbone.layer4.0.bn1.bias, backbone.layer4.0.bn1.running_mean, backbone.layer4.0.bn1.running_var, backbone.layer4.0.conv2.weight, backbone.layer4.0.bn2.weight, backbone.layer4.0.bn2.bias, backbone.layer4.0.bn2.running_mean, backbone.layer4.0.bn2.running_var, backbone.layer4.0.conv3.weight, backbone.layer4.0.bn3.weight, backbone.layer4.0.bn3.bias, backbone.layer4.0.bn3.running_mean, backbone.layer4.0.bn3.running_var, backbone.layer4.0.downsample.0.weight, backbone.layer4.0.downsample.1.weight, backbone.layer4.0.downsample.1.bias, backbone.layer4.0.downsample.1.running_mean, backbone.layer4.0.downsample.1.running_var, backbone.layer4.1.conv1.weight, backbone.layer4.1.bn1.weight, backbone.layer4.1.bn1.bias, backbone.layer4.1.bn1.running_mean, backbone.layer4.1.bn1.running_var, backbone.layer4.1.conv2.weight, backbone.layer4.1.bn2.weight, backbone.layer4.1.bn2.bias, backbone.layer4.1.bn2.running_mean, backbone.layer4.1.bn2.running_var, backbone.layer4.1.conv3.weight, backbone.layer4.1.bn3.weight, backbone.layer4.1.bn3.bias, backbone.layer4.1.bn3.running_mean, backbone.layer4.1.bn3.running_var, backbone.layer4.2.conv1.weight, backbone.layer4.2.bn1.weight, backbone.layer4.2.bn1.bias, backbone.layer4.2.bn1.running_mean, backbone.layer4.2.bn1.running_var, backbone.layer4.2.conv2.weight, backbone.layer4.2.bn2.weight, backbone.layer4.2.bn2.bias, backbone.layer4.2.bn2.running_mean, backbone.layer4.2.bn2.running_var, backbone.layer4.2.conv3.weight, backbone.layer4.2.bn3.weight, backbone.layer4.2.bn3.bias, backbone.layer4.2.bn3.running_mean, backbone.layer4.2.bn3.running_var

Dimensions of data

Thanks for the wonderful work about DenseCL.I have a code-related questions and want to consult with you.That is why the input len(x)==1 of DenseCLNeck, and what is the dimension of x[0]?

    assert len(x) == 1
    x = x[0]

2.3 negative sample

I have 1 question and hope to hear from you:
In section, 2.3 ''Each negative key t_ is the pooled feature vector of a view from a different image.''
Why not use the other parts of the two views of the same image as negative samples?
This seems more make sense.

how to train on single GPU?

Question in equation (1) and (2)

Thanks for the great work!

I have a question in equation (1) and (2) in the paper.
In the denominators of these equations, why the temperature hyper-parameter \tau is not used in the "exp(q·k_+)" and "exp(r^s·t^s_+)"?
In https://github.com/WXinlong/DenseCL/blob/main/openselfsup/models/heads/contrastive_head.py#L34, it seems that \tau is applied to all key features.

DenseNeck design

Have you tried different output channels for single projection and dense projection? Particularly, you used the same hidden channels and output channels for single mlp and dense mlp in the DenseCLNeck impl. As I know, the projection of instance representation requires a greater number of channels than the projection of dense representation. Treating both of them equally might lose lots of useful information from instance representation. How do you think about this problem? Most instance discrimination methods also design the projector as fc-bn-relu-fc so I wonder why you drop bn in DenseCLNeck? Is it just for simplicity?

        self.mlp = nn.Sequential(
            nn.Linear(in_channels, hid_channels), nn.ReLU(inplace=True),
            nn.Linear(hid_channels, out_channels))
        ...
        self.mlp2 = nn.Sequential(
            nn.Conv2d(in_channels, hid_channels, 1), nn.ReLU(inplace=True),
            nn.Conv2d(hid_channels, out_channels, 1))

Config issue about 'Res5ROIHeadsExtraNorm'

Hi, thanks so much for providing the detection configs.

When I run detectron2 based on "Base-RCNN-C4-BN.yaml",
(https://github.com/WXinlong/DenseCL/blob/main/benchmarks/detection/configs/Base-RCNN-C4-BN.yaml)
the process said that "Res5ROIHeadsExtraNorm" has not been registered?

And I carefully read the documents of detectron2,
it has 'Res5ROIHeads', but does not have 'Res5ROIHeadsExtraNorm'.

Does Res5ROIHeads == Res5ROIHeadsExtraNorm?

Link for pretraining Mocov2 on COCO is dead

Hi the link for downloading pretrained Mocov2 on COCO is dead. Can you re-upload the weight?

Inferior performance on PASCAL VOC12 with DeepLabV3+

Thanks for revealing your code and the results are impressive.

I've tried the downloaded DenseCL pretrained models and tested on the VOC semantic segmentation dataset. When using the same FCN architecture, the result performance matches the expectation. The DenseCL ImageNet pretrained model outperforms the ImageNet classification model. However, when replacing the backbones of DeepLabV3+, the DenseCL model showed inferior performance. The results comparisons are as below:

Arch	Dataset	Pretrained Model	mIoU
dv3+	VOC12	Sup ImageNet	71.33
dv3+	VOC12	DenseCL COCO	67.51
dv3+	VOC12	DenseCL ImageNet	69.5

The configs are borrowed from the official configs of MMSEG and I carefully tried to not make much modifications. Wondering if you have ever noticed same behavior on any other models or datasets?

Why use argmax for matching?

Hey there, thanks for sharing the code!

I just have a quick question:
Why is the argmax used to match features from different views, rather than using the spatial correspondance, which we have access to, since we know which data augmentations were applied to the images?

Thanks.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.