wxinlong / densecl Goto Github PK
View Code? Open in Web Editor NEWDense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021 Oral.
License: GNU General Public License v3.0
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021 Oral.
License: GNU General Public License v3.0
what about Linear classification results on ImageNet ?
This is a great job.
Could you give more details about the visualization of dense correspondence?
Thanks for sharing your great work!
I was able to reproduce your object detection result on Pascal VOC.
However, when I tested semantic segmentation on Pascal VOC using your pre-trained model on ImageNet1k "densecl_r50_imagenet_200ep.pth", I got mIoU 0.62, which is worse than the 0.69 reported in your paper. My test procedure is explained below,
I got a result of mIoU 0.62, mAcc: 0.75, aAcc: 0.91 at the end of the training. I ran 3 rounds and got similar results. Attached are my configuration file and training log.
Do you know a possible reason?
Thanks!
(8gpus) When I use the pretrained network with coco-800ep-resnet50 to do the detection task with VOC, the "AP" is only 44.76, while you can achieve 56.7. I don't konw why the gap is so large. Note that I change the batchsize from 16 to 8, and as a result, the base lr is set from 0.02 to 0.01.
Hi!
Are the provided checkpoints after pretraining or fine-tuning?
Thank you.
Based on MMDetection,train COCO2017 & val COCO2017
FasterR-CNN,r50 From torchvision://resnet50
1x: bbox_mAP: 0.3750
FasterR-CNN,r50 From My Reproduction Model Pretrained on ImageNet
1x: bbox_mAP: 0.3580
FasterR-CNN,r50 From Your Pretrained Mode on ImageNetl
1x: bbox_mAP: 0.3550
Which is not as good as expected? Could you give a help?
Hi @WXinlong thanks for the wonderful work.
I want to train the pre-trained model on the downstream task of object detection. I used the pre-trained model of mocov2 with 800 epochs here
I have followed the following process
step 1: Install detectron2.
step 2: Convert a pre-trained MoCo model to detectron2's format:
python3 convert-pretrain-to-detectron2.py input.pth.tar output.pkl
Put dataset under "./datasets" directory, following the directory structure required by detectron2.
step 3: Run training:
python train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml \
--num-gpus 1 MODEL.WEIGHTS ./output.pkl
The only change I did is used a single gpu rather than 8 gpu
I am getting the following error an
[08/31 12:42:12] fvcore.common.checkpoint WARNING: Some model parameters or buffers are not found in the checkpoint:
�[34mproposal_generator.rpn_head.anchor_deltas.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.conv.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.objectness_logits.{bias, weight}�[0m
�[34mroi_heads.box_predictor.bbox_pred.{bias, weight}�[0m
�[34mroi_heads.box_predictor.cls_score.{bias, weight}�[0m
�[34mroi_heads.res5.norm.{bias, running_mean, running_var, weight}�[0m
[08/31 12:42:12] fvcore.common.checkpoint WARNING: The checkpoint state_dict contains keys that are not used by the model:
�[35mstem.fc.0.{bias, weight}�[0m
�[35mstem.fc.2.{bias, weight}�[0m
[08/31 12:42:12] d2.engine.train_loop INFO: Starting training from iteration 0
[08/31 12:42:13] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/defaults.py", line 493, in run_step
self._trainer.run_step()
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 154, in forward
features = self.backbone(images.tensor)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 445, in forward
x = self.stem(x)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 356, in forward
x = self.conv1(x)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/layers/wrappers.py", line 88, in forward
x = self.norm(x)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
return _get_group_size(group)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
_check_default_pg()
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
assert _default_pg is not None, \
AssertionError: Default process group is not initialized
[08/31 12:42:13] d2.engine.hooks INFO: Total training time: 0:00:00 (0:00:00 on hooks)
[08/31 12:42:13] d2.utils.events INFO: iter: 0 lr: N/A max_mem: 207M
how can we run the training on a single GPU ?
attached are the logs for details
log 3.23.54 PM.txt
Since the model is trained on the dense matching loss, it would be natural to evaluate its performance on keypoint matching task and compare with sotas. May I know if you have conducted experiments or have related plan? Thank you!
Are the weights trained by 2 gpus different from those trained by 8 gpus in downstream tasks?? Because the overall batch size is different. Hope to get a reply.
In the paper, each negative key t− is the pooled feature vector of a view from a different image. I still don't know the exactly meaning of 'pooled feature vector'. Can you explain it? Thank you.
Hi, thanks for your contribution, very interesting approach!
Have you tried to compute the dense correspondence directly from the geometric transformation (resize / crop / flip) between the views?
I only found resnet weights after pretraining, it would be usefull to have access to neck weights as well.
Is that possible ?
Thanks for your work
Hello, thanks for your interesting work. I notice that the checkpoints that you released only contain the backbone. Now, I want a checkpoint with the neck and other data. Could you release it (pretrained on imagenet with 200 epoch)?
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 279, in forward
return self.forward_train(img, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 200, in forward_train
im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 132, in _batch_shuffle_ddp
x_gather = concat_all_gather(x)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 297, in concat_all_gather
for _ in range(torch.distributed.get_world_size())
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size
return _get_group_size(group)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size
python-BaseException
default_pg = _get_default_group()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group
raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Hi, Thank you very much for the nice work!
I have a question about the dense correspondence of views. In the paper, the correspondence is gained by calculating the similarity between feature vectors from the backbone. Since the data augmentation (e.g. rotating, cropping, flipping) performed to each view of the same image is known, it's possible to obtain the correspondence directly from these transformations.
For example, Image A
is a left-right flipped copy of Image B
. The two images are encoded to 3x3 feature maps, which can be represented as:
fa1, fa2, fa3
fa4, fa5, fa6
fa7, fa8, fa9
and
fb1, fb2, fb3
fb4, fb5, fb6
fb7, fb8, fb9
Since A
and B
are flipped views of the same image, the correspondence could be (fa1, fb3), (fa2, fb2), (fa3, fb1), ...
.
From my perspective, the transformation-motivated correspondence is more straightforward but the paper doesn't use it. Are there any intuitions behind this?
Thank you again!
Hi,
I am trying to run the code to train COCO (train2017) self supervised, I tried installing several times with the instructions but when run training it kept saying a lot of messages: KeyError: 'GaussianBlur is already registered in pipeline', and the code instantly stopped.
Command: bash tools/dist_train.sh configs/selfsup/densecl/densecl_coco_800ep.py 8
I am using torch version 1.7.1, CUDA 9.2. torch.cuda.is_available() = True
Have you tried reproduced the results in an entire new machine and faced this error?
Could you help me some suggestions on this bug?
Could you provide me with the training log. My training process is extremely slow. Thank you.
Hi, I tried to reproduce your results on VOC Semantic Segmentation, but only got mIOU = 46.87 (while you can achieve 69.4)
Can you give me some help?
Here's the steps I have done for reproduction.
I did not modify any setting about batch size or learning rate.
Is there anything I have ignored?
I've download ImageNet from Kaggle and can't find the train.txt, could you tell me where to download this file? @WXinlong
Here are the files can be download from Kaggle:
Hi, @WXinlong . Thanks for the great work.
Since the article claims that the proposed method mainly aims to solve the dense prediction tasks (e.g., detection and segmentation), I wonder if you have tried DenseCL on the classification task and what is the performance.
Thanks for your outstanding work. Here is a question about the evaluation setting on Semantic segmentation
Dis you used "two extra 3×3 convolutions of 256 channels, with BN and ReLU, and then a 1×1 convolution for per�pixel classification. The total stride is 16 (FCN-16s [43]). We set dilation = 6 in the two extra 3×3 convolutions, following the large field-of-view design in [6]" this setting during the evaluation of Semantic segmentation (the same as mMoCo), or just used a classic FCN?
I tried your algorithm for training and found that the loss is a bit strange. It rose from 8.0 at the beginning to 9.3 and then slowly dropped to 7.3. What is the reason? Is this normal?
Hi, thanks for your excellent work! Could you kindly release the corresponding 10% training data list and config for semi-supervised object detection in Table 3 of your paper? Thanks in advance!
How can I evalute the performance of the model pretrained in coco(800)?
Thank you for your great work.
Could you give the implementation detail or code of the loss_lambda warmup setting stated in the DenseCL paper?
Hi when I try to use extract.py to extract the features, I download the pretrained model from the link and run, but it shows the following:
The model and loaded state dict do not match exactly
unexpected key in source state_dict: conv1.weight, bn1.weight, bn1.bias, bn1.running_mean, bn1.running_var, bn1.num_batches_tracked, layer1.0.conv1.weight, layer1.0.bn1.weight, layer1.0.bn1.bias, layer1.0.bn1.running_mean, layer1.0.bn1.running_var, layer1.0.bn1.num_batches_tracked, layer1.0.conv2.weight, layer1.0.bn2.weight, layer1.0.bn2.bias, layer1.0.bn2.running_mean, layer1.0.bn2.running_var, layer1.0.bn2.num_batches_tracked, layer1.0.conv3.weight, layer1.0.bn3.weight, layer1.0.bn3.bias, layer1.0.bn3.running_mean, layer1.0.bn3.running_var, layer1.0.bn3.num_batches_tracked, layer1.0.downsample.0.weight, layer1.0.downsample.1.weight, layer1.0.downsample.1.bias, layer1.0.downsample.1.running_mean, layer1.0.downsample.1.running_var, layer1.0.downsample.1.num_batches_tracked, layer1.1.conv1.weight, layer1.1.bn1.weight, layer1.1.bn1.bias, layer1.1.bn1.running_mean, layer1.1.bn1.running_var, layer1.1.bn1.num_batches_tracked, layer1.1.conv2.weight, layer1.1.bn2.weight, layer1.1.bn2.bias, layer1.1.bn2.running_mean, layer1.1.bn2.running_var, layer1.1.bn2.num_batches_tracked, layer1.1.conv3.weight, layer1.1.bn3.weight, layer1.1.bn3.bias, layer1.1.bn3.running_mean, layer1.1.bn3.running_var, layer1.1.bn3.num_batches_tracked, layer1.2.conv1.weight, layer1.2.bn1.weight, layer1.2.bn1.bias, layer1.2.bn1.running_mean, layer1.2.bn1.running_var, layer1.2.bn1.num_batches_tracked, layer1.2.conv2.weight, layer1.2.bn2.weight, layer1.2.bn2.bias, layer1.2.bn2.running_mean, layer1.2.bn2.running_var, layer1.2.bn2.num_batches_tracked, layer1.2.conv3.weight, layer1.2.bn3.weight, layer1.2.bn3.bias, layer1.2.bn3.running_mean, layer1.2.bn3.running_var, layer1.2.bn3.num_batches_tracked, layer2.0.conv1.weight, layer2.0.bn1.weight, layer2.0.bn1.bias, layer2.0.bn1.running_mean, layer2.0.bn1.running_var, layer2.0.bn1.num_batches_tracked, layer2.0.conv2.weight, layer2.0.bn2.weight, layer2.0.bn2.bias, layer2.0.bn2.running_mean, layer2.0.bn2.running_var, layer2.0.bn2.num_batches_tracked, layer2.0.conv3.weight, layer2.0.bn3.weight, layer2.0.bn3.bias, layer2.0.bn3.running_mean, layer2.0.bn3.running_var, layer2.0.bn3.num_batches_tracked, layer2.0.downsample.0.weight, layer2.0.downsample.1.weight, layer2.0.downsample.1.bias, layer2.0.downsample.1.running_mean, layer2.0.downsample.1.running_var, layer2.0.downsample.1.num_batches_tracked, layer2.1.conv1.weight, layer2.1.bn1.weight, layer2.1.bn1.bias, layer2.1.bn1.running_mean, layer2.1.bn1.running_var, layer2.1.bn1.num_batches_tracked, layer2.1.conv2.weight, layer2.1.bn2.weight, layer2.1.bn2.bias, layer2.1.bn2.running_mean, layer2.1.bn2.running_var, layer2.1.bn2.num_batches_tracked, layer2.1.conv3.weight, layer2.1.bn3.weight, layer2.1.bn3.bias, layer2.1.bn3.running_mean, layer2.1.bn3.running_var, layer2.1.bn3.num_batches_tracked, layer2.2.conv1.weight, layer2.2.bn1.weight, layer2.2.bn1.bias, layer2.2.bn1.running_mean, layer2.2.bn1.running_var, layer2.2.bn1.num_batches_tracked, layer2.2.conv2.weight, layer2.2.bn2.weight, layer2.2.bn2.bias, layer2.2.bn2.running_mean, layer2.2.bn2.running_var, layer2.2.bn2.num_batches_tracked, layer2.2.conv3.weight, layer2.2.bn3.weight, layer2.2.bn3.bias, layer2.2.bn3.running_mean, layer2.2.bn3.running_var, layer2.2.bn3.num_batches_tracked, layer2.3.conv1.weight, layer2.3.bn1.weight, layer2.3.bn1.bias, layer2.3.bn1.running_mean, layer2.3.bn1.running_var, layer2.3.bn1.num_batches_tracked, layer2.3.conv2.weight, layer2.3.bn2.weight, layer2.3.bn2.bias, layer2.3.bn2.running_mean, layer2.3.bn2.running_var, layer2.3.bn2.num_batches_tracked, layer2.3.conv3.weight, layer2.3.bn3.weight, layer2.3.bn3.bias, layer2.3.bn3.running_mean, layer2.3.bn3.running_var, layer2.3.bn3.num_batches_tracked, layer3.0.conv1.weight, layer3.0.bn1.weight, layer3.0.bn1.bias, layer3.0.bn1.running_mean, layer3.0.bn1.running_var, layer3.0.bn1.num_batches_tracked, layer3.0.conv2.weight, layer3.0.bn2.weight, layer3.0.bn2.bias, layer3.0.bn2.running_mean, layer3.0.bn2.running_var, layer3.0.bn2.num_batches_tracked, layer3.0.conv3.weight, layer3.0.bn3.weight, layer3.0.bn3.bias, layer3.0.bn3.running_mean, layer3.0.bn3.running_var, layer3.0.bn3.num_batches_tracked, layer3.0.downsample.0.weight, layer3.0.downsample.1.weight, layer3.0.downsample.1.bias, layer3.0.downsample.1.running_mean, layer3.0.downsample.1.running_var, layer3.0.downsample.1.num_batches_tracked, layer3.1.conv1.weight, layer3.1.bn1.weight, layer3.1.bn1.bias, layer3.1.bn1.running_mean, layer3.1.bn1.running_var, layer3.1.bn1.num_batches_tracked, layer3.1.conv2.weight, layer3.1.bn2.weight, layer3.1.bn2.bias, layer3.1.bn2.running_mean, layer3.1.bn2.running_var, layer3.1.bn2.num_batches_tracked, layer3.1.conv3.weight, layer3.1.bn3.weight, layer3.1.bn3.bias, layer3.1.bn3.running_mean, layer3.1.bn3.running_var, layer3.1.bn3.num_batches_tracked, layer3.2.conv1.weight, layer3.2.bn1.weight, layer3.2.bn1.bias, layer3.2.bn1.running_mean, layer3.2.bn1.running_var, layer3.2.bn1.num_batches_tracked, layer3.2.conv2.weight, layer3.2.bn2.weight, layer3.2.bn2.bias, layer3.2.bn2.running_mean, layer3.2.bn2.running_var, layer3.2.bn2.num_batches_tracked, layer3.2.conv3.weight, layer3.2.bn3.weight, layer3.2.bn3.bias, layer3.2.bn3.running_mean, layer3.2.bn3.running_var, layer3.2.bn3.num_batches_tracked, layer3.3.conv1.weight, layer3.3.bn1.weight, layer3.3.bn1.bias, layer3.3.bn1.running_mean, layer3.3.bn1.running_var, layer3.3.bn1.num_batches_tracked, layer3.3.conv2.weight, layer3.3.bn2.weight, layer3.3.bn2.bias, layer3.3.bn2.running_mean, layer3.3.bn2.running_var, layer3.3.bn2.num_batches_tracked, layer3.3.conv3.weight, layer3.3.bn3.weight, layer3.3.bn3.bias, layer3.3.bn3.running_mean, layer3.3.bn3.running_var, layer3.3.bn3.num_batches_tracked, layer3.4.conv1.weight, layer3.4.bn1.weight, layer3.4.bn1.bias, layer3.4.bn1.running_mean, layer3.4.bn1.running_var, layer3.4.bn1.num_batches_tracked, layer3.4.conv2.weight, layer3.4.bn2.weight, layer3.4.bn2.bias, layer3.4.bn2.running_mean, layer3.4.bn2.running_var, layer3.4.bn2.num_batches_tracked, layer3.4.conv3.weight, layer3.4.bn3.weight, layer3.4.bn3.bias, layer3.4.bn3.running_mean, layer3.4.bn3.running_var, layer3.4.bn3.num_batches_tracked, layer3.5.conv1.weight, layer3.5.bn1.weight, layer3.5.bn1.bias, layer3.5.bn1.running_mean, layer3.5.bn1.running_var, layer3.5.bn1.num_batches_tracked, layer3.5.conv2.weight, layer3.5.bn2.weight, layer3.5.bn2.bias, layer3.5.bn2.running_mean, layer3.5.bn2.running_var, layer3.5.bn2.num_batches_tracked, layer3.5.conv3.weight, layer3.5.bn3.weight, layer3.5.bn3.bias, layer3.5.bn3.running_mean, layer3.5.bn3.running_var, layer3.5.bn3.num_batches_tracked, layer4.0.conv1.weight, layer4.0.bn1.weight, layer4.0.bn1.bias, layer4.0.bn1.running_mean, layer4.0.bn1.running_var, layer4.0.bn1.num_batches_tracked, layer4.0.conv2.weight, layer4.0.bn2.weight, layer4.0.bn2.bias, layer4.0.bn2.running_mean, layer4.0.bn2.running_var, layer4.0.bn2.num_batches_tracked, layer4.0.conv3.weight, layer4.0.bn3.weight, layer4.0.bn3.bias, layer4.0.bn3.running_mean, layer4.0.bn3.running_var, layer4.0.bn3.num_batches_tracked, layer4.0.downsample.0.weight, layer4.0.downsample.1.weight, layer4.0.downsample.1.bias, layer4.0.downsample.1.running_mean, layer4.0.downsample.1.running_var, layer4.0.downsample.1.num_batches_tracked, layer4.1.conv1.weight, layer4.1.bn1.weight, layer4.1.bn1.bias, layer4.1.bn1.running_mean, layer4.1.bn1.running_var, layer4.1.bn1.num_batches_tracked, layer4.1.conv2.weight, layer4.1.bn2.weight, layer4.1.bn2.bias, layer4.1.bn2.running_mean, layer4.1.bn2.running_var, layer4.1.bn2.num_batches_tracked, layer4.1.conv3.weight, layer4.1.bn3.weight, layer4.1.bn3.bias, layer4.1.bn3.running_mean, layer4.1.bn3.running_var, layer4.1.bn3.num_batches_tracked, layer4.2.conv1.weight, layer4.2.bn1.weight, layer4.2.bn1.bias, layer4.2.bn1.running_mean, layer4.2.bn1.running_var, layer4.2.bn1.num_batches_tracked, layer4.2.conv2.weight, layer4.2.bn2.weight, layer4.2.bn2.bias, layer4.2.bn2.running_mean, layer4.2.bn2.running_var, layer4.2.bn2.num_batches_tracked, layer4.2.conv3.weight, layer4.2.bn3.weight, layer4.2.bn3.bias, layer4.2.bn3.running_mean, layer4.2.bn3.running_var, layer4.2.bn3.num_batches_tracked
missing keys in source state_dict: queue, queue_ptr, queue2, queue2_ptr, encoder_q.0.conv1.weight, encoder_q.0.bn1.weight, encoder_q.0.bn1.bias, encoder_q.0.bn1.running_mean, encoder_q.0.bn1.running_var, encoder_q.0.layer1.0.conv1.weight, encoder_q.0.layer1.0.bn1.weight, encoder_q.0.layer1.0.bn1.bias, encoder_q.0.layer1.0.bn1.running_mean, encoder_q.0.layer1.0.bn1.running_var, encoder_q.0.layer1.0.conv2.weight, encoder_q.0.layer1.0.bn2.weight, encoder_q.0.layer1.0.bn2.bias, encoder_q.0.layer1.0.bn2.running_mean, encoder_q.0.layer1.0.bn2.running_var, encoder_q.0.layer1.0.conv3.weight, encoder_q.0.layer1.0.bn3.weight, encoder_q.0.layer1.0.bn3.bias, encoder_q.0.layer1.0.bn3.running_mean, encoder_q.0.layer1.0.bn3.running_var, encoder_q.0.layer1.0.downsample.0.weight, encoder_q.0.layer1.0.downsample.1.weight, encoder_q.0.layer1.0.downsample.1.bias, encoder_q.0.layer1.0.downsample.1.running_mean, encoder_q.0.layer1.0.downsample.1.running_var, encoder_q.0.layer1.1.conv1.weight, encoder_q.0.layer1.1.bn1.weight, encoder_q.0.layer1.1.bn1.bias, encoder_q.0.layer1.1.bn1.running_mean, encoder_q.0.layer1.1.bn1.running_var, encoder_q.0.layer1.1.conv2.weight, encoder_q.0.layer1.1.bn2.weight, encoder_q.0.layer1.1.bn2.bias, encoder_q.0.layer1.1.bn2.running_mean, encoder_q.0.layer1.1.bn2.running_var, encoder_q.0.layer1.1.conv3.weight, encoder_q.0.layer1.1.bn3.weight, encoder_q.0.layer1.1.bn3.bias, encoder_q.0.layer1.1.bn3.running_mean, encoder_q.0.layer1.1.bn3.running_var, encoder_q.0.layer1.2.conv1.weight, encoder_q.0.layer1.2.bn1.weight, encoder_q.0.layer1.2.bn1.bias, encoder_q.0.layer1.2.bn1.running_mean, encoder_q.0.layer1.2.bn1.running_var, encoder_q.0.layer1.2.conv2.weight, encoder_q.0.layer1.2.bn2.weight, encoder_q.0.layer1.2.bn2.bias, encoder_q.0.layer1.2.bn2.running_mean, encoder_q.0.layer1.2.bn2.running_var, encoder_q.0.layer1.2.conv3.weight, encoder_q.0.layer1.2.bn3.weight, encoder_q.0.layer1.2.bn3.bias, encoder_q.0.layer1.2.bn3.running_mean, encoder_q.0.layer1.2.bn3.running_var, encoder_q.0.layer2.0.conv1.weight, encoder_q.0.layer2.0.bn1.weight, encoder_q.0.layer2.0.bn1.bias, encoder_q.0.layer2.0.bn1.running_mean, encoder_q.0.layer2.0.bn1.running_var, encoder_q.0.layer2.0.conv2.weight, encoder_q.0.layer2.0.bn2.weight, encoder_q.0.layer2.0.bn2.bias, encoder_q.0.layer2.0.bn2.running_mean, encoder_q.0.layer2.0.bn2.running_var, encoder_q.0.layer2.0.conv3.weight, encoder_q.0.layer2.0.bn3.weight, encoder_q.0.layer2.0.bn3.bias, encoder_q.0.layer2.0.bn3.running_mean, encoder_q.0.layer2.0.bn3.running_var, encoder_q.0.layer2.0.downsample.0.weight, encoder_q.0.layer2.0.downsample.1.weight, encoder_q.0.layer2.0.downsample.1.bias, encoder_q.0.layer2.0.downsample.1.running_mean, encoder_q.0.layer2.0.downsample.1.running_var, encoder_q.0.layer2.1.conv1.weight, encoder_q.0.layer2.1.bn1.weight, encoder_q.0.layer2.1.bn1.bias, encoder_q.0.layer2.1.bn1.running_mean, encoder_q.0.layer2.1.bn1.running_var, encoder_q.0.layer2.1.conv2.weight, encoder_q.0.layer2.1.bn2.weight, encoder_q.0.layer2.1.bn2.bias, encoder_q.0.layer2.1.bn2.running_mean, encoder_q.0.layer2.1.bn2.running_var, encoder_q.0.layer2.1.conv3.weight, encoder_q.0.layer2.1.bn3.weight, encoder_q.0.layer2.1.bn3.bias, encoder_q.0.layer2.1.bn3.running_mean, encoder_q.0.layer2.1.bn3.running_var, encoder_q.0.layer2.2.conv1.weight, encoder_q.0.layer2.2.bn1.weight, encoder_q.0.layer2.2.bn1.bias, encoder_q.0.layer2.2.bn1.running_mean, encoder_q.0.layer2.2.bn1.running_var, encoder_q.0.layer2.2.conv2.weight, encoder_q.0.layer2.2.bn2.weight, encoder_q.0.layer2.2.bn2.bias, encoder_q.0.layer2.2.bn2.running_mean, encoder_q.0.layer2.2.bn2.running_var, encoder_q.0.layer2.2.conv3.weight, encoder_q.0.layer2.2.bn3.weight, encoder_q.0.layer2.2.bn3.bias, encoder_q.0.layer2.2.bn3.running_mean, encoder_q.0.layer2.2.bn3.running_var, encoder_q.0.layer2.3.conv1.weight, encoder_q.0.layer2.3.bn1.weight, encoder_q.0.layer2.3.bn1.bias, encoder_q.0.layer2.3.bn1.running_mean, encoder_q.0.layer2.3.bn1.running_var, encoder_q.0.layer2.3.conv2.weight, encoder_q.0.layer2.3.bn2.weight, encoder_q.0.layer2.3.bn2.bias, encoder_q.0.layer2.3.bn2.running_mean, encoder_q.0.layer2.3.bn2.running_var, encoder_q.0.layer2.3.conv3.weight, encoder_q.0.layer2.3.bn3.weight, encoder_q.0.layer2.3.bn3.bias, encoder_q.0.layer2.3.bn3.running_mean, encoder_q.0.layer2.3.bn3.running_var, encoder_q.0.layer3.0.conv1.weight, encoder_q.0.layer3.0.bn1.weight, encoder_q.0.layer3.0.bn1.bias, encoder_q.0.layer3.0.bn1.running_mean, encoder_q.0.layer3.0.bn1.running_var, encoder_q.0.layer3.0.conv2.weight, encoder_q.0.layer3.0.bn2.weight, encoder_q.0.layer3.0.bn2.bias, encoder_q.0.layer3.0.bn2.running_mean, encoder_q.0.layer3.0.bn2.running_var, encoder_q.0.layer3.0.conv3.weight, encoder_q.0.layer3.0.bn3.weight, encoder_q.0.layer3.0.bn3.bias, encoder_q.0.layer3.0.bn3.running_mean, encoder_q.0.layer3.0.bn3.running_var, encoder_q.0.layer3.0.downsample.0.weight, encoder_q.0.layer3.0.downsample.1.weight, encoder_q.0.layer3.0.downsample.1.bias, encoder_q.0.layer3.0.downsample.1.running_mean, encoder_q.0.layer3.0.downsample.1.running_var, encoder_q.0.layer3.1.conv1.weight, encoder_q.0.layer3.1.bn1.weight, encoder_q.0.layer3.1.bn1.bias, encoder_q.0.layer3.1.bn1.running_mean, encoder_q.0.layer3.1.bn1.running_var, encoder_q.0.layer3.1.conv2.weight, encoder_q.0.layer3.1.bn2.weight, encoder_q.0.layer3.1.bn2.bias, encoder_q.0.layer3.1.bn2.running_mean, encoder_q.0.layer3.1.bn2.running_var, encoder_q.0.layer3.1.conv3.weight, encoder_q.0.layer3.1.bn3.weight, encoder_q.0.layer3.1.bn3.bias, encoder_q.0.layer3.1.bn3.running_mean, encoder_q.0.layer3.1.bn3.running_var, encoder_q.0.layer3.2.conv1.weight, encoder_q.0.layer3.2.bn1.weight, encoder_q.0.layer3.2.bn1.bias, encoder_q.0.layer3.2.bn1.running_mean, encoder_q.0.layer3.2.bn1.running_var, encoder_q.0.layer3.2.conv2.weight, encoder_q.0.layer3.2.bn2.weight, encoder_q.0.layer3.2.bn2.bias, encoder_q.0.layer3.2.bn2.running_mean, encoder_q.0.layer3.2.bn2.running_var, encoder_q.0.layer3.2.conv3.weight, encoder_q.0.layer3.2.bn3.weight, encoder_q.0.layer3.2.bn3.bias, encoder_q.0.layer3.2.bn3.running_mean, encoder_q.0.layer3.2.bn3.running_var, encoder_q.0.layer3.3.conv1.weight, encoder_q.0.layer3.3.bn1.weight, encoder_q.0.layer3.3.bn1.bias, encoder_q.0.layer3.3.bn1.running_mean, encoder_q.0.layer3.3.bn1.running_var, encoder_q.0.layer3.3.conv2.weight, encoder_q.0.layer3.3.bn2.weight, encoder_q.0.layer3.3.bn2.bias, encoder_q.0.layer3.3.bn2.running_mean, encoder_q.0.layer3.3.bn2.running_var, encoder_q.0.layer3.3.conv3.weight, encoder_q.0.layer3.3.bn3.weight, encoder_q.0.layer3.3.bn3.bias, encoder_q.0.layer3.3.bn3.running_mean, encoder_q.0.layer3.3.bn3.running_var, encoder_q.0.layer3.4.conv1.weight, encoder_q.0.layer3.4.bn1.weight, encoder_q.0.layer3.4.bn1.bias, encoder_q.0.layer3.4.bn1.running_mean, encoder_q.0.layer3.4.bn1.running_var, encoder_q.0.layer3.4.conv2.weight, encoder_q.0.layer3.4.bn2.weight, encoder_q.0.layer3.4.bn2.bias, encoder_q.0.layer3.4.bn2.running_mean, encoder_q.0.layer3.4.bn2.running_var, encoder_q.0.layer3.4.conv3.weight, encoder_q.0.layer3.4.bn3.weight, encoder_q.0.layer3.4.bn3.bias, encoder_q.0.layer3.4.bn3.running_mean, encoder_q.0.layer3.4.bn3.running_var, encoder_q.0.layer3.5.conv1.weight, encoder_q.0.layer3.5.bn1.weight, encoder_q.0.layer3.5.bn1.bias, encoder_q.0.layer3.5.bn1.running_mean, encoder_q.0.layer3.5.bn1.running_var, encoder_q.0.layer3.5.conv2.weight, encoder_q.0.layer3.5.bn2.weight, encoder_q.0.layer3.5.bn2.bias, encoder_q.0.layer3.5.bn2.running_mean, encoder_q.0.layer3.5.bn2.running_var, encoder_q.0.layer3.5.conv3.weight, encoder_q.0.layer3.5.bn3.weight, encoder_q.0.layer3.5.bn3.bias, encoder_q.0.layer3.5.bn3.running_mean, encoder_q.0.layer3.5.bn3.running_var, encoder_q.0.layer4.0.conv1.weight, encoder_q.0.layer4.0.bn1.weight, encoder_q.0.layer4.0.bn1.bias, encoder_q.0.layer4.0.bn1.running_mean, encoder_q.0.layer4.0.bn1.running_var, encoder_q.0.layer4.0.conv2.weight, encoder_q.0.layer4.0.bn2.weight, encoder_q.0.layer4.0.bn2.bias, encoder_q.0.layer4.0.bn2.running_mean, encoder_q.0.layer4.0.bn2.running_var, encoder_q.0.layer4.0.conv3.weight, encoder_q.0.layer4.0.bn3.weight, encoder_q.0.layer4.0.bn3.bias, encoder_q.0.layer4.0.bn3.running_mean, encoder_q.0.layer4.0.bn3.running_var, encoder_q.0.layer4.0.downsample.0.weight, encoder_q.0.layer4.0.downsample.1.weight, encoder_q.0.layer4.0.downsample.1.bias, encoder_q.0.layer4.0.downsample.1.running_mean, encoder_q.0.layer4.0.downsample.1.running_var, encoder_q.0.layer4.1.conv1.weight, encoder_q.0.layer4.1.bn1.weight, encoder_q.0.layer4.1.bn1.bias, encoder_q.0.layer4.1.bn1.running_mean, encoder_q.0.layer4.1.bn1.running_var, encoder_q.0.layer4.1.conv2.weight, encoder_q.0.layer4.1.bn2.weight, encoder_q.0.layer4.1.bn2.bias, encoder_q.0.layer4.1.bn2.running_mean, encoder_q.0.layer4.1.bn2.running_var, encoder_q.0.layer4.1.conv3.weight, encoder_q.0.layer4.1.bn3.weight, encoder_q.0.layer4.1.bn3.bias, encoder_q.0.layer4.1.bn3.running_mean, encoder_q.0.layer4.1.bn3.running_var, encoder_q.0.layer4.2.conv1.weight, encoder_q.0.layer4.2.bn1.weight, encoder_q.0.layer4.2.bn1.bias, encoder_q.0.layer4.2.bn1.running_mean, encoder_q.0.layer4.2.bn1.running_var, encoder_q.0.layer4.2.conv2.weight, encoder_q.0.layer4.2.bn2.weight, encoder_q.0.layer4.2.bn2.bias, encoder_q.0.layer4.2.bn2.running_mean, encoder_q.0.layer4.2.bn2.running_var, encoder_q.0.layer4.2.conv3.weight, encoder_q.0.layer4.2.bn3.weight, encoder_q.0.layer4.2.bn3.bias, encoder_q.0.layer4.2.bn3.running_mean, encoder_q.0.layer4.2.bn3.running_var, encoder_q.1.mlp.0.weight, encoder_q.1.mlp.0.bias, encoder_q.1.mlp.2.weight, encoder_q.1.mlp.2.bias, encoder_q.1.mlp2.0.weight, encoder_q.1.mlp2.0.bias, encoder_q.1.mlp2.2.weight, encoder_q.1.mlp2.2.bias, encoder_k.0.conv1.weight, encoder_k.0.bn1.weight, encoder_k.0.bn1.bias, encoder_k.0.bn1.running_mean, encoder_k.0.bn1.running_var, encoder_k.0.layer1.0.conv1.weight, encoder_k.0.layer1.0.bn1.weight, encoder_k.0.layer1.0.bn1.bias, encoder_k.0.layer1.0.bn1.running_mean, encoder_k.0.layer1.0.bn1.running_var, encoder_k.0.layer1.0.conv2.weight, encoder_k.0.layer1.0.bn2.weight, encoder_k.0.layer1.0.bn2.bias, encoder_k.0.layer1.0.bn2.running_mean, encoder_k.0.layer1.0.bn2.running_var, encoder_k.0.layer1.0.conv3.weight, encoder_k.0.layer1.0.bn3.weight, encoder_k.0.layer1.0.bn3.bias, encoder_k.0.layer1.0.bn3.running_mean, encoder_k.0.layer1.0.bn3.running_var, encoder_k.0.layer1.0.downsample.0.weight, encoder_k.0.layer1.0.downsample.1.weight, encoder_k.0.layer1.0.downsample.1.bias, encoder_k.0.layer1.0.downsample.1.running_mean, encoder_k.0.layer1.0.downsample.1.running_var, encoder_k.0.layer1.1.conv1.weight, encoder_k.0.layer1.1.bn1.weight, encoder_k.0.layer1.1.bn1.bias, encoder_k.0.layer1.1.bn1.running_mean, encoder_k.0.layer1.1.bn1.running_var, encoder_k.0.layer1.1.conv2.weight, encoder_k.0.layer1.1.bn2.weight, encoder_k.0.layer1.1.bn2.bias, encoder_k.0.layer1.1.bn2.running_mean, encoder_k.0.layer1.1.bn2.running_var, encoder_k.0.layer1.1.conv3.weight, encoder_k.0.layer1.1.bn3.weight, encoder_k.0.layer1.1.bn3.bias, encoder_k.0.layer1.1.bn3.running_mean, encoder_k.0.layer1.1.bn3.running_var, encoder_k.0.layer1.2.conv1.weight, encoder_k.0.layer1.2.bn1.weight, encoder_k.0.layer1.2.bn1.bias, encoder_k.0.layer1.2.bn1.running_mean, encoder_k.0.layer1.2.bn1.running_var, encoder_k.0.layer1.2.conv2.weight, encoder_k.0.layer1.2.bn2.weight, encoder_k.0.layer1.2.bn2.bias, encoder_k.0.layer1.2.bn2.running_mean, encoder_k.0.layer1.2.bn2.running_var, encoder_k.0.layer1.2.conv3.weight, encoder_k.0.layer1.2.bn3.weight, encoder_k.0.layer1.2.bn3.bias, encoder_k.0.layer1.2.bn3.running_mean, encoder_k.0.layer1.2.bn3.running_var, encoder_k.0.layer2.0.conv1.weight, encoder_k.0.layer2.0.bn1.weight, encoder_k.0.layer2.0.bn1.bias, encoder_k.0.layer2.0.bn1.running_mean, encoder_k.0.layer2.0.bn1.running_var, encoder_k.0.layer2.0.conv2.weight, encoder_k.0.layer2.0.bn2.weight, encoder_k.0.layer2.0.bn2.bias, encoder_k.0.layer2.0.bn2.running_mean, encoder_k.0.layer2.0.bn2.running_var, encoder_k.0.layer2.0.conv3.weight, encoder_k.0.layer2.0.bn3.weight, encoder_k.0.layer2.0.bn3.bias, encoder_k.0.layer2.0.bn3.running_mean, encoder_k.0.layer2.0.bn3.running_var, encoder_k.0.layer2.0.downsample.0.weight, encoder_k.0.layer2.0.downsample.1.weight, encoder_k.0.layer2.0.downsample.1.bias, encoder_k.0.layer2.0.downsample.1.running_mean, encoder_k.0.layer2.0.downsample.1.running_var, encoder_k.0.layer2.1.conv1.weight, encoder_k.0.layer2.1.bn1.weight, encoder_k.0.layer2.1.bn1.bias, encoder_k.0.layer2.1.bn1.running_mean, encoder_k.0.layer2.1.bn1.running_var, encoder_k.0.layer2.1.conv2.weight, encoder_k.0.layer2.1.bn2.weight, encoder_k.0.layer2.1.bn2.bias, encoder_k.0.layer2.1.bn2.running_mean, encoder_k.0.layer2.1.bn2.running_var, encoder_k.0.layer2.1.conv3.weight, encoder_k.0.layer2.1.bn3.weight, encoder_k.0.layer2.1.bn3.bias, encoder_k.0.layer2.1.bn3.running_mean, encoder_k.0.layer2.1.bn3.running_var, encoder_k.0.layer2.2.conv1.weight, encoder_k.0.layer2.2.bn1.weight, encoder_k.0.layer2.2.bn1.bias, encoder_k.0.layer2.2.bn1.running_mean, encoder_k.0.layer2.2.bn1.running_var, encoder_k.0.layer2.2.conv2.weight, encoder_k.0.layer2.2.bn2.weight, encoder_k.0.layer2.2.bn2.bias, encoder_k.0.layer2.2.bn2.running_mean, encoder_k.0.layer2.2.bn2.running_var, encoder_k.0.layer2.2.conv3.weight, encoder_k.0.layer2.2.bn3.weight, encoder_k.0.layer2.2.bn3.bias, encoder_k.0.layer2.2.bn3.running_mean, encoder_k.0.layer2.2.bn3.running_var, encoder_k.0.layer2.3.conv1.weight, encoder_k.0.layer2.3.bn1.weight, encoder_k.0.layer2.3.bn1.bias, encoder_k.0.layer2.3.bn1.running_mean, encoder_k.0.layer2.3.bn1.running_var, encoder_k.0.layer2.3.conv2.weight, encoder_k.0.layer2.3.bn2.weight, encoder_k.0.layer2.3.bn2.bias, encoder_k.0.layer2.3.bn2.running_mean, encoder_k.0.layer2.3.bn2.running_var, encoder_k.0.layer2.3.conv3.weight, encoder_k.0.layer2.3.bn3.weight, encoder_k.0.layer2.3.bn3.bias, encoder_k.0.layer2.3.bn3.running_mean, encoder_k.0.layer2.3.bn3.running_var, encoder_k.0.layer3.0.conv1.weight, encoder_k.0.layer3.0.bn1.weight, encoder_k.0.layer3.0.bn1.bias, encoder_k.0.layer3.0.bn1.running_mean, encoder_k.0.layer3.0.bn1.running_var, encoder_k.0.layer3.0.conv2.weight, encoder_k.0.layer3.0.bn2.weight, encoder_k.0.layer3.0.bn2.bias, encoder_k.0.layer3.0.bn2.running_mean, encoder_k.0.layer3.0.bn2.running_var, encoder_k.0.layer3.0.conv3.weight, encoder_k.0.layer3.0.bn3.weight, encoder_k.0.layer3.0.bn3.bias, encoder_k.0.layer3.0.bn3.running_mean, encoder_k.0.layer3.0.bn3.running_var, encoder_k.0.layer3.0.downsample.0.weight, encoder_k.0.layer3.0.downsample.1.weight, encoder_k.0.layer3.0.downsample.1.bias, encoder_k.0.layer3.0.downsample.1.running_mean, encoder_k.0.layer3.0.downsample.1.running_var, encoder_k.0.layer3.1.conv1.weight, encoder_k.0.layer3.1.bn1.weight, encoder_k.0.layer3.1.bn1.bias, encoder_k.0.layer3.1.bn1.running_mean, encoder_k.0.layer3.1.bn1.running_var, encoder_k.0.layer3.1.conv2.weight, encoder_k.0.layer3.1.bn2.weight, encoder_k.0.layer3.1.bn2.bias, encoder_k.0.layer3.1.bn2.running_mean, encoder_k.0.layer3.1.bn2.running_var, encoder_k.0.layer3.1.conv3.weight, encoder_k.0.layer3.1.bn3.weight, encoder_k.0.layer3.1.bn3.bias, encoder_k.0.layer3.1.bn3.running_mean, encoder_k.0.layer3.1.bn3.running_var, encoder_k.0.layer3.2.conv1.weight, encoder_k.0.layer3.2.bn1.weight, encoder_k.0.layer3.2.bn1.bias, encoder_k.0.layer3.2.bn1.running_mean, encoder_k.0.layer3.2.bn1.running_var, encoder_k.0.layer3.2.conv2.weight, encoder_k.0.layer3.2.bn2.weight, encoder_k.0.layer3.2.bn2.bias, encoder_k.0.layer3.2.bn2.running_mean, encoder_k.0.layer3.2.bn2.running_var, encoder_k.0.layer3.2.conv3.weight, encoder_k.0.layer3.2.bn3.weight, encoder_k.0.layer3.2.bn3.bias, encoder_k.0.layer3.2.bn3.running_mean, encoder_k.0.layer3.2.bn3.running_var, encoder_k.0.layer3.3.conv1.weight, encoder_k.0.layer3.3.bn1.weight, encoder_k.0.layer3.3.bn1.bias, encoder_k.0.layer3.3.bn1.running_mean, encoder_k.0.layer3.3.bn1.running_var, encoder_k.0.layer3.3.conv2.weight, encoder_k.0.layer3.3.bn2.weight, encoder_k.0.layer3.3.bn2.bias, encoder_k.0.layer3.3.bn2.running_mean, encoder_k.0.layer3.3.bn2.running_var, encoder_k.0.layer3.3.conv3.weight, encoder_k.0.layer3.3.bn3.weight, encoder_k.0.layer3.3.bn3.bias, encoder_k.0.layer3.3.bn3.running_mean, encoder_k.0.layer3.3.bn3.running_var, encoder_k.0.layer3.4.conv1.weight, encoder_k.0.layer3.4.bn1.weight, encoder_k.0.layer3.4.bn1.bias, encoder_k.0.layer3.4.bn1.running_mean, encoder_k.0.layer3.4.bn1.running_var, encoder_k.0.layer3.4.conv2.weight, encoder_k.0.layer3.4.bn2.weight, encoder_k.0.layer3.4.bn2.bias, encoder_k.0.layer3.4.bn2.running_mean, encoder_k.0.layer3.4.bn2.running_var, encoder_k.0.layer3.4.conv3.weight, encoder_k.0.layer3.4.bn3.weight, encoder_k.0.layer3.4.bn3.bias, encoder_k.0.layer3.4.bn3.running_mean, encoder_k.0.layer3.4.bn3.running_var, encoder_k.0.layer3.5.conv1.weight, encoder_k.0.layer3.5.bn1.weight, encoder_k.0.layer3.5.bn1.bias, encoder_k.0.layer3.5.bn1.running_mean, encoder_k.0.layer3.5.bn1.running_var, encoder_k.0.layer3.5.conv2.weight, encoder_k.0.layer3.5.bn2.weight, encoder_k.0.layer3.5.bn2.bias, encoder_k.0.layer3.5.bn2.running_mean, encoder_k.0.layer3.5.bn2.running_var, encoder_k.0.layer3.5.conv3.weight, encoder_k.0.layer3.5.bn3.weight, encoder_k.0.layer3.5.bn3.bias, encoder_k.0.layer3.5.bn3.running_mean, encoder_k.0.layer3.5.bn3.running_var, encoder_k.0.layer4.0.conv1.weight, encoder_k.0.layer4.0.bn1.weight, encoder_k.0.layer4.0.bn1.bias, encoder_k.0.layer4.0.bn1.running_mean, encoder_k.0.layer4.0.bn1.running_var, encoder_k.0.layer4.0.conv2.weight, encoder_k.0.layer4.0.bn2.weight, encoder_k.0.layer4.0.bn2.bias, encoder_k.0.layer4.0.bn2.running_mean, encoder_k.0.layer4.0.bn2.running_var, encoder_k.0.layer4.0.conv3.weight, encoder_k.0.layer4.0.bn3.weight, encoder_k.0.layer4.0.bn3.bias, encoder_k.0.layer4.0.bn3.running_mean, encoder_k.0.layer4.0.bn3.running_var, encoder_k.0.layer4.0.downsample.0.weight, encoder_k.0.layer4.0.downsample.1.weight, encoder_k.0.layer4.0.downsample.1.bias, encoder_k.0.layer4.0.downsample.1.running_mean, encoder_k.0.layer4.0.downsample.1.running_var, encoder_k.0.layer4.1.conv1.weight, encoder_k.0.layer4.1.bn1.weight, encoder_k.0.layer4.1.bn1.bias, encoder_k.0.layer4.1.bn1.running_mean, encoder_k.0.layer4.1.bn1.running_var, encoder_k.0.layer4.1.conv2.weight, encoder_k.0.layer4.1.bn2.weight, encoder_k.0.layer4.1.bn2.bias, encoder_k.0.layer4.1.bn2.running_mean, encoder_k.0.layer4.1.bn2.running_var, encoder_k.0.layer4.1.conv3.weight, encoder_k.0.layer4.1.bn3.weight, encoder_k.0.layer4.1.bn3.bias, encoder_k.0.layer4.1.bn3.running_mean, encoder_k.0.layer4.1.bn3.running_var, encoder_k.0.layer4.2.conv1.weight, encoder_k.0.layer4.2.bn1.weight, encoder_k.0.layer4.2.bn1.bias, encoder_k.0.layer4.2.bn1.running_mean, encoder_k.0.layer4.2.bn1.running_var, encoder_k.0.layer4.2.conv2.weight, encoder_k.0.layer4.2.bn2.weight, encoder_k.0.layer4.2.bn2.bias, encoder_k.0.layer4.2.bn2.running_mean, encoder_k.0.layer4.2.bn2.running_var, encoder_k.0.layer4.2.conv3.weight, encoder_k.0.layer4.2.bn3.weight, encoder_k.0.layer4.2.bn3.bias, encoder_k.0.layer4.2.bn3.running_mean, encoder_k.0.layer4.2.bn3.running_var, encoder_k.1.mlp.0.weight, encoder_k.1.mlp.0.bias, encoder_k.1.mlp.2.weight, encoder_k.1.mlp.2.bias, encoder_k.1.mlp2.0.weight, encoder_k.1.mlp2.0.bias, encoder_k.1.mlp2.2.weight, encoder_k.1.mlp2.2.bias, backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, backbone.bn1.running_var, backbone.layer1.0.conv1.weight, backbone.layer1.0.bn1.weight, backbone.layer1.0.bn1.bias, backbone.layer1.0.bn1.running_mean, backbone.layer1.0.bn1.running_var, backbone.layer1.0.conv2.weight, backbone.layer1.0.bn2.weight, backbone.layer1.0.bn2.bias, backbone.layer1.0.bn2.running_mean, backbone.layer1.0.bn2.running_var, backbone.layer1.0.conv3.weight, backbone.layer1.0.bn3.weight, backbone.layer1.0.bn3.bias, backbone.layer1.0.bn3.running_mean, backbone.layer1.0.bn3.running_var, backbone.layer1.0.downsample.0.weight, backbone.layer1.0.downsample.1.weight, backbone.layer1.0.downsample.1.bias, backbone.layer1.0.downsample.1.running_mean, backbone.layer1.0.downsample.1.running_var, backbone.layer1.1.conv1.weight, backbone.layer1.1.bn1.weight, backbone.layer1.1.bn1.bias, backbone.layer1.1.bn1.running_mean, backbone.layer1.1.bn1.running_var, backbone.layer1.1.conv2.weight, backbone.layer1.1.bn2.weight, backbone.layer1.1.bn2.bias, backbone.layer1.1.bn2.running_mean, backbone.layer1.1.bn2.running_var, backbone.layer1.1.conv3.weight, backbone.layer1.1.bn3.weight, backbone.layer1.1.bn3.bias, backbone.layer1.1.bn3.running_mean, backbone.layer1.1.bn3.running_var, backbone.layer1.2.conv1.weight, backbone.layer1.2.bn1.weight, backbone.layer1.2.bn1.bias, backbone.layer1.2.bn1.running_mean, backbone.layer1.2.bn1.running_var, backbone.layer1.2.conv2.weight, backbone.layer1.2.bn2.weight, backbone.layer1.2.bn2.bias, backbone.layer1.2.bn2.running_mean, backbone.layer1.2.bn2.running_var, backbone.layer1.2.conv3.weight, backbone.layer1.2.bn3.weight, backbone.layer1.2.bn3.bias, backbone.layer1.2.bn3.running_mean, backbone.layer1.2.bn3.running_var, backbone.layer2.0.conv1.weight, backbone.layer2.0.bn1.weight, backbone.layer2.0.bn1.bias, backbone.layer2.0.bn1.running_mean, backbone.layer2.0.bn1.running_var, backbone.layer2.0.conv2.weight, backbone.layer2.0.bn2.weight, backbone.layer2.0.bn2.bias, backbone.layer2.0.bn2.running_mean, backbone.layer2.0.bn2.running_var, backbone.layer2.0.conv3.weight, backbone.layer2.0.bn3.weight, backbone.layer2.0.bn3.bias, backbone.layer2.0.bn3.running_mean, backbone.layer2.0.bn3.running_var, backbone.layer2.0.downsample.0.weight, backbone.layer2.0.downsample.1.weight, backbone.layer2.0.downsample.1.bias, backbone.layer2.0.downsample.1.running_mean, backbone.layer2.0.downsample.1.running_var, backbone.layer2.1.conv1.weight, backbone.layer2.1.bn1.weight, backbone.layer2.1.bn1.bias, backbone.layer2.1.bn1.running_mean, backbone.layer2.1.bn1.running_var, backbone.layer2.1.conv2.weight, backbone.layer2.1.bn2.weight, backbone.layer2.1.bn2.bias, backbone.layer2.1.bn2.running_mean, backbone.layer2.1.bn2.running_var, backbone.layer2.1.conv3.weight, backbone.layer2.1.bn3.weight, backbone.layer2.1.bn3.bias, backbone.layer2.1.bn3.running_mean, backbone.layer2.1.bn3.running_var, backbone.layer2.2.conv1.weight, backbone.layer2.2.bn1.weight, backbone.layer2.2.bn1.bias, backbone.layer2.2.bn1.running_mean, backbone.layer2.2.bn1.running_var, backbone.layer2.2.conv2.weight, backbone.layer2.2.bn2.weight, backbone.layer2.2.bn2.bias, backbone.layer2.2.bn2.running_mean, backbone.layer2.2.bn2.running_var, backbone.layer2.2.conv3.weight, backbone.layer2.2.bn3.weight, backbone.layer2.2.bn3.bias, backbone.layer2.2.bn3.running_mean, backbone.layer2.2.bn3.running_var, backbone.layer2.3.conv1.weight, backbone.layer2.3.bn1.weight, backbone.layer2.3.bn1.bias, backbone.layer2.3.bn1.running_mean, backbone.layer2.3.bn1.running_var, backbone.layer2.3.conv2.weight, backbone.layer2.3.bn2.weight, backbone.layer2.3.bn2.bias, backbone.layer2.3.bn2.running_mean, backbone.layer2.3.bn2.running_var, backbone.layer2.3.conv3.weight, backbone.layer2.3.bn3.weight, backbone.layer2.3.bn3.bias, backbone.layer2.3.bn3.running_mean, backbone.layer2.3.bn3.running_var, backbone.layer3.0.conv1.weight, backbone.layer3.0.bn1.weight, backbone.layer3.0.bn1.bias, backbone.layer3.0.bn1.running_mean, backbone.layer3.0.bn1.running_var, backbone.layer3.0.conv2.weight, backbone.layer3.0.bn2.weight, backbone.layer3.0.bn2.bias, backbone.layer3.0.bn2.running_mean, backbone.layer3.0.bn2.running_var, backbone.layer3.0.conv3.weight, backbone.layer3.0.bn3.weight, backbone.layer3.0.bn3.bias, backbone.layer3.0.bn3.running_mean, backbone.layer3.0.bn3.running_var, backbone.layer3.0.downsample.0.weight, backbone.layer3.0.downsample.1.weight, backbone.layer3.0.downsample.1.bias, backbone.layer3.0.downsample.1.running_mean, backbone.layer3.0.downsample.1.running_var, backbone.layer3.1.conv1.weight, backbone.layer3.1.bn1.weight, backbone.layer3.1.bn1.bias, backbone.layer3.1.bn1.running_mean, backbone.layer3.1.bn1.running_var, backbone.layer3.1.conv2.weight, backbone.layer3.1.bn2.weight, backbone.layer3.1.bn2.bias, backbone.layer3.1.bn2.running_mean, backbone.layer3.1.bn2.running_var, backbone.layer3.1.conv3.weight, backbone.layer3.1.bn3.weight, backbone.layer3.1.bn3.bias, backbone.layer3.1.bn3.running_mean, backbone.layer3.1.bn3.running_var, backbone.layer3.2.conv1.weight, backbone.layer3.2.bn1.weight, backbone.layer3.2.bn1.bias, backbone.layer3.2.bn1.running_mean, backbone.layer3.2.bn1.running_var, backbone.layer3.2.conv2.weight, backbone.layer3.2.bn2.weight, backbone.layer3.2.bn2.bias, backbone.layer3.2.bn2.running_mean, backbone.layer3.2.bn2.running_var, backbone.layer3.2.conv3.weight, backbone.layer3.2.bn3.weight, backbone.layer3.2.bn3.bias, backbone.layer3.2.bn3.running_mean, backbone.layer3.2.bn3.running_var, backbone.layer3.3.conv1.weight, backbone.layer3.3.bn1.weight, backbone.layer3.3.bn1.bias, backbone.layer3.3.bn1.running_mean, backbone.layer3.3.bn1.running_var, backbone.layer3.3.conv2.weight, backbone.layer3.3.bn2.weight, backbone.layer3.3.bn2.bias, backbone.layer3.3.bn2.running_mean, backbone.layer3.3.bn2.running_var, backbone.layer3.3.conv3.weight, backbone.layer3.3.bn3.weight, backbone.layer3.3.bn3.bias, backbone.layer3.3.bn3.running_mean, backbone.layer3.3.bn3.running_var, backbone.layer3.4.conv1.weight, backbone.layer3.4.bn1.weight, backbone.layer3.4.bn1.bias, backbone.layer3.4.bn1.running_mean, backbone.layer3.4.bn1.running_var, backbone.layer3.4.conv2.weight, backbone.layer3.4.bn2.weight, backbone.layer3.4.bn2.bias, backbone.layer3.4.bn2.running_mean, backbone.layer3.4.bn2.running_var, backbone.layer3.4.conv3.weight, backbone.layer3.4.bn3.weight, backbone.layer3.4.bn3.bias, backbone.layer3.4.bn3.running_mean, backbone.layer3.4.bn3.running_var, backbone.layer3.5.conv1.weight, backbone.layer3.5.bn1.weight, backbone.layer3.5.bn1.bias, backbone.layer3.5.bn1.running_mean, backbone.layer3.5.bn1.running_var, backbone.layer3.5.conv2.weight, backbone.layer3.5.bn2.weight, backbone.layer3.5.bn2.bias, backbone.layer3.5.bn2.running_mean, backbone.layer3.5.bn2.running_var, backbone.layer3.5.conv3.weight, backbone.layer3.5.bn3.weight, backbone.layer3.5.bn3.bias, backbone.layer3.5.bn3.running_mean, backbone.layer3.5.bn3.running_var, backbone.layer4.0.conv1.weight, backbone.layer4.0.bn1.weight, backbone.layer4.0.bn1.bias, backbone.layer4.0.bn1.running_mean, backbone.layer4.0.bn1.running_var, backbone.layer4.0.conv2.weight, backbone.layer4.0.bn2.weight, backbone.layer4.0.bn2.bias, backbone.layer4.0.bn2.running_mean, backbone.layer4.0.bn2.running_var, backbone.layer4.0.conv3.weight, backbone.layer4.0.bn3.weight, backbone.layer4.0.bn3.bias, backbone.layer4.0.bn3.running_mean, backbone.layer4.0.bn3.running_var, backbone.layer4.0.downsample.0.weight, backbone.layer4.0.downsample.1.weight, backbone.layer4.0.downsample.1.bias, backbone.layer4.0.downsample.1.running_mean, backbone.layer4.0.downsample.1.running_var, backbone.layer4.1.conv1.weight, backbone.layer4.1.bn1.weight, backbone.layer4.1.bn1.bias, backbone.layer4.1.bn1.running_mean, backbone.layer4.1.bn1.running_var, backbone.layer4.1.conv2.weight, backbone.layer4.1.bn2.weight, backbone.layer4.1.bn2.bias, backbone.layer4.1.bn2.running_mean, backbone.layer4.1.bn2.running_var, backbone.layer4.1.conv3.weight, backbone.layer4.1.bn3.weight, backbone.layer4.1.bn3.bias, backbone.layer4.1.bn3.running_mean, backbone.layer4.1.bn3.running_var, backbone.layer4.2.conv1.weight, backbone.layer4.2.bn1.weight, backbone.layer4.2.bn1.bias, backbone.layer4.2.bn1.running_mean, backbone.layer4.2.bn1.running_var, backbone.layer4.2.conv2.weight, backbone.layer4.2.bn2.weight, backbone.layer4.2.bn2.bias, backbone.layer4.2.bn2.running_mean, backbone.layer4.2.bn2.running_var, backbone.layer4.2.conv3.weight, backbone.layer4.2.bn3.weight, backbone.layer4.2.bn3.bias, backbone.layer4.2.bn3.running_mean, backbone.layer4.2.bn3.running_var
Thanks for the wonderful work about DenseCL.I have a code-related questions and want to consult with you.That is why the input len(x)==1 of DenseCLNeck, and what is the dimension of x[0]?
assert len(x) == 1
x = x[0]
I have 1 question and hope to hear from you:
In section, 2.3 ''Each negative key t_ is the pooled feature vector of a view from a different image.''
Why not use the other parts of the two views of the same image as negative samples?
This seems more make sense.
Thanks for the great work!
I have a question in equation (1) and (2) in the paper.
In the denominators of these equations, why the temperature hyper-parameter \tau is not used in the "exp(q·k_+)" and "exp(r^s·t^s_+)"?
In https://github.com/WXinlong/DenseCL/blob/main/openselfsup/models/heads/contrastive_head.py#L34, it seems that \tau is applied to all key features.
Have you tried different output channels for single projection and dense projection? Particularly, you used the same hidden channels and output channels for single mlp and dense mlp in the DenseCLNeck
impl. As I know, the projection of instance representation requires a greater number of channels than the projection of dense representation. Treating both of them equally might lose lots of useful information from instance representation. How do you think about this problem? Most instance discrimination methods also design the projector as fc-bn-relu-fc so I wonder why you drop bn
in DenseCLNeck
? Is it just for simplicity?
self.mlp = nn.Sequential(
nn.Linear(in_channels, hid_channels), nn.ReLU(inplace=True),
nn.Linear(hid_channels, out_channels))
...
self.mlp2 = nn.Sequential(
nn.Conv2d(in_channels, hid_channels, 1), nn.ReLU(inplace=True),
nn.Conv2d(hid_channels, out_channels, 1))
Hi, thanks so much for providing the detection configs.
When I run detectron2 based on "Base-RCNN-C4-BN.yaml",
(https://github.com/WXinlong/DenseCL/blob/main/benchmarks/detection/configs/Base-RCNN-C4-BN.yaml)
the process said that "Res5ROIHeadsExtraNorm" has not been registered?
And I carefully read the documents of detectron2,
it has 'Res5ROIHeads', but does not have 'Res5ROIHeadsExtraNorm'.
Does Res5ROIHeads == Res5ROIHeadsExtraNorm?
Hi the link for downloading pretrained Mocov2 on COCO is dead. Can you re-upload the weight?
Thanks for revealing your code and the results are impressive.
I've tried the downloaded DenseCL pretrained models and tested on the VOC semantic segmentation dataset. When using the same FCN architecture, the result performance matches the expectation. The DenseCL ImageNet pretrained model outperforms the ImageNet classification model. However, when replacing the backbones of DeepLabV3+, the DenseCL model showed inferior performance. The results comparisons are as below:
Arch | Dataset | Pretrained Model | mIoU |
---|---|---|---|
dv3+ | VOC12 | Sup ImageNet | 71.33 |
dv3+ | VOC12 | DenseCL COCO | 67.51 |
dv3+ | VOC12 | DenseCL ImageNet | 69.5 |
The configs are borrowed from the official configs of MMSEG and I carefully tried to not make much modifications. Wondering if you have ever noticed same behavior on any other models or datasets?
Hey there, thanks for sharing the code!
I just have a quick question:
Why is the argmax used to match features from different views, rather than using the spatial correspondance, which we have access to, since we know which data augmentations were applied to the images?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.