facebookresearch / swav Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 275.0 65 KB

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

License: Other

Python 84.42% Shell 15.58%

swav's People

Contributors

Stargazers

Watchers

Forkers

tpnguyen geekh04 twistedmove scape1989 linhduongtuan toanhvu oggyfaker solversa disruptive-tech-community leule jesperkers frankfan007 shafiahmed utayao cshnai suvrajeet01 ensv lekhanhh alihassanijr zeta1999 gridl dlusseau makama-md tcl326 kapitsa2811 stjordanis chaoso phecy wang3702 tophk jrpeng evgeneus ceshine bruinxiong robot-ai-machinelearning nedstarks aakashkumarnain nzw0301 kp-forks killsking andrewliao11 ananyahjha93 13301338176 liuyuyuil zhyj3038 forks-learning dotpyu dinarior endeavour10020 bonejay fangruizhu midoassran gg-yuki haowen470 martinetoering guoswang noticeable harshwani1997 albertvillanova cxxgtxy raegher haoyango coolsidd ankitshah009 premjithb rgring qgh1223 rehno-lindeque fdujay youtang1993 zaiweizhang cv-ip nikitadhawan poodarchu gidariss phymucs zhu2sir davidsoong scotter-qian nishanthcgit vpulid sui6662012 killawhale2 xinyu-shi jhvics1 mldl ronghanghu peterzhousz ajayarunachalam pclucas14 soapy-salted-fish-king sthalles k-stacke aabbcc23 qilingu tungx yueyedeai gustvision avani17101 xcvil

swav's Issues

TypeError: optimizers must be either a single optimizer or a list of optimizers.

Hello,

I'm trying to run main_swav.py with the following command:

python -m torch.distributed.launch --nproc_per_node=1 main_swav.py --images_path=<path to data directory> --train_annotations_path <path to data file> --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15

Some of those parameters have been added to accommodate our data. The only changes I have made to the code are minor changes to the dataset and additional/changed arguments. When I run this command I get the following error:

`Traceback (most recent call last):
File "main_swav.py", line 380, in
main()
File "main_swav.py", line 189, in main
model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O1")
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 158, in _initialize
raise TypeError("optimizers must be either a single optimizer or a list of optimizers.")
TypeError: optimizers must be either a single optimizer or a list of optimizers.

Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--images_path=/data/computer_vision_projects/rare_planes/classification_data/images/', '--train_annotations_path', '/data/computer_vision_projects/rare_planes/classification_data/annotations/instances_train_role_mislabel_category_id_033_chipped.json', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1.
make: *** [Makefile:69: train-rare-planes] Error 1`

Immediately before the line that throws the error I placed a couple print statements:
print("type(OPTIMIZER)", type(optimizer)) print("OPTIMIZER", optimizer)

The output from those is:
type(OPTIMIZER) <class 'apex.parallel.LARC.LARC'> OPTIMIZER SGD ( Parameter Group 0 dampening: 0 lr: 0.6 momentum: 0.9 nesterov: False weight_decay: 1e-06 )

Here are some version numbers I'm using:
Python 3.6.9 :: Anaconda, Inc. PyTorch == 1.5.0a0+8f84ded torchvision == 0.6.0a0 CUDA == 10.2 apex == 0.1

Any ideas why I would be seeing this error? Thanks in advance!

The learning rate of linear classification

Thanks for your awesome work.
I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)?
In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo.
Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo?
I wonder about the performance impact of the lr.

Evaluate on custom dataset

Hi, how can you evaluate different models on a custom dataset?

Question regarding the license

Hi!
I wonder if I can use swav internally in a commercial company? We do not charge end-users directly, but of course the company is for profit and it's profit may increase due to usage of DL models.
As the name of the license suggests, I can't use it, but would like to clarify.
Thanks!

Problem about data_path in training code ?

main_swav.py
the dataset folder is used args.data_path which is the path about imagenet dataset root path (contain train, val, test)

train_dataset = MultiCropDataset(
        args.data_path,
        args.size_crops,
        args.nmb_crops,
        args.min_scale_crops,
        args.max_scale_crops,
    )

eval_linear.py
train_dataset = datasets.ImageFolder(os.path.join(args.data_path, "train"))

in main_swav.py, if set the args.data_path=/path/to/imagenet, it will use the ( train, val, test ) to do self supervised pretraining , am i right ?

Training time is too long and how to accelerating training？

Hi, a wonderful work and thanks for sharing your code! Now i run your code on ImageNet following your setting, but I found it takes about one hour and a half for training just one epoch，which is too slow. So I want to know if I was missing some key points which are import for speeding up training？

Pre-trained checkpoint for larger architecture?

Hi! Thanks for the sharing of SWAV! I just wonder that do you have any follow up plans on releasing the pre-trained weight for larger models (like Res50X4, Res152X4), which might provide great help for researchers, as it might be too computational resources demanding for us to re-train it :(

Again, thanks for your work very much :)

Cannot load the pretrained models

Hi, I ran into a problem when I tried to load the pretrained resnet-50 model. It seems that the keys in the pre-trained model and keys in the torchvision resnet-50 are not the same. The same problem appears when I tried to load other models listed on the Model Zoo table. Could you please help me with this issue? Thanks.

Here is my code:
import torch, torchvision
model = torchvision.models.resnet50()
checkpoint = torch.load('.user/swav_800ep_pretrain.pth.tar')
model.load_state_dict(checkpoint, strict=False)

when I set strict=False, the model does not load any weights and act like a random initialized model.
when I set strict=True, it will raise error as following:

RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict:
"conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var",
......
"layer4.2.bn2.running_mean", "layer4.2.bn2.running_var", "layer4.2.conv3.weight", "layer4.2.bn3.weight", "layer4.2.bn3.bias", "layer4.2.bn3.running_mean", "layer4.2.bn3.running_var", "fc.weight", "fc.bias".
Unexpected key(s) in state_dict:
"module.conv1.weight", "module.bn1.weight", "module.bn1.bias", "module.bn1.running_mean", "module.bn1.running_var", "module.bn1.num_batches_tracked", "module.layer1.0.conv1.weight", "module.layer1.0.bn1.weight", "module.layer1.0.bn1.bias", "module.layer1.0.bn1.running_mean",
......
"module.projection_head.0.weight", "module.projection_head.0.bias", "module.projection_head.1.weight", "module.projection_head.1.bias", "module.projection_head.1.running_mean", "module.projection_head.1.running_var", "module.projection_head.1.num_batches_tracked", "module.projection_head.3.weight", "module.projection_head.3.bias", "module.prototypes.weight".

About the released pretrained model

I wonder whether the released pretrained models were trained on uncurated data(1 billion random public non-EU images from Instagram) ?

Open sourcing imagenet-trained models

Thanks for open sourcing your codebase. Would it be possible to share the final model corresponding to Imagenet downstream task that gets 75.3% top-1 accuracy? Thanks in advance

Choice of dimension of hidden variable

Hi, I'm curious about how you choose 2048 as the dimension in the projection layer, it seems that ResNet in this repo will output the tensor with 512 channels.

If my Resnet has to output tensor with 256 channels, do you think I need to decrease it from 2048?

Thx

Demo running on colab

So I was trying to get this working as a prototype on google colab. I installed apex, and when I run

python -m torch.distributed.launch main_swav.py \
--data_path /content/data/fer/images \
--epochs  20 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 48 48 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 0 \
--epoch_queue_starts 15

I get this error :

Traceback (most recent call last):
  File "main_swav.py", line 375, in <module>
    main()
  File "main_swav.py", line 123, in main
    init_distributed_mode(args)
  File "/content/swav/src/utils.py", line 65, in init_distributed_mode
    rank=args.rank,
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 391, in init_process_group
    init_method, rank, world_size, timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
    raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'main_swav.py', '--local_rank=0', '--data_path', '/content/data/fer/images', '--epochs', '20', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '48', '48', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '0', '--epoch_queue_starts', '15']' returned non-zero exit status 1.

Thinking this might be related to the python -m torch.distributed.launch \ because obviously I am not using a distributed computing environment, I try to change it to maybe torch.launch which does not obviously work.

Can I get any help ? Thanks.

Deepclusterv2: why always freeze the prototypes during training?

Hi,
Thanks for your nice work!

I notice in your code that

swav/main_deepclusterv2.py

Line 280 in 82bddbb

if iteration < args.freeze_prototypes_niters:

The freeze_prototypes_niters is set to 300000(deepclusterv2_800ep_pretrain.sh) which is equal to 1000 epochs when the batch size is 4096. It seems to never update the parameter of prototypes. I think the prototypes are as same as the fully connected layer in standard classification tasks which should be optimized. So why always freeze them in your code?

problems in run the eval_linear.py with the pretrained swav model

Hi, thanks for your excellent work! I meet some problems when I run the codes.
Firstly,I train the swav model with the command python -m torch.distributed.launch --nproc_per_node=2 main_swav.py ...,and the model parameters saved in the checkpoint.pth.tar. But when I run the eval_linear.py with the pretrained swav model with the command python -m torch.distributed.launch --nproc_per_node=2 eval_linear.py --pretrained checkpoint.pth.tar,I meet some errors,the logs are:

Traceback (most recent call last):
  File "/home/yc/codes/swav/src/utils.py", line 144, in restart_from_checkpoint
    msg = value.load_state_dict(checkpoint[key], strict=False)
TypeError: load_state_dict() got an unexpected keyword argument 'strict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "eval_linear.py", line 397, in <module>
    main()
  File "eval_linear.py", line 201, in main
    scheduler=scheduler,
  File "/home/yc/codes/swav/src/utils.py", line 147, in restart_from_checkpoint
    msg = value.load_state_dict(checkpoint[key])
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/site-packages/torch/optim/optimizer.py", line 123, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
Traceback (most recent call last):
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File "/home/yc/anaconda3/envs/tf2/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)

Does it means that there are some errors when the optimizer restore from the checkpoints? Could you help me,thanks!

Some problems about swav experiment

I'd like to ask a few questions.

due to the limitation of GPU resources, I can only use a single GPU to run swav experiments. In this case, what needs to be adjusted in the setting of experimental parameters? Will the performance of the pre-training model decrease significantly?
How many instances are needed at least in order to get a relatively good pre-training effect?
With regard to the model superparameter args.nmb_prototypes, if the actual categories of the custom dataset are few (far less than 1k), is it necessary to make corresponding adjustments?
In line 371 of the file main_swav.py, why does args.world_size appear in the code but not in the pseudo-code in the article?

Thanks again for being able to open source this code. I am looking forward to your reply.

Generate embedding without GPU

Hi, I love the project a lot, thanks for sharing it!

However, for some reason, the GPU is not available to me. I have downloaded the pretrained model, and my ultimate goal is to get some visual embedding from the pretrained model, so I just wondering if there is any easy way to do the following thing: If I input a single image into the model, the model would output a corresponding embedding. Note that I don't expect to re-train or fine-tune the model, so maybe it's still possible to do the job without using a GPU.

Also, if this is not too dumb, could you please specify the input/output size? Thank you very much!

prototype norm problems

Hi @mathildecaron31
As you said,
In the paper, the prototypes are indeed normalized along the first dimension because the prototypes matrix, C, is of dimension DxK (i.e, 128x3000).

On the contrary, in the code, w is of dimension KxD (i.e, 3000x128). You can easily check that the normalization is done correctly by printing:

print(torch.norm(w, dim=1).shape) # should give 3000
print(torch.norm(w, dim=1)) # should give a vector with 1 everywhere

But the the code goes:
self.prototypes = nn.Linear(output_dim, nmb_prototypes, bias=False)
which output_dim = 128 , nmb_prototypes = 3000
So the w is of dimension D * K (128 * 3000), which means there are 3000 prototypes and each prototype is a vector with dim 128

Thanks

How to download a model with projection head

Hellow. Thanks for the work. As far as I can see pretrained models in the end have fully counted dim =1000 . Shouldn't the projection head = 128 be there? I want to get embeddings , hiw to do this? Do you have appropriate pretraibed model with 128 x projection head?

How to perform multinode training with torch.distributed.launch?

Hi, nice work! I tried to do pretraining with main_swav.py on multiple machines.

Here's the main code for distributed training.

python -m torch.distributed.launch main_swav.py --rank 0 \
--world_size 8 \
--dist_url 'tcp://172.31.11.200:23456' \

I comment the line 55-59 in src/utils.py in order to set ranks for each machine. It is okay to run.

But I found that during training, on each machine, only 1 GPU was used. I think it is caused by

swav/src/utils.py

Line 68 in 77f7185

args.gpu_to_work_on = args.rank % torch.cuda.device_count()

. Could you help me figure it out?

Many thanks!

SwAV with 100 256 2x224 + 6x96 ?

can you provide a model / results of training swav with bs 256, input size 2x224+6x96 for 100 epochs?

The training time is too long.

cause some error by nn.BatchNorm1d in self.projection_head.

SyncBatchNorm.convert_sync_batchnorm() causes ValueError: expected at least 3D input (got 2D input).
how to solve it ?

DeepCluster-v2 linear evaluation for fewer epochs

Thanks for sharing your code of such a wonderful work. Have you done experiments using DeepCluster-v2 for fewer training epochs, e.g. 100 or 200 epochs ? If so, can you provide the linear evaluation top-1 acc. for such settings? Many thanks.

Empty clusters?

Hi @mathildecaron31

I trained a network from scratch with my own dataset and wrote some code that sorts images in different folders regarding their cluster assignments. I did this with the following lines of code:

embedding, output = model(inputs)
p = softmax(output / args.temperature)
prediction = p.tolist()
prototyp = []
for i in range(len(prediction)): 
      prototyp.append(np.argmax(prediction[i]))

The problem is that when I save the images in different folders regarding their cluster assignment, some folders remain empty. The number of folders is the same as the number of prototypes. I always thought that the images are equally distributed between the different prototypes. What is the problem? Can you help me?

checkpoint loading failed

Hi,

when I try to load one of your provided checkpoint models in your main_sway.py file, I always receive the warning:
WARNING - 01/05/21 11:22:46 - 0:00:09 - => failed to load optimizer from checkpoint ...
WARNING - 01/05/21 11:22:46 - 0:00:09 - => failed to load amp from checkpoint ...
WARNING - 01/05/21 11:22:46 - 0:00:09 - => failed to load state_dict from checkpoint ...

What is the reason for the warnings? Isn't it possible to use your provided models for finetuning?

About the pretrained model RN50-w5

Thank you for sharing this wonderful work.

I conducted several experiments using the released pretrained models. However, the pretrained resnet50w5 is failed to be loaded because the batchnorm layer of the projection_head is missing in the pretrained models.

So could I just ignore this batchnorm layer when using the pretrained resnet50w5?

Thanks very much!

update of prototypes

Hi Mathilde,
In your swav paper, I understand that the backbone as well as the prototypes are updated.

Therefore, I was wondering why you call embeddings.detach() (https://github.com/facebookresearch/swav/blob/master/main_swav.py#L291) in your script. I thought when detaching a tensor, no gradient will be back-propagated along this variable.

Thanks in advance for your help!

prototype normalize bugs

Hi,
In paper, the pseudo-code shows the prototypes are normalized along the first dimension:

    **with torch.no_grad():
       C = normalize(C, dim=0, p=2)**

But in the source code, the prototypes are normalized along the second dimension:

      **# normalize the prototypes
      with torch.no_grad():
         w = model.module.prototypes.weight.data.clone()
         w = nn.functional.normalize(w, dim=1, p=2)
         model.module.prototypes.weight.copy_(w)**
     
Since the column of the prototypes is regarded as one cluster,  the prototype should be normalized along the first dimension( dim = 0) ? 
Thanks

Multi-Crop on MoCo

Good job! Thanks for sharing the code. However I was wondering how much gain can Multi-Crop bring on MoCo ? Have you tried it?

How to use the checkpoint after training?

I tried to train swav with a small dataset, and I got these generated files:

checkpoints
stats0.pkl
params.pkl
train.log

If I have the model after training how can I use it? how to assign an unseen image to one of those clusters and how to retrieve images from the same cluster?

I used this command for training:

python -m torch.distributed.launch --nproc_per_node=1 main_swav.py \
--data_path pics1 \
--epochs 5 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 224 96 \
--nmb_crops 2 6 \
--min_scale_crops 0.14 0.05 \
--max_scale_crops 1. 0.14 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 3840 \
--epoch_queue_starts 15

Deepclusterv2：How to dispaly the results？

I have used main_deepclusterv2.py train a model on my custom dataset, I want to see the results of cluster. For example, put the images in same class into the same folder. How to do that? Thank you.

RuntimeError: No rendezvous handler for ://

Hello

I am trying to train a custom dataset.

I am trying to train in an environment where there is one gpu.

What's the problem?

Also, can you provide a tutorial for testing on a custom dataset?

export NGPU=1; python -m torch.distributed.launch --nproc_per_node=$NGPU main_swav.py --data_path /home/ubuntu/merge/src/swav/data/train --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15
Traceback (most recent call last):
File "main_swav.py", line 374, in
main()
File "main_swav.py", line 122, in main
init_distributed_mode(args)
File "/home/ubuntu/merge/src/swav/src/utils.py", line 65, in init_distributed_mode
rank=args.rank,
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 391, in init_process_group
init_method, rank, world_size, timeout=timeout
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 79, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for ://
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/ubuntu/anaconda3/envs/swav/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/swav/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--data_path', '/home/ubuntu/merge/src/swav/data/train', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1.

Training loss history

Hi,

Thank you so much for sharing your codes!

May I know if you have a copy of your loss record?

When I trained your model from scratch, the loss was stacked around 8 for the first 2 epochs. (I am still training the model)

Is it the same for you?

Thank you.

Benchmarking on CIFAR-10

Hi,

I wanted to benchmark SwAV on CIFAR-10.
Is there any recommended configuration for CIFAR-10? For eg:

The number of prototypes could be set to 50, 100 etc.
Since CIFAR-10 images are 32x32, multicrop can be avoided.

Also, do you plan to publish any pretrained model on CIFAR-10?

A design problem which may lead model to cheat

Hi
The algorithm design: view A --> code x, view B ---> code y, then let view B predict code x and view A predicts code y.
But in some experiments (CIFAR_10 dataset) , found that, the model will learn to cheat by predicting nearly same embeddings(z in paper) for all images including their augmentations. In this way the loss will decrease rapidly, but model learns wrong.

The full Pretrained model with MLP head?

Experiments of linear classification on different numbers of GPUs

Hello,

Thanks for your inspiring paper and code.

I trained SwAV with a batch size of 4096 for 200 epochs and then trained a linear classifier with your default setting (batch size of 256 on 8 GPUs), achieving 74.5% top-1 accuracies. I wanted to fasten the linear classifier training process, so I tried to train it with a batch size of 2048 on 64 GPUs and left all the other settings the same. I observed 73.3% in terms of top-1, showing a slight drop from your default setting.

So I am wondering how to train the linear classifier on 64 GPUs and achieve similar performance as training on 8 GPUs, e.g. tuning some hyper-parameters? Looking forward to your reply.

Thanks

How is it ensured that only full resolution views are used for code computation?

Referring to this section of the paper:

In the code, this part is supposedly handled with crops_for_assign:

for i, crop_id in enumerate(args.crops_for_assign):
            with torch.no_grad():
                out = output[bs * crop_id: bs * (crop_id + 1)]

                # time to use the queue
                if queue is not None:
                    if use_the_queue or not torch.all(queue[i, -1, :] == 0):
                        use_the_queue = True
                        out = torch.cat((torch.mm(
                            queue[i],
                            model.module.prototypes.weight.t()
                        ), out))
                    # fill the queue
                    queue[i, bs:] = queue[i, :-bs].clone()
                    queue[i, :bs] = embedding[crop_id * bs: (crop_id + 1) * bs]
                # get assignments
                q = torch.exp(out / args.epsilon).t()
                q = distributed_sinkhorn(q, args.sinkhorn_iterations)[-bs:]

I am not sure how this indexing out = output[bs * crop_id: bs * (crop_id + 1)] ensures we are only operating on full resolution views (224/160)?

[resnet50w2]Size mismatch between checkpoint model and config model

I tried to load checkpoint downloaded from resnet50w2 to do some experiments, but an error occurred. It seems the model you published doesn't match the config model in resnet.py for resnet50w2.

size mismatch for module.layer1.0.conv1.weight: copying a param with shape torch.Size([128, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]). size mismatch for module.layer1.0.bn1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.bn1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.bn1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.bn1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for module.layer1.0.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for module.layer1.0.bn2.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).

The script is swav_RN50w2_400ep_pretrain.sh and the checkpoint is resnet50w2.

fixed seed, but no reproducability

Hi Mathilde,
Thanks for your great work. I enjoyed reading your paper!

When running main_swav.py, I experience no reproducibility of the results (although the seeds are set nicely in utils.fix_random_seeds).

RUN1:
INFO - 12/07/20 09:38:11 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037)
INFO - 12/07/20 09:38:30 - 0:00:25 - Epoch: [0][50] Loss 2.9354 (3.0861)

RUN2:
INFO - 12/07/20 09:37:31 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037)
INFO - 12/07/20 09:37:51 - 0:00:25 - Epoch: [0][50] Loss 2.9074 (3.0710)

Do you experience the same? If yes, do you have a clue why that is the case (maybe distributed training)?

Thanks in advance!

Finetuning with 1% and 10%

Hi,

Thanks for sharing this awesome repo.

I was wondering in the eval script how is the finetuning on 1% and 10% of imagenet done?

Here it looks like the entire folder is used for the dataset:

swav/eval_linear.py

Line 103 in 139623b

train_dataset = datasets.ImageFolder(os.path.join(args.data_path, "train"))

Was 1% and 10% of train imagenet preselected and placed into separate folders?

Non-distributed training

If I run main_deepclusterv2.py in a non-distributed training mode, what modifications do I need to make？

Traceback (most recent call last):
File "main_deepclusterv2.py", line 426, in
main()
File "main_deepclusterv2.py", line 119, in main
init_distributed_mode(args)
File "/remote_projects/ImageSimilarity/swav/src/utils.py", line 56, in init_distributed_mode
args.rank = int(os.environ["RANK"])
File "/root/software/anaconda3/envs/similarity/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK'

Display clustering results?

Hi,
how can I display clustering results? When I forward an image through a pretrained network, I get a vector of numbers with the length of the number of prototype vectors. Do I have to pass this vector to the distributed_sinkhorn function?
The distributed_sinkhorn function returns the probabilities for every cluster, is this correct?

Regarding including our TensorFlow implementation

Hi @mathildecaron31.

I was wondering if you'd be interested in including our (@ayulockin and mine) implementation (in TensorFlow) of SwAV in the README. Many folks might find it helpful.

Pre-trained models (256 batch size, 200 epochs) without multi-crops

Hi, thanks for your excellent work! Could you kindly release the model and results pre-trained with batch size of 256 for 200 epochs without multi-crops? I am asking because this seems a commonly used configuration in the literature but it is missing both in the paper and the repo. Some researchers also raised this issue before but it seems that it has not be resolved. Due to the limited computing resources, I think releasing this model would help a lot. Thank you very much!

About process_group in SyncBN

Hi,

I noticed that you adopted 8 GPUs as a group in SyncBN (https://github.com/facebookresearch/swav/blob/master/main_swav.py#L158) when training with a large batch size of 4096, i.e. 512 training samples in a group for sync batchnorm. I am wondering that 1) why don't you use global syncBN for training and 2) how much does it affect?

Thanks!