donkeyshot21 / cassle Goto Github PK
View Code? Open in Web Editor NEWOfficial repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)
License: MIT License
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)
License: MIT License
Hi,
Thanks a lot for your amazing work and releasing the code. I am trying to reproduce your Table 4 for sometime. I directly use the code and the scripts with NO modification.
For example, in this Table, BYOL fine-tuning on ImageNet-100 for 5-class incremental task performance is 66.0. Instead, I measured below <<60.0, at least 6% below. Please see the full results Table below if interested (a 5 x 5 Table).
Any idea what may be causing the gap? Is there any nuances in evaluation method? For example, for average accuracy, I simply take the mean of the below Table across all rows and colums (as also suggested by GEM, as you referenced).
Thanks a lot again for your response and your eye-opening work.
Hi,
Very interesting work.
Did you use the cleaned version of DomainNet or the original one?
The cleaned version excludes a lot of duplicate images.
Thanks
Hi,
I want to be sure that setting the first task of distillation process as zero isn't an error, since you have already learned this as SSL in normal process.
What can be the goal of this, please ?
Thanks
Hi all,
Is there any link where I can access to the checkpoint of model trained using Barlowtwins and VicREG ?
I would like to evaluate this approach using different models and need the trained last checkpoint of these models.
Thanks.
Hi,
This is an exciting and enlightening work.
I wonder where the data for training the classifier come from, for linear evaluation accuracy.
The training data of the current task?
Hi,
This is an exciting and enlightening work.
I am confused by the number of classifiers for Linear Evaluation Accuracy.
In the paper, you said, "For class-incremental and data-incremental, we use the task-agnostic setting, meaning that at evaluation time we do not assume to know the task ID". As I understand it, this means that you only maintain one classifier and continuously optimize it after learning each task for linear evaluation accuracy.
However, I found in #1 that you said, "as we operate in the class-incremental setting we train one linear classifier per task."
I would appreciate a clearer explanation.
Thanks.
Hello,
Congratulations on your excellent work. I have a question about the training setting.
Why did you load the checkpoint of task 0 before training cassle? I see the first task of cassle is trained without distillers. Then the setting is the same as the first task of finetuning. I think loading the checkpoint is unnecessary.
Looking forward to your reply,
Thanks.
Hi,
thanks for your interesting work.
I have problems reproducing the results.
Line 171 in b5b0929
Thanks
Hi! We are following your excellent work.
We would like to know more clearly the details of your experiments on CIRAR-100 to calculate Forward Transfer
, such as how the accuracy of the random model on each task is obtained.
If we understand correctly, since the random seed is fixed, then the accuracy of the random model should be fixed as well. Is it possible to provide the accuracy of the random model on five tasks for reference.
Thanks!
Hi,
I found this work very interesting and plan to work on similar topics. However I encounter some issues:
(1) For the fine-tuning example with Barlow Twins and CIFAR-100, should it be barlow.sh instead of barlow_distill.sh? Otherwise, we need to provide the pretrained model in order to successfully run the code.
(2) If I enable the the KNN online evaluation by setting disable_knn_eval = False, there was an issue showing empty test feature and expect argument in base.py line 432. I saw the previous closed issue saying the similar thing but it still appears even if I set a meaningful online_eval_batch_size = 256.
Thanks for your help!
I'm wondering if you provide the correct training procedure for DomainNet, as I see from main_pretrain.py, you only use the trainer.fit() on validation data, and it seems not a train but a validation. Moreover, is that DomainNet data is in same with the Dali data?
Hello,
I have read your paper. It is very impressive. I got a question for class incremental setting and am wondering to know if you can answer.
Did you train the classifier for each task only in the embedding training process? Or did you re-train all classifiers after all task embedding training processes finish? I see that the embedding of the previous task may change after the next task is trained. How does the old classifier trained by the old embedding format take this changed embedding? Your paper mentioned "a subset, e.g., 10% of the data". Did this mean using 10% of data to retrain the classifier at the very end?
Looking forward to your kind reply.
Thanks.
Hello,
Congrats for your paper! It touches very interesting questions and I'd love to further study the problem of CSSL!
I am trying to execute your script for training barlow twins python job_launcher.py --script bash_files/continual/cifar/barlow_distill.sh
, but I might have encountered a bug: if I train with the WeightedKNNClassifier for performance monitor, your code calls its forward here with only the train_features
and target_features
provided.
After that, the compute function breaks down here at line 89 because self.test_features
is an empty list.
Am I getting something wrong? I am working in a new conda env with setup as specified in your README file.
Muchas gracias!
Hi,
Thanks for your excellent work!
I'm curious about how to calculate the "Forward Transfer" after training. For example, I have successfully re-produced the class-il results for Fine-tuning and CaSSLe (with BYOL) on Cifar-100 but don't know how to directly check the FT results. Does it need a seperate run to obtain the "linear evaluation accuracy of a random network" as the paper stated?
BTW, just to be sure, is it right to directly check the "val_acc1" results of wandb board as the final linear evaluation accuracy?
Hi,
I have some questions regarding the calculation of upper and lower bounds, taking class incremental learning as an example:
In supervised learning, the lower bound (Fine-tuning) is performed in a task-specific manner, i.e., Task 1 fine-tuning -> Task 2 fine-tuning ...; whereas the upper bound (offline) involves training a model by integrating all the data together.
Regarding SimCLR, my understanding is that the lower bound (Fine-tuning) corresponds to SSL (Self-Supervised Learning) stage, where it undergoes Task 1 SSL -> Task 2 SSL ..., followed by Linear Evaluation. The upper bound (offline) involves performing SSL on the entire dataset and then conducting Linear Evaluation. I'm not sure if my understanding is correct ?
Hello,
Thank you for your fantastic project! I have some questions regarding model evaluation.
1)Taking CIFAR10 as an example, if there are 2 tasks, each with 5 classes, is the process shown in the following figure correct?
2)If it is correct, after the self-supervised continual learning part is completed, a 10-class classifier will be trained. When training this 10-class classifier, will all the data from all categories be used simultaneously?
3)Additionally, what is the overall process for Fine-tuning (using Table 2 as an example, Strategy 1 Fine-tuning)? Is it to replace CaSSLe with non-continual learning SSL method?
Thanks!
Hi,
I have a few questions about the simclr code.
cassle/cassle/losses/simclr.py
Line 21 in b5b0929
cassle/cassle/distillers/contrastive.py
Lines 65 to 68 in b5b0929
In the paper the distillation loss is applied to the two views independently. Based on the code above, does it mean that we should use them jointly to reproduce the result?
cassle/cassle/losses/simclr.py
Lines 30 to 33 in b5b0929
TIA
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.