sense-gvt / declip Goto Github PK
View Code? Open in Web Editor NEWSupervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Hi, I want to reproduce the zero-shot result of DeClip-88M under ResNet50 in ImageNet-1K (whose performance is 62.5 in the table). But the evaluation result I got is 7.264 which is too low. But the result of ViT-B32 is correct. And I found a problem during loading the ResNet50 checkpoint:
size mismatch for module.logit_scale: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).
I didn't change any code of the model.
Another question is that why run.sh
of declip-88m-resnet50 uses clip_solver while other run.sh
files use declip_solver? I use declip_solver to do the evaluation for DeClip-88M-ResNet50 by replacing the yaml file. The following figure is the results reproduced on my own compute resources:
Do you have any ideas? Thanks!
What is the preprocess like ? Such as the resize shape, mean and std
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.8/site-packages/Spring_Prototype-3.0.0b0-py3.8.egg/prototype/solver/declip_solver.py", line 22, in
from prototype.model import model_entry
File "/opt/conda/lib/python3.8/site-packages/Spring_Prototype-3.0.0b0-py3.8.egg/prototype/model/init.py", line 5, in
from .declip import declip_res50, declip_vitb32
File "/opt/conda/lib/python3.8/site-packages/Spring_Prototype-3.0.0b0-py3.8.egg/prototype/model/declip.py", line 11, in
from .image_encoder.visual_transformer import visual_transformer_B32, visual_transformer_B16
ModuleNotFoundError: No module named 'prototype.model.image_encoder'
请问这个问题怎么解决?
It would be awesome if you'd release your code for preprocessing and training :)
I use the nvidia-dali-cuda110 with version 1.14.0, and get the error: module 'nvidia.dali.ops' has no attribute 'McReader'
In the requirements, the need nvidia-dali is 0.14, but there is no nvidia-dali=0.14.
Hi~ @zlccccc @SlotherCui
I notice that there isn't BPE file here. In your token embedding weight, the shape is [49409, 512], but the shape in CLIP is [49408, 512]. Are yours BPE file consistent with CLIP?
If I missed something, please comment~ Thanks a lot!
The get_started.md is None. When will you upload this file?
Hi, authors! Thanks for your awesome work!
I'm confused about the usage of fused AdamW_SGD optimizer as described in paper Appendix C, paragraph implementation details.
It's said you use AdamW with 1e-3 lr and 0.05 wd for ViT vision encoder, and SGD with 0.02 lr and 1e-4 wd for text transformer.
However, in your configuration, ViT-B/32 is also optimized by SGD instead of fused AdamW_SGD. So which optimizer is your choice in experiment actually?
And, if you use fused AdamW_SGD optimizer just as said in paper, why did you use it? CLIP only uses AdamW optimizer. Is this beneficial to CLIP?
Looking forward for your reply!😁
Hey,
Thanks for the great work.
I noted that the 15M subset of YFCC you use is significantly different from the subset that OpenAI uses and the Quality not Quantity paper uses. To compare the proportion of matching samples, I just did a quick test and saw that the overall stats for the three datasets are:
declip json: 15,388,848 samples
quality-not-quantity csv: 14,825,236 samples
open-ai csv: 14,829,396 samples
The difference between the quality-not-quantity and openai csvs can simply be attributed to link rot.
Further, when I take an intersection between your photo-ids and the photo-ids used by OpenAI / Quality not Quantity:
declip intersection with openai: 6,642,077 matches
declip intersection with quality-not-quantity: 6,640,264 matches
It is interesting that there are still so many matches (~40%). I just wanted to add this information here since I found it quite hard to figure out the exact differences and intersections between the different YFCC subsets. So in-case people are trying to use YFCC15M subsets based on the OpenAI-CLIP subsets, this is useful to keep in mind that the DeCLIP subset is substantially different. This is also mentioned in the DeCLIP paper, appendix F and table 8.
The code here
https://github.com/Sense-GVT/DeCLIP/blob/main/prototype/model/image_encoder/modified_resnet.py#L103
calls a non-defined method (new_group
)
DeCLIP/prototype/model/clip.py
Lines 136 to 146 in 9d9e25d
作者您好,我采用自己的数据集 finetune yfcc15m 的预训练权重,loss一直不收敛,可能的原因是什么呢?
It seems that the url link to download the models weights doesn't work
Hi, I just realized that you have Filip and DeFilip implementations, very interesting.
Do you already have results How they compare with Clip and DeCLIP?
With respect benchmark scores and compute efficiency? :)
May I also ask about the details of the hardware you used for training?
… And - do you have code for multinode training?
Can you provide the installation instructions or dockerfile?
Hello, I found that you used 32 A100s. Would you like to ask how much memory is the A100? 80 G?
If we use 8 cards 32G V100, how long will it take to complete YFCC-15M training?
In fact, after reading the paper, I still don't understand, I don't know whether this code is trained from scratch or frozen some parameters fine-tuning of CLIP. Sorry, I just started to learn deep learning, every kind person can tell me the answer
Thank you for this exciting repository. Can you provide a simple example of how I might be able to load the models you provide in your model zoo?
Something along the lines of what is provided by the timm (pytorch-image-models) model repository:
import timm
model_name = 'ghostnet_100'
model = timm.create_model(model_name, pretrained=True)
model.eval()
from timm.data.transforms_factory import create_transform
from timm.data import resolve_data_config
config = resolve_data_config({}, model = model_name)
transform = create_transform(**config)
Ideally, this would allow us to use the models in a jupyter notebook or other interactive context.
Thanks in advance!
Hi, thanks for the great work. After downloading the provided YFCC15M label file, I can see there are three keys caption
filename
url
in each one of the labels. how should we find the corresponding YFCC image according to your label? i.e., which key should we use to align with YFCC data?
I use the followed command to run zero-shot evaluation:
python -m prototype.solver.clip_solver --config ./experiments/declip_experiments/declip88m/declip88m_r50_declip/config.yaml --evaluate
And then it report this error:
import FusedFP16SGD failed, FusedFP16AdamW replace slurm Traceback (most recent call last): File "/opt/conda/envs/openmmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/opt/conda/envs/openmmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/apdcephfs/share_1227775/mingzhenzhu/DeCLIP/prototype/solver/clip_solver.py", line 769, in <module> main() File "/apdcephfs/share_1227775/mingzhenzhu/DeCLIP/prototype/utils/dist.py", line 11, in wrapper dist_init() File "/apdcephfs/share_1227775/mingzhenzhu/DeCLIP/prototype/utils/dist.py", line 21, in dist_init proc_id = int(os.environ['SLURM_PROCID']) File "/opt/conda/envs/openmmlab/lib/python3.7/os.py", line 681, in __getitem__ raise KeyError(key) from None KeyError: 'SLURM_PROCID'
How to fix it? Thanks!
Are there any plans on hosting the weights on huggingface? I am willing to help if need be.
In the paper《SUPERVISION EXISTS EVERYWHERE: A DATA EFFICIENT CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING PARADIGM》, training on the YFCC_V2 dataset, CLIP and DECLIP can get 31.3 and 41.9 zero-shot performance of Imagenet, but it is reported 37.3 and 44.4 in the paper 《Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision》. So what's the difference between them?
hello, thank you for your work. And I wonder how to deal with this problem?
how to install dataflow==1.2.1,
when I intall dataflow==1.2.1 with pip, the error is that:
ERROR: Could not find a version that satisfies the requirement dataflow==1.2.1 (from versions: 0.1.1)
ERROR: No matching distribution found for dataflow==1.2.1
Hi, when I run run.sh, I got errors:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.