Code Monkey home page Code Monkey logo

declip's People

Contributors

slothercui avatar zlccccc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

declip's Issues

Performance of Declip-88M checkpoint

Hi, I want to reproduce the zero-shot result of DeClip-88M under ResNet50 in ImageNet-1K (whose performance is 62.5 in the table). But the evaluation result I got is 7.264 which is too low. But the result of ViT-B32 is correct. And I found a problem during loading the ResNet50 checkpoint:

size mismatch for module.logit_scale: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).

I didn't change any code of the model.

Another question is that why run.sh of declip-88m-resnet50 uses clip_solver while other run.sh files use declip_solver? I use declip_solver to do the evaluation for DeClip-88M-ResNet50 by replacing the yaml file. The following figure is the results reproduced on my own compute resources:
image

Do you have any ideas? Thanks!

环境问题

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.8/site-packages/Spring_Prototype-3.0.0b0-py3.8.egg/prototype/solver/declip_solver.py", line 22, in
from prototype.model import model_entry
File "/opt/conda/lib/python3.8/site-packages/Spring_Prototype-3.0.0b0-py3.8.egg/prototype/model/init.py", line 5, in
from .declip import declip_res50, declip_vitb32
File "/opt/conda/lib/python3.8/site-packages/Spring_Prototype-3.0.0b0-py3.8.egg/prototype/model/declip.py", line 11, in
from .image_encoder.visual_transformer import visual_transformer_B32, visual_transformer_B16
ModuleNotFoundError: No module named 'prototype.model.image_encoder'

请问这个问题怎么解决?

Training code

It would be awesome if you'd release your code for preprocessing and training :)

About the BPE file

Hi~ @zlccccc @SlotherCui
I notice that there isn't BPE file here. In your token embedding weight, the shape is [49409, 512], but the shape in CLIP is [49408, 512]. Are yours BPE file consistent with CLIP?
If I missed something, please comment~ Thanks a lot!

Fused AdamW_SGD optimizer issues

Hi, authors! Thanks for your awesome work!
I'm confused about the usage of fused AdamW_SGD optimizer as described in paper Appendix C, paragraph implementation details.
It's said you use AdamW with 1e-3 lr and 0.05 wd for ViT vision encoder, and SGD with 0.02 lr and 1e-4 wd for text transformer.
However, in your configuration, ViT-B/32 is also optimized by SGD instead of fused AdamW_SGD. So which optimizer is your choice in experiment actually?
And, if you use fused AdamW_SGD optimizer just as said in paper, why did you use it? CLIP only uses AdamW optimizer. Is this beneficial to CLIP?
Looking forward for your reply!😁

Mismatched YFCC15M URL set compared to OpenAI's subset

Hey,

Thanks for the great work.

I noted that the 15M subset of YFCC you use is significantly different from the subset that OpenAI uses and the Quality not Quantity paper uses. To compare the proportion of matching samples, I just did a quick test and saw that the overall stats for the three datasets are:

declip json: 15,388,848 samples
quality-not-quantity csv: 14,825,236 samples
open-ai csv: 14,829,396 samples

The difference between the quality-not-quantity and openai csvs can simply be attributed to link rot.

Further, when I take an intersection between your photo-ids and the photo-ids used by OpenAI / Quality not Quantity:

declip intersection with openai: 6,642,077 matches
declip intersection with quality-not-quantity: 6,640,264 matches

It is interesting that there are still so many matches (~40%). I just wanted to add this information here since I found it quite hard to figure out the exact differences and intersections between the different YFCC subsets. So in-case people are trying to use YFCC15M subsets based on the OpenAI-CLIP subsets, this is useful to keep in mind that the DeCLIP subset is substantially different. This is also mentioned in the DeCLIP paper, appendix F and table 8.

What is AllGather for. Why use ALLGather.

if self.training and self.use_allgather or all_gather:
gathered_image_features = self.all_gather(image_features)
gathered_text_features = self.all_gather(text_features)
logits_per_image = logit_scale * image_features @ gathered_text_features.t()
logits_per_text = logit_scale * text_features @ gathered_image_features.t()
else:
logits_per_image = logit_scale * image_features @ text_features.t()
logits_per_text = logit_scale * text_features @ image_features.t()
return logits_per_image, logits_per_text

weights of models

It seems that the url link to download the models weights doesn't work

Filip & DeFILIP? :)

Hi, I just realized that you have Filip and DeFilip implementations, very interesting.

Do you already have results How they compare with Clip and DeCLIP?
With respect benchmark scores and compute efficiency? :)

May I also ask about the details of the hardware you used for training?

… And - do you have code for multinode training?

How long did it take to train at YFCC15M-V2?

Hello, I found that you used 32 A100s. Would you like to ask how much memory is the A100? 80 G?
If we use 8 cards 32G V100, how long will it take to complete YFCC-15M training?

train

In fact, after reading the paper, I still don't understand, I don't know whether this code is trained from scratch or frozen some parameters fine-tuning of CLIP. Sorry, I just started to learn deep learning, every kind person can tell me the answer

worked (simple) example of loading model and transforms?

Thank you for this exciting repository. Can you provide a simple example of how I might be able to load the models you provide in your model zoo?

Something along the lines of what is provided by the timm (pytorch-image-models) model repository:

import timm
model_name = 'ghostnet_100'
model = timm.create_model(model_name, pretrained=True)
model.eval()

from timm.data.transforms_factory import create_transform
from timm.data import resolve_data_config
    
config = resolve_data_config({}, model = model_name)
transform = create_transform(**config)

Ideally, this would allow us to use the models in a jupyter notebook or other interactive context.

Thanks in advance!

Filter YFCC data

Hi, thanks for the great work. After downloading the provided YFCC15M label file, I can see there are three keys caption filename url in each one of the labels. how should we find the corresponding YFCC image according to your label? i.e., which key should we use to align with YFCC data?

KeyError: 'SLURM_PROCID'

I use the followed command to run zero-shot evaluation:
python -m prototype.solver.clip_solver --config ./experiments/declip_experiments/declip88m/declip88m_r50_declip/config.yaml --evaluate
And then it report this error:
import FusedFP16SGD failed, FusedFP16AdamW replace slurm Traceback (most recent call last): File "/opt/conda/envs/openmmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/opt/conda/envs/openmmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/apdcephfs/share_1227775/mingzhenzhu/DeCLIP/prototype/solver/clip_solver.py", line 769, in <module> main() File "/apdcephfs/share_1227775/mingzhenzhu/DeCLIP/prototype/utils/dist.py", line 11, in wrapper dist_init() File "/apdcephfs/share_1227775/mingzhenzhu/DeCLIP/prototype/utils/dist.py", line 21, in dist_init proc_id = int(os.environ['SLURM_PROCID']) File "/opt/conda/envs/openmmlab/lib/python3.7/os.py", line 681, in __getitem__ raise KeyError(key) from None KeyError: 'SLURM_PROCID'
How to fix it? Thanks!

Different results in 《Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision》 and 《SUPERVISION EXISTS EVERYWHERE: A DATA EFFICIENT CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING PARADIGM》

In the paper《SUPERVISION EXISTS EVERYWHERE: A DATA EFFICIENT CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING PARADIGM》, training on the YFCC_V2 dataset, CLIP and DECLIP can get 31.3 and 41.9 zero-shot performance of Imagenet, but it is reported 37.3 and 44.4 in the paper 《Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision》. So what's the difference between them?

how to install dataflow==1.2.1

how to install dataflow==1.2.1,
when I intall dataflow==1.2.1 with pip, the error is that:

ERROR: Could not find a version that satisfies the requirement dataflow==1.2.1 (from versions: 0.1.1)
ERROR: No matching distribution found for dataflow==1.2.1

lacking module

Hi, when I run run.sh, I got errors:

  1. ModuleNotFoundError: No module named 'springvision'
  2. KeyError: 'SLURM_PROCID'
  3. KeyError: 'SLURM_NTASKS'
  4. KeyError: 'SLURM_NODELIST'
    Could you tell me how to fix them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.