The crepe from raivnlab

Possible need to normalize the image and text features while calculating systematicity results.

Hi, thanks for the interesting work and to make this repo open source for reproducing results!

I found that productivity calculations always normalize the image and text features. for eg: in file crepe_prod_eval_clip.py

            if one2many:
                image_emb = model.encode_image(images)
                image_emb /= image_emb.norm(dim = -1, keepdim = True)
                
                text_emb = model.encode_text(texts)
                text_emb /= text_emb.norm(dim = -1, keepdim = True)

I was wondering if the same is required for systematicity as well, since its a common practice while training/inference using CLIP? currently I see there is no normalization in main.py file:

            if one2many:
                image_features = model(images, None)
                all_text_features = []
                for text in texts:
                    text_features = model(None, text)
                    all_text_features.append(text_features)

there might be some change in results. I tested with seen and unseen compunds as follows:
I calculated the seen and unseen compunds accuracy using rn50-quickgelu-cc12m checkpoint, and got the following results:

seen = {'image_to_text_mean_rank': 2.238098900253788, 'image_to_text_rank_std': 1.5897088335794651, 'image_to_text_median_rank': 2.0, 'image_to_text_R@1': 0.4813120049219411, 'image_to_text_R@1_std': 0.4996506367853067, 'image_to_text_R@3': 0.8016611551180497, 'image_to_text_R@3_std': 0.39874872726172306, 'image_to_text_R@5': 0.9418211182034915, 'image_to_text_R@5_std': 0.23408139505184175, 'image_to_text_R@10': 1.0, 'image_to_text_R@10_std': 0.0}

unseen = {'image_to_text_mean_rank': 2.291608586562587, 'image_to_text_rank_std': 1.5839114294531287, 'image_to_text_median_rank': 2.0, 'image_to_text_R@1': 0.4549763033175355, 'image_to_text_R@1_std': 0.49796874072279423, 'image_to_text_R@3': 0.7925843323111235, 'image_to_text_R@3_std': 0.4054558033695585, 'image_to_text_R@5': 0.9439643155840536, 'image_to_text_R@5_std': 0.2299906226087988, 'image_to_text_R@10': 1.0, 'image_to_text_R@10_std': 0.0}

Without normalization, the results I am getting are as follows

seen = {'image_to_text_mean_rank': 1.999884642005691, 'image_to_text_rank_std': 1.5084804704626469, 'image_to_text_median_rank': 1.0, 'image_to_text_R@1': 0.5763669922325617, 'image_to_text_R@1_std': 0.4941336686538895, 'image_to_text_R@3': 0.8412673998308082, 'image_to_text_R@3_std': 0.3654265477667424, 'image_to_text_R@5': 0.9523571483503807, 'image_to_text_R@5_std': 0.21300941372697987, 'image_to_text_R@10': 1.0, 'image_to_text_R@10_std': 0.0}

unseen = {'image_to_text_mean_rank': 2.0384722609422914, 'image_to_text_rank_std': 1.4923549091268753, 'image_to_text_median_rank': 1.0, 'image_to_text_R@1': 0.5508781711736828, 'image_to_text_R@1_std': 0.49740467599131133, 'image_to_text_R@3': 0.8410928352383608, 'image_to_text_R@3_std': 0.3655894934883338, 'image_to_text_R@5': 0.9559520490660719, 'image_to_text_R@5_std': 0.205201678727174, 'image_to_text_R@10': 1.0, 'image_to_text_R@10_std': 0.0}

Thanks again for your work.

Code to generate the hard negatives

Hey, I was wondering if you could provide the code used to generate the hard negatives from this dataset.

raivnlab / crepe Goto Github PK

crepe's People

Contributors

Stargazers

Watchers

Forkers

crepe's Issues

Possible need to normalize the image and text features while calculating systematicity results.

Code to generate the hard negatives

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent