msight-tech / research-ms-loss Goto Github PK

View Code? Open in Web Editor NEW

489.0 489.0 75.0 129 KB

MS-Loss: Multi-Similarity Loss for Deep Metric Learning

License: Other

Shell 1.02% Python 98.98%

research-ms-loss's People

Contributors

Stargazers

Watchers

Forkers

xf739645524 dwctod ypsprimer oscarzhangf bnu-wangxun leerock douxiaotian kktsubota chammika-become wayne980 wujx990 xialuxi xavierxhq kkechen ljl02521 imtiazziko melgor t-mac-curry swg209 fanyangmeng tsingjinyun aihgf li7819559 vitvicky syslot lld533 githubpgq tjddus9597 jzabroski jianfweng keshav47 guoshengcv mapleleafkiller kanshichao young00007 jawaechan chuan92 zhangxinyu-xyz youtang1993 tvkpz robotseye zhouxiaoxu brianlan caogang2018 peternara dongsijing megayeye djacobjiang fireae zhushaoquan xrosliang qianrenjian liuhl-source cparisien yangyangkiki mldl adamzdw yueyedeai celia0971 lindsey98 sbraggion qwerty6518 5ace michaelpdu aarashfeizi huyen-spec python-repository-hub tomgoh cpkingw wahyurahmaniar hexi-pixel dl-loss 1304268llss harsh20562 supvvi

research-ms-loss's Issues

No function named "build_memory_data"

in the XBM module, i can't seem to find the build memory data function.

Unused lr_mul for the backbone?

research-ms-loss/ret_benchmark/solver/build.py

Line 14 in b68507d

params += [{"params": [value], "lr_mul": lr_mul}]

There is not "lr_mul" in torch.optim, so the backbone's learning rate will be the same as embedding layer?

Cars-196 experiments settings

I cannot see any experiments setting for Cars-196, can you show the yaml config also?

When I use the CUB-200's yaml config to train the Cars-196(followed the paper's training split rule, 98 classes), but only got best R@1=78.4%(the embedding size is 512), much lower than 84.1% in your paper.

Thanks,
Adam

About Multi-label task , I am desire your reply.

Hello , I found this code can only run on one-label task and , I want to use MSLOSS to an image retrieval task , And When I deal with multi-label dataset , I am kind of ignorant now.

pos_pair_ = sim_mat[i][labels == labels[i]]
pos_pair_ = pos_pair_[pos_pair_ < 1 - ep]
neg_pair_ = sim_mat[i][labels != labels[i]]

this is a original code , What I am think first is that I want to change the part "label==label[i]" into label[i] ,but before this I input a onehot-label and I use (label = label @ label.t() > 0)
but I the loss is easy to get a INF and NAN
I have suspicion that it's my problem, so I'm asking you for advice

Models underfit on highly imbalanced dataset

I trained models with the same loss setting as mentioned in the paper; alpha=2 and beta=50. It seemed like the models can't produce good enough embedding features for the minority class (judging from the visualization with t-SNE), but they do obviously a better job for the majority class which led to poor classification results. I'd like to get some advice on how to adjust the hyperparameters or mining setting of this ms loss to better handle the highly imbalanced dataset (say having class 0 10x more samples than class 1). For additional details, I used the embedding size of 512 and the batch size of 8 (maximum capacity of my GPU because the image size is quite large).

Thanks in advance.

Pytorch implementation of ms-loss

class MultiSimilarityLoss(nn.Module):
    def __init__(self, configer=None):  
        super(MultiSimilarityLoss, self).__init__()
        self.is_norm = True
        self.eps = 0.1
        self.lamb = 1
        self.alpha = 2
        self.beta = 50
        
            
    def forward(self, inputs, targets):
        n = inputs.size(0)
        if self.is_norm:
            inputs = inputs / torch.norm(inputs, dim=1, keepdim=True)
        similari_matrix = inputs.matmul(inputs.t())
        mask = targets.expand(n, n).eq(targets.expand(n, n).t())
        
        loss = None
        for i in range(n):
            temp_sim, temp_mask = similari_matrix[i], mask[i]
            min_ap, max_an = temp_sim[temp_mask].min(), temp_sim[temp_mask==0].max()
            temp_AP = temp_sim[(temp_mask==1) & (temp_sim < max_an + self.eps)]       # may be tensor([])
            temp_AN = temp_sim[(temp_mask==0) & (temp_sim > min_ap - self.eps)]  # torch.sum(tensor([])) = tensor(0.)
            L1 = torch.log(1 + torch.sum(torch.exp(-self.alpha * (temp_AP - self.lamb)))) / self.alpha
            L2 = torch.log(1 + torch.sum(torch.exp(self.beta * (temp_AN - self.lamb)))) / self.beta
            L = L1 + L2
            if loss is None:
                loss = L
            else:
                loss += L
        loss /= n

        return loss

how is the PIXEL_MEAN and PIXEL_STD calculated?

First of all thank you for the great work.

In ret_benchmark/data/transforms/build.py, you have:
.......
normalize_transform = T.Normalize(mean=cfg.INPUT.PIXEL_MEAN,
std=cfg.INPUT.PIXEL_STD)
if is_train:
transform = T.Compose([
T.Resize(size=cfg.INPUT.ORIGIN_SIZE),
T.RandomResizedCrop(
scale=cfg.INPUT.CROP_SCALE,
size=cfg.INPUT.CROP_SIZE
),
T.RandomHorizontalFlip(p=cfg.INPUT.FLIP_PROB),
T.ToTensor(),
normalize_transform,
])
............

I wonder how is the PIXEL_MEAN and PIXEL_STD calculated? Are they calculated after Resize()、RandomResizedCrop(), RandomHorizontalFlip() and ToTensor()? Or they are calculated applying only ToTensor() (which converts a PIL image from [0, 255] to [0,1]) to all the pics in the dataset?

No module named 'ret_benchmark'

Hi, I tried to run the code. But is showed that "No module named 'ret_benchmark'". Why?

pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon] ?

Hi there, thanks for sharing the code and beautifule work!
In multi_similarity_loss.py line 35 ：
pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon]
why do we need this code ?
And what's the logic of using the output of avgpooling as the embeddings of network?

train.txt and test.txt

First of all, thanks you for a great work!

Can you upload train.txt and test.txt files for training CUB-200-2011 dataset?

能否修改成两分类？

我是想修改成异常检测，虽然有6种不同的产品，但只分了OK、和Anomaly两类。
我自己写了个dataloader，使每个batch里只有单一产品的两类样本且平均分布OK和Anomaly。OK标签为1，Anomaly标签为0。
修改了loss如下，但是训练时pos_pair_很快就变成0。是哪里改的不对吗？
再请教下，用XBM的那个库会不会更适合我的这个情况呢？
多谢！！！

@LOSS.register('ms_loss')
class MultiSimilarityLoss(nn.Module):
    def __init__(self, cfg):
        super(MultiSimilarityLoss, self).__init__()
        self.thresh = 0.5
        self.margin = 0.1
        self.scale_pos = cfg.LOSSES.MULTI_SIMILARITY_LOSS.SCALE_POS
        self.scale_neg = cfg.LOSSES.MULTI_SIMILARITY_LOSS.SCALE_NEG

    def forward(self, feats, labels):
        assert feats.size(0) == labels.size(0), \
            f"feats.size(0): {feats.size(0)} is not equal to labels.size(0): {labels.size(0)}"
        batch_size = feats.size(0)
        sim_mat = torch.matmul(feats, torch.t(feats))

        epsilon = 1e-5
        loss = list()

        for i in range(batch_size):
            pos_pair_ = sim_mat[i][labels == 1]  # 此处修改
            pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon]
            neg_pair_ = sim_mat[i][labels == 0]  # 此处修改

            neg_pair = neg_pair_[neg_pair_ + self.margin > min(pos_pair_)]
            pos_pair = pos_pair_[pos_pair_ - self.margin < max(neg_pair_)]

            if len(neg_pair) < 1 or len(pos_pair) < 1:
                continue

            # weighting step
            pos_loss = 1.0 / self.scale_pos * torch.log(
                1 + torch.sum(torch.exp(-self.scale_pos * (pos_pair - self.thresh))))
            neg_loss = 1.0 / self.scale_neg * torch.log(
                1 + torch.sum(torch.exp(self.scale_neg * (neg_pair - self.thresh))))
            loss.append(pos_loss + neg_loss)

        if len(loss) == 0:
            return torch.zeros([], requires_grad=True)

        loss = sum(loss) / batch_size
        return loss

_prepare_batch function in RandomIdentitySampler select just positive pair if available!!

Hi,
I read your good paper with title "Multi-similarity loss with general pair weighting for deep metric learning" and based on the content of that paper, I cannot understand logic of "_prepare_batch" function in "RandomIdentitySampler", because you select just positive pair in it (if available).
This make your method bad training!
for convenient i copy mentioned part bellow:
for label in self.labels:
idxs = copy.deepcopy(self.label_index_dict[label])
#load all data indexes that equal to label to idxs
if len(idxs) < self.K:
idxs.extend(np.random.choice(idxs, size=self.K - len(idxs), replace=True))

Overfitting on "iteration" parameters?

Hi, the unify framework for all knids of paired-loss proposed in the paper is great, while i found that it appeared that the best "test recall" has been actually decided by val_dataset, which refenced to the raw code below:

According to the fig above, "val datatset" actually also plays a role of "test dataset", which means "test dataset" is visible during training.
So does it seems like choosing a "best train iteration" parameter, which is a risk of overfitting on training hyperparameters?
(I have found similar operation in several other papers, and i knew there was a lack of test dataset building the dataset, such as the general protocal "construct query+gallery based on the raw val+test split in DeepFashion")

where is the ms-loss

How to understand the recall@K in your code?

@mscottml
First of all thank you for sharing the code, this is really great work.

I ran the experiment and got good results, but I can't understand the implementation of computational recall @ K in your code. Can you explain it to me? The two bold lines are shown below.

`def recall_k(self, k=1):
m = len(self.sim_mat)
match_counter = 0
for i in range(m):
pos_sim = self.sim_mat[i][self.gallery_labels == self.query_labels[i]]
neg_sim = self.sim_mat[i][self.gallery_labels != self.query_labels[i]]
thresh = np.sort(pos_sim)[-2] if self.is_equal_query else np.max(pos_sim)

        ****if np.sum(neg_sim > thresh) < k:   #  The  lines that I can not understand.
            match_counter += 1**** 

 return float(match_counter) / m`

Thank you!

I implemented ms-loss using tensorflow.

https://github.com/geonm/tf_ms_loss

I uploaded only the ms-loss codes

I'm wondering whether my implementation is correct.

Please comment here if you guys find something wrong.

什么时候开始发布第一个release版本？

Detailed setting of hyper parameters

The setting of hyper parameters we can refer is just the example.yaml, but as you said in your paper, you have experimented on more than one dataset. How to set the hyper parameters for those datasets? Please give more config file for reference, thanks.

NMI results (or trained models)

Hi,

Do you have the NMI results of your methods in the datasets you experiment on the paper? If not, do you have the trained models?

I was interesting to compare our method with yours in the NMI metric.

Thanks in advance!

Loss is not stable and sometimes Nan

I use resnet50 + MS loss to train my own dataset, but sometimes loss will be Nan, it seems that the loss is not very stable

Use a for loop in loss calculation is a little bit slow.

Use a for loop in loss calculation is a little bit slow.
You can find a way to remove the for loop.
In my case, only pairs on the diagonal are positive, so I remove the for loop as follows.

simi_mat = torch.matmul(y1, torch.t(y2))
simi_sub = simi_mat - ms_gama
pos_pair_sub = torch.unsqueeze(torch.diag(simi_sub), 1)
neg_pair_sub_plus1 = simi_sub
neg_pair_sub_plus1[range(batch_size), range(batch_size)] = 0
pos_loss = torch.log(1 + torch.sum(torch.exp(-ms_alpha * pos_pair_sub), dim = 1)) / ms_alpha
neg_loss = torch.log(torch.sum(torch.exp(ms_beta * neg_pair_sub_plus1), dim = 1)) / ms_beta
loss = torch.mean(pos_loss + neg_loss)

The link for downloading the dataset can't be opened in the normal way.

Hyperparams for CARS?

First of all thank you for sharing the code, this is really great work.

I am trying to reproduce the results on other datasets (CARS, SOP, In-Shop), both with resnet50 and inception-bn, could you please share the hyperparams that you used? I tried the default ones and a few tweaking but I could not get the numbers in the paper.

Thank you!

what is the release date of ms-loss?

dear @mscottml ，i have just read the paper "Multi-Similarity Loss with General Pair Weighting
for Deep Metric Learning" .It's a great job I have ever seen in metrics learning.
Would you please release the code and show more details ?Thank you very much.