Code Monkey home page Code Monkey logo

research-ms-loss's People

Contributors

chammika-become avatar dependabot[bot] avatar kktsubota avatar mscottml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

research-ms-loss's Issues

Cars-196 experiments settings

I cannot see any experiments setting for Cars-196, can you show the yaml config also?

When I use the CUB-200's yaml config to train the Cars-196(followed the paper's training split rule, 98 classes), but only got best R@1=78.4%(the embedding size is 512), much lower than 84.1% in your paper.

Thanks,
Adam

About Multi-label task , I am desire your reply.

Hello , I found this code can only run on one-label task and , I want to use MSLOSS to an image retrieval task , And When I deal with multi-label dataset , I am kind of ignorant now.

pos_pair_ = sim_mat[i][labels == labels[i]]
pos_pair_ = pos_pair_[pos_pair_ < 1 - ep]
neg_pair_ = sim_mat[i][labels != labels[i]]

this is a original code , What I am think first is that I want to change the part "label==label[i]" into label[i] ,but before this I input a onehot-label and I use (label = label @ label.t() > 0)
but I the loss is easy to get a INF and NAN
I have suspicion that it's my problem, so I'm asking you for advice

Models underfit on highly imbalanced dataset

I trained models with the same loss setting as mentioned in the paper; alpha=2 and beta=50. It seemed like the models can't produce good enough embedding features for the minority class (judging from the visualization with t-SNE), but they do obviously a better job for the majority class which led to poor classification results. I'd like to get some advice on how to adjust the hyperparameters or mining setting of this ms loss to better handle the highly imbalanced dataset (say having class 0 10x more samples than class 1). For additional details, I used the embedding size of 512 and the batch size of 8 (maximum capacity of my GPU because the image size is quite large).

Thanks in advance.

Pytorch implementation of ms-loss

class MultiSimilarityLoss(nn.Module):
    def __init__(self, configer=None):  
        super(MultiSimilarityLoss, self).__init__()
        self.is_norm = True
        self.eps = 0.1
        self.lamb = 1
        self.alpha = 2
        self.beta = 50
        
            
    def forward(self, inputs, targets):
        n = inputs.size(0)
        if self.is_norm:
            inputs = inputs / torch.norm(inputs, dim=1, keepdim=True)
        similari_matrix = inputs.matmul(inputs.t())
        mask = targets.expand(n, n).eq(targets.expand(n, n).t())
        
        loss = None
        for i in range(n):
            temp_sim, temp_mask = similari_matrix[i], mask[i]
            min_ap, max_an = temp_sim[temp_mask].min(), temp_sim[temp_mask==0].max()
            temp_AP = temp_sim[(temp_mask==1) & (temp_sim < max_an + self.eps)]       # may be tensor([])
            temp_AN = temp_sim[(temp_mask==0) & (temp_sim > min_ap - self.eps)]  # torch.sum(tensor([])) = tensor(0.)
            L1 = torch.log(1 + torch.sum(torch.exp(-self.alpha * (temp_AP - self.lamb)))) / self.alpha
            L2 = torch.log(1 + torch.sum(torch.exp(self.beta * (temp_AN - self.lamb)))) / self.beta
            L = L1 + L2
            if loss is None:
                loss = L
            else:
                loss += L
        loss /= n

        return loss  

how is the PIXEL_MEAN and PIXEL_STD calculated?

First of all thank you for the great work.

In ret_benchmark/data/transforms/build.py, you have:
.......
normalize_transform = T.Normalize(mean=cfg.INPUT.PIXEL_MEAN,
std=cfg.INPUT.PIXEL_STD)
if is_train:
transform = T.Compose([
T.Resize(size=cfg.INPUT.ORIGIN_SIZE),
T.RandomResizedCrop(
scale=cfg.INPUT.CROP_SCALE,
size=cfg.INPUT.CROP_SIZE
),
T.RandomHorizontalFlip(p=cfg.INPUT.FLIP_PROB),
T.ToTensor(),
normalize_transform,
])
............

I wonder how is the PIXEL_MEAN and PIXEL_STD calculated? Are they calculated after Resize()、RandomResizedCrop(), RandomHorizontalFlip() and ToTensor()? Or they are calculated applying only ToTensor() (which converts a PIL image from [0, 255] to [0,1]) to all the pics in the dataset?

pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon] ?

Hi there, thanks for sharing the code and beautifule work!
In multi_similarity_loss.py line 35 :
pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon]
why do we need this code ?
And what's the logic of using the output of avgpooling as the embeddings of network?

train.txt and test.txt

First of all, thanks you for a great work!

Can you upload train.txt and test.txt files for training CUB-200-2011 dataset?

能否修改成两分类?

我是想修改成异常检测,虽然有6种不同的产品,但只分了OK、和Anomaly两类。
我自己写了个dataloader,使每个batch里只有单一产品的两类样本且平均分布OK和Anomaly。OK标签为1,Anomaly标签为0。
修改了loss如下,但是训练时pos_pair_很快就变成0。是哪里改的不对吗?
再请教下,用XBM的那个库会不会更适合我的这个情况呢?
多谢!!!

@LOSS.register('ms_loss')
class MultiSimilarityLoss(nn.Module):
    def __init__(self, cfg):
        super(MultiSimilarityLoss, self).__init__()
        self.thresh = 0.5
        self.margin = 0.1
        self.scale_pos = cfg.LOSSES.MULTI_SIMILARITY_LOSS.SCALE_POS
        self.scale_neg = cfg.LOSSES.MULTI_SIMILARITY_LOSS.SCALE_NEG

    def forward(self, feats, labels):
        assert feats.size(0) == labels.size(0), \
            f"feats.size(0): {feats.size(0)} is not equal to labels.size(0): {labels.size(0)}"
        batch_size = feats.size(0)
        sim_mat = torch.matmul(feats, torch.t(feats))

        epsilon = 1e-5
        loss = list()

        for i in range(batch_size):
            pos_pair_ = sim_mat[i][labels == 1]  # 此处修改
            pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon]
            neg_pair_ = sim_mat[i][labels == 0]  # 此处修改

            neg_pair = neg_pair_[neg_pair_ + self.margin > min(pos_pair_)]
            pos_pair = pos_pair_[pos_pair_ - self.margin < max(neg_pair_)]

            if len(neg_pair) < 1 or len(pos_pair) < 1:
                continue

            # weighting step
            pos_loss = 1.0 / self.scale_pos * torch.log(
                1 + torch.sum(torch.exp(-self.scale_pos * (pos_pair - self.thresh))))
            neg_loss = 1.0 / self.scale_neg * torch.log(
                1 + torch.sum(torch.exp(self.scale_neg * (neg_pair - self.thresh))))
            loss.append(pos_loss + neg_loss)

        if len(loss) == 0:
            return torch.zeros([], requires_grad=True)

        loss = sum(loss) / batch_size
        return loss

_prepare_batch function in RandomIdentitySampler select just positive pair if available!!

Hi,
I read your good paper with title "Multi-similarity loss with general pair weighting for deep metric learning" and based on the content of that paper, I cannot understand logic of "_prepare_batch" function in "RandomIdentitySampler", because you select just positive pair in it (if available).
This make your method bad training!
for convenient i copy mentioned part bellow:
for label in self.labels:
idxs = copy.deepcopy(self.label_index_dict[label])
#load all data indexes that equal to label to idxs
if len(idxs) < self.K:
idxs.extend(np.random.choice(idxs, size=self.K - len(idxs), replace=True))

Overfitting on "iteration" parameters?

Hi, the unify framework for all knids of paired-loss proposed in the paper is great, while i found that it appeared that the best "test recall" has been actually decided by val_dataset, which refenced to the raw code below:
image
According to the fig above, "val datatset" actually also plays a role of "test dataset", which means "test dataset" is visible during training.
So does it seems like choosing a "best train iteration" parameter, which is a risk of overfitting on training hyperparameters?
(I have found similar operation in several other papers, and i knew there was a lack of test dataset building the dataset, such as the general protocal "construct query+gallery based on the raw val+test split in DeepFashion")

How to understand the recall@K in your code?

@mscottml
First of all thank you for sharing the code, this is really great work.

I ran the experiment and got good results, but I can't understand the implementation of computational recall @ K in your code. Can you explain it to me? The two bold lines are shown below.

`def recall_k(self, k=1):
m = len(self.sim_mat)
match_counter = 0
for i in range(m):
pos_sim = self.sim_mat[i][self.gallery_labels == self.query_labels[i]]
neg_sim = self.sim_mat[i][self.gallery_labels != self.query_labels[i]]
thresh = np.sort(pos_sim)[-2] if self.is_equal_query else np.max(pos_sim)

        ****if np.sum(neg_sim > thresh) < k:   #  The  lines that I can not understand.
            match_counter += 1**** 

 return float(match_counter) / m`

Thank you!

Detailed setting of hyper parameters

The setting of hyper parameters we can refer is just the example.yaml, but as you said in your paper, you have experimented on more than one dataset. How to set the hyper parameters for those datasets? Please give more config file for reference, thanks.

NMI results (or trained models)

Hi,

Do you have the NMI results of your methods in the datasets you experiment on the paper? If not, do you have the trained models?

I was interesting to compare our method with yours in the NMI metric.

Thanks in advance!

Use a for loop in loss calculation is a little bit slow.

Use a for loop in loss calculation is a little bit slow.
You can find a way to remove the for loop.
In my case, only pairs on the diagonal are positive, so I remove the for loop as follows.

simi_mat = torch.matmul(y1, torch.t(y2))
simi_sub = simi_mat - ms_gama
pos_pair_sub = torch.unsqueeze(torch.diag(simi_sub), 1)
neg_pair_sub_plus1 = simi_sub
neg_pair_sub_plus1[range(batch_size), range(batch_size)] = 0
pos_loss = torch.log(1 + torch.sum(torch.exp(-ms_alpha * pos_pair_sub), dim = 1)) / ms_alpha
neg_loss = torch.log(torch.sum(torch.exp(ms_beta * neg_pair_sub_plus1), dim = 1)) / ms_beta
loss = torch.mean(pos_loss + neg_loss)

Hyperparams for CARS?

First of all thank you for sharing the code, this is really great work.

I am trying to reproduce the results on other datasets (CARS, SOP, In-Shop), both with resnet50 and inception-bn, could you please share the hyperparams that you used? I tried the default ones and a few tweaking but I could not get the numbers in the paper.

Thank you!

what is the release date of ms-loss?

dear @mscottml ,i have just read the paper "Multi-Similarity Loss with General Pair Weighting
for Deep Metric Learning" .It's a great job I have ever seen in metrics learning.
Would you please release the code and show more details ?Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.