Code Monkey home page Code Monkey logo

dividemix's People

Contributors

lijunnan1992 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dividemix's Issues

DivideMix without augmentation

First, thank you for sharing your code.
Could you please explain the details of the diviemix without augmentation in ablation study? Are unlabeled samples also without the data augmentation ?

Evaluation 50 classes WebVision

Hi,

Many thanks for sharing the code of your cool work!

I am trying to evaluate your approach in the ImageNet val set when training with first 50 classes of WebVision. I wonder how do you get the right 50 classes from ImageNet to map the first 50 classes of WebVision.

Some questions in Figure 2

hello,I'm repeating your work, and can you explain how the pdf corresponding to each loss in figure 2 was drawn, and which function in GMM was used?

some question about when noise_mode = asym

Hi Junnan,

I really like your paper and am running your code. But I have some question regarding how you deal with asymmetric noise.

  1. In line 24 in dataloader_cifar.py, did you just match similar class manually? Because I checked the Cifar official website, it seems you just match similar classes like cats and dogs, deers and horses, birds and planes. May I ask why you generate asymmetric data like this?

  2. I didn't find the asymmetric class transition for Cifar100 in your code and it is interesting that you didn't report asymmetric noise accuracy in you paper in Table 5. So can you tell me how you generate asymmetric data for Cifar100?

Looking forward to your reply!

Ask for hyperparameter for Table 6

Hello, thanks for your excellent work!
I'm trying to reproduce your experiments in Table 6. And I noticed that hyperparameters(e.g., lambda_u, threshold tau) are not mentioned in the original paper. Can you share the hyperparameter For Table 6?

a question about Warmup dataloader mode

in the process of warmup, the mode is 'all', it means it will take all the train data for training , and the train data include both
the labeled data and the unlabeled data, it may have some influence on the input of GMM, the unlabeled data with it's labled are trained in the process of warmup, what i say is right?

ImageNet val 50 classes

Dear Junnan,
I'm from NUS SoC and I want to follow this work. Can you provide synsets.txt for selection of ImageNet val 50 classes (as I saw in another issue)? Or can you provide a snippet to choose them? I'd appreciate that very much.

Data set used for module 'dataloader_cifar'

Hi. Thanks for your exciting paper to me!
I'd like to implement on your code but there is no dataset used in the module 'dataloader_cifar'.
When 'unpickle' the file, there is no data dict file so arise error.
I think the cifar dataset which used in this code will have some proper form like 'dictionary'.
Where could i get this dataset?

Lambda_u for CIFAR-100 on 40 asym noise

Hi, I haven't been able to find which hyper-parameters you use to train on CIFAR-100 with 40% asymmetric noise. Can you please tell me?

Thank you!

P.S: Awesome work!

Can you share how you organized clothing1m dataset?

I have downloaded clothing1m dataset.
The Dataset I downloaded has architecture

.
├─clean_test
├─clean_train
├─clean_valid
├─noisy_train

which contains only jpg not txt files such as noisy_label_kv.txt

Can you share these files necessary for running the code?

Hyperparameter setting of GMM

Could you please explain how to consider the GMM parameter setting? What main factors need to be considered if we transfer the framework to new data?

Could you share the Clothing-1M dataset?

I did my best but i couldn't find where can i get the dataset.
Would you share the dataset you had downloaded?
Or please tell me the website to get the dataset.

How to specify gpu

I have 4 gpu on my machine, and I want to run your code on gpu 1, my command is:

python train_cifar.py --gpu_ids 1

But it raises issue:

model = model.to(device)
    RuntimeError: CUDA error: invalid device ordinal

My gpu is normal. Do you have any idea?

74.48% (reproduced) VS 74.76% (claimed) on Clothing1M ?

First, thank you for sharing your code ~
Do I miss some important details to reproduce the result claimed in the paper? Or there is some fluctuation in the final result and the 74.76% is the best result in your experiments on Clothing1M dataset ?

About batch size on CIFAR

Paper said that the batch size for CIFAR is 128, but the code initialize batch size with 64:

parser.add_argument('--batch_size', default=64, type=int, help='train batchsize')

Since there are no scripts provided. I am confused, about whether 128 means two 64 batch augmentation? Should I set --batch_size 128 when I train on CIFAR?

about number of data augmentations?

Hi, thanks for sharing this repo!

I have been reading your code together with your paper, and noticed that in your paper you mentioned to run M data augmentations for unlabeled data, while in

img1 = self.transform(img)
img2 = self.transform(img)
, there are 2 transformations, I am wondering if you have fixed M=2 here, or did I misunderstand something?

BTW, do you think this method is applicable to regression problem as well?

Thank you!

Effectiveness of the mixup operation

Hi Junnan,

Thanks for your excellent work and codes. In the ablation study (Table 5) of your paper, you have conducted the experiments of "DivideMix w/o augmentation". I would like to know what does the augmentation refers to? Does it means the M transformations or the mixup operation? If this augmentation denotes the M transformations, have you ever evaluated the impact of the mixup operation?

Thanks a lot.
Xiaohan

Re-implementation of your result about P-correct

Hi Junnan, so nice of your work!
In your paper, you cited this work, P-correction (Yi & Wu, 2019) , and re-implemented this work, get worse result than his paper. I have done the same thing, and get even worse result than your re-implementation.
Could you please share some experience in re-implementation? Thank you a lot! ^_^

Ask for hyperparameters for experiments in Table1.

Hi, @LiJunnan1992
Thanks for your excellent work and I really interested in it. But I cannot get the results claimed in Table .1 by using the default hyperparameters, as shown in the following table. So could you show the hyperparameters setting for experiments in Table .1?

CIFAR_ResNet18 default setting (p_threshold=0.5, lambda_u=25, T=0.5, alpha=4)    
cifar10-sym-20% 90.77/91.06    
cifar10-sym-50% 94.87/94.87    
cifar10-sym-80% 92.87/93.05    
cifar10-sym-90% error    
       
Pre-ResNet18   paper claim  
cifar10-sym-20% 91.63/92 95.7/96.1  
cifar10-sym-50% 94.91/94.91 94.4/94.6  
cifar10-sym-80%   92.9/93.2  
cifar10-sym-90% 50.68/69.84 75.4/76.0 overfit

Question about intuition of fitting loss to GMM

Hello, I am new to topic about label noise but very interested in your algorithm, I have two questions in mind if you can help provide some insights into

  1. Why fitting loss to GMM instead of others, such as dimension reduced learnt representations, have you experimented with other settings?

  2. Related to the first question, if using loss as input to GMM, how is the inference done if validation set also contains noisy labels? Can we still separate clean/noisy label without posterior loss?

Thank you

Parameter setting for cifar10

Hi!
I trained on cifar10 dataset with 40% asymmetric noise on default parameter setting. and I got only 83.5% accuracy on test set.
I noticed that sentence 'We choose λu from {0, 25, 50, 150} using a small validation set.' in your paper. So how to choose λu for different noise mode and radio to get the best accuracy?
Thank you very much!

Some discussions about DivideMix implementation

Hi, this is excellent work! I have read the paper and source code a few times in the past two weeks. They are inspiring, thanks for sharing them! I have two questions about your implementation, would you take a look when possible?

The first question is about co-guessing and label refinement in the train function. Is it safer to use net.eval() and net2.eval() in this block, then turn on net.train() before calculating the logits in line 101? I feel both net and net2 are used to prepare some labels in this block, which is just doing the evaluation.

DivideMix/Train_cifar.py

Lines 62 to 67 in d9d3058

with torch.no_grad():
# label co-guessing of unlabeled samples
outputs_u11 = net(inputs_u)
outputs_u12 = net(inputs_u2)
outputs_u21 = net2(inputs_u)
outputs_u22 = net2(inputs_u2)

The second question is about the linear_rampup function. I didn't understand the reason for multiple lambda_u with the current epoch number current. Could you explain that?

DivideMix/Train_cifar.py

Lines 192 to 194 in d9d3058

def linear_rampup(current, warm_up, rampup_length=16):
current = np.clip((current-warm_up) / rampup_length, 0.0, 1.0)
return args.lambda_u*float(current)

Thank you very much!

Cannot get the correct image of GMM's result in the setting cifar10_asym_0.4

Hi, thanks for your idea and code!
I want to check the loss distribution in the DivideMix pipeline, so I want to plot the distribution like the image in the paper:
dividemix
I plot the image of cifar10-sym-0.5/0.8, and it looks like right(e.g. cifar10-sym-0.5 Epoch13):
QQ截图20210518135608
But I find it looks wrong in the setting cifar10-asym-0.4(lambda_u=0,initial learning rate=0.02,batch_size=128,warm_up=10 epochs,p_threshold=0.5)(e.g. Epoch10/Epoch14):
asym-0 4
asym0 4

I don't change the DivideMix‘s implementation and use the conf_penlty(noise_mode=asym). My plot method is save the noisy index in dataloader_vifar.py:

noise_label = []
idx = list(range(50000))
random.shuffle(idx)
num_noise = int(self.r*50000)            
noise_idx = idx[:num_noise]
np.save('noiseidx_%s_%.1f.npy'%(noise_mode,r),np.array(noise_idx))

and plot the GMM's result in Train_cifar.py:

pred1 = (prob1 > args.p_threshold)      
pred2 = (prob2 > args.p_threshold)      

all_idx=list(range(50000))
noisy_idx=np.load('noiseidx_%s_%.1f.npy'%(args.noise_mode,args.r)).tolist()
clean_idx=[]
for i in all_idx:
  if i not in noisy_idx:
    clean_idx.append(i)
clean_loss=all_loss[0][-1][clean_idx].numpy()
noisy_loss=all_loss[0][-1][noisy_idx].numpy()

import matplotlib.pyplot as plt
plt.hist(clean_loss, bins=100,density=True,alpha=0.5, histtype='stepfilled',color="lightsteelblue",label='clean')
plt.hist(noisy_loss, bins=100,density=True,alpha=0.5, histtype='stepfilled',color="pink",label='noisy')

plt.title('Epoch %d'%(epoch))
plt.legend(loc='upper right')
plt.xlabel('Normalized loss')
plt.ylabel('Empirical pdf')
svgname='epoch_'+str(epoch)+'.svg'
svg_path=os.path.join(GMM_imgs_path,svgname)
plt.savefig(svg_path)
plt.cla()

Can you give me some advice?Thanks~~

A question about the paper

I noticed that you have made a comparison with Joint-opt under the asym setting, but there is no comparison with it under the sym setting. Why?

When r=0.1, the accuracy is 87.89.

Thank you for your code,. However i meet some problems. When i adjust the r to 0.1, the accuracy is only 87.89. which parametes need to adjust?

Errors when i ran cifar-100 experiment

Hi its me again

Love all ur work and code :)

There were no problems when I ran clothing1m and cifar10. But when I ran experment on Cifar-100 using "python Train_cifar.py --data_path ./dataset/Cifar-100 --gpuid 0 --dataset cifar100", the error came out as following:
'''
Warmup Net1
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorMath.cu line=26 error=59 : device-side assert triggered
Traceback (most recent call last):
File "Train_cifar.py", line 256, in
warmup(epoch,net1,optimizer1,warmup_trainloader)
File "Train_cifar.py", line 137, in warmup
L.backward()
File "/home/zhuwang/anaconda2/envs/dividemix/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zhuwang/anaconda2/envs/dividemix/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorMath.cu:26
'''

I googled and it says that the label is out of range or might contain -1. But i can't figure it out. Has this ever occurred to you? Thank you kindly.

unlabeled set empty during training

Thanks for the nice implementation!

I am trying to run it on my own dataset with a slightly imbalance binary data (about 3: 1) .
During the training after a few steps , the unlabeled set will be too small 0 or 1, and it will error out.
Should i increase the p_threashold to encourage more samples in the unlabelled set or do some other tuning ?
Does it work on imbalanced data? Not sure if it's learning because the unlabeled losses are so small the whole time.
This is how the loss goes in the first few steps

Labeled loss: 0.63  Unlabeled loss: 0.04
Labeled loss: 0.65  Unlabeled loss: 0.03
Labeled loss: 0.60  Unlabeled loss: 0.03
Labeled loss: 0.63  Unlabeled loss: 0.01
Labeled loss: 0.65  Unlabeled loss: 0.02
Labeled loss: 0.30  Unlabeled loss: 0.00
Labeled loss: 0.22  Unlabeled loss: 0.02
Labeled loss: 0.63  Unlabeled loss: 0.03

Thanks so much!

Could DivideMix generalize to Segmentation Problem?

Thanks for the amazing work!
I see your work mainly use object classification problem as benchmarks (as most research works of similar area do), do you think the framework could be applied on segmentation problem as well?

Question about overfitting

Hi,

Thanks so much for sharing your code and work!
I wonder have you tried asym noise at a low ratio? I tried some different noise mode such as mixing asym and sym together, sometimes the network seems overfit quickly in the initial epochs of warmup. Do you have any suggestions about modifying the loss and regularization tricks in this condition? Actually, I'm curious and confused about the relation between noise mode and loss distribution. Any suggestions will be highly appreciated!

Best,
Chen

Plot Figure 2

Dear author

Thank you very much for your excellent code. My recent work is also trying to identify noise labels from correct labels.

I'm curious where you output the loss values (e.g., Fig 2(a)) from your code? Is it the value of <display_loss> in the following code? If not, could you tell me how to calculate it?

def warmup(epoch, net, optimizer, dataloader, args): # make noise labels in asym and sym ways
----net.train()
----num_iter = (len(dataloader.dataset) // dataloader.batch_size) + 1
----CEloss = nn.CrossEntropyLoss()
----display_loss = []
----for batch_idx, (inputs, labels, path) in enumerate(dataloader):
--------inputs, labels = inputs.cuda(), labels.cuda()
--------optimizer.zero_grad()
--------outputs = net(inputs)
--------loss = CEloss(outputs, labels)
--------L = loss
--------display_loss.append(L)

Thanks again for your help. Looking forward to your reply.

Usage on a custom dataset

I intend to use this repo on a custom dataset of fashion images with weak labels (stored in a CSV file, row wise).. can you suggest which files would be the best choice to edit and use?

Can you share log file of training cifar10 of asym noise?

Hi Junnan. I really appreciate your work and it helps me study a lot actually.

However, somehow, I cannot get the result as same as you mentioned in your paper.
It shows that asym 40% noise on cifar-10 should give 93% accuracy but what i see is middle of 80s.

Can you share your log file of asym0.4-cifar10?
image

This is image of middle of the process.

Need For Kind Help in Paper's Replication

Hi li,
I got a poor performance in both cifar-10/100 when the noise ratio is high than 80%, which mainly indicating 80% 90% in Table 1. All goes well with the hyperparameters in low noise condition. Could you kindly give me some suggestions for this problem I encountered ?
More precisely, my results as follows:
c10 0.8 best:75.7 last:73.5
c10 0.9 best:44.0 last:41.2
c100 0.8 best:52.7 last:50.9
c100 0.9 best:22.6 last:21.4

Hyperparameter setting of GMM

Hello, thanks for your excellent work!
Can two input losses be entered in the fit() of GMM?
For example, gmm.fit(input_loss1, input_loss2)?
I am curious and want to try to use two losses to model GMM.

Training on Webvision 1.0

Hi, thanks for your sharing such a cool code!

I am trying to evaluate your approach on Webvision1.0, and there are 2 different kind of datasets are available(original version and resized version).

Which kind of version did you choose in your paper?

Thanks a lot!

About the accuracy of the asym noise in cifar10

Hello, thanks for your nice work!
I run the Train_cifar.py code, and set the noise_mode to asym, the r( noise rate) to 0.4 .
Then I found the highest accuracy is 83.+ , differing from your paper mentioned about 92.1/93.4.
Am I make some mistake or need to change some hyperparameter?
Thank~

....
Epoch:280 Accuracy:81.89
Epoch:281 Accuracy:81.77
Epoch:282 Accuracy:82.54
Epoch:283 Accuracy:82.56
Epoch:284 Accuracy:82.96
Epoch:285 Accuracy:82.65
Epoch:286 Accuracy:82.90
Epoch:287 Accuracy:82.20
Epoch:288 Accuracy:82.63
Epoch:289 Accuracy:82.06
Epoch:290 Accuracy:82.03
Epoch:291 Accuracy:82.68
Epoch:292 Accuracy:82.34
Epoch:293 Accuracy:82.37
Epoch:294 Accuracy:83.15
Epoch:295 Accuracy:82.32
Epoch:296 Accuracy:82.14
Epoch:297 Accuracy:82.18
Epoch:298 Accuracy:82.28
Epoch:299 Accuracy:82.63
Epoch:300 Accuracy:82.62

Can you share cifar noise file

Hello, I have repeated experiments on cifar10 and cifar100, and I find that accuracy is affected by noise file when noise ratio is high(0.8/0.9). I got only 54.6/27.9 on cifar100 with 0.8/0.9 noise ratio, which are much lower than that claimed in paper(60.2/31.5).

so, can you share the noise file?
Thanks:)

Labeled data has a size of 0 after training a few epochs

Hi, thanks for the nice implementation!

I am trying to run it on my own dataset, but the labeled data becomes empty after a few epochs, the error as follows:

labeled data has a size of 0
ValueError: num_samples should be a positive integer value, but got num_samples=0

What could i do to solve it?
Thanks for your reply!

Actual noise rate regarding symmetric noise

In case of symmetric noise, it seems to me that some labels that were intended to be corrupted aren't actually corrupted.

Let's take 50% symmetric noise in CIFAR10(10 classes) for example.
The code intends to apply noise to 25000 out of 50000 instances, but 10% of randomly labeled 25000 samples(since there are 10 classes) will be mapped back to their original labels, resulting in only 22500 noisy labeled samples. Because of this, in CIFAR10, 50% symmetric noise will actually end up in 45% noise rate. (49.5% noise rate in CIFAR100)

Adding the following lines in dataloader_cifar.py makes an exact 50% noise.

noiselabel = random.randint(0,9)

while True:
    noiselabel = random.randint(0, 9)
    if train_label[i]!=noiselabel: break

question about penalty

prior = torch.ones(args.num_class)/args.num_class
prior = prior.cuda()
pred_mean = torch.softmax(logits, dim=1).mean(0)
penalty = torch.sum(prior*torch.log(prior/pred_mean))

entropy=p*log(p) why not penalty = torch.sum(pred_mean*torch.log(prior/pred_mean))

Question about the clothing1M dataset.

I notice that there is a clean training set in the clothing1M (clean_train_key_list.txt). It seems consist of images different from the noisy training set.
Does dividemix use it? Or is it common pratice to ignore it in the field of LNL?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.