lijunnan1992 / dividemix Goto Github PK

View Code? Open in Web Editor NEW

529.0 529.0 83.0 124 KB

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

License: MIT License

Python 100.00%

dividemix's People

Contributors

Stargazers

Watchers

Forkers

reactivetype xjtushujun templeblock zbpjlc ml-lab xrosliang leesh6796 wang3702 swansealeo yuanwei0908 filiperobotic hiyoung-asr lihuikenny pjb7687 nothingeasy runqing-formost timosachsenberg am94ghiassi chaoso lancege hongxin001 cieusy zhangc2 mikeswf changchunli walter-pixel makototakamatsu013 huizhang0110 nickyfot camelliahmy gxinhu wh-forker masonwang025 nta-byte zhangxuemiao lee-jinhee fuweijie jeremyyang99 haolsun hhhhnwl pexure xhwxd zzx820302704 neonsign247 lisenbuaa ejbejaranosai kongyanlei ckghosted qiuxiaolong857 reihaneh-torkzadehmahani bzp92 sdgumdam jianzhu tks1998 aldakata tyrantyk thucbx99 chasemonsteraway roshankenia zhaoxin94 shwinshaker kangkai98 techthiyanes xyupeng nehaar onlyonewater morales97 lockinlucien7 hhhhhhao jasonshao55 ml-edu jiyang-zheng eternel22 kanelankai htdinh dongyyyyy siaer baekms bighan123 kakou34 steveli88 hyeokreal awj2021

dividemix's Issues

DivideMix without augmentation

First, thank you for sharing your code.
Could you please explain the details of the diviemix without augmentation in ablation study? Are unlabeled samples also without the data augmentation ?

Evaluation 50 classes WebVision

Hi,

Many thanks for sharing the code of your cool work!

I am trying to evaluate your approach in the ImageNet val set when training with first 50 classes of WebVision. I wonder how do you get the right 50 classes from ImageNet to map the first 50 classes of WebVision.

为什么transform两次啊？

img1 = self.transform(image)
img2 = self.transform(image)
您好，请问这样做的理由是啥？

Some questions in Figure 2

hello，I'm repeating your work, and can you explain how the pdf corresponding to each loss in figure 2 was drawn, and which function in GMM was used?

some question about when noise_mode = asym

Hi Junnan,

I really like your paper and am running your code. But I have some question regarding how you deal with asymmetric noise.

In line 24 in dataloader_cifar.py, did you just match similar class manually? Because I checked the Cifar official website, it seems you just match similar classes like cats and dogs, deers and horses, birds and planes. May I ask why you generate asymmetric data like this?
I didn't find the asymmetric class transition for Cifar100 in your code and it is interesting that you didn't report asymmetric noise accuracy in you paper in Table 5. So can you tell me how you generate asymmetric data for Cifar100?

Looking forward to your reply!

Ask for hyperparameter for Table 6

Hello, thanks for your excellent work!
I'm trying to reproduce your experiments in Table 6. And I noticed that hyperparameters(e.g., lambda_u, threshold tau) are not mentioned in the original paper. Can you share the hyperparameter For Table 6?

a question about Warmup dataloader mode

in the process of warmup, the mode is 'all', it means it will take all the train data for training , and the train data include both
the labeled data and the unlabeled data, it may have some influence on the input of GMM, the unlabeled data with it's labled are trained in the process of warmup, what i say is right?

ImageNet val 50 classes

Dear Junnan,
I'm from NUS SoC and I want to follow this work. Can you provide synsets.txt for selection of ImageNet val 50 classes (as I saw in another issue)? Or can you provide a snippet to choose them? I'd appreciate that very much.

Data set used for module 'dataloader_cifar'

Hi. Thanks for your exciting paper to me!
I'd like to implement on your code but there is no dataset used in the module 'dataloader_cifar'.
When 'unpickle' the file, there is no data dict file so arise error.
I think the cifar dataset which used in this code will have some proper form like 'dictionary'.
Where could i get this dataset?

Lambda_u for CIFAR-100 on 40 asym noise

Hi, I haven't been able to find which hyper-parameters you use to train on CIFAR-100 with 40% asymmetric noise. Can you please tell me?

Thank you!

P.S: Awesome work!

I have some issues with the CIFARN dataset, has anyone tested DMix on this dataset

I couldn't achieve the accuracy of CIFAR10-N on its dataset's official website, or even much worse.Any recommendations for hyperparameters?

Can you share how you organized clothing1m dataset?

I have downloaded clothing1m dataset.
The Dataset I downloaded has architecture

.
├─clean_test
├─clean_train
├─clean_valid
├─noisy_train

which contains only jpg not txt files such as noisy_label_kv.txt

Can you share these files necessary for running the code?

where is your noise_file?

Hyperparameter setting of GMM

Could you please explain how to consider the GMM parameter setting? What main factors need to be considered if we transfer the framework to new data?

Could you share the Clothing-1M dataset?

I did my best but i couldn't find where can i get the dataset.
Would you share the dataset you had downloaded?
Or please tell me the website to get the dataset.

How to specify gpu

I have 4 gpu on my machine, and I want to run your code on gpu 1, my command is:

python train_cifar.py --gpu_ids 1

But it raises issue:

model = model.to(device)
    RuntimeError: CUDA error: invalid device ordinal

My gpu is normal. Do you have any idea?

74.48% (reproduced) VS 74.76% (claimed) on Clothing1M ?

First, thank you for sharing your code ~
Do I miss some important details to reproduce the result claimed in the paper? Or there is some fluctuation in the final result and the 74.76% is the best result in your experiments on Clothing1M dataset ?

About batch size on CIFAR

Paper said that the batch size for CIFAR is 128, but the code initialize batch size with 64:

DivideMix/Train_cifar.py

Line 17 in d9d3058

    
           parser.add_argument('--batch_size', default=64, type=int, help='train batchsize')

Since there are no scripts provided. I am confused, about whether 128 means two 64 batch augmentation? Should I set --batch_size 128 when I train on CIFAR?

about number of data augmentations?

Hi, thanks for sharing this repo!

I have been reading your code together with your paper, and noticed that in your paper you mentioned to run M data augmentations for unlabeled data, while in

DivideMix/dataloader_cifar.py

Lines 113 to 114 in d9d3058

    
           img1 = self.transform(img)  
        
           img2 = self.transform(img)

, there are 2 transformations, I am wondering if you have fixed M=2 here, or did I misunderstand something?

BTW, do you think this method is applicable to regression problem as well?

Thank you!

about the learning rate schedule

DivideMix/Train_cifar.py

Line 244 in d9d3058

if epoch >= 150:

in this code， it seems that after 150 epoch，learning rate is reduced by a factor of 10 in every epoch . Then in 160 epoch, the learning rate will be very small (0.02 * 0.1^10).

Effectiveness of the mixup operation

Hi Junnan,

Thanks for your excellent work and codes. In the ablation study (Table 5) of your paper, you have conducted the experiments of "DivideMix w/o augmentation". I would like to know what does the augmentation refers to? Does it means the M transformations or the mixup operation? If this augmentation denotes the M transformations, have you ever evaluated the impact of the mixup operation?

Thanks a lot.
Xiaohan

Re-implementation of your result about P-correct

Hi Junnan, so nice of your work!
In your paper, you cited this work, P-correction (Yi & Wu, 2019) , and re-implemented this work, get worse result than his paper. I have done the same thing, and get even worse result than your re-implementation.
Could you please share some experience in re-implementation? Thank you a lot! ^_^

Ask for hyperparameters for experiments in Table1.

Hi, @LiJunnan1992
Thanks for your excellent work and I really interested in it. But I cannot get the results claimed in Table .1 by using the default hyperparameters, as shown in the following table. So could you show the hyperparameters setting for experiments in Table .1?

CIFAR_ResNet18	default setting (p_threshold=0.5, lambda_u=25, T=0.5, alpha=4)
cifar10-sym-20%	90.77/91.06
cifar10-sym-50%	94.87/94.87
cifar10-sym-80%	92.87/93.05
cifar10-sym-90%	error

Pre-ResNet18		paper claim
cifar10-sym-20%	91.63/92	95.7/96.1
cifar10-sym-50%	94.91/94.91	94.4/94.6
cifar10-sym-80%		92.9/93.2
cifar10-sym-90%	50.68/69.84	75.4/76.0	overfit

Question about intuition of fitting loss to GMM

Hello, I am new to topic about label noise but very interested in your algorithm, I have two questions in mind if you can help provide some insights into

Why fitting loss to GMM instead of others, such as dimension reduced learnt representations, have you experimented with other settings?
Related to the first question, if using loss as input to GMM, how is the inference done if validation set also contains noisy labels? Can we still separate clean/noisy label without posterior loss?

Thank you

Parameter setting for cifar10

Hi!
I trained on cifar10 dataset with 40% asymmetric noise on default parameter setting. and I got only 83.5% accuracy on test set.
I noticed that sentence 'We choose λu from {0, 25, 50, 150} using a small validation set.' in your paper. So how to choose λu for different noise mode and radio to get the best accuracy?
Thank you very much!

Some discussions about DivideMix implementation

Hi, this is excellent work! I have read the paper and source code a few times in the past two weeks. They are inspiring, thanks for sharing them! I have two questions about your implementation, would you take a look when possible?

The first question is about co-guessing and label refinement in the train function. Is it safer to use net.eval() and net2.eval() in this block, then turn on net.train() before calculating the logits in line 101? I feel both net and net2 are used to prepare some labels in this block, which is just doing the evaluation.

DivideMix/Train_cifar.py

Lines 62 to 67 in d9d3058

    
           with torch.no_grad(): 
        
               # label co-guessing of unlabeled samples 
        
               outputs_u11 = net(inputs_u) 
        
               outputs_u12 = net(inputs_u2) 
        
               outputs_u21 = net2(inputs_u) 
        
               outputs_u22 = net2(inputs_u2)

The second question is about the linear_rampup function. I didn't understand the reason for multiple lambda_u with the current epoch number current. Could you explain that?

DivideMix/Train_cifar.py

Lines 192 to 194 in d9d3058

    
           def linear_rampup(current, warm_up, rampup_length=16): 
        
               current = np.clip((current-warm_up) / rampup_length, 0.0, 1.0) 
        
               return args.lambda_u*float(current)

Thank you very much!

Cannot get the correct image of GMM's result in the setting cifar10_asym_0.4

Hi, thanks for your idea and code!
I want to check the loss distribution in the DivideMix pipeline, so I want to plot the distribution like the image in the paper：

I plot the image of cifar10-sym-0.5/0.8, and it looks like right(e.g. cifar10-sym-0.5 Epoch13):

But I find it looks wrong in the setting cifar10-asym-0.4(lambda_u=0,initial learning rate=0.02,batch_size=128,warm_up=10 epochs,p_threshold=0.5)(e.g. Epoch10/Epoch14):

I don't change the DivideMix‘s implementation and use the conf_penlty(noise_mode=asym). My plot method is save the noisy index in dataloader_vifar.py:

noise_label = []
idx = list(range(50000))
random.shuffle(idx)
num_noise = int(self.r*50000)            
noise_idx = idx[:num_noise]
np.save('noiseidx_%s_%.1f.npy'%(noise_mode,r),np.array(noise_idx))

and plot the GMM's result in Train_cifar.py:

pred1 = (prob1 > args.p_threshold)      
pred2 = (prob2 > args.p_threshold)      

all_idx=list(range(50000))
noisy_idx=np.load('noiseidx_%s_%.1f.npy'%(args.noise_mode,args.r)).tolist()
clean_idx=[]
for i in all_idx:
  if i not in noisy_idx:
    clean_idx.append(i)
clean_loss=all_loss[0][-1][clean_idx].numpy()
noisy_loss=all_loss[0][-1][noisy_idx].numpy()

import matplotlib.pyplot as plt
plt.hist(clean_loss, bins=100,density=True,alpha=0.5, histtype='stepfilled',color="lightsteelblue",label='clean')
plt.hist(noisy_loss, bins=100,density=True,alpha=0.5, histtype='stepfilled',color="pink",label='noisy')

plt.title('Epoch %d'%(epoch))
plt.legend(loc='upper right')
plt.xlabel('Normalized loss')
plt.ylabel('Empirical pdf')
svgname='epoch_'+str(epoch)+'.svg'
svg_path=os.path.join(GMM_imgs_path,svgname)
plt.savefig(svg_path)
plt.cla()

Can you give me some advice？Thanks~~

A question about the paper

I noticed that you have made a comparison with Joint-opt under the asym setting, but there is no comparison with it under the sym setting. Why?

When r=0.1， the accuracy is 87.89.

Thank you for your code,. However i meet some problems. When i adjust the r to 0.1， the accuracy is only 87.89. which parametes need to adjust?

Errors when i ran cifar-100 experiment

Hi its me again

Love all ur work and code :)

There were no problems when I ran clothing1m and cifar10. But when I ran experment on Cifar-100 using "python Train_cifar.py --data_path ./dataset/Cifar-100 --gpuid 0 --dataset cifar100", the error came out as following:
'''
Warmup Net1
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorMath.cu line=26 error=59 : device-side assert triggered
Traceback (most recent call last):
File "Train_cifar.py", line 256, in
warmup(epoch,net1,optimizer1,warmup_trainloader)
File "Train_cifar.py", line 137, in warmup
L.backward()
File "/home/zhuwang/anaconda2/envs/dividemix/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zhuwang/anaconda2/envs/dividemix/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorMath.cu:26
'''

I googled and it says that the label is out of range or might contain -1. But i can't figure it out. Has this ever occurred to you? Thank you kindly.

unlabeled set empty during training

Thanks for the nice implementation!

I am trying to run it on my own dataset with a slightly imbalance binary data (about 3: 1) .
During the training after a few steps , the unlabeled set will be too small 0 or 1, and it will error out.
Should i increase the p_threashold to encourage more samples in the unlabelled set or do some other tuning ?
Does it work on imbalanced data? Not sure if it's learning because the unlabeled losses are so small the whole time.
This is how the loss goes in the first few steps

Labeled loss: 0.63  Unlabeled loss: 0.04
Labeled loss: 0.65  Unlabeled loss: 0.03
Labeled loss: 0.60  Unlabeled loss: 0.03
Labeled loss: 0.63  Unlabeled loss: 0.01
Labeled loss: 0.65  Unlabeled loss: 0.02
Labeled loss: 0.30  Unlabeled loss: 0.00
Labeled loss: 0.22  Unlabeled loss: 0.02
Labeled loss: 0.63  Unlabeled loss: 0.03

Thanks so much!

Warmup dataloader batch size double

Hi,
I wonder why in the warmup process, the dataloader batch size has to be doubled? Can I use the normal batch size?

Cheers

Could DivideMix generalize to Segmentation Problem?

Thanks for the amazing work!
I see your work mainly use object classification problem as benchmarks (as most research works of similar area do), do you think the framework could be applied on segmentation problem as well?

Question about overfitting

Hi,

Thanks so much for sharing your code and work!
I wonder have you tried asym noise at a low ratio? I tried some different noise mode such as mixing asym and sym together, sometimes the network seems overfit quickly in the initial epochs of warmup. Do you have any suggestions about modifying the loss and regularization tricks in this condition? Actually, I'm curious and confused about the relation between noise mode and loss distribution. Any suggestions will be highly appreciated!

Best,
Chen

Plot Figure 2

Dear author

Thank you very much for your excellent code. My recent work is also trying to identify noise labels from correct labels.

I'm curious where you output the loss values (e.g., Fig 2(a)) from your code? Is it the value of <display_loss> in the following code? If not, could you tell me how to calculate it?

def warmup(epoch, net, optimizer, dataloader, args): # make noise labels in asym and sym ways
----net.train()
----num_iter = (len(dataloader.dataset) // dataloader.batch_size) + 1
----CEloss = nn.CrossEntropyLoss()
----display_loss = []
----for batch_idx, (inputs, labels, path) in enumerate(dataloader):
--------inputs, labels = inputs.cuda(), labels.cuda()
--------optimizer.zero_grad()
--------outputs = net(inputs)
--------loss = CEloss(outputs, labels)
--------L = loss
--------display_loss.append(L)

Thanks again for your help. Looking forward to your reply.

Usage on a custom dataset

I intend to use this repo on a custom dataset of fashion images with weak labels (stored in a CSV file, row wise).. can you suggest which files would be the best choice to edit and use?

Can you share log file of training cifar10 of asym noise?

Hi Junnan. I really appreciate your work and it helps me study a lot actually.

However, somehow, I cannot get the result as same as you mentioned in your paper.
It shows that asym 40% noise on cifar-10 should give 93% accuracy but what i see is middle of 80s.

Can you share your log file of asym0.4-cifar10?

This is image of middle of the process.

Need For Kind Help in Paper's Replication

Hi li，
I got a poor performance in both cifar-10/100 when the noise ratio is high than 80%, which mainly indicating 80% 90% in Table 1. All goes well with the hyperparameters in low noise condition. Could you kindly give me some suggestions for this problem I encountered ?
More precisely, my results as follows:
c10 0.8 best:75.7 last:73.5
c10 0.9 best:44.0 last:41.2
c100 0.8 best:52.7 last:50.9
c100 0.9 best:22.6 last:21.4

Hyperparameter setting of GMM

Hello, thanks for your excellent work!
Can two input losses be entered in the fit() of GMM?
For example, gmm.fit(input_loss1, input_loss2)?
I am curious and want to try to use two losses to model GMM.

Training on Webvision 1.0

Hi, thanks for your sharing such a cool code!

I am trying to evaluate your approach on Webvision1.0, and there are 2 different kind of datasets are available(original version and resized version).

Which kind of version did you choose in your paper?

Thanks a lot!

About the accuracy of the asym noise in cifar10

Hello, thanks for your nice work!
I run the Train_cifar.py code, and set the noise_mode to asym, the r( noise rate) to 0.4 .
Then I found the highest accuracy is 83.+ , differing from your paper mentioned about 92.1/93.4.
Am I make some mistake or need to change some hyperparameter?
Thank~

....
Epoch:280 Accuracy:81.89
Epoch:281 Accuracy:81.77
Epoch:282 Accuracy:82.54
Epoch:283 Accuracy:82.56
Epoch:284 Accuracy:82.96
Epoch:285 Accuracy:82.65
Epoch:286 Accuracy:82.90
Epoch:287 Accuracy:82.20
Epoch:288 Accuracy:82.63
Epoch:289 Accuracy:82.06
Epoch:290 Accuracy:82.03
Epoch:291 Accuracy:82.68
Epoch:292 Accuracy:82.34
Epoch:293 Accuracy:82.37
Epoch:294 Accuracy:83.15
Epoch:295 Accuracy:82.32
Epoch:296 Accuracy:82.14
Epoch:297 Accuracy:82.18
Epoch:298 Accuracy:82.28
Epoch:299 Accuracy:82.63
Epoch:300 Accuracy:82.62

Can you share cifar noise file

Hello, I have repeated experiments on cifar10 and cifar100, and I find that accuracy is affected by noise file when noise ratio is high(0.8/0.9). I got only 54.6/27.9 on cifar100 with 0.8/0.9 noise ratio, which are much lower than that claimed in paper(60.2/31.5).

so, can you share the noise file?
Thanks:)

Labeled data has a size of 0 after training a few epochs

Hi, thanks for the nice implementation!

I am trying to run it on my own dataset, but the labeled data becomes empty after a few epochs, the error as follows:

labeled data has a size of 0
ValueError: num_samples should be a positive integer value, but got num_samples=0

What could i do to solve it?
Thanks for your reply!

the webvision dataset is Resized version or Full resolution version?

Hi Junnan，

I have a question about the webvision dataset :
The webvision dataset has two versions, Resized and or Full resolution.
Which version of webvision dataset is selected in the paper？

Best Regards,
Zhenzhou

Can you share how you made noise to the data?

I am actually quite new in this field and cannot figure out how you gave label noise.

Can you share the implementations?

Your new paper: "Towards Noise-resistant Object Detection with Noisy Annotations"

Dear @LiJunnan1992 ,

I am very interested in your research, especially your new paper . Are you going to release its source code so that we can reproduce the results?

At cifar10 0.2 sym noise rate setting, cann't repeat the results reported at paper.

The cifar10 0.2 noise rate, I can only get about 91.5% acc useing the code, and I can't find the reason

Actual noise rate regarding symmetric noise

In case of symmetric noise, it seems to me that some labels that were intended to be corrupted aren't actually corrupted.

Let's take 50% symmetric noise in CIFAR10(10 classes) for example.
The code intends to apply noise to 25000 out of 50000 instances, but 10% of randomly labeled 25000 samples(since there are 10 classes) will be mapped back to their original labels, resulting in only 22500 noisy labeled samples. Because of this, in CIFAR10, 50% symmetric noise will actually end up in 45% noise rate. (49.5% noise rate in CIFAR100)

Adding the following lines in dataloader_cifar.py makes an exact 50% noise.

DivideMix/dataloader_cifar.py

Line 68 in d9d3058

noiselabel = random.randint(0,9)

while True:
    noiselabel = random.randint(0, 9)
    if train_label[i]!=noiselabel: break

question about penalty

prior = torch.ones(args.num_class)/args.num_class
prior = prior.cuda()
pred_mean = torch.softmax(logits, dim=1).mean(0)
penalty = torch.sum(prior*torch.log(prior/pred_mean))

entropy=p*log(p) why not penalty = torch.sum(pred_mean*torch.log(prior/pred_mean))

Question about the clothing1M dataset.

I notice that there is a clean training set in the clothing1M (clean_train_key_list.txt). It seems consist of images different from the noisy training set.
Does dividemix use it? Or is it common pratice to ignore it in the field of LNL?

	with torch.no_grad():
	# label co-guessing of unlabeled samples
	outputs_u11 = net(inputs_u)
	outputs_u12 = net(inputs_u2)
	outputs_u21 = net2(inputs_u)
	outputs_u22 = net2(inputs_u2)

	def linear_rampup(current, warm_up, rampup_length=16):
	current = np.clip((current-warm_up) / rampup_length, 0.0, 1.0)
	return args.lambda_u*float(current)

lijunnan1992 / dividemix Goto Github PK

dividemix's People

Contributors

Stargazers

Watchers

Forkers

dividemix's Issues

Recommend Projects

Recommend Topics

Recommend Org