takerum / vat_tf Goto Github PK

View Code? Open in Web Editor NEW

251.0 251.0 80.0 327 KB

Virtual adversarial training with Tensorflow

License: MIT License

Python 100.00%

vat_tf's People

Contributors

Stargazers

Watchers

Forkers

benjamesbabala jdc08161063 hanwgyu daviddemeij sakishinoda sungraepark visionu hanfeng-cdd kunisaito sjtuguofei kiddozhu galmalca codeaudit qingsong99 yanliang0813 liqunchen0606 jayceelee wenjunjiang noirblack sxdkxgwan sunny371 spinachr locussam danielvarga qinenergy eong2012 yuanmengzhixing andrewhuang121 bokunwang mid-push daiquanyu fulifeng wrccrwx ethir amose-yao shuxjweb zhengyixiao seongokryu kuenstlicheintelligenz toa-zr semi-supervised-paper herbertchen1 formulaone9275 jeremy43 dadaqingjian yfshich neuron888 otiliastr shadirahimian sduchh olililili arifmudi blank-wang hujilin1229 hyang428 jt17383 dict qianrenjian wyxingyux tanimutomo hellotem manishanker yyht xchen011 qinxie aivanni koszpe mmotl markoulitin maxovb daperera sadrasabouri guoxuxu kheuton elijahahianyo raulcho94527 sophie-normand

vat_tf's Issues

Why not update statistics in BatchNorm layer?

Hi,

I found that in your 'build_training_graph' function in train_semisup.py. The 'update_batch_stats' set to False, in this case, the avg_mean and avg_var would always be 0 and 1 during the EVAL.

I am not sure about it or Is there any trick in this setting.

What are vat.xi and vat.eplison optimal values?.

In your implementation vat.xi = 1e-6, and when you do
vat.xi * d # is the random unit vector
It is close to zero right?, whether vat.xi should be greater than 1.

I am confused between vat.xi and vat.eplison ? can you clarify this

thanks in advance

I have a question about your paper

In your paper, the equation (12) is to computer Hd, but how the equation (12) convert to the equation (13). Where the parameter xi go?
I think equation (13) should be d <- \overline{Hd}, that is the unit vector of the result of the equation (12).
@takerum

Maybe a wrong loss function

Hello, Takeru, Thanks for your great works.
I find that there may be an error with your code, at the 46th line in vat.py:
" dist = L.kl_divergence_with_logit(logit_p, logit_m)",
where I think you may add a negative sign before KL_divergence, because here we want to maximize the distance to get the virtual adversarial direction.
Am I right?

VAT in segmentation tasks

Hi,
I wonder have tried some experiments using VAT in segmentation tasks?

Thanks.

Unable to replicate results in the paper

Hi there

First of all, thanks for this excellent algorithm and accompanying code. I have found both very useful for my masters' thesis.

One issue I've been having though is replicating the semi-supervised CIFAR10 results reported in your paper (10.55% error with data augmentation and entropy minimisation). When I run this code with the suggested semi-supervised/entropy minimisation CIFAR10 parameters, the test error I get at the end is usually more in the range of 14%, at best.

Is this the code that you used to produce the results in your paper? If so, would it be possible to get the exact parameter values used to replicate those results?

Kind regards,
Liam Schoneveld

Test accuracy at the end of training

A concern about VAT and dropout

Hi, I come back again. A question of virtual adversarial training arose to me. When using this method, we first feed the unlabeled data into the forward net to get their predictions as labels, and in the meantime, feed the noised data to get logits, then regard the KL divergence as the loss to get r_vadv. But in your codes, you applied dropout to the forward net. As far as I know, tensorflow only reuses variables and no operation (e.g. dropout) in the graph, thus the logits and the labels we get from the above procedure use different settings of dropout, i.e., the KL divergence computed may include the noises from dropout. How do you consider about it? Would it make the r_vadv not accurate enough?

is_training = True

Hi,

Thanks for this excellent algorithm and accompanying code.
After reading your code, I found one issue, that is in your train_semisup.py file of line 86. Why you set the is_training as true when you do your evaluation? Will that influence the final result? What if I change it to false?

Best,
Xiang

GPU problem

Hi takerum,
Thanks for your work. In your code, it seems that you use the CPU to deal with the dataset and then use GPU to build the training graph. However, when I ran this program, here comes an issue that the program can not locate the CPU and GPU. Is there something I missed?

usage of 'range' in 'reduce_max' and 'reduce_sum' needs to be debugged

vat_tf/vat.py

Line 34 in c5125d2

    
           d /= (1e-12 + tf.reduce_max(tf.abs(d), range(1, len(d.get_shape())), keep_dims=True))

In tensorflow (1.8>) with python 3, this line does not seem to work. It seems range needs to be wrapped by 'list'.

Is there a queue runner to start training?

Hi, I have reviewed your codes for vat. One thing I am confused is that how do you run the training graph? I saw the inputs are read from tfrecords, there are a coordinator and tf. start_queue_runner in test.py, but I did not find anything like that in train_semisup.py.

Unstable training in MNIST

Hello.

I have created an implementation of VAT in pytorch and I am facing the following issue:
While the code works for the toy example (two moons dataset) and produces the correct decision boundary, when running on MNIST with 100 labeled samples, training becomes unstable and test accuracy oscillates constantly as training progresses. This issue is mitigated when I increase the number of labeled samples to 300 or more. In that case, training becomes stable and there is noticeable improvement in comparison to the supervised baseline, as expected.

Do you have any intuition as to why the above happens?

e.g. The network for MNIST consists of fully connected layers with batch normalization and dropout. Removing the batch norm layers and/or the dropout doesn't seem to affect the issue

PyTorch implementation

Hi,

I am trying to implement your code in PyTorch.

I believe I implemented VAT loss accurately. But, I cannot get the same performance probably because I used a different ConvNet. When I try to replicate your convnet; namely: "conv-large" the network did not work at all. Here, I am copying my code for conv-large in PyTorch. I would appreciate if you can give me a feedback on what might be wrong.

Also, in the paper you are referring to the paper "Temporal Ensembling for Semi-Supervised Learning" for the network used in experiments. But, they are adding Gaussian noise in the first layer while I could not find noise in your implementation.

import torch.nn as nn
import torch.nn.functional as F

class conv_large(nn.Module):
def init(self):
super(conv_large, self).init()

    self.lr = nn.LeakyReLU(0.1)
    self.mp2_2 = nn.MaxPool2d(2, stride=2, padding=0)
    self.drop = nn.Dropout(p = 0.5)
    
    self.bn128 = nn.BatchNorm2d(128, affine=True)
    self.bn256 = nn.BatchNorm2d(256, affine=True)
    self.bn512 = nn.BatchNorm2d(512, affine=True)
    
    self.conv3_128_3_1 = nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1);         
    
    self.conv128_128_3_1 = nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1);
    self.conv128_256_3_1 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1);
    self.conv256_256_3_1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1);
    
    self.conv256_512_3_1 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=0);
    self.conv512_256_1_1 = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0);
    self.conv256_128_1_1 = nn.Conv2d(256, 128, kernel_size=1, stride=1, padding=0);

    self.avg = nn.AvgPool2d(6, ceil_mode=True) # global average pooling
    self.fc = nn.Linear(128, 10)
    
def forward(self, x):
    x = self.conv3_128_3_1(x);

    x = self.bn128(x); x = self.lr(x)  

    x = self.conv128_128_3_1(x); 

    x = self.bn128(x); x = self.lr(x)     
    x = self.conv128_128_3_1(x); x = self.bn128(x); x = self.lr(x)   
    

    x = self.mp2_2(x); 
    x = self.drop(x)
            
    x = self.conv128_256_3_1(x); 
    x = self.bn256(x); 
    x = self.lr(x)
    
    x = self.conv256_256_3_1(x); 
    x = self.bn256(x); x = self.lr(x)
    
    x = self.conv256_256_3_1(x);
    x = self.bn256(x); x = self.lr(x)
    
    x = self.mp2_2(x); 
    x = self.drop(x)
    
    x = self.conv256_512_3_1(x); 
    x = self.bn512(x); x = self.lr(x)

    x = self.conv512_256_1_1(x); 
    x = self.bn256(x); x = self.lr(x)
    
    x = self.conv256_128_1_1(x); 
    x = self.bn128(x); x = self.lr(x)        
    
    x = self.avg(x)
    x = x.view(x.size(0),-1)
    x = self.fc(x)

    return x

VAT for MNIST

Can you please guide me about, how to reproduce results for MNIST.

cannot reproduce your results

Hi,

Thanks for this excellent algorithm and accompanying code.
After running your code, I only got about 85% of evaluation accuracy. And after checking issues of this project, I set the is_training=True in your train_semisup.py file of line 86 and then I got 89.00% of evaluation accuracy. Will that an issue in your code?
best,
Jie.