Code Monkey home page Code Monkey logo

vat_tf's People

Contributors

qinenergy avatar takerum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vat_tf's Issues

Why not update statistics in BatchNorm layer?

Hi,

I found that in your 'build_training_graph' function in train_semisup.py. The 'update_batch_stats' set to False, in this case, the avg_mean and avg_var would always be 0 and 1 during the EVAL.

I am not sure about it or Is there any trick in this setting.

What are vat.xi and vat.eplison optimal values?.

In your implementation vat.xi = 1e-6, and when you do
vat.xi * d # is the random unit vector
It is close to zero right?, whether vat.xi should be greater than 1.

I am confused between vat.xi and vat.eplison ? can you clarify this

thanks in advance

I have a question about your paper

In your paper, the equation (12) is to computer Hd, but how the equation (12) convert to the equation (13). Where the parameter xi go?
I think equation (13) should be d <- \overline{Hd}, that is the unit vector of the result of the equation (12).
@takerum

Maybe a wrong loss function

Hello, Takeru, Thanks for your great works.
I find that there may be an error with your code, at the 46th line in vat.py:
" dist = L.kl_divergence_with_logit(logit_p, logit_m)",
where I think you may add a negative sign before KL_divergence, because here we want to maximize the distance to get the virtual adversarial direction.
Am I right?

Unable to replicate results in the paper

Hi there

First of all, thanks for this excellent algorithm and accompanying code. I have found both very useful for my masters' thesis.

One issue I've been having though is replicating the semi-supervised CIFAR10 results reported in your paper (10.55% error with data augmentation and entropy minimisation). When I run this code with the suggested semi-supervised/entropy minimisation CIFAR10 parameters, the test error I get at the end is usually more in the range of 14%, at best.

Is this the code that you used to produce the results in your paper? If so, would it be possible to get the exact parameter values used to replicate those results?

Kind regards,
Liam Schoneveld

A concern about VAT and dropout

Hi, I come back again. A question of virtual adversarial training arose to me. When using this method, we first feed the unlabeled data into the forward net to get their predictions as labels, and in the meantime, feed the noised data to get logits, then regard the KL divergence as the loss to get r_vadv. But in your codes, you applied dropout to the forward net. As far as I know, tensorflow only reuses variables and no operation (e.g. dropout) in the graph, thus the logits and the labels we get from the above procedure use different settings of dropout, i.e., the KL divergence computed may include the noises from dropout. How do you consider about it? Would it make the r_vadv not accurate enough?

is_training = True

Hi,

Thanks for this excellent algorithm and accompanying code.
After reading your code, I found one issue, that is in your train_semisup.py file of line 86. Why you set the is_training as true when you do your evaluation? Will that influence the final result? What if I change it to false?

Best,
Xiang

GPU problem

Hi takerum,
Thanks for your work. In your code, it seems that you use the CPU to deal with the dataset and then use GPU to build the training graph. However, when I ran this program, here comes an issue that the program can not locate the CPU and GPU. Is there something I missed?

Is there a queue runner to start training?

Hi, I have reviewed your codes for vat. One thing I am confused is that how do you run the training graph? I saw the inputs are read from tfrecords, there are a coordinator and tf. start_queue_runner in test.py, but I did not find anything like that in train_semisup.py.

Unstable training in MNIST

Hello.

I have created an implementation of VAT in pytorch and I am facing the following issue:
While the code works for the toy example (two moons dataset) and produces the correct decision boundary, when running on MNIST with 100 labeled samples, training becomes unstable and test accuracy oscillates constantly as training progresses. This issue is mitigated when I increase the number of labeled samples to 300 or more. In that case, training becomes stable and there is noticeable improvement in comparison to the supervised baseline, as expected.

Do you have any intuition as to why the above happens?

e.g. The network for MNIST consists of fully connected layers with batch normalization and dropout. Removing the batch norm layers and/or the dropout doesn't seem to affect the issue

PyTorch implementation

Hi,

I am trying to implement your code in PyTorch.

I believe I implemented VAT loss accurately. But, I cannot get the same performance probably because I used a different ConvNet. When I try to replicate your convnet; namely: "conv-large" the network did not work at all. Here, I am copying my code for conv-large in PyTorch. I would appreciate if you can give me a feedback on what might be wrong.

Also, in the paper you are referring to the paper "Temporal Ensembling for Semi-Supervised Learning" for the network used in experiments. But, they are adding Gaussian noise in the first layer while I could not find noise in your implementation.

import torch.nn as nn
import torch.nn.functional as F

class conv_large(nn.Module):
def init(self):
super(conv_large, self).init()

    self.lr = nn.LeakyReLU(0.1)
    self.mp2_2 = nn.MaxPool2d(2, stride=2, padding=0)
    self.drop = nn.Dropout(p = 0.5)
    
    self.bn128 = nn.BatchNorm2d(128, affine=True)
    self.bn256 = nn.BatchNorm2d(256, affine=True)
    self.bn512 = nn.BatchNorm2d(512, affine=True)
    
    self.conv3_128_3_1 = nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1);         
    
    self.conv128_128_3_1 = nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1);
    self.conv128_256_3_1 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1);
    self.conv256_256_3_1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1);
    
    self.conv256_512_3_1 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=0);
    self.conv512_256_1_1 = nn.Conv2d(512, 256, kernel_size=1, stride=1, padding=0);
    self.conv256_128_1_1 = nn.Conv2d(256, 128, kernel_size=1, stride=1, padding=0);

    self.avg = nn.AvgPool2d(6, ceil_mode=True) # global average pooling
    self.fc = nn.Linear(128, 10)
    
def forward(self, x):
    x = self.conv3_128_3_1(x);

    x = self.bn128(x); x = self.lr(x)  

    x = self.conv128_128_3_1(x); 

    x = self.bn128(x); x = self.lr(x)     
    x = self.conv128_128_3_1(x); x = self.bn128(x); x = self.lr(x)   
    

    x = self.mp2_2(x); 
    x = self.drop(x)
            
    x = self.conv128_256_3_1(x); 
    x = self.bn256(x); 
    x = self.lr(x)
    
    x = self.conv256_256_3_1(x); 
    x = self.bn256(x); x = self.lr(x)
    
    x = self.conv256_256_3_1(x);
    x = self.bn256(x); x = self.lr(x)
    
    x = self.mp2_2(x); 
    x = self.drop(x)
    
    x = self.conv256_512_3_1(x); 
    x = self.bn512(x); x = self.lr(x)

    x = self.conv512_256_1_1(x); 
    x = self.bn256(x); x = self.lr(x)
    
    x = self.conv256_128_1_1(x); 
    x = self.bn128(x); x = self.lr(x)        
    
    x = self.avg(x)
    x = x.view(x.size(0),-1)
    x = self.fc(x)

    return x

VAT for MNIST

Can you please guide me about, how to reproduce results for MNIST.

cannot reproduce your results

Hi,

Thanks for this excellent algorithm and accompanying code.
After running your code, I only got about 85% of evaluation accuracy. And after checking issues of this project, I set the is_training=True in your train_semisup.py file of line 86 and then I got 89.00% of evaluation accuracy. Will that an issue in your code?
best,
Jie.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.