huawei-noah / disout Goto Github PK

View Code? Open in Web Editor NEW

220.0 220.0 39.0 340 KB

Code for AAAI 2020 paper, Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks (Disout).

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

disout's People

Contributors

Stargazers

Watchers

disout's Issues

Inconsistent training behavior in CIFAR and ImageNet

In the script of resnet.py and train.py where it sets weight_behind of Disout，however in the training process of ImageNet, it resets weight_behind of Disout None. However, why? According to the paper, it should be guided by the ERC value of next layer, and where you use the weights of next layer to approximately compute the gradient of distortion.

Can anyone help me solve this problem？

Hi, I am a CS master in TYUST, I'm very interested in Disout.
But I could not find this object function in your code

Can you tell me where it is in the code?
Think you !

async is a reserved keyword in python

There was an async parameter in train_imagenet.py, but it was deprecated now

Can Disout be used for 3D-CNN?

Hi! Thanks for sharing your great work! I have some questions to ask you. Can Disout be used for 3D-CNN? What should I do? Thank you very much!

Linear scheduler in multi-gpu training

So the Linear Scheduler for linearly increasing distortion (similar to dropblock) would in no way work for multi-gpu training since it uses a simple variable i (not tensor) so when we do the following

def step(self):
        if self.i < len(self.drop_values):
            self.disout.dist_prob = self.drop_values[self.i]
        self.i += 1

The value of i will never get updated. You can try if you want. My question is, how did you guys run this code to train imagenet and got those results?

Training configuration

Dear authors,

Thanks for your great work, It's very useful for training CNNs. Would you mind telling me the training configuration so that I can duplicate the method? Another question: I noticed that DropBlock is trained for 270 epochs, while the default training epoch for Disout is 540. Does it make a big difference to the result?

Thanks

About recurrent neural network using disout

Can anyone tell me how to set the hyperparameters of disout in the recurrent neural network?

Could you provide a example of Feature Map Distortion?

Thanks for your great work!

I am very curious about the experiments of "Feature Map Distortion" mentioned in paper (for fully connected layer). In code there is only block-wise distortion.

Could you provide a example of Feature Map Distortion?

Scale of weight_behind

If we scale the weight_behind by 10, it will output the same features because of BN? Will the magnitude of distortion still keep the same?

Could you provide more details of "Experiments on Fully Connected Layers"?

Hi,

Thanks for your great work!

I am reproducing your first experiment recently, but even the baseline method I implemented is very different from the results in the paper. (In your paper, Table 1 says the accuracy of conventional CNN method on CIFAR-10 reaches 81.99%)

My implementation is as consistent as possible with the narrative in your section "Implementation details" of "Experiments on Fully Connected Layers". Maybe I missed some details or tricks (such as padding, momentum, etc.), could you provide more information about this experiment?

May I submit a PR to your repository?

Hi,

I'm very interested in your work about Disout. And I reproduced the experiment about conventional CNNs in the paper. I want to submit a PR to your repository, and I wonder if you are willing to accept this.

Here is my performance of conventional CNNs.

the parameter of the weight_behind?

hi i have some question about the code;
I have write the code in tensorflow yestoday;
In my opinion , as the code released show that the disout is on the bisic of the dropblock added with a random value;
<0>in your code, the weight_behind is always set None ;
<1> i am also confused about the ERC after reading the paper; why use "dist=self.alpha0.01(var**0.5)*torch.randn(*x.shape,device=x.device)" to get the ERC? where mentioned in your paper?

Project code

Dear Huawei-Noah:

Thank you very much for hoping to reproduce the ideas in the article, but in the process of debugging the code, I found the lack of experiment on fully connected layers code, Can you provide the train_cnn.py and train100_cnn.py code? Thank you very much!