Comments (13)
-
As i told in my previous comment, you cannot just delete
batchnorm
layer. Before doing that you need to modify the weights of previousconv
layer. In our case we did not get rid of these layers while testing. -
We do not include
Unlabelled
class in our confusion matrix. Giving it label 1 made writing the code easier, because cityscapes hasUnlabelled
as its first class. -
As mentioned in the paper, we use our own weight calculation scheme which gave us better result than median frequency balancing.
from enet-training.
- Yes batchnorm and spatial dropouts are very important for getting good results because it forces your network to learn and become more general.
- We did not change our model for inference on test images. Once you have your trained model, of course you can calculate accuracies on test set, the same way they were calculated on train-val set.
- In case of decoder, relus gave better result. Most probably because prelus have extra parameters and our network was not pretrained on any other bigger dataset.
from enet-training.
--learningRateDecay
is 1e-1 and# samples
is there by mistake and has got no meaning.- We do not perform any preprocessing.
from enet-training.
Thank you for your reply! May I also confirm with you that for each dataset, you trained the model for a total of 300 epochs and you perform decay only every 100 epochs?
from enet-training.
Yes that is correct.
from enet-training.
Thank you for the confirmation. Could I also know if you have continued to turn on dropout and batch norm when evaluating the test data? For many models, I think this is a standard thing to do. However, on my side, I seem to see a large difference in performance when I turned off batch norm and dropout.
Also, could I confirm the dataset you were using is equivalent to what is found here: https://github.com/alexgkendall/SegNet-Tutorial/tree/master/CamVid
Thank you once again.
from enet-training.
- If by turning off you mean deleting
batchnorm
then no you cannot do that. You need to adjust the weights of previousconv
layer before getting rid ofbatchnorm
layer. Once that is done then I don't think there will be any difference in performance. - Yes the dataset used here is equivalent to the one which you have mentioned in your comment.
from enet-training.
I have tested on the test dataset with dropout and batch_norm activated, and the results seem to be better than having either batch_norm or dropout turned off (or both). Did you have to turn off dropout when evaluating the test dataset? I see that in many models it's a common thing to stop dropout for test dataset.
Further, for the ordering of the classes in CamVid, I noted that the original dataset gave class labels from 0-11, where 11 is the void class. If the dataset you've used is the one found in the segnet tutorial as well, did you have to relabel all the segmentations from 1-12 (in lua's case), since you've put class 1 as void? Is there a particular reason why void is the first class instead of the last?
CamVid Labelling: https://github.com/alexgkendall/SegNet-Tutorial/blob/c922cc4a4fcc7ce279dd998fb2d4a8703f34ebd7/Scripts/test_segmentation_camvid.py#L60
Your Labelling:
ENet-training/train/data/loadCamVid.lua
Line 19 in e4b664d
Could I also confirm with you if you performed median frequency balancing for obtained the weighted cross entropy loss? For a reference, these are the class weights used for the CamVid dataset:
Thank you for your help once again.
from enet-training.
@codeAC29 thanks for your excellent response! I have been mulling over your response, and I've tried to create a version that can deactivate both batch_norm and spatial dropout during evaluation, however this gives me a very poor result. Like what you mentioned, during testing, batch_norm and spatial dropout are turned on. Is it correct to say these two functions are critical to evaluating images?
On the other hand, if batch_norm is critical to helping the model perform, would evaluating single images result in a very poor result? From my results, somehow there is quite a bit of difference in output when evaluating single images vs a batch of image. Would there be a way to effectively evaluate singular images for the network? I am currently only performing feature standardization to alleviate the effects.
Your paper has a great amount of content which I'm still learning to appreciate. Would you share how in particular is the p_class calculated in the weighing formula: w_class = 1.0 / ln(c + p_class) ? From your code, is it right to assume that p_class is the number of occurrences of a certain pixel label in all images, divided by the total number of pixels in all images? Is there a particular reason why the class weights should be restricted between 1 and 50? Using median frequency balancing, I see that the weights do not exceed 10.
Also, to verify with you, the spatial dropout you have used is Spatial Dropout in 2D (channel wise dropping) - is this correct?
from enet-training.
- As I have told in two of my previous comments: "You cannot just delete batch norm". Before removing batchnorm, you will have to do something like this:
-- x is old model and y is new model
local xsub = x.modules[i].modules
local xsubsub = x.modules[i].modules[1].modules
local output = module.running_mean:nElement()
local eps = xsubsub[j].eps
local momentum = xsubsub[j].momentum
local affine = xsubsub[j].affine
y:add(nn.BatchNormalization(output*#xsub, eps, momentum, affine))
y.modules[#y.modules].train = false
-- concatenate distributed parameters over different models
for k = 1, #xsub do
local range = {output*(k-1)+1, output*k}
y.modules[#y.modules].running_mean[{range}]:copy(xsub[k].modules[j].running_mean)
y.modules[#y.modules].running_var[{range}]:copy(xsub[k].modules[j].running_var)
if affine then
y.modules[#y.modules].weight[{range}]:copy(xsub[k].modules[j].weight)
y.modules[#y.modules].bias[{range}]:copy(xsub[k].modules[j].bias)
end
end
-
Yes,
p_class
is what you have said. The values of the weights need to be such that, while training you are giving equal importance to all the classes. Ifxi
is number of pixels occupied by classi
then weightwi
should be such thatxi*wi
is mostly giving a constant value for all the classes. If the there is huge class imbalance then weights varying between 1 to 50 is also fine, which you found in this case. -
Yes that is correct.
from enet-training.
@codeAC29 Thank you for your excellent response once again. I am currently trying a variant of ENet in TensorFlow, in which case batch_norm could be turned off by setting the argument is_training=False
in the standard batch norm function. Not accounting for implementation but theoretically speaking, would you say that spatial dropout and batch norm are crucial for getting good results?
If batch_norm and dropout are both turned on simultaneously during testing, how have you handled the test images differently from the validation images? Was there any change to the model when you were performing the inference for test images? If there aren't any changes, could the test images be included into the mix of train-validation images instead, given there is no difference from evaluating test images and validation images? That is, of course, assuming there are no changes to the model during testing.
Also, what inspired you to not perform any preprocessing for the images? Is there a conceptual reason behind this? It would be interesting to learn the reason why no preprocessing works well for the datasets.
In your paper, you mentioned that all the relus are replaced with prelus. However, in the decoder implementation here: https://github.com/e-lab/ENet-training/blob/master/train/models/decoder.lua
it seems that relus are used once again instead of prelus. Should relus be used in the decoder rather than prelus?
from enet-training.
@codeAC29 can I verify with you that your calculation on the ENet test accuracy was the result of testing on both the test and validation dataset combined together? It seems to me this is a natural choice given that there are no architectural changes for both the test and validation dataset and that the only difference comes from the data. In fact, perhaps the test dataset can be distributed to both training and validation datasets?
from enet-training.
@kwotsin No, we performed testing only on test dataset. Combining test data into training/validation will give you better result but then the whole point of test data will be lost. So, you should always train and validate your network using respective data and then in the end when you have your trained network, run it on test data.
from enet-training.
Related Issues (20)
- Consider hosting the pretrained model on Github HOT 1
- model-cityscapes.net released HOT 1
- what is the test time per image on cpu HOT 1
- implementation different from paper? HOT 1
- Do you freeze weights of encoder while training the whole model HOT 1
- encoder weights HOT 5
- "assertion failed!" When trying to run demo.lua
- In frameimage.lua:17: module 'fastimage' not found: HOT 2
- Not able to reproduce the fps on tx2 HOT 3
- Confused about the camVid dataset used to train encoder HOT 5
- Is the error the same as the loss HOT 1
- lua/5.2/nn/JoinTable.lua:38: bad argument #1 to 'copy' (sizes do not match at /home/user/torch/extra/cutorch/lib/THC/THCTensorCopy.cu:31) HOT 1
- add one more class to pre-trained model
- Which encoder weights should I use as CNNEncoder?
- Cityscapes Test Result HOT 1
- Assertion `t >= 0 && t < n_classes` failed, HOT 1
- Training ENet using own Dataset
- there is no mask when i try the visualization
- input and target should be of same size
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enet-training.