ping-c / optimizer Goto Github PK

33.0 1.0 5.0 35 KB

This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent"

Python 90.92% Shell 9.08%

optimizer's Issues

Using batch_size greater than 1 when calculating test losses and accuracies

Hi,

In train_distributed.py you are using 'batch_size=1' when calculating test losses and accuracies.
Why not to use a bigger batch_size to speed up the calculations?

Thanks!

Problem in reproducing Table2 while working on Google Collab

Hi,

I uploaded the paper's code to Google Collab in order to restore Table 2 results on Arch:LeNet Optimizer: G&C .
I run the script : !python train_distributed.py -C configs/table2/mnist_guess.yaml .
I got results for:
num_samples=2 loss_bin= (0.3,0.35) after 320,000 tested models
num_samples=4 loss_bin= (0.3,0.35) after 1,760,000 tested model

But for num_samples=8 loss_bin= (0.3,0.35), i am not getting result even after 100,000,000 tested models ( The trials continue in an infinite loop).
I was wondering how many tested models did you try until you determined that there are no solutions with 100% training accuracy in a certain loss bin? (i saw in your paper that you got this case for the linear models).
I did not install your environment.yml in the Google Collab but i don't think that this is what causing the problem.
It will be great if you can advise what should i do in this situation.

Thanks,
Tal

Add description for how the parallelized model work

train_distributed.py - line 526

Hi,
At line 526 you wrote: perfect_model_weights.append(model_result.get_weights_by_idx(perfect_model_idxs)) .

Shouldn't it be: perfect_model_weights.append(model_result.get_weights_by_idx(perfect_model_idxs.nonzero().squeeze(1))) ,
(similar to line 524) in order to take only models that correspond to the condition: ((es_l< train_loss) & (train_loss <= es_u) & (train_acc == 1.0)) instead of taking all models.

Thanks

'forward_normalize' function

Hi,
At section 4.2 ("Results on 2-Class CIFAR-10/MNIST") in the paper you say: "we compare the performance of the models across different train loss levels after the model's weights have been normalized".
'LeNetModels' class contains 2 functions that are related to normalization: 'normalize' and 'forward_normalize'.
You don't use 'normalize' function during the training with G&C optimizer , but you do use 'forward_normalize'.
I have some questions regarding this function:

Can you explain what's the idea behind normalizing the forward pass by the 'cum_norm' of the model's weights? I don't see how this function is equivalent to normalizing the model's weights.
I saw that you are not using 'Softmax' at the end of your model as usually done in classification problem. Why is that? Is the 'forward_normalize' function is a substitute for the 'Softmax'? I am guessing that applying 'Softmax' at the end of the forward pass and using 'forward_normalize' will result in outputs that are close to zero.
In line 509 at 'train_distributed.py' you pass 'forward_normalize' as the forward pass function for the train loss and accuracy calculation, but in line 687 you pass 'forward' as the forward pass function for the test train loss and accuracy calculation. Can you explain this difference?

Thanks!

Explain why there is a divide by 3 in the code base.

Estimated standard deviations calculation

Hi,
At table 2 in the paper you say: " We also show the estimated standard deviations of the averages computed over 175 random data split and training seeds ".
But 'mnist_guess.yaml' parameters are: target_model_count=200 target_model_count_subrun=10 , which means that for each combination of (cur_num_samples, cur_loss_bin) there are 20 records in the Data Base (each record contain test_acc that was calculated over 10 models accuracies) that is used for the standard deviation calculation. So for my understanding, the standard deviation is calculated over 10 random data split and training seeds.
I would appreciate if you could clarify this issue for me, to make sure that i understand the way you calculated the standard deviations.

Thanks,
Tal

reopening Estimated standard deviations calculation

Hi,
I am sorry for reopening this issue but something in your calculation of standard deviation of the estimated mean looks odd to me.
From my understanding, for each combination of (num_train_samples, loss_bin) you calculate 's' using 175 test accuracies that you found during the run.
Why do you calculate 's_mean' and treats it like the standard deviation of the estimated mean?
Is 's' not the result we are looking for?

Thanks!

Originally posted by @talrub in #4 (comment)

ping-c / optimizer Goto Github PK

optimizer's Issues

Using batch_size greater than 1 when calculating test losses and accuracies

Problem in reproducing Table2 while working on Google Collab

Add description for how the parallelized model work

train_distributed.py - line 526

'forward_normalize' function

Explain why there is a divide by 3 in the code base.

Estimated standard deviations calculation

reopening Estimated standard deviations calculation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent