Hi, Thanks for your awesome work! For Figure 3 in

Thanks for your patient explanation! <a class="user-mention notranslate" data-hovercar

A confusion when reading your paper about importance-sampling HOT 4 CLOSED

idiap commented on August 16, 2024

A confusion when reading your paper

from importance-sampling.

Comments (4)

angeloskath commented on August 16, 2024

Hi,

Thanks for taking the time and for your good words! I will address your questions one by one:

Regarding Figure 3 you are correct that the comment should be almost all methods provide some speedup.
In Figure 6 for reasons of clarity (8 lines in a graph are hard to read) we only compare with SVRG type methods. Although they are designed to improve upon the uniform baseline, a nice take-away message from the paper is that when it comes to highly non-convex large model training they actually perform worse than plain SGD with momentum.

Regarding further explanations with respect to the results in Figure 6, the SVRG based methods require two gradient evaluations per parameter update and a certain number of gradient evaluations every m iterations. The benefits are reaped mostly at the very final stages of training where almost zero variance is needed. But in the case of large models such as the WRN-28-2 that is used in the paper the overhead is way too much and the constant factors overwhelm the asymptotic improvement. In literature there is barely any comparison of SVRG methods with momentum SGD precisely because momentum SGD sets a very competitive baseline.

I hope I covered your concerns. Feel free to ask more either here or by email.

Cheers,
Angelos

from importance-sampling.

zhiqiangdon commented on August 16, 2024

Thanks for your patient explanation! @angeloskath

from importance-sampling.

angeloskath commented on August 16, 2024

You 're welcome. I am closing the issue, feel free to reopen it if you have further questions or shoot me an email.

from importance-sampling.

borre47 commented on August 16, 2024

Hi, what is your email, please?

from importance-sampling.

Recommend Projects

A confusion when reading your paper about importance-sampling HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent