Comments (4)
Hi,
Thanks for taking the time and for your good words! I will address your questions one by one:
- Regarding Figure 3 you are correct that the comment should be almost all methods provide some speedup.
- In Figure 6 for reasons of clarity (8 lines in a graph are hard to read) we only compare with SVRG type methods. Although they are designed to improve upon the uniform baseline, a nice take-away message from the paper is that when it comes to highly non-convex large model training they actually perform worse than plain SGD with momentum.
Regarding further explanations with respect to the results in Figure 6, the SVRG based methods require two gradient evaluations per parameter update and a certain number of gradient evaluations every m
iterations. The benefits are reaped mostly at the very final stages of training where almost zero variance is needed. But in the case of large models such as the WRN-28-2 that is used in the paper the overhead is way too much and the constant factors overwhelm the asymptotic improvement. In literature there is barely any comparison of SVRG methods with momentum SGD precisely because momentum SGD sets a very competitive baseline.
I hope I covered your concerns. Feel free to ask more either here or by email.
Cheers,
Angelos
from importance-sampling.
Thanks for your patient explanation! @angeloskath
from importance-sampling.
You 're welcome. I am closing the issue, feel free to reopen it if you have further questions or shoot me an email.
from importance-sampling.
Hi, what is your email, please?
from importance-sampling.
Related Issues (20)
- Addition of RAIS in the implemented methods
- The result seems okay, but there are a few confusing things HOT 2
- Fail when metric given as function HOT 9
- About the complete importance sampling code HOT 9
- Add initial_epoch to ImportanceTraining classes
- Remove unnecessary loading of validation data to memory HOT 2
- Failed when using ModelCheckpoint callback HOT 3
- Port to tensorflow 2 HOT 20
- Support of networks with multiple outputs HOT 10
- How do I get the serial Numbers of important samples HOT 1
- High GPU memory usage HOT 4
- Memory leak? HOT 2
- Question regarding Eq. 29 HOT 1
- Question regarding maximum variance reduction & speedup HOT 4
- RuntimeError: The layer has never been called and thus has no defined output shape. HOT 6
- Adjust Learning rate
- Compatibility with tensorflow = 2.5.0 and Keras = 2.4 HOT 4
- tensorflow and keras versions HOT 2
- Some confusion about the fast grads calculation when converting to Pytorch. HOT 3
- Would it be possible to calculate b from tau_threshold?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from importance-sampling.