Comments (5)
Very good observation! The convergence of DSM will be plagued by large variance and will be very small for small sigma. This is a known issue but can be alleviated by control variates (see https://arxiv.org/abs/2101.03288 as an example). In our experiments we do DSM across multiple noise scales, and didn't observe slowed convergence since there are many large sigmas in the noise scales.
from ncsn.
Ah ok, I was already planning for variance reduction methods :) For larger sigmae everything seems to be much smoother - that I observed as well. I wonder if the runtime advantage of dsm over ism is not eaten up again by slower convergence? After all, for ism, we only need the trace of the jacobian, which should be faster to compute than the entire jacobian (if frameworks like PyTorch would support such an operation). I have already a quite fast version (limited to specific NN architectures) here
from ncsn.
Trace of the jacobian is still very expensive to compute. That said, there are methods like sliced score matching that do not add noise and are not affected by variance issues. I tried them in training score-based models before. They gave decent performance, but didn't seem to outperform dsm.
from ncsn.
Yes, very true if data dimensions become large. I was thinking about (low-rank) approximations to the jacobian and came across this paper
Abdel-Khalik, Hany S., et al. "A low rank approach to automatic differentiation." Advances in Automatic Differentiation. Springer, Berlin, Heidelberg, 2008. 55-65.
which is also quite dated. But after skimming it, the idea seems connected to your sliced SM approach: as if sliced score matching computes a low-rank jacobian approximation.
Ok, thanks for your valuable time and have a nice Saturday.
from ncsn.
I've recreated your toy-example to compare Langevin and annealed Langevin sampling. In particular, I've not used exact scores but trained a toy model to perform score prediction. The results are below. In the first figure on right plot we see default Langevin sampling (model trained unconditionally) with expected issues. The next figure (again right plot) shows annealed Langevin sampling as proposed in your paper (model trained conditioned on noise-level). The results are as expected, but I had to change one particular thing to make it work:
- The noise levels range from [2..0.01] compared to [20..1] as mentioned in the paper. I tried with the original settings, but a sigma of 20 basically gives a flat space and this led to particles flying off in all kind of directions.
I believe the difference is due to the inexactness of model prediction and, of course, due to potential hidden errors in the code. Would you agree?
from ncsn.
Related Issues (10)
- Model in eval forever after the first log on validation HOT 1
- Custom dataset HOT 1
- Baseline not converge HOT 1
- Error loading pre-trained checkpoints HOT 1
- Annealed GMM Analytical Log Probabilities
- About the condition in conditional score network HOT 2
- Why need adding noise twice HOT 2
- Question about adding noise to input
- Is it feasible to directly calculate the DSM loss function without scorenet estimation?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ncsn.