Hey, I'm currently tracing out the story of diffusion generative mod

A question on convergence of DSM about ncsn HOT 5 OPEN

cheind commented on June 14, 2024

A question on convergence of DSM

from ncsn.

Comments (5)

yang-song commented on June 14, 2024

Very good observation! The convergence of DSM will be plagued by large variance and will be very small for small sigma. This is a known issue but can be alleviated by control variates (see https://arxiv.org/abs/2101.03288 as an example). In our experiments we do DSM across multiple noise scales, and didn't observe slowed convergence since there are many large sigmas in the noise scales.

from ncsn.

cheind commented on June 14, 2024

Ah ok, I was already planning for variance reduction methods :) For larger sigmae everything seems to be much smoother - that I observed as well. I wonder if the runtime advantage of dsm over ism is not eaten up again by slower convergence? After all, for ism, we only need the trace of the jacobian, which should be faster to compute than the entire jacobian (if frameworks like PyTorch would support such an operation). I have already a quite fast version (limited to specific NN architectures) here

https://github.com/cheind/diffusion-models/blob/189fbf545f07be0f8f9c42bc803016b846602f3c/diffusion/jacobians.py#L5

from ncsn.

yang-song commented on June 14, 2024

Trace of the jacobian is still very expensive to compute. That said, there are methods like sliced score matching that do not add noise and are not affected by variance issues. I tried them in training score-based models before. They gave decent performance, but didn't seem to outperform dsm.

from ncsn.

cheind commented on June 14, 2024

Yes, very true if data dimensions become large. I was thinking about (low-rank) approximations to the jacobian and came across this paper

Abdel-Khalik, Hany S., et al. "A low rank approach to automatic differentiation." Advances in Automatic Differentiation. Springer, Berlin, Heidelberg, 2008. 55-65.

which is also quite dated. But after skimming it, the idea seems connected to your sliced SM approach: as if sliced score matching computes a low-rank jacobian approximation.

Ok, thanks for your valuable time and have a nice Saturday.

from ncsn.

cheind commented on June 14, 2024

I've recreated your toy-example to compare Langevin and annealed Langevin sampling. In particular, I've not used exact scores but trained a toy model to perform score prediction. The results are below. In the first figure on right plot we see default Langevin sampling (model trained unconditionally) with expected issues. The next figure (again right plot) shows annealed Langevin sampling as proposed in your paper (model trained conditioned on noise-level). The results are as expected, but I had to change one particular thing to make it work:

The noise levels range from [2..0.01] compared to [20..1] as mentioned in the paper. I tried with the original settings, but a sigma of 20 basically gives a flat space and this led to particles flying off in all kind of directions.

I believe the difference is due to the inexactness of model prediction and, of course, due to potential hidden errors in the code. Would you agree?

from ncsn.

A question on convergence of DSM about ncsn HOT 5 OPEN

Comments (5)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent