hojonathanho / diffusion Goto Github PK
View Code? Open in Web Editor NEWDenoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Thanks so much for the excellent work and code sharing.
May I inquire about how to compute the rate and distortion in Figure 5 and Table 4 in the paper, especially the rate? Preferably with some codes?
Thanks!
I want to check functionalities of this code but not able to setup on my pc can somebody help for the same?
Thanks for your great work! When I train ddpm on some dataset like cityscape. It is well-known that images of this dataset are almost the same color/style. However, the colors among generated samples are quite diverse. What is more interesting, when I adjust the u-net to predict xstart or I increase the model capacity, this problem solves. I really hope you can give some hints about this phenomenon.
Hi, I am a freshman in deeplearning, recently I found your excellent work. However, you didn't provide your gpu code. Could you share your gpu code?
Hi @hojonathanho. First of all, thank you for your code.
I'm training the model on 256x256 images of the celeb_a_hq dataset taken from Kaggle.
The parameters I'm using are:
I train the model for more or less 20.000 steps, and the output is the result of using:
out = unet.model(
x, t=t, y=y, name='model', ch=128, ch_mult=(1, 1, 2, 2, 4, 4), num_res_blocks=2, attn_resolutions=(16,),
out_ch=out_ch, num_classes=self.num_classes, dropout=dropout
)
The problem is that the best result that I got so far is the following (up: original, below: final result):
I always have a "blue-ish" filter on the image. I believe this is caused by the loss function. Its job is to predict the noise instead of x_start. seems to be weighted naturally like SNR
as you wrote, but by doing so we have changes in the image colors, producing a Gaussian distribution similar to the noise's one:
Why is this loss used even if it changes the color spectrum?
Am I missing something, like the correct way to obtain an output?
Hi, thanks for your significant work.
Could you give any suggestion on model training, such as epoch, number of GPUs?
Before that, I trained DDPM on CIFAR10 with 800k iterations (bs=128), the model didn't converge.
Hello,
I want to know how to compute nll when we use diffusion model. Can you help me?
Thanks!
Hello,
is there any good code base in pytorch available?
Hi,
I want to elaborate on #2:
The sampling algorithm in your paper is a bit different that what shown in the paper.
The paper suggests this sample step
The clipping is done here
diffusion/diffusion_tf/diffusion_utils.py
Line 172 in 1e0dceb
Now I checked and indeed, without the clipping, the two equations are the same.
Can you give any interpretation or intuition for the clipping and why it is needed?
It seem to be crucial for training while not mentioned in the paper
Thanks
Hi, thanks for sharing this great work.
I have a question about the sampling implementation. First of all, what is the difference between diffusion_utils_2
and diffusion_utils_2
? I think diffusion_utils_2
is only used for cifar unconditional part. But the difference seems to be larger than that.
Another question is about the difference between current implementation and Algorithm 2 in the paper. If I understand correctly, predict_start_from_noise
predicts p(x_0|x_t)
and then q_posterior
predicts x_{t-1}
using equation 7 in the paper. This is different compare to algorithm 2, where equation 11 is used. Are those two equivalent? Or which one is better(stable)?
Thanks!
First, thank you for sharing this project with us!
Could you please add an explicit LICENSE
file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?
[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.
Thanks!
Hi, thanks your code , the paper is said that diffusion model can not reverse the image
so , how to reconstruction input image like Fig8 in paper ??
I know that to run run_celebahq.py, I need to write python3 scripts/run_celebahq.py train --bucket_name_prefix $BUCKET_PREFIX --exp_name $EXPERIMENT_NAME --tpu_name $TPU_NAME
or python3 scripts/run_celebahq.py evaluation --bucket_name_prefix $BUCKET_PREFIX --tpu_name $EVAL_TPU_NAME --model_dir $MODEL_DIR
code.
But I don't know what to write in the $BUCKET_PREFIX
, $EXPERIMENT_NAME
, $TPU_NAME
, $EVAL_TPU_NAME
, $MODEL_DIR
parts. Can you give me some examples?
When I download almost complete, it shows that there is a network problem can not download
I tried to reproduce DDPM on CIFAR10. As mentioned in the paper, my batchsize is 128, the optimizer is Adam, the learning rate is 0.0002, and I used l2 loss. I found that the training loss kept fluctuating between 0.015 and 0.030. What is this caused by? Should I need to reduce the learning rate? Can you tell me the loss of your training?
def get_timestep_embedding(timesteps, embedding_dim: int):
"""
From Fairseq.
Build sinusoidal embeddings.
This matches the implementation in tensor2tensor, but differs slightly
from the description in Section 3.5 of "Attention Is All You Need".
"""
assert len(timesteps.shape) == 1 # and timesteps.dtype == tf.int32
half_dim = embedding_dim // 2
emb = math.log(10000) / (half_dim - 1)
I don't understand why (half_dim - 1) is used here. According to the transformer's time-coding formula, there should be "emb = math.log(10000) / half_dim", I don't think half_dim should minus 1 here.
Hey I have a quick question! Is it possible to do the evaluation loop during training (e.g., every 100 iterations do once of sampling images), instead of having to execute the evaluation separately?
1) Any idea on the highlighted sentence in lilianweng blog which illustrates the μ_bar(x_t, x_0)
expression ?
2) The highlighted sentence in DDPM paper does not make sense to me.
3) I also tried checking the reference [53] , but it seems different in that reference though.
4) What is the actual purpose of "clipping" mentioned in #5 ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.