hojonathanho / diffusion Goto Github PK

Denoising Diffusion Probabilistic Models

Python 100.00%

diffusion's Issues

Rate-Distortion Computation

Thanks so much for the excellent work and code sharing.

May I inquire about how to compute the rate and distortion in Figure 5 and Table 4 in the paper, especially the rate? Preferably with some codes?

Thanks!

how to setup this project

I want to check functionalities of this code but not able to setup on my pc can somebody help for the same?

color convergence on custom data

Thanks for your great work! When I train ddpm on some dataset like cityscape. It is well-known that images of this dataset are almost the same color/style. However, the colors among generated samples are quite diverse. What is more interesting, when I adjust the u-net to predict xstart or I increase the model capacity, this problem solves. I really hope you can give some hints about this phenomenon.

Could you share your gpu code?

Hi, I am a freshman in deeplearning, recently I found your excellent work. However, you didn't provide your gpu code. Could you share your gpu code?

Reconstructed celeb_a_hq images have good edges but wrong color histogram. Why?

Hi @hojonathanho. First of all, thank you for your code.
I'm training the model on 256x256 images of the celeb_a_hq dataset taken from Kaggle.
The parameters I'm using are:

ema_decay=1e-10
optimizer='adam'
dataset_dir='tensorflow_datasets'
warmup=5000
num_diffusion_timesteps=1000
beta_start=0.0001
beta_end=0.02
grad_clip=1.
beta_schedule='linear'
randflip=1
batch_size=3
img_shape=[256,256,3]
model_name='unet2d16b2c112244'
lr=0.00002
loss_type='noisepred'
dropout=0.0
randflip=1
block_size=1

I train the model for more or less 20.000 steps, and the output is the result of using:

out = unet.model(
    x, t=t, y=y, name='model', ch=128, ch_mult=(1, 1, 2, 2, 4, 4), num_res_blocks=2, attn_resolutions=(16,),
    out_ch=out_ch, num_classes=self.num_classes, dropout=dropout
  )

The problem is that the best result that I got so far is the following (up: original, below: final result):

I always have a "blue-ish" filter on the image. I believe this is caused by the loss function. Its job is to predict the noise instead of x_start. seems to be weighted naturally like SNR as you wrote, but by doing so we have changes in the image colors, producing a Gaussian distribution similar to the noise's one:

Why is this loss used even if it changes the color spectrum?
Am I missing something, like the correct way to obtain an output?

Training epochs on different datasets

Hi, thanks for your significant work.

Could you give any suggestion on model training, such as epoch, number of GPUs?

Before that, I trained DDPM on CIFAR10 with 800k iterations (bs=128), the model didn't converge.

What is the ‘bpd’ in code?

nll test

Hello,
I want to know how to compute nll when we use diffusion model. Can you help me?
Thanks!

project for pytorch.

Hello,

is there any good code base in pytorch available?

Sampling algorithm differ from paper.

Hi,
I want to elaborate on #2:
The sampling algorithm in your paper is a bit different that what shown in the paper.

The paper suggests this sample step

while you do this:

The clipping is done here

diffusion/diffusion_tf/diffusion_utils.py

Line 172 in 1e0dceb

x_recon = tf.clip_by_value(x_recon, -1., 1.)

Now I checked and indeed, without the clipping, the two equations are the same.
Can you give any interpretation or intuition for the clipping and why it is needed?
It seem to be crucial for training while not mentioned in the paper

Thanks

Questions about sampling implementation

Hi, thanks for sharing this great work.

I have a question about the sampling implementation. First of all, what is the difference between diffusion_utils_2 and diffusion_utils_2? I think diffusion_utils_2 is only used for cifar unconditional part. But the difference seems to be larger than that.

Another question is about the difference between current implementation and Algorithm 2 in the paper. If I understand correctly, predict_start_from_noise predicts p(x_0|x_t) and then q_posterior predicts x_{t-1} using equation 7 in the paper. This is different compare to algorithm 2, where equation 11 is used. Are those two equivalent? Or which one is better(stable)?

Thanks!

Please add a license to this repo

First, thank you for sharing this project with us!

Could you please add an explicit LICENSE file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.

Thanks!

how to reconstructions input image like Fig.8 in paper?

Hi, thanks your code , the paper is said that diffusion model can not reverse the image
so , how to reconstruction input image like Fig8 in paper ??

How to run training or evaluation?

I know that to run run_celebahq.py, I need to write python3 scripts/run_celebahq.py train --bucket_name_prefix $BUCKET_PREFIX --exp_name $EXPERIMENT_NAME --tpu_name $TPU_NAME or python3 scripts/run_celebahq.py evaluation --bucket_name_prefix $BUCKET_PREFIX --tpu_name $EVAL_TPU_NAME --model_dir $MODEL_DIR code.

But I don't know what to write in the $BUCKET_PREFIX, $EXPERIMENT_NAME, $TPU_NAME, $EVAL_TPU_NAME, $MODEL_DIR parts. Can you give me some examples?

can't download model and samble

When I download almost complete, it shows that there is a network problem can not download

Training on CIFAR10

I tried to reproduce DDPM on CIFAR10. As mentioned in the paper, my batchsize is 128, the optimizer is Adam, the learning rate is 0.0002, and I used l2 loss. I found that the training loss kept fluctuating between 0.015 and 0.030. What is this caused by? Should I need to reduce the learning rate? Can you tell me the loss of your training?

question about time embedding

def get_timestep_embedding(timesteps, embedding_dim: int):
"""
From Fairseq.
Build sinusoidal embeddings.
This matches the implementation in tensor2tensor, but differs slightly
from the description in Section 3.5 of "Attention Is All You Need".
"""
assert len(timesteps.shape) == 1 # and timesteps.dtype == tf.int32

half_dim = embedding_dim // 2
emb = math.log(10000) / (half_dim - 1)

I don't understand why (half_dim - 1) is used here. According to the transformer's time-coding formula, there should be "emb = math.log(10000) / half_dim", I don't think half_dim should minus 1 here.

Evaluation during training?

Hey I have a quick question! Is it possible to do the evaluation loop during training (e.g., every 100 iterations do once of sampling images), instead of having to execute the evaluation separately?

Question about diffusion rate β_t

1) Any idea on the highlighted sentence in lilianweng blog which illustrates the μ_bar(x_t, x_0) expression ?

2) The highlighted sentence in DDPM paper does not make sense to me.

3) I also tried checking the reference [53] , but it seems different in that reference though.

4) What is the actual purpose of "clipping" mentioned in #5 ?

Trouble with Output

Does this repo outputs a super resolution image?
while running the repo i got only a zip file with three files init. attaching the screenshot.

hojonathanho / diffusion Goto Github PK

diffusion's Issues

Recommend Projects

Recommend Topics

Recommend Org