hojonathanho / diffusion Goto Github PK

View Code? Open in Web Editor NEW

3.3K 20.0 324.0 102.61 MB

Denoising Diffusion Probabilistic Models

Python 100.00%

diffusion's Introduction

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, Pieter Abbeel

Paper: https://arxiv.org/abs/2006.11239

Website: https://hojonathanho.github.io/diffusion

Experiments run on Google Cloud TPU v3-8. Requires TensorFlow 1.15 and Python 3.5, and these dependencies for CPU instances (see requirements.txt):

pip3 install fire
pip3 install scipy
pip3 install pillow
pip3 install tensorflow-probability==0.8
pip3 install tensorflow-gan==0.0.0.dev0
pip3 install tensorflow-datasets==2.1.0

The training and evaluation scripts are in the scripts/ subdirectory. The commands to run training and evaluation are in comments at the top of the scripts. Data is stored in GCS buckets. The scripts are written to assume that the bucket names are of the form gs://mybucketprefix-us-central1; i.e. some prefix followed by the region. The prefix should be passed into the scripts using the --bucket_name_prefix flag.

Models and samples can be found at: https://www.dropbox.com/sh/pm6tn31da21yrx4/AABWKZnBzIROmDjGxpB6vn6Ja

Citation

If you find our work relevant to your research, please cite:

@article{ho2020denoising,
    title={Denoising Diffusion Probabilistic Models},
    author={Jonathan Ho and Ajay Jain and Pieter Abbeel},
    year={2020},
    journal={arXiv preprint arxiv:2006.11239}
}

diffusion's People

Contributors

Stargazers

Watchers

Forkers

ioangatop liannice yqgans peterzhousz yyht 4sunshine liumarcus70s jayendra13 shawwn israrbacha arfafax deniseduma aleksandervainer conormdurkan vlievin hendrydong qiaoptdun jxzhangjhu ml-lab eashanadhikarla simonsleo wn1695173791 destructive-observer stjordanis metavai heroism502 gengcong940126 rezaarmand niudunkule lauvchen ziqiaomeng xuty-007 rodski 0xpussycat xuliangcs prasad131 celsopitta whynot-research-group yuxuan-sun pkulwj1994 wangqiang9 jongwankim2090 xueliu8617112 ashishd dercaft colorful-liyu horikawahorikawa fallenshock dennychui weonyoungjoo leehosu01 kokoedwin skwashie freedom1979 swizad patricia886 techthiyanes jackie666666 swordbearfire zeta1999 laplacekorea cxz silyfox nameless-mc stanley17932 shifengxu gaozhihan lilyevanshogwarts sheryl-ai jags111 royandzoe pscgylotti kurn3san sundevil0405 zjuchy lcrypto happyxy feimeng93 smksyj salvatoretrimarchi shuheikatoinfo integritynoble elgabbaghoul xiaonengmiao xiezixiustc cywtim tzonglin66 happypanda94 ml-edu silvesteryu winwinjjiang ymcidence fieryswampshire jcjohnson zideliu zhouhaowa frank1543179 yingjerkao byeboyhi benygood

diffusion's Issues

can't download model and samble

When I download almost complete, it shows that there is a network problem can not download

Rate-Distortion Computation

Thanks so much for the excellent work and code sharing.

May I inquire about how to compute the rate and distortion in Figure 5 and Table 4 in the paper, especially the rate? Preferably with some codes?

Thanks!

Sampling algorithm differ from paper.

Hi,
I want to elaborate on #2:
The sampling algorithm in your paper is a bit different that what shown in the paper.

The paper suggests this sample step

while you do this:

The clipping is done here

diffusion/diffusion_tf/diffusion_utils.py

Line 172 in 1e0dceb

x_recon = tf.clip_by_value(x_recon, -1., 1.)

Now I checked and indeed, without the clipping, the two equations are the same.
Can you give any interpretation or intuition for the clipping and why it is needed?
It seem to be crucial for training while not mentioned in the paper

Thanks

Questions about sampling implementation

Hi, thanks for sharing this great work.

I have a question about the sampling implementation. First of all, what is the difference between diffusion_utils_2 and diffusion_utils_2? I think diffusion_utils_2 is only used for cifar unconditional part. But the difference seems to be larger than that.

Another question is about the difference between current implementation and Algorithm 2 in the paper. If I understand correctly, predict_start_from_noise predicts p(x_0|x_t) and then q_posterior predicts x_{t-1} using equation 7 in the paper. This is different compare to algorithm 2, where equation 11 is used. Are those two equivalent? Or which one is better(stable)?

Thanks!

How to run training or evaluation?

I know that to run run_celebahq.py, I need to write python3 scripts/run_celebahq.py train --bucket_name_prefix $BUCKET_PREFIX --exp_name $EXPERIMENT_NAME --tpu_name $TPU_NAME or python3 scripts/run_celebahq.py evaluation --bucket_name_prefix $BUCKET_PREFIX --tpu_name $EVAL_TPU_NAME --model_dir $MODEL_DIR code.

But I don't know what to write in the $BUCKET_PREFIX, $EXPERIMENT_NAME, $TPU_NAME, $EVAL_TPU_NAME, $MODEL_DIR parts. Can you give me some examples?

color convergence on custom data

Thanks for your great work! When I train ddpm on some dataset like cityscape. It is well-known that images of this dataset are almost the same color/style. However, the colors among generated samples are quite diverse. What is more interesting, when I adjust the u-net to predict xstart or I increase the model capacity, this problem solves. I really hope you can give some hints about this phenomenon.

question about time embedding

def get_timestep_embedding(timesteps, embedding_dim: int):
"""
From Fairseq.
Build sinusoidal embeddings.
This matches the implementation in tensor2tensor, but differs slightly
from the description in Section 3.5 of "Attention Is All You Need".
"""
assert len(timesteps.shape) == 1 # and timesteps.dtype == tf.int32

half_dim = embedding_dim // 2
emb = math.log(10000) / (half_dim - 1)

I don't understand why (half_dim - 1) is used here. According to the transformer's time-coding formula, there should be "emb = math.log(10000) / half_dim", I don't think half_dim should minus 1 here.

Reconstructed celeb_a_hq images have good edges but wrong color histogram. Why?

Hi @hojonathanho. First of all, thank you for your code.
I'm training the model on 256x256 images of the celeb_a_hq dataset taken from Kaggle.
The parameters I'm using are:

ema_decay=1e-10
optimizer='adam'
dataset_dir='tensorflow_datasets'
warmup=5000
num_diffusion_timesteps=1000
beta_start=0.0001
beta_end=0.02
grad_clip=1.
beta_schedule='linear'
randflip=1
batch_size=3
img_shape=[256,256,3]
model_name='unet2d16b2c112244'
lr=0.00002
loss_type='noisepred'
dropout=0.0
randflip=1
block_size=1

I train the model for more or less 20.000 steps, and the output is the result of using:

out = unet.model(
    x, t=t, y=y, name='model', ch=128, ch_mult=(1, 1, 2, 2, 4, 4), num_res_blocks=2, attn_resolutions=(16,),
    out_ch=out_ch, num_classes=self.num_classes, dropout=dropout
  )

The problem is that the best result that I got so far is the following (up: original, below: final result):

I always have a "blue-ish" filter on the image. I believe this is caused by the loss function. Its job is to predict the noise instead of x_start. seems to be weighted naturally like SNR as you wrote, but by doing so we have changes in the image colors, producing a Gaussian distribution similar to the noise's one:

Why is this loss used even if it changes the color spectrum?
Am I missing something, like the correct way to obtain an output?

Training on CIFAR10

I tried to reproduce DDPM on CIFAR10. As mentioned in the paper, my batchsize is 128, the optimizer is Adam, the learning rate is 0.0002, and I used l2 loss. I found that the training loss kept fluctuating between 0.015 and 0.030. What is this caused by? Should I need to reduce the learning rate? Can you tell me the loss of your training?

how to setup this project

I want to check functionalities of this code but not able to setup on my pc can somebody help for the same?

Evaluation during training?

Hey I have a quick question! Is it possible to do the evaluation loop during training (e.g., every 100 iterations do once of sampling images), instead of having to execute the evaluation separately?

Could you share your gpu code?

Hi, I am a freshman in deeplearning, recently I found your excellent work. However, you didn't provide your gpu code. Could you share your gpu code?

project for pytorch.

Hello,

is there any good code base in pytorch available?

Please add a license to this repo

First, thank you for sharing this project with us!

Could you please add an explicit LICENSE file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.

Thanks!