Code Monkey home page Code Monkey logo

diffusion's Introduction

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, Pieter Abbeel

Paper: https://arxiv.org/abs/2006.11239

Website: https://hojonathanho.github.io/diffusion

Samples generated by our model

Experiments run on Google Cloud TPU v3-8. Requires TensorFlow 1.15 and Python 3.5, and these dependencies for CPU instances (see requirements.txt):

pip3 install fire
pip3 install scipy
pip3 install pillow
pip3 install tensorflow-probability==0.8
pip3 install tensorflow-gan==0.0.0.dev0
pip3 install tensorflow-datasets==2.1.0

The training and evaluation scripts are in the scripts/ subdirectory. The commands to run training and evaluation are in comments at the top of the scripts. Data is stored in GCS buckets. The scripts are written to assume that the bucket names are of the form gs://mybucketprefix-us-central1; i.e. some prefix followed by the region. The prefix should be passed into the scripts using the --bucket_name_prefix flag.

Models and samples can be found at: https://www.dropbox.com/sh/pm6tn31da21yrx4/AABWKZnBzIROmDjGxpB6vn6Ja

Citation

If you find our work relevant to your research, please cite:

@article{ho2020denoising,
    title={Denoising Diffusion Probabilistic Models},
    author={Jonathan Ho and Ajay Jain and Pieter Abbeel},
    year={2020},
    journal={arXiv preprint arxiv:2006.11239}
}

diffusion's People

Contributors

hojonathanho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusion's Issues

Rate-Distortion Computation

Thanks so much for the excellent work and code sharing.

May I inquire about how to compute the rate and distortion in Figure 5 and Table 4 in the paper, especially the rate? Preferably with some codes?

Thanks!

Sampling algorithm differ from paper.

Hi,
I want to elaborate on #2:
The sampling algorithm in your paper is a bit different that what shown in the paper.

The paper suggests this sample step
Screenshot from 2021-05-20 12-36-24

while you do this:
Screenshot from 2021-05-20 12-55-28

The clipping is done here

x_recon = tf.clip_by_value(x_recon, -1., 1.)

Now I checked and indeed, without the clipping, the two equations are the same.
Can you give any interpretation or intuition for the clipping and why it is needed?
It seem to be crucial for training while not mentioned in the paper

Thanks

Questions about sampling implementation

Hi, thanks for sharing this great work.

I have a question about the sampling implementation. First of all, what is the difference between diffusion_utils_2 and diffusion_utils_2? I think diffusion_utils_2 is only used for cifar unconditional part. But the difference seems to be larger than that.

Another question is about the difference between current implementation and Algorithm 2 in the paper. If I understand correctly, predict_start_from_noise predicts p(x_0|x_t) and then q_posterior predicts x_{t-1} using equation 7 in the paper. This is different compare to algorithm 2, where equation 11 is used. Are those two equivalent? Or which one is better(stable)?

Thanks!

How to run training or evaluation?

I know that to run run_celebahq.py, I need to write python3 scripts/run_celebahq.py train --bucket_name_prefix $BUCKET_PREFIX --exp_name $EXPERIMENT_NAME --tpu_name $TPU_NAME or python3 scripts/run_celebahq.py evaluation --bucket_name_prefix $BUCKET_PREFIX --tpu_name $EVAL_TPU_NAME --model_dir $MODEL_DIR code.

But I don't know what to write in the $BUCKET_PREFIX, $EXPERIMENT_NAME, $TPU_NAME, $EVAL_TPU_NAME, $MODEL_DIR parts. Can you give me some examples?

color convergence on custom data

Thanks for your great work! When I train ddpm on some dataset like cityscape. It is well-known that images of this dataset are almost the same color/style. However, the colors among generated samples are quite diverse. What is more interesting, when I adjust the u-net to predict xstart or I increase the model capacity, this problem solves. I really hope you can give some hints about this phenomenon.

question about time embedding

Snipaste_2022-11-24_16-26-12

def get_timestep_embedding(timesteps, embedding_dim: int):
"""
From Fairseq.
Build sinusoidal embeddings.
This matches the implementation in tensor2tensor, but differs slightly
from the description in Section 3.5 of "Attention Is All You Need".
"""
assert len(timesteps.shape) == 1 # and timesteps.dtype == tf.int32

half_dim = embedding_dim // 2
emb = math.log(10000) / (half_dim - 1)

I don't understand why (half_dim - 1) is used here. According to the transformer's time-coding formula, there should be "emb = math.log(10000) / half_dim", I don't think half_dim should minus 1 here.

Reconstructed celeb_a_hq images have good edges but wrong color histogram. Why?

Hi @hojonathanho. First of all, thank you for your code.
I'm training the model on 256x256 images of the celeb_a_hq dataset taken from Kaggle.
The parameters I'm using are:

  • ema_decay=1e-10
  • optimizer='adam'
  • dataset_dir='tensorflow_datasets'
  • warmup=5000
  • num_diffusion_timesteps=1000
  • beta_start=0.0001
  • beta_end=0.02
  • grad_clip=1.
  • beta_schedule='linear'
  • randflip=1
  • batch_size=3
  • img_shape=[256,256,3]
  • model_name='unet2d16b2c112244'
  • lr=0.00002
  • loss_type='noisepred'
  • dropout=0.0
  • randflip=1
  • block_size=1

I train the model for more or less 20.000 steps, and the output is the result of using:

out = unet.model(
    x, t=t, y=y, name='model', ch=128, ch_mult=(1, 1, 2, 2, 4, 4), num_res_blocks=2, attn_resolutions=(16,),
    out_ch=out_ch, num_classes=self.num_classes, dropout=dropout
  )

The problem is that the best result that I got so far is the following (up: original, below: final result):

image
image

I always have a "blue-ish" filter on the image. I believe this is caused by the loss function. Its job is to predict the noise instead of x_start. seems to be weighted naturally like SNR as you wrote, but by doing so we have changes in the image colors, producing a Gaussian distribution similar to the noise's one:

image

Why is this loss used even if it changes the color spectrum?
Am I missing something, like the correct way to obtain an output?

Training on CIFAR10

I tried to reproduce DDPM on CIFAR10. As mentioned in the paper, my batchsize is 128, the optimizer is Adam, the learning rate is 0.0002, and I used l2 loss. I found that the training loss kept fluctuating between 0.015 and 0.030. What is this caused by? Should I need to reduce the learning rate? Can you tell me the loss of your training?

how to setup this project

I want to check functionalities of this code but not able to setup on my pc can somebody help for the same?

Evaluation during training?

Hey I have a quick question! Is it possible to do the evaluation loop during training (e.g., every 100 iterations do once of sampling images), instead of having to execute the evaluation separately?

Could you share your gpu code?

Hi, I am a freshman in deeplearning, recently I found your excellent work. However, you didn't provide your gpu code. Could you share your gpu code?

Please add a license to this repo

First, thank you for sharing this project with us!

Could you please add an explicit LICENSE file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.

Thanks!

Training epochs on different datasets

Hi, thanks for your significant work.

Could you give any suggestion on model training, such as epoch, number of GPUs?

Before that, I trained DDPM on CIFAR10 with 800k iterations (bs=128), the model didn't converge.

nll test

Hello,
I want to know how to compute nll when we use diffusion model. Can you help me?
Thanks!

Trouble with Output

Does this repo outputs a super resolution image?
while running the repo i got only a zip file with three files init. attaching the screenshot.
author

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.