forever208 / ddpm-ip Goto Github PK

[ICML 2023] official implementation for "Input Perturbation Reduces Exposure Bias in Diffusion Models"

License: MIT License

Python 100.00%

ddpm-ip's Introduction

DDPM-IP

This is the codebase for the ICML 2023 paper Input Perturbation Reduces Exposure Bias in Diffusion Models.
This repository is heavily based on openai/guided-diffusion, with training modification of input perturbation.

Also, feel free to check out our ICLR 2024 paper Elucidating the Exposure Bias in Diffusion Models which introduces a simple training-free solution to exposure bias. Repository: ADM-ES and EDM-ES

Simple to implement Input Perturbation in diffusion models

Our proposed Input Perturbation is an extremely simple plug-in method for general diffusion models. The implementation of Input Perturbation is just two lines of code.

For instance, based on guided-diffusion, the only code modifications are in the script guided_diffusion/gaussian_diffusion.py, in line 765-766:

new_noise = noise + gamma * th.randn_like(noise)  # gamma=0.1
x_t = self.q_sample(x_start, t, noise=new_noise)

NOTE THAT: change the parameter GPUS_PER_NODE = 4 in the script dist_util.py according to your GPU cluster configuration.

Installation

the installation is the same with guided-diffusion

git clone https://github.com/forever208/DDPM-IP.git
cd DDPM-IP
conda create -n ADM python=3.8
conda activate ADM
pip install -e .
(note that, pytorch 1.10~1.13 is recommended as our experiments in paper were done with pytorch 1.10 and pytorch 2.0 has not been tested by us in this repo)

# install the missing packages
conda install mpi4py
conda install numpy
pip install Pillow
pip install opencv-python

Download ADM-IP models and ADM base models

We have released checkpoints for the main models in the paper.

(The baseline checkpoint of ImageNet-32 and CelebA-64 are missing due to unexpected server file deletion. If you have trained the ADM base models, welcome to share the checkpoints)

Here are the download links for model checkpoints:

CIFAR10 32x32: ADM-IP.pt, ADM-baseline.pt
ImageNet 32x32: ADM-IP.pt
LSUN tower 64x64: ADM-IP.pt, ADM-baseline.pt
CelebA 64x64: ADM-IP.pt
FFHQ 128x128: ADM-IP.pt, ADM-baseline.pt
CIFAR10 32x32: DDIM-IP (NOTE THAT we use DDIM official code to do DDIM-IP training and sampling)

Sampling from pre-trained ADM-IP models

To unconditionally sample from these models, you can use the image_sample.py scripts. Sampling from DDPM-IP has no difference with sampling from openai/guided-diffusion since DDPM-IP does not change the sampling process.

For example, we sample 50k images using 100 steps from CIFAR10 by:

mpirun python scripts/image_sample.py \
--image_size 32 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

sample 50k images using 100 steps from LSUN_tower by:

mpirun -n 1 python scripts/image_sample.py \
--image_size 64 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--use_fp16 True --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --batch_size 256 --num_samples 50000

sample 50k images using 100 steps from FFHQ128 by:

mpirun -n 1 python scripts/image_sample.py \
--image_size 128 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--use_fp16 True --num_channels 256 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --batch_size 128 --num_samples 50000

Results

This table summarizes our input perturbation results based on ADM baselines. Input perturbation shows tremendous training acceleration and much better FID results.

FID computation details:

All FIDs are computed using 50K generated samples (unconditional sampling).
For CIFAR10 and ImageNet 32x32, we use the whole training data as the reference batch,
For LSUN tower 64x64 and CelebA 64x64, we randomly pick up 50k samples from the training set, forming the reference batch

This table summarizes our input perturbation results based on DDIM baselines.

Prepare datasets

Please refer to README.md for the data preparation.

Training ADM-IP

Training diffusion models are described in this repository.

Training ADM-IP only requires one more argument --input perturbation 0.1 (set --input perturbation 0.0 for the baseline).

NOTE THAT: if you have problems with slurm multi-node training, try the following setting. Let's say training by 16 GPUs on 2 nodes:

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:8 # 8 gpus for each node

instead of specifying mpiexec -n 16, you run by mpirun python script/image_train.py. (more discussion can be found here)

We share the complete arguments of training ADM-IP in the four datasets:

CIFAR10

mpiexec -n 2  python scripts/image_train.py --input_pertub 0.15 \
--data_dir PATH_TO_DATASET \
--image_size 32 --use_fp16 True --num_channels 128 --num_head_channels 32 --num_res_blocks 3 \
--attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 64

ImageNet 32x32 (you can also choose dropout=0.1)

mpiexec -n 4  python scripts/image_train.py --input_pertub 0.1 \
--data_dir PATH_TO_DATASET \
--image_size 32 --use_fp16 True --num_channels 128 --num_head_channels 32 --num_res_blocks 3 \
--attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 128

LSUN tower 64x64

mpiexec -n 16  python scripts/image_train.py --input_pertub 0.1 \
--data_dir PATH_TO_DATASET \
--image_size 64 --use_fp16 True --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 16

CelebA 64x64

mpiexec -n 16  python scripts/image_train.py --input_pertub 0.1 \
--data_dir PATH_TO_DATASET \
--image_size 64 --use_fp16 True --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 16

FFHQ 128x128

mpirun -n 16 python scripts/image_train.py --input_pertub 0.1 \
--data_dir PATH_TO_DATASET \
--image_size 128 --use_fp16 True --num_channels 256 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 8

Citation

If you find our work useful, please feel free to cite by

@inproceedings{ning2023input,
  title={Input Perturbation Reduces Exposure Bias in Diffusion Models},
  author={Ning, Mang and Sangineto, Enver and Porrello, Angelo and Calderara, Simone and Cucchiara, Rita},
  booktitle={International Conference on Machine Learning},
  pages={26245--26265},
  year={2023},
  organization={PMLR}
}

@article{ning2023elucidating,
  title={Elucidating the Exposure Bias in Diffusion Models},
  author={Ning, Mang and Li, Mingxiao and Su, Jianlin and Salah, Albert Ali and Ertugrul, Itir Onal},
  journal={arXiv preprint arXiv:2308.15321},
  year={2023}
}

ddpm-ip's People

Contributors

Stargazers

Watchers

Forkers

reihaneh-torkzadehmahani rzyfrank cenkbircanoglu kevinwang676 zqirui psr6275 rachelteamo mathialm piyushi-0

ddpm-ip's Issues

Line 868: noise = th.randn_like(x_start) remains unchanged

Hi, I found that in line 204 and 868 of guided_diffusion/gaussian_diffusion.py, the code noise = th.randn_like(x_start) remains unchanged. Shouldn't we have new_noise = noise + gamma * th.randn_like(noise) in line 868 as well? Or it doesn't matter. Thanks!

About using ddim50 on face dataset

I found an issue when I use ddim50 sampling aftrer training ddpm-ip on my own face dataset. The sampled images have much noise (however using ddpm50 is ok). I tried to use pre-trained celeba ckpt you provide and found the same problem.

mpiexec 4 python scripts/image_sample.py
--image_size 32 --timestep_respacing ddim50 --use_ddim True
--model_path DDPM_IP_celeba64.pt
--num_channels 192 --num_head_channels 64 --num_res_blocks 3 --attention_resolutions 32,16,8
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.1
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

An error when using a different noise schedule

Hi, I got an error called "ValueError: only one element tensors can be converted to Python scalars" when I tried to use a different noise schedule for $\xi$. I want $\gamma_{0}, \cdots, \gamma_{T}$ to have differents values as $t$ increases, but it seems that I can't just introduce a different parameter new_nosie that is dependent on $t$. The error is shown below. Could you help me resolve this issue? Thank you!

Input perturbation by increasing the noise strength

Hi @forever208 ,
Great work! I like your observation of the inconsistency between training and sampling and propose a simple input perturbation to mitigate/align it. A question about your implementation code.

new_noise = noise + gamma * th.randn_like(noise)  # gamma=0.1

I understand this equation as increasing the noise strength because the new noise is essentially two Gaussian noises added with weight 1 and gamma. Thus the resultant new_noise is a gaussian noise scaled by (1+gemma)

If so, can I understand the input perturbation as interpolating with a larger noise?

Thanks,
Zhangzhi

Do the trick related to noise offset

https://www.crosslabs.org//blog/diffusion-with-offset-noise
@forever208

Noisy Images when training on custom dataset

When I try to train a model on my own dataset, with the command for the FFHQ 128x128 dataset I get really noisy images, even after 90000 steps:

The custom dataset I want to train on is the NCT-CRC-HE-100K dataset. The images in the dataset are 224x224, but for me 128x128 would be enough.

Do you have any ideas why I get those noisy images, instead of better quality?

Here is the command that i use for training:

mpirun -n 3 python scripts/image_train.py --input_pertub 0.1 --data_dir /home/tmp/scholuka/NCT_PNG/ADI/ --image_size 128 --use_fp16 True --num_channels 256 --num_head_channels 64 --num_res_blocks 3 --attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 8 --save_interval 1000

Here is the command that i use for generation:

mpirun -n 3 python scripts/image_sample.py --image_size 128 --timestep_respacing 100 --model_path /vol/tmp/scholuka/diffusion/DDPM-IP/openai-2024-01-01-19-40-18-680385/model099000.pt --use_fp16 True --num_channels 256 --num_head_channels 64 --num_res_blocks 3 --attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.1 --diffusion_steps 100 --noise_schedule cosine --use_scale_shift_norm True --rescale_learned_sigmas True --batch_size 128 --num_samples 50

Edit: The images in question should look like this:

Multi-node training does not work

Thanks for your good work!

I have some questions about the multi-node training.
Specificaly, I try your script (mpiexec -n 16 or mpirun ) in 2 nodes by 16 GPUs for the imagenet, but NCCL error still occurs.

Script:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:8

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

mpirun python scripts/image_train.py --data_dir /input/datasets/imagenet/train.zip --attention_resolutions 32,16,8 --class_cond True
--diffusion_steps 1000 --image_size 128 --learn_sigma True --noise_schedule linear --num_channels 256 --num_heads 4
--num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --lr 1e-4 --batch_size 8 --logger_dir '/input/guide_diffusion/image128con'

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:8

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

mpiexec -n 16 python scripts/image_train.py --data_dir /input/datasets/imagenet/train.zip --attention_resolutions 32,16,8 --class_cond True
--diffusion_steps 1000 --image_size 128 --learn_sigma True --noise_schedule linear --num_channels 256 --num_heads 4
--num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --lr 1e-4 --batch_size 8 --logger_dir '/input/guide_diffusion/image128con'

Error:

RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:825, invalid usage, NCCL version 2.7.8

ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

ValueError: unsupported image size: 32

Thanks for sharing such a fantastic job.

When I try to train on CIFAR-10 (or any other dataset with image size = 32), I meet the error.

The full information is like this,

Traceback (most recent call last):
File "scripts/image_train.py", line 83, in
main()
File "scripts/image_train.py", line 27, in main
**args_to_dict(args, model_and_diffusion_defaults().keys())
File "/mnt/backup2/home/zxwang22/code/fedif/DDPM-IP/guided_diffusion/script_util.py", line 115, in create_model_and_diffusion
use_new_attention_order=use_new_attention_order,
File "/mnt/backup2/home/zxwang22/code/fedif/DDPM-IP/guided_diffusion/script_util.py", line 158, in create_model
raise ValueError(f"unsupported image size: {image_size}")
ValueError: unsupported image size: 32

I found that in 'DDPM-IP/guided_diffusion/script_util.py', L158, there is no definition when the input size equals 32.

Can you help me with this issue?

Thanks a lot!

the pretrained CelebA 64x64 model gives me noisy sample

When I extracted image samples using the pretrained CelebA 64x64 model you provided, I obtained noisy images. Is this the correct pretrained weight?

I used arguments below.

python scripts/image_sample.py
--image_size 64
--model_path PATH_TO_CHECKPOINT
--num_channels 192 --num_head_channels 64 --num_res_blocks 3 --attention_resolutions 32,16,8
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.1
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 10 --num_samples 10

If the above arguments are not correct, can you provide me the arguments for sampling CelebA 64 x 64 images?

How to train ADM-Baseline.pt on CiFar-10?

Thanks for the open source code, but I'm curious how to train to get ADM-Baseline on the CiFar-10 dataset. Is it just changing the perturbation to 0.0 in the settings below? I made a previous attempt when the FID was 7.14 after training 300K iters, so I would appreciate your advice. Thanks.

mpiexec -n 2  python scripts/image_train.py --input_pertub 0.15 \
--data_dir PATH_TO_DATASET \
--image_size 32 --use_fp16 True --num_channels 128 --num_head_channels 32 --num_res_blocks 3 \
--attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 64

My train setting is

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=2 --master_port=23456 scripts/image_train.py \
--data_dir ./datasets/cifar_train --lr_anneal_steps 300000 \
--image_size 32 --use_fp16 False --num_channels 128 --num_head_channels 32 --num_res_blocks 3 --save_interval 50000 \
--attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True  --schedule_sampler loss-second-moment  --lr 1e-4 --batch_size 64 --log_dir cifar

Since when I set use_fp16 True, The training loss will appear to be NaN.

could mpiexec works on a gpu?

I run the cmd 'mpiexec -n 16 python scripts/image_train.py XXXXXX' while only has one gpu. and got some error. have anybody tried it with one gpu?

FID in Cifar10.

Thank you to the author for your work. Compared with opanai's code, your code adds more details and solves many of my previous problems. Thank you. But when I used your open source ADM_cifar10_baseline.pt model, the final fid calculation result was 3.38, which is still far behind the 2.99 in your paper. Below is the script I generated. Are there any questions?

python3 scripts/image_sample.py \

--image_size 32 --timestep_respacing 100 \

--model_path ./ckpt/ADM_cifar10_baseline.pt \

--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \

--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \

--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

My final result is

Inception Score: 9.63854694366455

FID: 3.386766967519236

Thanks again.

TypeError: forward() missing l required positional argument: 'timesteps'

Hi, when I tried to train a model myself, I got TypeError: forward() missing l required positional argument: 'timesteps'. I tried both multi-node training and single gpu training, but I got the same error in both cases. Could you help me resolve the issue? Thank you!

Here is my code:
mpiexec -n 2 python scripts/image_train.py --input_pertub 0.15 \ --data_dir cifar_train \ --image_size 32 --use_fp16 True --num_channels 128 --num_head_channels 32 --num_res_blocks 3 \ --attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \ --learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \ --rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 64

About the modification based on ADM

I wonder if I only need to change two lines of codes in gaussian_diffusion.py to achieve ddpm-ip in my own project which is also based on ADM.

Thanks.

A way to create cifar_train.npz

Hi, I wrote cifar10_npz.py in order to create the npz file of cifar10 dateset. The code is basically the same as your celeba64_npz.py and I just modified the names in the code and set the image size to 32×32. I wonder if this method can lead to the correct npz file of cifar10 dateset since I'm not sure if the structure of the two datasets (cifar10 and celeba) is the same. Thanks again!

Here is the link to the code.

FID on cifar10

Hello, may I ask how many steps you iterated to get the ADM_IP_015.pt you provided

If this method works when the target is 'START_X' rather than 'EPSILON'?

or any other targets work?

Something Wrong in imagenet32_npz.py

I'm sorry to bother you again, but I'm getting an error when making an npz file of Imagenet32 data!
https://github.com/forever208/DDPM-IP/blob/DDPM-IP/datasets/imagenet32_npz.py#L19

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1024 and the array at index 2 has size 10240

I found the dimension of x[:, 2 * 1024:] to be (128116, 10240), which doesn't seem to match the previous one.

Reference batch for calculating FID of ImageNet

Hi,

Can you share your reference batch for calculating the imagenet-32 FID score? Thanks.

How to train ADM-IP.pt on CIFAR10

Thank you to the author for your work,but I'm curious how to train to get ADM-IP on the CiFar-10 dataset. The FID values I obtained during iterations were significantly higher than the ones reported in your paper.My results are at 70k iterations, the FID was 3.66, at 230k iterations the FID was 10.61, at 460k iterations the FID reached 17.17,Below is the script I generated. Are there any questions?so I would appreciate your advice. Thanks.
mpiexec -n 4 python scripts/image_train.py --input_pertub 0.15 \ --data_dir datasets/cifar_train \ --image_size 32 --use_fp16 True --num_channels 128 --num_head_channels 32 --num_res_blocks 3 \ --attention_resolutions 16,8 --resblock_updown True --use_new_attention_order True \ --learn_sigma True --dropout 0.3 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \ --rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 64

train_data_batch in imagenet32 is not npz file

Hi, the 10 train_data_batch in Imagenet32_train.zip downloaded online is not npz file, so the code (line 2) in datasets/imagenet32_npz.py doesn't work. I tried to modify the code to make it work. Here are two modifications I make: filename = './ImageNet32/train_data_batch_' + str(i) and raw_data = np.load(filename, allow_pickle=True). Please doulbe check my code. Thanks!

A discrepancy between the table in the paper and the figure in the repo

Hi, I found that in your paper (Table 3) the FID scores of CIFAR10 32×32 are all above 3, but in your repo the FID score is less than 2.5. So I wonder what the FID score of CIFAR10 32×32 is exactly.

Cannot find the output of sampling

Hi, I've finished sampling from the pretrained model, but I don't see the output npz file. There is no folder called "/tmp/openai..." in my interface. I wonder how I can find the npz file. Thanks!

This is what I got:

ModuleNotFoundError: No module named 'guided_diffusion'

mpirun python3 scripts/image_sample.py
--image_size 32 --timestep_respacing 100
--model_path PATH_TO_CHECKPOINT
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

你好，我想问加上这个input_pertub之后是否会影响加载之前训练过的checkpoint呢？

例如guided-diffusion给出的pt文件。

另外我想请问readme里面没有给出256的训练，请问是在256*256大小训练效果不是很好吗？
谢谢

CelebA dataset

Hi I'm using the following code for celebA data generation, but the result is very poor with fid>10, am I wrong somewhere?

torchrun --nproc_per_node=8 --master_port=33456 scripts/image_sample.py \
--image_size 64 --timestep_respacing 100 \
--model_path ./ckpt/DDPM_IP_celeba64.pt \
--use_fp16 False --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --batch_size 256 --num_samples 50000  --sample_dir ./celeba_sample

somebody update 256 x 256 CelebA pretrained with ip?

A Theoretical Question

In the diffusion model papers, we all assume the real image $\textbf{x}_0 \sim q(\textbf{x}_0)$, but I haven't seen an exact definition of $q(\textbf{x}_0)$. So I wonder what exactly $q(\textbf{x}_0)$ is. Is it the distribution function of $\textbf{x}_0$? If so, how do we calculate the distribution function of a single image? Thank you!

A Error “mpirun: command not found”

I got an error named “mpirun: command not found” when I ran the code below:

And here is my attempt to sample from the pre-trained ADM-IP model (I put DDPM_IP_cifar10.pt under the “model” folder):

Could you tell me which step I did wrong? Thanks for your time!

FID on CelebA 64x64

Would you like to share how to generate .npz file for evaluating performance on CelebA 64x64? When evaluating CelebA 64x64, you claimed in paper that you used full training set but you said you randomly picked up 50k images from training set on github readme, which is confusing. Can you show more details?

Here is how I did.

First of all, I generate .npz file as reference:

import numpy as np
from PIL import Image
import random
import math
import blobfile as bf
from tqdm import tqdm


def _list_image_files_recursively(data_dir):
    results = []
    for entry in sorted(bf.listdir(data_dir)):
        full_path = bf.join(data_dir, entry)
        ext = entry.split(".")[-1]
        if "." in entry and ext.lower() in ["jpg", "jpeg", "png", "gif"]:
            results.append(full_path)
        elif bf.isdir(full_path):
            results.extend(_list_image_files_recursively(full_path))
    return results


def center_crop_arr(pil_image, image_size):
    # We are not on a new enough PIL to support the `reducing_gap`
    # argument, which uses BOX downsampling at powers of two first.
    # Thus, we do it by hand to improve downsample quality.
    while min(*pil_image.size) >= 2 * image_size:
        pil_image = pil_image.resize(
            tuple(x // 2 for x in pil_image.size), resample=Image.BOX
        )

    scale = image_size / min(*pil_image.size)
    pil_image = pil_image.resize(
        tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC
    )

    arr = np.array(pil_image)
    crop_y = (arr.shape[0] - image_size) // 2
    crop_x = (arr.shape[1] - image_size) // 2
    return arr[crop_y : crop_y + image_size, crop_x : crop_x + image_size]

local_images = _list_image_files_recursively("./img_align_celeba")
len(local_images)

num_samples = 50_000
random_indices = random.sample(range(len(local_images)), num_samples)
assert len(set(random_indices)) == num_samples
resolution = 64
arrs = []
for idx in tqdm(random_indices):
    path = local_images[idx]
    with bf.BlobFile(path, "rb") as f:
        pil_image = Image.open(f)
        pil_image.load()
    pil_image = pil_image.convert("RGB")

    arr = center_crop_arr(pil_image, resolution)

    arrs.append(arr)

arrs = np.stack(arrs)
np.savez("celeba64_50k.npz", arrs)

Then I run the evaluation script:

python evaluations/evaluator.py celeba64_50k.npz 1000t_50000x64x64x3.npz

Here 1000t_50000x64x64x3.npz stores samples generated by your CelebA 64x64 ADM-IP.pt using 1000 steps. The result is:

Inception Score: 3.248073101043701
FID: 9.09117351570518
sFID: 37.492244654769365
Precision: 0.45012
Recall: 0.61274

Can you help me figure out where I did wrong? I will be grateful about that.