dome272 / diffusion-models-pytorch Goto Github PK

Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)

License: Apache License 2.0

Python 100.00%

diffusion-models-pytorch's Introduction

Diffusion Models

This is an easy-to-understand implementation of diffusion models within 100 lines of code. Different from other implementations, this code doesn't use the lower-bound formulation for sampling and strictly follows Algorithm 1 from the DDPM paper, which makes it extremely short and easy to follow. There are two implementations: conditional and unconditional. Furthermore, the conditional code also implements Classifier-Free-Guidance (CFG) and Exponential-Moving-Average (EMA). Below you can find two explanation videos for the theory behind diffusion models and the implementation.

Train a Diffusion Model on your own data:

Unconditional Training

(optional) Configure Hyperparameters in ddpm.py
Set path to dataset in ddpm.py
python ddpm.py

Conditional Training

(optional) Configure Hyperparameters in ddpm_conditional.py
Set path to dataset in ddpm_conditional.py
python ddpm_conditional.py

Sampling

The following examples show how to sample images using the models trained in the video on the Landscape Dataset. You can download the checkpoints for the models here.

Unconditional Model

    device = "cuda"
    model = UNet().to(device)
    ckpt = torch.load("unconditional_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    x = diffusion.sample(model, n=16)
    plot_images(x)

Conditional Model

This model was trained on CIFAR-10 64x64 with 10 classes airplane:0, auto:1, bird:2, cat:3, deer:4, dog:5, frog:6, horse:7, ship:8, truck:9

    n = 10
    device = "cuda"
    model = UNet_conditional(num_classes=10).to(device)
    ckpt = torch.load("conditional_ema_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    y = torch.Tensor([6] * n).long().to(device)
    x = diffusion.sample(model, n, y, cfg_scale=3)
    plot_images(x)

A more advanced version of this code can be found here by @tcapelle. It introduces better logging, faster & more efficient training and other nice features and is also being followed by a nice write-up.

diffusion-models-pytorch's People

Contributors

Stargazers

Watchers

Forkers

emmasci leemengtw tcapelle jonasloos erickrus lisongze thang-39 sunny635 qbiwan zurichrain wmlu suparnaarya jramcast una865 willdalh chredlinger xiaocdh m-schillinger harishgovardhandamodar smilejet jinseriouspark vogelskamp gauravkuppa ahedayat xuziteng2020 gdma96 zhixiangwang-cn dinggit firewalldragondarkfluid johngrabner suwoncjh truongchien boyaolyu cucdengjunli sabarikumar rlnrbio davegrays juyongjiang zhaohuiqiao0517 linkserendipity how-about alalbiol jaybhanushali99 steventrouble aysim cakenuthep fengeternity keshavbnsl102 jaechul1 iicedream feiji110 kinkinava ianchang120805 haorand pkamath2 xc1996114514 pbjarterot cdavidgutierrez hi-zhengcheng sbmalik wenchuntech devin-pi kaomsos triphop ilovelx suyuleyuan abhat-93 long-louis ateb14 doorleyr steefancontractor bigcathu xpxxxxxx matthimatik lzhhh93 summer3shadow hideki105 bingranhu kvmduc faiail semchan edward0liu shahzadnit appledorem ptnv-s dat0901 chrisqrs1 jimcui0508 hieugoku lifeisboringsoprogramming poltimmer mikhailmir pragmatism0220 teksingh2 soumya997 sajjadanwar0 wmyname kleeeeea balajishankar04 mert-delibalta

diffusion-models-pytorch's Issues

how to train the model on multiple GPUs?

thanks.

Training generates images with full red output

While training on not changed model with a different dataset (portraits of faces) I am getting bunch of full red outputs:

I also changed the code to train on the same dataset but greyscaled before training and as a result I still get monocolored outputs but this time they are either white or black:

Has anyone had the same issue? Is there something I can do to prevent this?

this code has no clamp on x0，please refer

https://github.com/hojonathanho/diffusion

https://github.com/lucidrains/denoising-diffusion-pytorch

https://github.com/openai/improved-diffusion

FileNotFoundError: Couldn't find any class folder in ./datasets/unconditional/landscape

Thank you for your sharing!
As I perform ddpm.py and set the datapath:

args.dataset_path = "./datasets/unconditional/landscape"

rasing an error:

FileNotFoundError: Couldn't find any class folder in ./datasets/unconditional/landscape

What should I change if I want to generate grayscale image out of grayscale image dataset?

about the pretrain_mode

the model learns very badly on cifar10 32*32

Hello! Firstly, thank you for the awesome video and code which explain so well how the diffusion models are implemented in code.

I met some issues during reimplementing your code. I' m wondering if you could give me some advice on how to make it work.
I tried to use your code to generate conditional cifar10 (32*32 resolution), but so far the results look kind of bad. In my training, I changed the Unet input size and output size to 32, the corresponding resolution in self-attention, and batch-size to 256. The number of down and up blocks, as well as the bottleneck layers were kept the same as your original setting. After training 400 epochs, the generated images were almost pure color.

Then, I tried add a warmup learning schedule to 10 times as big as the original lr (1e-4) in the first 1/10 epochs, and a cosine annealing for following epochs, and trained it for 1800 epochs. But the final results still look the same as earlier

Do you have some ideas on what's wrong with my version of reimplementation ? I would really appreciate any insights.

The model learns very badly

Good model results are very often mixed with random noise and getting good results is not constant even after many hours of training.
Maybe someone has found a solution to what needs to be changed in the model?

not square images

What to change if I do not need square images, but for example 128 * 64 pixels?

Getting shape mismatch error when using a different input

getting this error RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x128 and 256x128)

when I am using the code below which is basically an array of size 32x32x32 and timesteps of 128:

size = 32
device_val="cuda"
x_input = torch.tensor(np.random.rand(1,size, size, size)).to(device_val).type(torch.cuda.FloatTensor)
diffusion = Diffusion(img_size=size, device=device_val)
t = diffusion.sample_timesteps(128).to(device_val)
model = UNet().to(device_val)
xmodel=model(x_input,t)

In addition to that can you please explain a bit more how is this time embedding working especially in terms of the dimension?

images 16:9

I want to achieve the generation of panoramic images, but only square images are implemented in the code. How to solve it?
Thanks in advance

I don't understand the function def noise_images(self, x, t)

Assume that we have img_{0}, img_{1}, ..., img_{T}, which are obtained from adding the noise iteratively. I understand that img{t} is given by the formula "sqrt_alpha_hat * img_{0} + sqrt_one_minus_alpha_hat * Ɛ".

However, I don't understand the function "def noise_images(self, x, t)" in [ddpm.py].

It return Ɛ, where Ɛ = torch.randn_like(x). So, this is just a noise signal draw directly from the normal distribution. I suppose this random noise is not related to the input image? It is becasue randn_like() returns a tensor with the same size as input x that is filled with random numbers from a normal distribution with mean 0 and variance 1

In training, the predicted noise is compared to this Ɛ (line 80 in [ddpm.py]).

Why we are predicting this random noise? Shouldn't we predict the noise added at time t, i.e. "img_{t} - img_{t-1}"?

Hi, I have added many new features to your code

Hi, I have added many new features to your code.

I want to express my gratitude for providing such a detailed explanation of DDPM. I have made numerous improvements and added functionalities to your code, including the following changes:

Added support for custom image sizes.
Included code segments for DDIM.
Enhanced the generator and trainer methods for improved usability.
Implemented cosine and warmup cosine learning rate schedules.
Added support for distributed training, allowing for larger models to be trained across multiple GPUs.
Introduced custom seeds for improved reproducibility.
Enabled half-precision training to reduce GPU memory usage.
Provided the ability to choose different optimizers.
Added the option to select different activation functions for training.
Implemented a training checkpoint recovery feature, allowing training to resume from an interruption point.
Added support for server deployment, enabling access through an interface.
Refactored the code to make it more modular.

I will continue to build upon your work by introducing more optimizations and new features. Thank you for your contributions to the community.

Project Repository: https://github.com/chairc/Industrial-Defect-Diffusion-Model

the shared 3 checkpoints

Did you train the shared 3 checkpoints from scratch? There are only 4000+ images in the landscape dataset. However, you got very good results on it. So, did you use any pre-trained weights to got those 3 shared checkpoints?

About training procedure

Conditional constraints on training process and inference process.

I want to control the condition to produce the resulting image that I want, should I add conditions to both the training process and the reasoning process to constrain it, and who has made improvements to that.

How to inference with pretrained model weight?

Here is my code, I try to inference my pretrained model, but it seems doesn't work. Can you please help me work this out?

When i change the image_size to 128, the error occurs

How can i solve it?

model generating bad random images

I trained my diffusion model in tensorflow based on this implementation and and after training for 450 epochs(on landscape dataset) ,my loss was around 0.015 (mse) and I generated a few samples and generated ones were very bad or random. Below are the generated images for 1000 time steps .

I just want to know is this a training issue , does my model need more training to further reduce the loss (currently : 0.015) OR is the problem in sampling technique.Can anyone help me please?

img2img

Does anyone have some experience on img2img by using diffusion model?

Conditional Unet using dataset with one scalar label

Hello,

I am PhD candidate at SNU.

Your repo helped me really good understanding DDPM.

And i want to change this code to train a image labeled by 1 scalar value.

Is there any respository or your comment that i can reference?

thank you very much.

Could you kindly provide a requirement.txt to specify the version of packages being used in the project?

Thank you in advance.

segmentation fault core dumped

Hi !

Thank you for such great videos and the code !

I happen to have a "segmentation fault error (core dumped)"
"The segfault happens while pytorch was trying to raise a Type Error when constructing a Tensor."
Do you know where it might happen in the code ? weird that this error did not show earlier though.

dataset

First, thank you for useful video and code.
can you share the dataset that you use in the code？I want to apply the code in cifar-10, but failed, so I want to have a look at your dateset structure used in the code.

args.dataset_path = r"C:\Users\dome\datasets\landscape_img_folder"
the complete file of "landscape_img_folder"

qn regarding DoubleConv

Any reason why there is no gelu in the else clause (when there is no self.residual)? a bit puzzled.

class DoubleConv(nn.Module):

    def forward(self, x):
        if self.residual:
            return F.gelu(x + self.double_conv(x))
        else:
            return self.double_conv(x)

requirements.txt

Hi There,

Would it be possible for you to share the requirements.txt or environment.yaml file here in repo?

Thanks

ImportError: DLL load failed while importing _message

Confuse in the 'Sample Fuction'

def sample(self, model, n):
        logging.info(f"Sampling {n} new images....")
        model.eval()
        with torch.no_grad():
            x = torch.randn((n, 3, self.img_size, self.img_size)).to(self.device)
            for i in tqdm(reversed(range(1, self.noise_steps)), position=0):
                t = (torch.ones(n) * i).long().to(self.device)
                predicted_noise = model(x, t)
                alpha = self.alpha[t][:, None, None, None]
                alpha_hat = self.alpha_hat[t][:, None, None, None]
                beta = self.beta[t][:, None, None, None]
                if i > 1:
                    noise = torch.randn_like(x)
                else:
                    noise = torch.zeros_like(x)
                x = 1 / torch.sqrt(alpha) * (x - ((1 - alpha) / (torch.sqrt(1 - alpha_hat))) * predicted_noise) + torch.sqrt(beta) * noise
        model.train()
        x = (x.clamp(-1, 1) + 1) / 2
        x = (x * 255).type(torch.uint8)
        return x

Can anyone explain this fuction . In the line 'x = torch.randn((n, 3, self.img_size, self.img_size)).to(self.device)', you create a random image then from that image you predict the noise ( i.e. predicted_noise = model(x, t) ). Are you tring to create an image from a random tensor ??

Increase image_size for training

Thanks for your great work!!

The code can be trained using image_size with 64, and the sampling results are OK.
However, I try to use large size for training (e.g. image_size = 128), and the code cannot be trained.

Could you share how to resolve this problem. I appreciate your help very much~

how can i modify the code to train on 3D data like x = torch.randn(3, 3, 64, 64, 64)

Thanks for your great job. I want to test the code on the 3D data， Could you please give me some advice on how to change the code?

FID Score and Inception Score Calculation

Can anyone suggest a way for me to calculate the FID and Inception score?

Some details

I've noticed that in other's DDPM implement, there exists another parameters called 'alpha_cumprod_prev' or something, which append the 1. in front of the alpha_cumprod. Why don't you use this thing, or you have changed it in a simple way?

DDIM sampling

Hi,

Thank you for the great videos on diffusion models.

Have you tried sampling using DDIM with one of your trained models? If so, were the results good and as expected?

Problem in ddpm -> sampled_images

Hey there,
very nice code and a good youtube tutorial!
I had a little trouble using the selfattentions with other image sizes, but this isnt the problem right now, i just commented them out and the first epoch works. But after this in: ddpm.py -> sampled_images = diffusion.sample(model, n=images.shape[0]) I get an dimension error depending of the batch size... I'm confused, may some one has an Idea, I think I have mist to change the size somewhere..
Error (Batch size 4):

Traceback (most recent call last):
File "/Sync_v01/ddpm.py", line 120, in
launch()
File "/Sync_v01/ddpm.py", line 117, in launch
train(args)
File "/Sync_v01/ddpm.py", line 99, in train
sampled_images = diffusion.sample(model, n=images.shape[0])
File "/Sync_v01/ddpm.py", line 53, in sample
predicted_noise = model(x, t)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/Sync_v01/modules.py", line 164, in forward
x1 = self.inc(x)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/Sync_v01/modules.py", line 75, in forward
return self.double_conv(x)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "***/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 4, 3, 64] to have 3 channels, but got 4 channels instead

Error (Batch size 3):

Traceback (most recent call last):
File "/Sync_v01/ddpm.py", line 120, in
launch()
File "/Sync_v01/ddpm.py", line 117, in launch
train(args)
File "***/Sync_v01/ddpm.py", line 99, in train
sampled_images = diffusion.sample(model, n=images.shape[0])

Thanks everyone :)

File "/Sync_v01/ddpm.py", line 53, in sample
predicted_noise = model(x, t)
File "e/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/Sync_v01/modules.py", line 164, in forward
x1 = self.inc(x)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/Sync_v01/modules.py", line 75, in forward
return self.double_conv(x)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected weight to be a vector of size equal to the number of channels in input, but got weight of shape [64] and input of shape [64, 3, 64]

Does the unet architecture need to change for a larger image size?

Im interested in this?

Time embedding

Hi! Great code, thanks for sharing!

I noticed something a little bit weird in this code. Is there any reason why you choose to use SiLU() right after the sinusoidal embedding?
It seams unnatural as it might change the desired properties of the embedding.

Maybe you missed to use a learnable projection of embedding like adding this to U-net
self.time_embed = nn.Sequential(
nn.Linear(time_dim, time_dim),
nn.SiLU(),
nn.Linear(time_dim, time_dim),
)

And also changing the forward by adding:
def forward(self, x, t):
t = t.unsqueeze(-1).type(torch.float)
t = self.pos_encoding(t, self.time_dim)
t = self.time_embed(t)

In this conditions, the SiLU() activation for projections of each block make sense, being at all just the activation of the learned embedding.

Is the EMA implementation code in this repository wrong?

in the update_average function of EMA class in modules.py，the update step is:
old * self.beta + (1 - self.beta) * new
should it be "new * self.beta + (1 - self.beta) * old"?

RuntimeError running the model

First of all, love your video.

Trying to run the unconditional model gives me this funny error:

File "f:\Coding Projects\ai\diffusinon-test\modules.py", line 99, in forward
return x + emb
RuntimeError: The size of tensor a (192) must match the size of tensor b (12) at non-singleton dimension 0

Do you know how to fix it?

Training Time

Hi. I'm the newbie studying about Diffusion models.
First, thank you for interesting video and code.

I have some question about the code and implementation.

Is conditional image generation possible for 32x32 CIFAR-10 Images? Then, is it possible to adjust the Unet structure and various parameters? As a result of training and inference, the image is not well generated. If there is a reason, I would like to know what it is.
Can you tell me how long the train time is when you train on 64x64?

Thank you.

CFG is right?

In the paper, CLASSIFIER-FREE DIFFUSION GUIDANCE, epsilont = (1 + w)epsilonθ(zt, c) - wepsilonθ(zt).
But, codes's epsilont = lerp(epsilonθ(zt), epsilonθ(zt, c), w) = (1 - w)*epsilonθ(zt) + epsilonθ(zt, c)?
Am I right? Are those two the same?
Just a issue.