Code Monkey home page Code Monkey logo

diffusion-models-pytorch's Introduction

Diffusion Models

This is an easy-to-understand implementation of diffusion models within 100 lines of code. Different from other implementations, this code doesn't use the lower-bound formulation for sampling and strictly follows Algorithm 1 from the DDPM paper, which makes it extremely short and easy to follow. There are two implementations: conditional and unconditional. Furthermore, the conditional code also implements Classifier-Free-Guidance (CFG) and Exponential-Moving-Average (EMA). Below you can find two explanation videos for the theory behind diffusion models and the implementation.

Qries Qries

Train a Diffusion Model on your own data:

Unconditional Training

  1. (optional) Configure Hyperparameters in ddpm.py
  2. Set path to dataset in ddpm.py
  3. python ddpm.py

Conditional Training

  1. (optional) Configure Hyperparameters in ddpm_conditional.py
  2. Set path to dataset in ddpm_conditional.py
  3. python ddpm_conditional.py

Sampling

The following examples show how to sample images using the models trained in the video on the Landscape Dataset. You can download the checkpoints for the models here.

Unconditional Model

    device = "cuda"
    model = UNet().to(device)
    ckpt = torch.load("unconditional_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    x = diffusion.sample(model, n=16)
    plot_images(x)

Conditional Model

This model was trained on CIFAR-10 64x64 with 10 classes airplane:0, auto:1, bird:2, cat:3, deer:4, dog:5, frog:6, horse:7, ship:8, truck:9

    n = 10
    device = "cuda"
    model = UNet_conditional(num_classes=10).to(device)
    ckpt = torch.load("conditional_ema_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    y = torch.Tensor([6] * n).long().to(device)
    x = diffusion.sample(model, n, y, cfg_scale=3)
    plot_images(x)

A more advanced version of this code can be found here by @tcapelle. It introduces better logging, faster & more efficient training and other nice features and is also being followed by a nice write-up.

diffusion-models-pytorch's People

Contributors

dome272 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusion-models-pytorch's Issues

Training generates images with full red output

While training on not changed model with a different dataset (portraits of faces) I am getting bunch of full red outputs:
image
I also changed the code to train on the same dataset but greyscaled before training and as a result I still get monocolored outputs but this time they are either white or black:
image
Has anyone had the same issue? Is there something I can do to prevent this?

the model learns very badly on cifar10 32*32

Hello! Firstly, thank you for the awesome video and code which explain so well how the diffusion models are implemented in code.

I met some issues during reimplementing your code. I' m wondering if you could give me some advice on how to make it work.
I tried to use your code to generate conditional cifar10 (32*32 resolution), but so far the results look kind of bad. In my training, I changed the Unet input size and output size to 32, the corresponding resolution in self-attention, and batch-size to 256. The number of down and up blocks, as well as the bottleneck layers were kept the same as your original setting. After training 400 epochs, the generated images were almost pure color.
image

Then, I tried add a warmup learning schedule to 10 times as big as the original lr (1e-4) in the first 1/10 epochs, and a cosine annealing for following epochs, and trained it for 1800 epochs. But the final results still look the same as earlier
image

Do you have some ideas on what's wrong with my version of reimplementation ? I would really appreciate any insights.

The model learns very badly

Good model results are very often mixed with random noise and getting good results is not constant even after many hours of training.
Maybe someone has found a solution to what needs to be changed in the model?

not square images

What to change if I do not need square images, but for example 128 * 64 pixels?

Getting shape mismatch error when using a different input

getting this error RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x128 and 256x128)

when I am using the code below which is basically an array of size 32x32x32 and timesteps of 128:

size = 32
device_val="cuda"
x_input = torch.tensor(np.random.rand(1,size, size, size)).to(device_val).type(torch.cuda.FloatTensor)
diffusion = Diffusion(img_size=size, device=device_val)
t = diffusion.sample_timesteps(128).to(device_val)
model = UNet().to(device_val)
xmodel=model(x_input,t)

In addition to that can you please explain a bit more how is this time embedding working especially in terms of the dimension?

images 16:9

I want to achieve the generation of panoramic images, but only square images are implemented in the code. How to solve it?
Thanks in advance

I don't understand the function def noise_images(self, x, t)

Assume that we have img_{0}, img_{1}, ..., img_{T}, which are obtained from adding the noise iteratively. I understand that img{t} is given by the formula "sqrt_alpha_hat * img_{0} + sqrt_one_minus_alpha_hat * Ɛ".

However, I don't understand the function "def noise_images(self, x, t)" in [ddpm.py].

It return Ɛ, where Ɛ = torch.randn_like(x). So, this is just a noise signal draw directly from the normal distribution. I suppose this random noise is not related to the input image? It is becasue randn_like() returns a tensor with the same size as input x that is filled with random numbers from a normal distribution with mean 0 and variance 1

In training, the predicted noise is compared to this Ɛ (line 80 in [ddpm.py]).

Why we are predicting this random noise? Shouldn't we predict the noise added at time t, i.e. "img_{t} - img_{t-1}"?

Hi, I have added many new features to your code

Hi, I have added many new features to your code.

I want to express my gratitude for providing such a detailed explanation of DDPM. I have made numerous improvements and added functionalities to your code, including the following changes:

  1. Added support for custom image sizes.
  2. Included code segments for DDIM.
  3. Enhanced the generator and trainer methods for improved usability.
  4. Implemented cosine and warmup cosine learning rate schedules.
  5. Added support for distributed training, allowing for larger models to be trained across multiple GPUs.
  6. Introduced custom seeds for improved reproducibility.
  7. Enabled half-precision training to reduce GPU memory usage.
  8. Provided the ability to choose different optimizers.
  9. Added the option to select different activation functions for training.
  10. Implemented a training checkpoint recovery feature, allowing training to resume from an interruption point.
  11. Added support for server deployment, enabling access through an interface.
  12. Refactored the code to make it more modular.

I will continue to build upon your work by introducing more optimizations and new features. Thank you for your contributions to the community.

Project Repository: https://github.com/chairc/Industrial-Defect-Diffusion-Model

the shared 3 checkpoints

Did you train the shared 3 checkpoints from scratch? There are only 4000+ images in the landscape dataset. However, you got very good results on it. So, did you use any pre-trained weights to got those 3 shared checkpoints?

model generating bad random images

I trained my diffusion model in tensorflow based on this implementation and and after training for 450 epochs(on landscape dataset) ,my loss was around 0.015 (mse) and I generated a few samples and generated ones were very bad or random. Below are the generated images for 1000 time steps .

I just want to know is this a training issue , does my model need more training to further reduce the loss (currently : 0.015) OR is the problem in sampling technique.Can anyone help me please?

image

img2img

Does anyone have some experience on img2img by using diffusion model?

Conditional Unet using dataset with one scalar label

Hello,

I am PhD candidate at SNU.

Your repo helped me really good understanding DDPM.

And i want to change this code to train a image labeled by 1 scalar value.

Is there any respository or your comment that i can reference?

thank you very much.

segmentation fault core dumped

Hi !

Thank you for such great videos and the code !

I happen to have a "segmentation fault error (core dumped)"
"The segfault happens while pytorch was trying to raise a Type Error when constructing a Tensor."
Do you know where it might happen in the code ? weird that this error did not show earlier though.

dataset

First, thank you for useful video and code.
can you share the dataset that you use in the code?I want to apply the code in cifar-10, but failed, so I want to have a look at your dateset structure used in the code.

args.dataset_path = r"C:\Users\dome\datasets\landscape_img_folder"
the complete file of "landscape_img_folder"

qn regarding DoubleConv

Any reason why there is no gelu in the else clause (when there is no self.residual)? a bit puzzled.

class DoubleConv(nn.Module):

    def forward(self, x):
        if self.residual:
            return F.gelu(x + self.double_conv(x))
        else:
            return self.double_conv(x)

requirements.txt

Hi There,

Would it be possible for you to share the requirements.txt or environment.yaml file here in repo?

Thanks

Confuse in the 'Sample Fuction'

def sample(self, model, n):
        logging.info(f"Sampling {n} new images....")
        model.eval()
        with torch.no_grad():
            x = torch.randn((n, 3, self.img_size, self.img_size)).to(self.device)
            for i in tqdm(reversed(range(1, self.noise_steps)), position=0):
                t = (torch.ones(n) * i).long().to(self.device)
                predicted_noise = model(x, t)
                alpha = self.alpha[t][:, None, None, None]
                alpha_hat = self.alpha_hat[t][:, None, None, None]
                beta = self.beta[t][:, None, None, None]
                if i > 1:
                    noise = torch.randn_like(x)
                else:
                    noise = torch.zeros_like(x)
                x = 1 / torch.sqrt(alpha) * (x - ((1 - alpha) / (torch.sqrt(1 - alpha_hat))) * predicted_noise) + torch.sqrt(beta) * noise
        model.train()
        x = (x.clamp(-1, 1) + 1) / 2
        x = (x * 255).type(torch.uint8)
        return x 

Can anyone explain this fuction . In the line 'x = torch.randn((n, 3, self.img_size, self.img_size)).to(self.device)', you create a random image then from that image you predict the noise ( i.e. predicted_noise = model(x, t) ). Are you tring to create an image from a random tensor ??

Increase image_size for training

Thanks for your great work!!

The code can be trained using image_size with 64, and the sampling results are OK.
However, I try to use large size for training (e.g. image_size = 128), and the code cannot be trained.

Could you share how to resolve this problem. I appreciate your help very much~

Some details

I've noticed that in other's DDPM implement, there exists another parameters called 'alpha_cumprod_prev' or something, which append the 1. in front of the alpha_cumprod. Why don't you use this thing, or you have changed it in a simple way?

DDIM sampling

Hi,

Thank you for the great videos on diffusion models.

Have you tried sampling using DDIM with one of your trained models? If so, were the results good and as expected?

Problem in ddpm -> sampled_images

Hey there,
very nice code and a good youtube tutorial!
I had a little trouble using the selfattentions with other image sizes, but this isnt the problem right now, i just commented them out and the first epoch works. But after this in: ddpm.py -> sampled_images = diffusion.sample(model, n=images.shape[0]) I get an dimension error depending of the batch size... I'm confused, may some one has an Idea, I think I have mist to change the size somewhere..
Error (Batch size 4):

Traceback (most recent call last):
File "/Sync_v01/ddpm.py", line 120, in
launch()
File "
/Sync_v01/ddpm.py", line 117, in launch
train(args)
File "/Sync_v01/ddpm.py", line 99, in train
sampled_images = diffusion.sample(model, n=images.shape[0])
File "
/Sync_v01/ddpm.py", line 53, in sample
predicted_noise = model(x, t)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/Sync_v01/modules.py", line 164, in forward
x1 = self.inc(x)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/Sync_v01/modules.py", line 75, in forward
return self.double_conv(x)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "***/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 4, 3, 64] to have 3 channels, but got 4 channels instead

Error (Batch size 3):

Traceback (most recent call last):
File "/Sync_v01/ddpm.py", line 120, in
launch()
File "
/Sync_v01/ddpm.py", line 117, in launch
train(args)
File "***/Sync_v01/ddpm.py", line 99, in train
sampled_images = diffusion.sample(model, n=images.shape[0])

Thanks everyone :)

File "/Sync_v01/ddpm.py", line 53, in sample
predicted_noise = model(x, t)
File "e/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/Sync_v01/modules.py", line 164, in forward
x1 = self.inc(x)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/Sync_v01/modules.py", line 75, in forward
return self.double_conv(x)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "
/anaconda3v2/lib/python3.9/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected weight to be a vector of size equal to the number of channels in input, but got weight of shape [64] and input of shape [64, 3, 64]

Time embedding

Hi! Great code, thanks for sharing!

I noticed something a little bit weird in this code. Is there any reason why you choose to use SiLU() right after the sinusoidal embedding?
It seams unnatural as it might change the desired properties of the embedding.

Maybe you missed to use a learnable projection of embedding like adding this to U-net
self.time_embed = nn.Sequential(
nn.Linear(time_dim, time_dim),
nn.SiLU(),
nn.Linear(time_dim, time_dim),
)

And also changing the forward by adding:
def forward(self, x, t):
t = t.unsqueeze(-1).type(torch.float)
t = self.pos_encoding(t, self.time_dim)
t = self.time_embed(t)

In this conditions, the SiLU() activation for projections of each block make sense, being at all just the activation of the learned embedding.

RuntimeError running the model

First of all, love your video.

Trying to run the unconditional model gives me this funny error:

File "f:\Coding Projects\ai\diffusinon-test\modules.py", line 99, in forward
return x + emb
RuntimeError: The size of tensor a (192) must match the size of tensor b (12) at non-singleton dimension 0

Do you know how to fix it?

Training Time

Hi. I'm the newbie studying about Diffusion models.
First, thank you for interesting video and code.

I have some question about the code and implementation.

  1. Is conditional image generation possible for 32x32 CIFAR-10 Images? Then, is it possible to adjust the Unet structure and various parameters? As a result of training and inference, the image is not well generated. If there is a reason, I would like to know what it is.

  2. Can you tell me how long the train time is when you train on 64x64?

Thank you.

CFG is right?

In the paper, CLASSIFIER-FREE DIFFUSION GUIDANCE, epsilont = (1 + w)epsilonθ(zt, c) - wepsilonθ(zt).
But, codes's epsilont = lerp(epsilonθ(zt), epsilonθ(zt, c), w) = (1 - w)*epsilonθ(zt) + epsilonθ(zt, c)?
Am I right? Are those two the same?
Just a issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.