yang-song / score_sde_pytorch Goto Github PK

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Home Page: https://arxiv.org/abs/2011.13456

License: Apache License 2.0

Python 6.07% C++ 0.04% Cuda 0.35% Jupyter Notebook 93.54%

pytorch stochastic-differential-equations inverse-problems generative-models score-matching score-based-generative-modeling controllable-generation iclr-2021 diffusion-models

score_sde_pytorch's People

Contributors

Stargazers

Watchers

Forkers

shaun95 eleanor91888388 pbizimis yuzhiguo07 zymrael giangdip2410 ysig trendingtechnology ludwigwinkler ablattmann mwaiton seungchan-mok satyam-cyc simidjievskin stefan-jansen gbatzolis gaochengmin-cloud ashesh-0 ardywibowo ahsanmah byeonghu-na ziqiaomeng sriram-ravula gengcong940126 shinypond cikrhazo immohann wltjr1007 fboyle2001 xuty-007 liushiru arunsanknar dashstander 4-geeks jongwankim2090 guome jeremyhengjm henryaddison cuent mirfaridmusavian back2yes cutecows mjslee0921 kuhsinyv rahul1921 pkulwj1994 byungheecha dotori-hj puppetry-ai deepbmi alexiajm gnobitab seongjinahn nikhiljha95 sandeshgh shreyas-bhat comp6248-reproducability-challenge eunbi1 itohamy rromb jskim0406 hoknshin ishine celsopitta ynghnji peterhan91 minseo-kimm drorspei 170928 aiefordream quangnh-2761 junhopark0314 zxzheng826 giannisdaras asclepiusinformatica 20171130 zhanfengdog mrzzy2021 liujxing rkstgr jk4011 jpcbertoldo mohammadjafari80 lewlin ftyuejian sundevil0405 jasperlinmans bopeng112 z-zheng usryokousha integritynoble dnguii tetrzim haotiansun14 metrized-inc winwinjjiang whittakerwave justcherie keigo-iwakuma zhouhaowa

score_sde_pytorch's Issues

Question for inpaint

Hi Yang Song,

In 30 row, section "Inpaint", Score SDE demo PyTorch.ipynb,

should it be changed to x = pc_inpainter(score_model, scaler(img*mask), mask) so that it wouldn't inference img from img?

Please correct me if I misunderstand this part.

can you release a dockerfile

it's always struggle to set up the environment, can you release a docker file so that we can easily set up

Error when running DDPM with celeb_a dataset.

Dear authors,
I'm struggling to run the DDPM model on CELEBA dataset.
Tensorflow, tensorflow_dataset and pytorch versions:

>>> tf.__version__
'2.9.1'
>>> tfds.__version__
'4.6.0'
>>> torch.__version__
'1.11.0+cu102'

This is my celeba.py config file:

# coding=utf-8
# Copyright 2020 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Lint as: python3
"""Config file for reproducing the results of DDPM on bedrooms."""

from configs.default_lsun_configs import get_default_configs


def get_config():
  config = get_default_configs()

  # training
  training = config.training
  training.sde = 'vpsde'
  training.continuous = False
  training.reduce_mean = True

  # sampling
  sampling = config.sampling
  sampling.method = 'pc'
  sampling.predictor = 'ancestral_sampling'
  sampling.corrector = 'none'

  # data
  data = config.data
  data.dataset = 'CELEBA'
  data.centered = True
  # data.tfrecords_path = '/atlas/u/yangsong/celeba_hq/-r10.tfrecords'
  data.image_size = 64
  data.batch_size = 64

  # model
  model = config.model
  model.name = 'ddpm'
  model.scale_by_sigma = False
  model.num_scales = 1000
  model.ema_rate = 0.9999
  model.normalization = 'GroupNorm'
  model.nonlinearity = 'swish'
  model.nf = 128
  model.ch_mult = (1, 2, 2, 2)
  model.num_res_blocks = 2
  model.attn_resolutions = (16,)
  model.resamp_with_conv = True
  model.conditional = True

  # optim
  optim = config.optim
  optim.lr = 2e-5

  return config

The error message is:

TypeError: in user code:                                                                                                                                                     
                                                                                                                                                                             
    File "/home/mvargasvieyra/others_code/score_sde_pytorch/datasets.py", line 169, in preprocess_fn  *                                                                      
        img = resize_op(d['image'])                                                                                                                                          
    File "/home/mvargasvieyra/others_code/score_sde_pytorch/datasets.py", line 121, in resize_op  *                                                                          
        img = resize_small(img, config.data.image_size)                                                                                                                      
    File "/home/mvargasvieyra/others_code/score_sde_pytorch/datasets.py", line 60, in resize_small  *                                                                        
        h = tf.round(h * ratio, tf.int32)                                                                                                                                    
    File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__                                                                                                          
        return next(self.gen)                                                                                                                                                
                                                                                                                                                                             
    TypeError: expected string or bytes-like object

By inspecting the dataset with ipdb I verified the ds object is not empty. Any hints would be greatly appreciated.
Thanks!

CUDA_HOME environment variable is not set. Please set it to your CUDA install root

error when using main.py script to do evaluation

KeyError: 'ncsnpp' of get_models() in models.utils

hello author,
when I tried to run the checkpoint, an error occured:

KeyError Traceback (most recent call last)
Cell In[12], line 34
30 inverse_scaler = datasets.get_data_inverse_scaler(config)
32 # print(config)
---> 34 score_model = mutils.create_model(config)
36 optimizer = get_optimizer(config, score_model.parameters())
37 ema = ExponentialMovingAverage(score_model.parameters(),
38 decay=config.model.ema_rate)

File /score_sde_pytorch-main/models/utils.py:94, in create_model(config)
92 model_name = config.model.name
93 print(model_name) # ncsnpp
---> 94 score_model = get_model(model_name)(config)
95 score_model = score_model.to(config.device)
96 score_model = torch.nn.DataParallel(score_model)

File /score_sde_pytorch-main/models/utils.py:48, in get_model(name)
46 def get_model(name):
47 print(_MODELS)
---> 48 return _MODELS[name]
52 def get_sigmas(config):

KeyError: 'ncsnpp'

any suggestion would be grateful.

how long wold c++ compiles

Hi, Thanks for your work.
how long wile c++ compile? would anything printed when compiled?

Why 999 must be multiplied when calculating score?

Hi,

I have some question on your code.

score_sde_pytorch/models/utils.py

Lines 146 to 159 in 1618dde

    
           if continuous or isinstance(sde, sde_lib.subVPSDE): 
        
             # For VP-trained models, t=0 corresponds to the lowest noise level 
        
             # The maximum value of time embedding is assumed to 999 for 
        
             # continuously-trained models. 
        
             labels = t * 999 
        
             score = model_fn(x, labels) 
        
             std = sde.marginal_prob(torch.zeros_like(x), t)[1] 
        
           else: 
        
             # For VP-trained models, t=0 corresponds to the lowest noise level 
        
             labels = t * (sde.N - 1) 
        
             score = model_fn(x, labels) 
        
             std = sde.sqrt_1m_alphas_cumprod.to(labels.device)[labels.long()] 
        
           score = -score / std[:, None, None, None]

When computing the score matching loss, it seems that you post process the output of model(A score model). But it doesn't make sense for me to multiplying 999 to t and doing scaling score = - score / std while you use the output of model on reverse(sampling) process.

score_sde_pytorch/sde_lib.py

Lines 95 to 97 in 1618dde

    
           drift, diffusion = sde_fn(x, t) 
        
           score = score_fn(x, t) 
        
           drift = drift - diffusion[:, None, None, None] ** 2 * score * (0.5 if self.probability_flow else 1.)

Am I missing something?

PC sampler mismatched?

Hello, thanks for your interesting work!

I have a question about your implementation of PC sampler:

  def pc_sampler(model):
        with torch.no_grad():
            # Initial sample
            x = sde.prior_sampling(shape).to(device)
            timesteps = torch.linspace(sde.T, eps, sde.N, device=device)
      
            for i in range(sde.N):
                t = timesteps[i]
                vec_t = torch.ones(shape[0], device=t.device) * t
                x, x_mean = corrector_update_fn(x, vec_t, model=model)
                x, x_mean = predictor_update_fn(x, vec_t, model=model)
      
            return inverse_scaler(x_mean if denoise else x), sde.N * (n_steps + 1)

Why you start by correcter instead of predictor as in Alg 1. of your original paper? Is there any reason?
Thank you very much!

[New Dataset Training]

Hi yang,

Thank you so much for your brilliant work! I have one question to ask:
When I involve a new dataset (such as a directory of images). The first step
is to convert it to TFRecord and then I normalize it to [-1, 1].

The example code is below, so I want some help in checking whether it is right, thank you!

# 1) image preprocess. 
def preprocess_hand_image(image):
  image = tf.image.decode_jpeg(image, channels=3)
  image = tf.image.resize(image, [128, 128])
  image /= 255.0  # normalize to [0,1] range
  img = image * 2. - 1.  # normalize to [-1, 1] range

  return dict(image=img, label=None)

# 2) load a directory of images without label 
all_image_paths = [str(item) for item in glob.glob("/data-nas1/sam/2021AW/score_hand/score_train/*")]
image_ds = tf.data.Dataset.from_tensor_slices(all_image_paths).map(tf.io.read_file)
tfrec = tf.data.experimental.TFRecordWriter('/data-nas1/sam/2021AW/score_hand/hands_0625.tfrec')
dataset_builder = tf.data.TFRecordDataset('/data-nas1/sam/2021AW/score_hand/hands_0625.tfrec')
train_split_name = eval_split_name = 'train'

# 3) output 
ds = dataset_builder.with_options(dataset_options)
ds = ds.repeat(count=num_epochs)
ds = ds.shuffle(shuffle_buffer_size)
ds = ds.map(preprocess_hand_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds = ds.batch(batch_size, drop_remainder=True)
return ds.prefetch(prefetch_size)

Rescaling of ouput

Thanks for the great repo!

I noticed that you scale your images to be between [0,1] or [-1,1]. I'd like to run the model with coordinate data, which doesn't lend itself well to scaling. I'm wondering if this is expected to significantly impact performance?

Accelerat the Training Process

Hi, yang song, thanks for your nice work.

I tried to reproduce the experiment "configs/subvp/cifar10_ncsnpp_continuous.py", which runs on a single V100 with 128 images. However, I found the training is too slow, as of now, 100K iterations consumed around 23 hours.

I want to ask if an experiment with a larger batch size run on multiple GPU can produce the same performance?
At your convenience, would you share with me the config of the multiple GPU experiment of cifar10?

Sincerely thanks for your help.

Sample epsilon iteratively

score_sde_pytorch/likelihood.py

Line 85 in 1618dde

epsilon = torch.randn_like(data)

For the sake of unbiased estimator, wonder whether epsilon need to be resampled in each iteration of ode solver?

.

Problem with dependencies. Anyone who could run the code, please let us know about the exact dependencies. For God's sake.

Unable to recreate working python environment to run this codebase using requirements.txt

Hello Authors,

For some reason I am unable to create a working python environment using the provided requirements.txt. Could you please test the environment requirements you provide?

For me the error me the main issue seems to be with a dependency of tensorflow_gan==2.0.0 called tensorflow_probability which defaults to 0.19.0 and seems to be incompatible with tensorflow==2.4.0. But even after downgrading tensorflow_probability, I don't get a stable environment that's able to run main.py.

Any insights would be appreciated.

Reproducing NCSN++ cont. (deep, VE)

Hi,

I am currently trying to reproduce results of NCSN++ cont. (deep, VE) on CIFAR-10 using this code. However, both times running the code resulted in FID around 2.60, where it is supposed to be around 2.20. May I ask what could be the problem here? Also, I noticed the seed in config files, but couldn't find anything else that contains related stuff, is the seed used in this repo?

Regards,
Paul

Sampling samples for cifar10

Dear Author, I am using python 3.8 with requirements version to sample some images, however, the program shows that it need higher version of tensor flow probability, how should I solve it , besides, how can I change the sample number of images and the format they will be stored, such as npz.

thanks

ODE sampling get unrealistic images

Thanks for your great work which inspires so many!!

As mentioned in #13, I also noticed that the pure ODE sampling results is not satisfying for 256*256 images (maybe also bigger size).

Most of the time these generated images are blurry or over-smooth, sometimes even very noisy.

To reproduce, one can simply run the demo notebook with pretrained checkpoints provided by the authors.

Reconstruct for colorization

I’m very interested in your work!

I’m considering to conduct colorization for my own dataset, but I’m not sure how to reconstruct your codes.

Does it work just change the code of your tutorial’s jupyter notebook?

Training process for multi-GPUs

Hi, I am trying to run training/evaluation with 4 A100s.
However, after some experiments I noticed that the training speed was same compared with process trained with a single GPU.
Am I missing something?

Setup Problems: Recent Version of TF Probability and Dlerror

Hi,

Thanks for your excellent work and released code repo.

Following https://github.com/yang-song/score_sde_pytorch#dependencies, encountered the problem,

2021-06-23 20:51:42.320605: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

ImportError: This version of TensorFlow Probability requires TensorFlow version >= 2.5; Detected an installation of version 2.4.0. Please upgrade TensorFlow to proceed.

Could you guys help?

Env info:

Ubuntu 18.04.4 LTS
CUDA Driver version: 440.64.00
CUDA version: 10.2

Python 3.7 vs 3.8

Not an issue (I suppose it could be an issue with the documentation), but putting this here in case it helps anyone else:

Running this with Python 3.7 resulted in a segmentation fault during the JIT compilation of the Pytorch C++ extensions.

Using Python 3.8 works without issue (using all packages in requirements.txt).

Hope this helps someone else.

ConditionalResidualBlock not working

score_sde_pytorch/models/layers.py

Lines 406 to 416 in cb1f359

    
           if resample == 'down': 
        
             if dilation > 1: 
        
               self.conv1 = ncsn_conv3x3(input_dim, input_dim, dilation=dilation) 
        
               self.normalize2 = normalization(input_dim, num_classes) 
        
               self.conv2 = ncsn_conv3x3(input_dim, output_dim, dilation=dilation) 
        
               conv_shortcut = partial(ncsn_conv3x3, dilation=dilation) 
        
             else: 
        
               self.conv1 = ncsn_conv3x3(input_dim, input_dim) 
        
               self.normalize2 = normalization(input_dim, num_classes) 
        
               self.conv2 = ConvMeanPool(input_dim, output_dim, 3, adjust_padding=adjust_padding) 
        
               conv_shortcut = partial(ConvMeanPool, kernel_size=1, adjust_padding=adjust_padding)

The above code would create mismatched shortcut and output shape when dilation is larger than 1.

CUDA out of memory batch size=2, (V100 32G)

Dear @yang-song ,

Thanks for the great work.

I'm always running into OOM error even if reducing the batch size to 2. This is the command that I run:

python main.py --config configs/vp/cifar10_ddpmpp.py --mode train --workdir ./workdir

and the error information

I0317 15:24:53.050663 47465039521600 run_lib.py:126] Starting training loop at step 0.
terminate called after throwing an instance of 'c10::CUDAOutOfMemoryError'
  what():  CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 31.75 GiB total capacity; 1.00 GiB already allocated; 4.00 MiB free; 1.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at /tmp/coulombc/pytorch_build_2021-11-09_14-57-01/avx2/python3.8/pytorch/c10/cuda/CUDACachingAllocator.cpp:513 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x2b2c1d81f905 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x295bf (0x2b2c1d7c15bf in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x2a2c5 (0x2b2c1d7c22c5 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x2a7d2 (0x2b2c1d7c27d2 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #4: THCStorage_resizeBytes(THCState*, c10::StorageImpl*, long) + 0x84 (0x2b2c047bb894 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x1c9d961 (0x2b2c03178961 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #6: at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x66 (0x2b2c04506346 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x3176efa (0x2b2c04651efa in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x3176f70 (0x2b2c04651f70 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #9: <unknown function> + 0x1e56e88 (0x2b2bf8d78e88 in /home/jma/codes/score/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)

Fatal Python error: Aborted

How should I train the model on single GPU (NVIDIA V100 32G)?

Best regards,
Jun

Question about the scaling operation of score function of VP and VE

Hello, thanks for your amazing work!
I am wondering why scale neural network output by standard deviation and flip sign in VP score function, and NOT in VE score function?
Thanks a lot!

Implementation of VP :

def score_fn(x, t):
      # Scale neural network output by standard deviation and flip sign
      if continuous or isinstance(sde, sde_lib.subVPSDE):
        # For VP-trained models, t=0 corresponds to the lowest noise level
        # The maximum value of time embedding is assumed to 999 for
        # continuously-trained models.
        labels = t * 999
        score = model_fn(x, labels)
        std = sde.marginal_prob(torch.zeros_like(x), t)[1]
      else:
        # For VP-trained models, t=0 corresponds to the lowest noise level
        labels = t * (sde.N - 1)
        score = model_fn(x, labels)
        std = sde.sqrt_1m_alphas_cumprod.to(labels.device)[labels.long()]

      ########################################
      score = -score / std[:, None, None, None]
      ###########################################
      return score

Implementation of VE:

def score_fn(x, t):
      if continuous:
        labels = sde.marginal_prob(torch.zeros_like(x), t)[1]
      else:
        # For VE-trained models, t=0 corresponds to the highest noise level
        labels = sde.T - t
        labels *= sde.N - 1
        labels = torch.round(labels).long()

      score = model_fn(x, labels)
      return score

subVPSDE sample

Hi,

When I use Score_SDE_demo_PyTorch.ipynb and set score-based model to subVPSDE, it shows that
"AttributeError: 'subVPSDE' object has no attribute 'alphas'"
in subsection "PC sampling", "PC inpainting", "PC colorizer".

ImportError: cannot import name 'ParamSpec' from 'typing_extensions'

I used Python 3.8 and requirement. txt but encountered an import error. I have tried many versions of Python, but they all report different errors.
ImportError: cannot import name 'ParamSpec' from 'typing_extensions' (gxz_sde/lib/python3.8/site-packages/typing_extensions.py)

an error of the upfirn2d.py

Hi all,
I downloaded the op files to the local files(named sdemodels_op_diy), and occurred an error when i import sdemodels_op_diy . I found the crucial reason is that the upfirn2d.py file used the cpp_extension function, namely, got same error when run the upfirn2d.py in pycharm. However, I did not fix it.

import sdemodels_op_diy
Traceback (most recent call last):
File "/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/hwtan/projects/test.py", line 1, in
import sdemodels_op_diy
File "/home/hwtan/projects/sdemodels_op_diy/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/home/hwtan/projects/sdemodels_op_diy/fused_act.py", line 11, in
fused = load(
File "/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return jit_compile(
File "/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in jit_compile
write_ninja_file_and_build_library(
File "/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in write_ninja_file_and_build_library
run_ninja_build(
File "/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/TH -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/THC -isystem /home/hwtan/anaconda3/envs/pytorch/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp -o fused_bias_act.o
In file included from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/c10/core/DeviceType.h:8,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/c10/core/Device.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:11,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/extension.h:4,
from /home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:1:
/home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp: In function ‘at::Tensor fused_bias_act(const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float)’:
/home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:7:41: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
7 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
| ^
/home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:13:5: note: in expansion of macro ‘CHECK_CUDA’
13 | CHECK_CUDA(input);
| ^~~~~~~~~~
In file included from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/extension.h:4,
from /home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:1:
/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:222:30: note: declared here
222 | DeprecatedTypeProperties & type() const {
| ^~~~
In file included from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/c10/core/DeviceType.h:8,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/c10/core/Device.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:11,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/extension.h:4,
from /home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:1:
/home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:7:41: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
7 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
| ^
/home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:14:5: note: in expansion of macro ‘CHECK_CUDA’
14 | CHECK_CUDA(bias);
| ^~~~~~~~~~
In file included from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/extension.h:4,
from /home/hwtan/projects/sdemodels_op_diy/fused_bias_act.cpp:1:
/home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:222:30: note: declared here
222 | DeprecatedTypeProperties & type() const {
| ^~~~
[2/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/TH -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/THC -isystem /home/hwtan/anaconda3/envs/pytorch/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/hwtan/projects/sdemodels_op_diy/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/TH -isystem /home/hwtan/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/include/THC -isystem /home/hwtan/anaconda3/envs/pytorch/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/hwtan/projects/sdemodels_op_diy/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
ninja: build stopped: subcommand failed.

Best,
hongwei

Initialisation of pc_inpainting

My question concerns this line:

# Initial sample
x = data * mask + sde.prior_sampling(data.shape).to(data.device) * (1. - mask)

Let's assume data is normalised to have approx std=1. In this case, we're initialising x as a tensor that has some parts with std=1 and some parts with std=prior_std, which is certainly out of distribution for the score network. Wouldn't it make more sense to initialise it similarly to the body of inpaint_update_fn?

vec_t = torch.ones(data.shape[0], device=data.device) * timesteps[0]
masked_data_mean, std = sde.marginal_prob(data, vec_t)
masked_data = masked_data_mean + torch.randn_like(data) * std[:, None, None, None]
x = masked_data * mask + sde.prior_sampling(data.shape).to(data.device) * (1. - mask)

I have tried the modification and visually I can't tell if one is significantly better than the other, but I imagine a more thorough benchmarking could reveal differences in FID.

Original algortithm:

My modification:

Latent Code Manipulation

Hi, can someone tell me where is the code for "manipulation of latent representation". Like how did you where did you use interpolation and temperature change stuff
Thank you

Question about conditional generation

Thank you for your paper and code! I am trying to perform conditional generation using score-based diffusion model. Is it possible to apply FiLM (Feature-wise Linear Modulation) conditioning [Film: Visual reasoning with a general conditioning layer] on score-based diffusion model? And how could I edit the normal DDPM models into score-based generative models? Thank you!

Run the code in single GPU

Dear Song,

Hi, thanks for your great work.
I try to reproduce your work to enhance it a bit, but there is some problem in my setting.
I have to use a single GPU due to the limitation of resources. For this, I add
os.environ["CUDA_VISIBLE_DEVICES"]='1'
in main.py or set device with
torch.device('cuda:1')
in run_lib.train.

However, it always assigns gpu:0, so
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
is occurred.

Is there any solution for it?

GPU that I used is NVIDIA RTX3090 with 24GB.
Thank you.

Error in the denoising score matching loss?

I may be wrong, but it appears to me that there is an error in the denoising score matching loss:

score_sde_pytorch/losses.py

Line 95 in 1618dde

losses = torch.square(score + z / std[:, None, None, None])

In particular, std, the denominator for z, should be squared, shouldn't it? This is what is prescribed in Eq. 5 in the earlier work [1]. Does this seem to be the case?

[1] Y. Song, S. Ermon, Generative Modeling by Estimating Gradients of the Data Distribution, 2019.

Some tips on why the model ain't working

一些关于模型无法运行的建议

问题描述

如果代码跑起来之后程序没有任何相应，可以参考以下其他项目的解决方法
原文地址 blog.csdn.net
今天在跑实验时碰到标题所述的问题，具体代码片段如下：

### chamfer_3D.py

chamfer_found = importlib.find_loader("chamfer_3D") is not None
if not chamfer_found:
    ## Cool trick from https://github.com/chrdiller
    print("Jitting Chamfer 3D")

    from torch.utils.cpp_extension import load
    chamfer_3D = load(,
          sources=[
              "/".join(os.path.abspath(__file__).split('/')[:-1] + ["chamfer_cuda.cpp"]),
              "/".join(os.path.abspath(__file__).split('/')[:-1] + ["chamfer3D.cu"]),
              ])
    print("Loaded JIT 3D CUDA chamfer distance")

else:
    import chamfer_3D
    print("Loaded compiled 3D CUDA chamfer distance")

这段代码的含义是如果在 python 环境中检测到 chamfer_3D 包就直接引入，否则调用 torch.utils.cpp_extension.load，手动加载外部 C++ 库。

运行这段代码时，由于没有 chamfer_3D 包，所以程序运行 load 函数，发现程序会卡住，长时间一直无输出，命令行输出界面如下：

> (atlasnet) user@ubuntu: ~/chamfer3D$  python chamfer_3D.py
Jitting Chamfer 3D

按 Ctrl+C 强行结束掉程序时，输出如下：

> (atlasnet) user@ubuntu: ~/chamfer3D$  python chamfer_3D.py
Jitting Chamfer 3D
^CTraceback (most recent call last):
  File "dist_chamfer_3D.py", line 15, in <module>
    "/".join(os.path.abspath(__file__).split('/')[:-1] + ["chamfer3D.cu"]),
  File "/home/zhangwenyuan/anaconda3/envs/atlasnet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 974, in load
    keep_intermediates=keep_intermediates)
  File "/home/zhangwenyuan/anaconda3/envs/atlasnet/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1183, in _jit_compile
    baton.wait()
  File "/home/zhangwenyuan/anaconda3/envs/atlasnet/lib/python3.6/site-packages/torch/utils/file_baton.py", line 49, in wait
    time.sleep(self.wait_seconds)
KeyboardInterrupt

问题分析

出现这一问题的原因是存在互斥锁。出问题的代码片段如下：

### torch/utils/cpp_extension.py

if version != old_version:
   baton = FileBaton(os.path.join(build_directory, 'lock'))
   if baton.try_acquire():
       try:
           with GeneratedFileCleaner(keep_intermediates=keep_intermediates) as clean_ctx:
               if IS_HIP_EXTENSION and (with_cuda or with_cudnn):
                   hipify_python.hipify(
                       project_directory=build_directory,
                       output_directory=build_directory,
                       includes=os.path.join(build_directory, '*'),
                       extra_files=[os.path.abspath(s) for s in sources],
                       show_detailed=verbose,
                       is_pytorch_extension=True,
                       clean_ctx=clean_ctx
                   )
               _write_ninja_file_and_build_library(
                   name=name,
                   sources=sources,
                   extra_cflags=extra_cflags or [],
                   extra_cuda_cflags=extra_cuda_cflags or [],
                   extra_ldflags=extra_ldflags or [],
                   extra_include_paths=extra_include_paths or [],
                   build_directory=build_directory,
                   verbose=verbose,
                   with_cuda=with_cuda)
       finally:
           baton.release()
   else:
       baton.wait()

通过这个代码大致可以看出来，pytorch 的 cpp_extension 在加载外部库的时候会给这个库文件加上一个”读锁 “，这个读锁是通过新建一个 "lock" 文件来做的。如果程序探测到有“lock” 文件，就认为此时有其它进程正在使用相同的文件，发生读写冲突，导致 baton.try_acquire()返回 False，进入 wait()函数，直到锁被释放。

锁的存在，导致同一时刻其它进程不能读取此文件。如果在之前运行这个程序时，趁加锁之后突然 kill 掉这个程序，导致它还没来得及释放锁，这样锁就会一直存在，导致后续所有程序都无法读取该库文件。我分析这次碰到的 Jitting 卡住的问题就是上述原因引起的。

解决方案

首先要找到锁在哪里。

进入库函数 torch/utils/cpp_extension.py 文件，在第 1156 行打上一个断点，也就是这一句：

baton = FileBaton(os.path.join(build_directory, 'lock'))

当程序运行到这里时，查看变量 build_directory 的值，lock 文件应该就存在这里。进入这个文件夹删掉 lock 文件，之后再次运行该程序就不会卡住了。

windows 下如果使用 PyCharm，打断点和查看变量值的操作比较容易，在这里演示一下 linux 上使用 pdb 调试 python 程序的操作，如下：

(atlasnet) zhangwenyuan@ubuntu:~/atlas/AtlasNet/auxiliary/ChamferDistancePytorch/chamfer3D$ 
cd ~/atlas/AtlasNet
(atlasnet) zhangwenyuan@ubuntu:~/atlas/AtlasNet$ python -m pdb train.py --shapenet13
> /home/zhangwenyuan/atlas/AtlasNet/train.py(1)<module>()
-> import sys
(Pdb) b /home/zhangwenyuan/anaconda3/envs/atlasnet/lib/python3.6/site-packages/torch/utils/cpp_extension.py:1156
Breakpoint 1 at /home/zhangwenyuan/anaconda3/envs/atlasnet/lib/python3.6/site-packages/torch/utils/cpp_extension.py:1156
(Pdb) c
Jitting Chamfer 3D
> /home/zhangwenyuan/anaconda3/envs/atlasnet/lib/python3.6/site-packages/torch/utils/cpp_extension.py(1156)_jit_compile()
-> baton = FileBaton(os.path.join(build_directory, 'lock'))
(Pdb) p build_directory
'/home/zhangwenyuan/.cache/torch_extensions/chamfer_3D'

因此知道 lock 文件在 "/home/zhangwenyuan/.cache/torch_extensions/chamfer_3D" 目录下。进入该目录删掉 lock 文件，再次运行程序，不会再碰到该问题了。

==/home/geyulong/.cache/torch_extensions/py38_cu121/fused/lock==

运行后TensorFlow_hub报错urllib.error.URLError:

自己下载的模型文件，放在本地，然后修改代码中的路径，就可以解决这个问题了。

Training VP-SDE

First of all, great thanks for such brilliant work!

I have one trivial question.

If I want to train a score network with VP-SDE on FFHQ datasets, what arguments do I need to change from "config/ve/ffhq_ncsnpp_continuous.py" ?

Best regards,
Joseph

.

The small experiment in Figure 2

Hello, I want to realize the conversion of a one-dimensional non-normal distribution data into a standard distribution in Figure 2 of the paper. Could you give me some tips?

Checkpoint

There is a "for" circle in your code with relate to checkpoints, however I haven't discover any consecutively numbered checkpoints. Can I replace them with one checkpoint if it does not have much influence on outputs.

Codes in run_lib:
"for ckpt in range(begin_ckpt, config.eval.end_ckpt + 1)"
(There is no such consecutive numbered checkpoint in presented URL).

Error for example:
Waiting for check_point 9 (I can not download check_point 9 anywhere).

Are there exiting any tricks to accelerate the training process?

I try to perform the experiments on 8A100 GPUs. However, as I observed, the utilities of GPUs are quite low (<20%). Therefore, I am quite curious about whether tricks exist to further accelerate the training process.

How to train a model with 16GB GPU

Hey,

thanks for your PyTorch implementation. I am trying to train a model with my custom dataset. I managed to set the dataset (tfrecords) up but I run out of memory on training loop step 0.

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 15.90 GiB total capacity; 14.61 GiB already allocated; 53.75 MiB free; 14.84 GiB reserved in total by PyTorch)

Sadly, I do not have more GPU RAM options. My config is the following:

from configs.default_lsun_configs import get_default_configs


def get_config():
  config = get_default_configs()
  # training
  training = config.training
  training.sde = 'vesde'
  training.continuous = True

  # sampling
  sampling = config.sampling
  sampling.method = 'pc'
  sampling.predictor = 'reverse_diffusion'
  sampling.corrector = 'langevin'

  # data
  data = config.data
  data.dataset = 'CUSTOM'
  data.image_size = 128
  data.tfrecords_path = '/content/drive/MyDrive/Training/tf_dataset'


  # model
  model = config.model
  model.name = 'ncsnpp'
  model.sigma_max = 217
  model.scale_by_sigma = True
  model.ema_rate = 0.999
  model.normalization = 'GroupNorm'
  model.nonlinearity = 'swish'
  model.nf = 128
  model.ch_mult = (1, 1, 2, 2, 2, 2, 2)
  model.num_res_blocks = 2
  model.attn_resolutions = (16,)
  model.resamp_with_conv = True
  model.conditional = True
  model.fir = True
  model.fir_kernel = [1, 3, 3, 1]
  model.skip_rescale = True
  model.resblock_type = 'biggan'
  model.progressive = 'output_skip'
  model.progressive_input = 'input_skip'
  model.progressive_combine = 'sum'
  model.attention_type = 'ddpm'
  model.init_scale = 0.
  model.fourier_scale = 16
  model.conv_size = 3

  return config

Are there any options to improve memory efficiency? I would like to stay at a 128x128 resolution (if it is possible).

Thanks!

How to calculate the score of a new unseen datapoint by a score based diffusion model?

I have a pretrained score based diffusion model trained on 64X64 images. Now I want to calculate the score of a new image(of same dimension) through this pre-trained neural network.

The score network takes two inputs :

x_t : Sample at timestamp t
t : timestamp
How should I calculate the score of a new image via this pre-trained neural network ?

Question about Eq. (4) and VP SDE implementation

Hi,

In the paper, the sampling equation of VP SDE (=: DDPM) differs from the DDPM's form. Specifically, the multiplicand to the score function is defined as $\beta_i$ while DDPM uses $(1-\alpha_t)/(\sqrt{1-\bar{\alpha_t}})$. It seems that the code in the repo originally intended to use $\bar{\alpha_t}$ since it is initialized (but not used) as can be seen in the below snippet. Can you provide me reason for this difference?

score_sde_pytorch/sde_lib.py

Line 127 in cb1f359

self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)

cpu code availability

Hello
Does this repo work on cpu?
in particular the part in the OP folder?
seems this only works only on Cuda?
Are there some settings which can be set to enable running on CPU?

best regards

Issues on evaluation

Thank you for open-sourcing your SDE code. However, when attempting to use the FID computation code in the provided Colab file, I encountered an error message stating that it cannot import the 'celeba_ncsnpp_continuous' name from the 'configs.ve' module. Could you please guide me on where to download this configuration file and the corresponding model checkpoints? Thanks once again.

How to do temperature rescaling

Hi!
Thanks for this marvellous work.
I have questions about how to do temperature scaling, like Fig. 6 in the paper does. I use pretrained ODE model to generate 256x256 Celeba images, but the results are too smooth and blurry. I wonder how I can get more realistic pics like Fig. 6 does? Specifically how to "reduce the norm of embedding" to do temperature scaling?

Many thanks.

Segment fault

Thank your for your work.

I use python3.8, pytorch=1.7.1 or 1.7.0. always segment fault.

How to understand `snr` in `LangevinCorrector`?

The code is here:

class LangevinCorrector(Corrector):
  def __init__(self, sde, score_fn, snr, n_steps):
    super().__init__(sde, score_fn, snr, n_steps)
    if not isinstance(sde, sde_lib.VPSDE) \
        and not isinstance(sde, sde_lib.VESDE) \
        and not isinstance(sde, sde_lib.subVPSDE):
      raise NotImplementedError(f"SDE class {sde.__class__.__name__} not yet supported.")

  def update_fn(self, x, t):
    sde = self.sde
    score_fn = self.score_fn
    n_steps = self.n_steps
    target_snr = self.snr
    if isinstance(sde, sde_lib.VPSDE) or isinstance(sde, sde_lib.subVPSDE):
      timestep = (t * (sde.N - 1) / sde.T).long()
      alpha = sde.alphas.to(t.device)[timestep]
    else:
      alpha = torch.ones_like(t)

    for i in range(n_steps):
      grad = score_fn(x, t)
      noise = torch.randn_like(x)
      grad_norm = torch.norm(grad.reshape(grad.shape[0], -1), dim=-1).mean()
      noise_norm = torch.norm(noise.reshape(noise.shape[0], -1), dim=-1).mean()
      step_size = (target_snr * noise_norm / grad_norm) ** 2 * 2 * alpha
      x_mean = x + step_size[:, None, None, None] * grad
      x = x_mean + torch.sqrt(step_size * 2)[:, None, None, None] * noise

    return x, x_mean

Could you please give me some explanation or references? Thx a lot! @yang-song @patrickvonplaten

Question about reporting likelihoods in bits per dim

Hi all,

Thank you for providing this code, it's very educational :)

I am interested in reporting likelihoods but I don't fully understand how bits per dim are calculated. In the code this is

bpd = -(prior_logp + delta_logp) / np.log(2)
N = np.prod(shape[1:])
bpd = bpd / N
# A hack to convert log-likelihoods to bits/dim
offset = 7. - inverse_scaler(-1.)
bpd = bpd + offset

Would you be able to elaborate on how this hack works and if it applies to other image dimensions?

Best,
Matt

Round operation for discrete models

Hello,

Firstly, congratulations on the amazing work. The ICLR award was well deserved!

I don't want to be pedantic but I realized that the get_score_fn for discrete models doesn't have a torch.round() operation even though the t at training time is an int. Therefore, the sampling is being done with slightly different values than the training (e.g. 500.1 instead of 500). I'm not sure if this really affects performance, it's just an observation.

I would add labels = torch.round(labels) after line 155 of the models/utils.py file.

Many thanks,
Pedro

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

RuntimeError Traceback (most recent call last)
Cell In[29], line 1
----> 1 x, n = sampling_fn(score_model)
2 show_samples(x)

File /workspace/pytorchcode/score_sde_pytorch-main/sampling.py:407, in get_pc_sampler..pc_sampler(model)
405 vec_t = torch.ones(shape[0], device=t.device) * t
406 x, x_mean = corrector_update_fn(x, vec_t, model=model)
--> 407 x, x_mean = predictor_update_fn(x, vec_t, model=model)
409 return inverse_scaler(x_mean if denoise else x), sde.N * (n_steps + 1)

File /workspace/pytorchcode/score_sde_pytorch-main/sampling.py:341, in shared_predictor_update_fn(x, t, sde, model, predictor, probability_flow, continuous)
339 else:
340 predictor_obj = predictor(sde, score_fn, probability_flow)
--> 341 return predictor_obj.update_fn(x, t)

File /workspace/pytorchcode/score_sde_pytorch-main/sampling.py:196, in ReverseDiffusionPredictor.update_fn(self, x, t)
195 def update_fn(self, x, t):
--> 196 f, G = self.rsde.discretize(x, t)
197 z = torch.randn_like(x)
198 x_mean = x - f

File /workspace/pytorchcode/score_sde_pytorch-main/sde_lib.py:104, in SDE.reverse..RSDE.discretize(self, x, t)
102 def discretize(self, x, t):
103 """Create discretized iteration rules for the reverse diffusion sampler."""
--> 104 f, G = discretize_fn(x, t)
105 rev_f = f - G[:, None, None, None] ** 2 * score_fn(x, t) * (0.5 if self.probability_flow else 1.)
106 rev_G = torch.zeros_like(G) if self.probability_flow else G

File /workspace/pytorchcode/score_sde_pytorch-main/sde_lib.py:251, in VESDE.discretize(self, x, t)
248 timestep = (t * (self.N - 1) / self.T).long()
249 sigma = self.discrete_sigmas.to(t.device)[timestep]
250 adjacent_sigma = torch.where(timestep == 0, torch.zeros_like(t),
--> 251 self.discrete_sigmas[timestep - 1].to(t.device))
252 f = torch.zeros_like(x)
253 G = torch.sqrt(sigma ** 2 - adjacent_sigma ** 2)