Code Monkey home page Code Monkey logo

pytorch-generative's Introduction

pytorch-generative

pytorch-generative is a Python library which makes generative modeling in PyTorch easier by providing:

  • high quality reference implementations of SOTA generative models
  • useful abstractions of common building blocks found in the literature
  • utilities for training, debugging, and working with Google Colab
  • integration with TensorBoard for easy metrics visualization

To get started, click on one of the links below.

Installation

To install pytorch-generative, clone the repository and install the requirements:

git clone https://www.github.com/EugenHotaj/pytorch-generative
cd pytorch-generative
pip install -r requirements.txt

After installation, run the tests to sanity check that everything works:

python -m unittest discover

Reproducing Results

All our models implement a reproduce function with all the hyperparameters necessary to reproduce the results listed in the supported algorithms section. This makes it very easy to reproduce any results using our training script, for example:

python train.py --model image_gpt --logdir /tmp/run --use-cuda

Training metrics will periodically be logged to TensorBoard for easy visualization. To view these metrics, launch a local TensorBoard server:

tensorboard --logdir /tmp/run

To run the model on a different dataset, with different hyperparameters, etc, simply modify its reproduce function and rerun the commands above.

Google Colab

To use pytorch-generative in Google Colab, clone the repository and move it into the top-level directory:

!git clone https://www.github.com/EugenHotaj/pytorch-generative
!mv pytorch-generative/pytorch_generative pytorch-generative

You can then import pytorch-generative like any other library:

import pytorch_generative as pg_nn
from pytorch_generative import models
...

Example - ImageGPT

Supported models are implemented as PyTorch Modules and are easy to use:

from pytorch_generative import models

... # Data loading code.

model = models.ImageGPT(in_channels=1, out_channels=1, in_size=28)
model(batch)

Alternatively, lower level building blocks in pytorch_generative.nn can be used to write models from scratch. We show how to implement a convolutional ImageGPT model below:

from torch import nn

from pytorch_generative import nn as pg_nn


class TransformerBlock(nn.Module):
  """An ImageGPT Transformer block."""

  def __init__(self, 
               n_channels, 
               n_attention_heads):
    """Initializes a new TransformerBlock instance.
    
    Args:
      n_channels: The number of input and output channels.
      n_attention_heads: The number of attention heads to use.
    """
    super().__init__()
    self._ln1 = pg_nn.NCHWLayerNorm(n_channels)
    self._ln2 = pg_nn.NCHWLayerNorm(n_channels)
    self._attn = pg_nn.CausalAttention(
        in_channels=n_channels,
        embed_channels=n_channels,
        out_channels=n_channels,
        n_heads=n_attention_heads,
        mask_center=False)
    self._out = nn.Sequential(
        nn.Conv2d(
            in_channels=n_channels, 
            out_channels=4*n_channels, 
            kernel_size=1),
        nn.GELU(),
        nn.Conv2d(
            in_channels=4*n_channels, 
            out_channels=n_channels, 
            kernel_size=1))

  def forward(self, x):
    x = x + self._attn(self._ln1(x))
    return x + self._out(self._ln2(x))


class ImageGPT(nn.Module):
  """The ImageGPT Model."""
  
  def __init__(self,       
               in_channels,
               out_channels,
               in_size,
               n_transformer_blocks=8,
               n_attention_heads=4,
               n_embedding_channels=16):
    """Initializes a new ImageGPT instance.
    
    Args:
      in_channels: The number of input channels.
      out_channels: The number of output channels.
      in_size: Size of the input images. Used to create positional encodings.
      n_transformer_blocks: Number of TransformerBlocks to use.
      n_attention_heads: Number of attention heads to use.
      n_embedding_channels: Number of attention embedding channels to use.
    """
    super().__init__()
    self._pos = nn.Parameter(torch.zeros(1, in_channels, in_size, in_size))
    self._input = pg_nn.CausalConv2d(
        mask_center=True,
        in_channels=in_channels,
        out_channels=n_embedding_channels,
        kernel_size=3,
        padding=1)
    self._transformer = nn.Sequential(
        *[TransformerBlock(n_channels=n_embedding_channels,
                         n_attention_heads=n_attention_heads)
          for _ in range(n_transformer_blocks)])
    self._ln = pg_nn.NCHWLayerNorm(n_embedding_channels)
    self._out = nn.Conv2d(in_channels=n_embedding_channels,
                          out_channels=out_channels,
                          kernel_size=1)

  def forward(self, x):
    x = self._input(x + self._pos)
    x = self._transformer(x)
    x = self._ln(x)
    return self._out(x)

Supported Algorithms

pytorch-generative supports the following algorithms.

We train likelihood based models on dynamically Binarized MNIST and report the log likelihood in the tables below.

Autoregressive Models

Algorithm Binarized MNIST (nats) Links
PixelSNAIL 78.61 Code, Paper
ImageGPT 79.17 Code, Paper
Gated PixelCNN 81.50 Code, Paper
PixelCNN 81.45 Code, Paper
MADE 84.87 Code, Paper
NADE 85.65 Code, Paper
FVSBN 96.58 Code, Paper

Variational Autoencoders

NOTE: The results below are the (variational) upper bound on the negative log likelihod (or equivalently, the lower bound on the log likelihod).

Algorithm Binarized MNIST (nats) Links
VD-VAE <= 80.72 Code, Paper
VAE <= 86.77 Code, Paper
BetaVAE N/A Code, Paper
VQ-VAE N/A Code, Paper
VQ-VAE-2 N/A Code, Paper

Normalizing Flows

NOTE: Bits per dimension (bits/dim) can be calculated as (nll / 784 + log(256)) / log(2) where 784 is the MNIST dimension, log(256) accounts for dequantizing pixel values, and log(2.0) converts from natural log to base 2.

Algorithm MNIST (bits/dim) Links
NICE 4.34 Code, Paper

Miscellaneous

Algorithm Links
Mixture Models Code, Wiki
Kernel Density Estimators Code, Wiki
Nerual Style Transfer Code, Blog, Paper
Compositional Pattern Producing Networks Code, Wiki

pytorch-generative's People

Contributors

eugenhotaj avatar pratikshapi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-generative's Issues

train.py file is gone along with others

Hi,

Some of the files seem to be miss and the code is not functioning without them.

Can you please upload them back so we can continue using this wonderful framework?

Thanks,
EB

How to view progress during training?

I've ran a PixelCNN in Colab last week, but I didn't know when it would finish (4 hours) because there was no in-training output. If you're concerned about flooding a cell with outputs, you could use tqdm over all the epochs that way the user can still see progress while printing only a few lines.

How to get the labels of generated images

Hi Eugen,
I have successfully generated the image with NADE model,but i don't konw how to generate the label of the image.I don't see relative functions in codes.Can i get the label by some functions?

Applying Style Transfer to a GIF

forgive me if you've already made this possible.
I was looking at a small gif and I thought to myself it would be cool to stylize each picture in the gif.

I realized I could do this myself by saving the pictures individually and then applying the process one at a time. I remember you saying something about input images can be processed similar to how you did the multiple style images.

Forgive me if the feature is already there and I just don't know how to set it up properly. But if it's not there (or is) can you assist in either an explanation or save this as a pending feature?

It would be pretty cool to redo gif's using this.

[Q] Need an example to show how to generate an image.

Hi Eugen,

I've followed the readme and successfully train imageGPT model by reproduce, but I have no idea how to generate an image with the trained model. I tend to train the model by 1, 2, 3, ... epochs and generate images for each stage to see what happens.

I checked the following code, but also have no idea what sample_fn should I assigned (it is None by default). Would you please add a simple example in the readme?

if self.sample_epochs and self._epoch % self.sample_epochs == 0:
self.model.eval()
with torch.no_grad():
tensor = self.sample_fn(self.model)
self._summary_writer.add_images("sample", tensor, self._step)

Style transfer notebook removed / not functional

Style transfer notebook won't open in colab

Notebook not found
There was an error loading this notebook. Ensure that the file is accessible and try again.
Ensure that you have permission to view this notebook in GitHub and authorize Colaboratory to use the GitHub API.

Also, manually loading the notebook locally or on colab fails in the secondcell

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-15-7857ada21f3b> in <module>
      2 
      3 import kornia
----> 4 from nn_hallucinations import colab_utils
      5 import torch
      6 from torch import nn

ImportError: cannot import name 'colab_utils' from 'nn_hallucinations' (unknown location)

Create Lightweight Tests for Important Notebooks

Notebooks are pretty prone to getting out of date and breaking (e.g. #13). We should create some lightweight tests for notebooks which have no canonical implementations in the library (currently style_transfer.py, and cppn.py).

How does generation work for autoregressive models?

Thank you for sharing good codes!
and Sorry for asking you about the original paper, but I can not find where to ask. ๐Ÿ˜ญ

I am trying to understand the Pixel CNN and I don't know how the mode generates images from zero tensor input.
Most of the codes feed the zero tensor into the models for generations.

I think,
If zero input comes into the model, and model may compute only the bias values in layers ( 0 tensor + bias ).
And results of the model are the same because the same bias values are computed every try for generating an image.
So the model generates the same image each tries to generate.

But the model looks like generates a different image on every try.
How does the model generate an image from zero tensor input? Is there are any noise values or random sampling functions thing?

VectorQuantizedVAE2

In the reproduce function of VQ-VAE-2, you use VectorQuantizedVAE rather than VectorQuantizedVAE2.

Tune tqdm logging output

To figure out:

  • Should we log the average train loss (or maybe a moving average)?
  • What output should be logged (e.g. epoch, percent done, train_loss, eval_loss, batches/s, anything else we need)?

how to use multiple GPUs

Hi Eugen,
I want to use multiple GPUs to train the NADE model at the same time. I'm trying to write device_id=[0,1,2,3] or device_id =(0,1,2,3) in trainer.Trainer( ) ,but error message appears,I don't know how to write it correctly.

Loading 3 Channels Images to PixelSnail

Hi!

I try to load the ImageNet dataset to the PixelSnail Model.

after fixing this lines in the pixel_snail.py scripts :

 `class PixelSNAIL(base.AutoregressiveModel):
  """The PixelSNAIL model.
  Unlike [1], we implement skip connections from each block to the output.
  We find that this makes training a lot more stable and allows for much faster
  convergence.
  """
  def __init__(self, 
               in_channels=3 ####### <------ I changed this, 
               out_dim=1,
               probs_fn=torch.sigmoid,
               sample_fn=lambda x: distributions.Bernoulli(probs=x).sample(),
               n_channels=64,
               n_pixel_snail_blocks=8,
               n_residual_blocks=2,
               attention_key_channels=4,
               attention_value_channels=32,
               head_channels=1):
    """Initializes a new PixelSNAIL instance.`

I still get this error :

RuntimeError: Given groups=1, weight of size [64, 1, 3, 3], expected input[128, 3, 64, 64] to have 1 channels, but got 3 channels instead

Am I missing something? should I do anything else when trying to load 3 channels images?

Thanks,
Eyal

Replicating NLL Results for PIXEL CNN

Could you please let me know, how can I replicate NLL results in the paper. For how many epochs should I train.
Could you please give me script to generate an image using the trained model?

Training ImageGPT on 64 * 64 size images

Hey,

I wrapped your implementation of ImageGPT for a project I am doing in collab.
I have single channel images of size 64 with the intention of scaling it to 128.
Running you code crashes (out of memory) even if I set batch_size to 2.
It runs only with batch_size=1 and it doesn't use that much of memory around 3GB, while being extremely slow 60 hours for ~200Kpictures (20 s per picture).
Is this normal or is there something wrong?

Sample Images During Training to Check Generation Quality

Currently, the only way to measure model quality during training is by monitoring the evaluation loss. However, a low loss does not always correspond to high quality generation (e.g. see https://arxiv.org/abs/1511.01844). Furthermore, for autoregressive models, if the model is not correctly masked, it will get very low loss by peeking into the future, but the generation quality will be extremely bad.

In #16, one suggestion was to periodically sample from the model as it trains (e.g. every 10 epochs) and display these samples using TensorBoard. We should be able to easily support this in the Trainer.

Sampling on 3 channels looks corrupted

Hi

I trained PixelSnail on CIFAR10 and try to sample from the model with this function that I wrote (based on the code from this project notebook) :

        print("Sample")
        
        if self._epoch % 10 == 0 :
            print("Epoch Number: " + str(self._epoch))
            print("sampling")
            curr_path = 'sample_from_epoch_' + str(self._epoch) + '.png'
            print(curr_path)
            sampleTensor=self._model.sample((10, 3, 32, 32))
            sampleTensor=sampleTensor.cpu()
            cu.imsave(sampleTensor, figsize=(50, 5),filename = curr_path)
            
      self._summary_writer.close()

Where cu.imsave is :

def imsave(batch_or_tensor, title=None, figsize=None, filename="sample.png"):
  """Renders tensors as an image using Matplotlib.
  Args:
    batch_or_tensor: A batch or single tensor to render as images. If the batch
      size > 1, the tensors are flattened into a horizontal strip before being
      rendered.
    title: The title for the rendered image. Passed to Matplotlib.
    figsize: The size (in inches) for the image. Passed to Matplotlib.
  """
  batch = batch_or_tensor
  for _ in range(4 - batch.ndim):
    batch = batch.unsqueeze(0)
  n, c, h, w = batch.shape
  tensor = batch.permute(1, 2, 0, 3).reshape(c, h, -1)
  image = _IMAGE_UNLOADER(tensor)

  plt.figure(figsize=figsize)
  plt.title(title)
  plt.axis('off')
  plt.imsave(filename,image)

For some reason the output after 90 epochs looks bad :

sample_from_epoch_90

am I missing anything?

How to run the style_transfer notebook without square images

In the style_transfer notebook, there is a size argument in the colab_utils.load_image function that resizes the images into squares (i.e., makes the dimensionality of the length and width the same). This leads to the outputs having square dimensions and I'd rather avoid this, but when I remove this argument and execute the run_style_transfer function, I get the following error:

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:32: UserWarning: Using a target size (torch.Size([1, 256, 256, 161])) that is different to the input size (torch.Size([1, 256, 311, 256])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-64-2ce60ed3a286> in <module>()
     23     content_weight=1.,
     24     style_weight=1000.,
---> 25     log_steps=250)

7 frames
/usr/local/lib/python3.7/dist-packages/torch/functional.py in broadcast_tensors(*tensors)
     72     if has_torch_function(tensors):
     73         return handle_torch_function(broadcast_tensors, tensors, *tensors)
---> 74     return _VF.broadcast_tensors(tensors)  # type: ignore
     75 
     76 

RuntimeError: The size of tensor a (256) must match the size of tensor b (161) at non-singleton dimension 3

I'm wondering if there is a way to run the style transfer (which is of excellent quality) without resizing the content and style images into square dimensions.

Sampling via py scripts

Hi!

Is there a way to use the sample function that appears in the notebook within the py scripts / add sampling option to the tensorboard logs?

I tried to convert the imshow to imsave and run it after finish training, but the output image is currrpted for some reason.

BTW - this project is great! thank you for this.

Regards.

Get Rid of Most Colab Notebooks

Most Colab notebooks provide the same exact implementation of different models as the code does. They also quickly fall out of date and become broken.

It's probably a good idea to just have one canonical tutorial notebook which can be used to train all our models.

However, some of the notebooks (such as Neural Style Transfer and CPPN) need to be kept around.

Update trainer to use max_steps instead of epochs

Epochs is dependent on the dataset size. This complicates things like how long to train for, how often to evaluate, how often to log, etc. Currently we evaluate/log at the end of each epoch which is not ideal for large datasets. While the logging/evaluation is not strictly tied to using epochs, and we could support both, things might get pretty messy.

Trainer is (Probably) Slow Due to Excessive Logging

Right now, the Trainer object creates logs after every step (although logs get flushed every 100 steps). This (probably) slows down our training. We need to first measure if this is the case (e.g. turn off all logs, see how fast the model trains), and tune logging rate accordingly.

PixelSnail model fails on GPU

Hi!

In trainer.py the defult settings are for device=torch.device('cpu')).

when I change it to 'cuda' I get this error :

RuntimeError: All input tensors must be on the same device. Received cpu and cuda:0

Do you know what should I do in order to fix this?

Thanks!

Update KDE models to handle multi-dimensional inputs

Hi,
When I use KDE I get a density whose integral is not 1.

import torch
from pytorch_generative.models import KernelDensityEstimator

# Random 2D data
data = torch.normal(torch.zeros((100, 2)), torch.ones((100, 2)))
estimator = KernelDensityEstimator(data)

# Build 
# [[-8.0, -8.0], [-8.0, -7.9], [-8.0, -7.8], ..., [-8.0,  7.9],
#  [-7.9, -8.0], [-7.9, -7.9], [-7.9, -7.8], ..., [-7.9,  7.9],
#   ...
#  [ 7.9, -8.0], [ 7.9, -7.9], [ 7.9, -7.8], ..., [ 7.9,  7.9]]
dx = 0.1
X = torch.arange(-8, 8, dx)
Y = torch.arange(-8, 8, dx)
xx, yy = torch.meshgrid(X, Y)
a = torch.stack((xx, yy), axis=2).view(-1, 2)

probs = estimator(a)
print(torch.sum(probs*dx**2)) # -> tensor(0.1251)

Am I missing something? When I do the same thing with scikit, I don't have this task:

from sklearn.neighbors import KernelDensity

# Random 2D data
data = np.random.normal(np.zeros((100, 2)), np.ones((100, 2)))
kde = KernelDensity(bandwidth=1.0, kernel="gaussian")
kde.fit(data)

# Build 
# [[-8.0, -8.0], [-8.0, -7.9], [-8.0, -7.8], ..., [-8.0,  7.9],
#  [-7.9, -8.0], [-7.9, -7.9], [-7.9, -7.8], ..., [-7.9,  7.9],
#   ...
#  [ 7.9, -8.0], [ 7.9, -7.9], [ 7.9, -7.8], ..., [ 7.9,  7.9]]
dx = 0.1
X = np.arange(-8, 8, dx)
Y = np.arange(-8, 8, dx)
xx, yy = np.meshgrid(X, Y)
a = np.stack((xx, yy), axis=2).reshape(-1, 2)

probs = np.exp(kde.score_samples(a))
print(np.sum(probs*dx**2)) # -> 0.9999999970275351

Add PixelCNN++ improvements to GatedPixelCNN

Some things we could add:

  • [easy] Use down/right shifted 3x3 conv instead of a right shifted 1x3 conv in the horizontal stack.
  • [medium] Use a U-Net architecture with skip connections (this has shown improvement in some early experiments)
  • [medium] [optional] Use gated ResNet layers in both the vertical and horizontal stacks instead of the adhoc GatedPixelCNNLayer.
  • [hard] [optional] Implement the discretized logistic mixture for non-binary outputs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.