dome272 / paella Goto Github PK

View Code? Open in Web Editor NEW

730.0 16.0 53.0 14.05 MB

Official Implementation of Paella https://arxiv.org/abs/2211.07292v2

License: MIT License

Python 1.27% Shell 0.01% Jupyter Notebook 98.72%

diffusion-models generative-model

paella's People

Contributors

Stargazers

Watchers

Forkers

jffdev sonicviz soumik12345 diogoneves americanpresidentjimmycarter joskid brandaobrandisborges ntgspecial techthiyanes grez72 moerehman backnotprop clazos metavai tamagusko bariscanbilgin misterdlg johnnypeck sizhky editablepublicai arielreplicate mbrukman alejandrosuarez frogtalkpepe corleone-huang undercontroller mfkiwl zetimente macguyversmusic foksnstuff mayman99 dahwin akash5474 polyphron hhy5277 nbardy thuanz123 camenduru choiyounsou universeresearch cleancoindev kp-forks neuroradiology recursivehook sohaib0399 lewington-pitsos phymhan vidina-solutions 5l1v3r1 paperwave assassindesign mathewferon

paella's Issues

Image variation finetuned model

Hi
The colab script syas there's a fintuned model needed for runing image variation. Can you upload it?

I'm trying this way

input_image = Image.open(str(input_image))
num_outputs = int(num_outputs)
latent_shape = (32, 32)
with torch.inference_mode():
    with torch.autocast(device_type="cuda"):
        input_image = clip_preprocess(input_image).to(device).unsqueeze(0)
        clip_embeddings = clip_model.encode_image(input_image).float()
        clip_embeddings = torch.repeat_interleave(clip_embeddings, num_outputs, dim=0)
        sampled = sample(model, clip_embeddings, T=12, size=latent_shape, starting_t=0, temp_range=[1.0, 1.0],
                                 typical_filtering=True, typical_mass=0.2, typical_min_tokens=1,
                                 classifier_free_scale=5, renoise_steps=11)
sampled = decode(sampled[-1])

and getting very bad results.
Like these ones (Input on left and 3 outputs on the right)

Invalid link in readme

Hyperparameters link goes to a nonexistent line.

Could you provide the evaluation code?

questions on parameters

Some questions and observations after trying this out. Nice notebook! This is not an issue
but no Discussion tab is present in your github repo.

Is there a way to specify additional hyperparameters such as seed, image size, iterations...?
When you set the batch size to 1, the resulting image displayed is huge (not accurate to the actual resolution)
It would be great if the output (under /content/output in colab) could be one image. I understand that i can set batch size to 1, but I am looking to put in a number, and have that many images saved under /content/output. Having images glued together horizontally is less appealing (for me).

Is this a bug or I missed something?

Hi, when I check the vqgan.py file, In the line of 110, self.encode(x, quantize), it has a quantize parameter, but in the encode function definition (line of 91), it has no this parameter . thanks.

Easy to finetune for Img2Img?

Hi,

I noticed that in the paper you don't mention img2img as Stable Diffusion does. Do you think it is easy to finetune for that?

Best,
Benno

Model license

Since the model isn't included in this repository, and thus isn't explicitly licensed under MIT, how are you choosing to license the model?

Dataset for finetuning

Hello,

Is it possible to finetune it on a dataset of around 2000 image-text pairs?

Also, what should the dataset include? Does the model need to know basic definitions of objects, if I am trying to introduce a new object? For instance if it is a new object like a rugby ball, does the dataset need to have individual images of rugby balls, to be able to respond to a prompt "a player kicking a red rugby ball"? Or if the dataset only has images of players holding the ball, it will work?

The point I am trying to make is, does the model need to have one object per image, or it can learn easily with multiple objects in an image. How does it learn which token maps to which object, if it is the latter case?

Regards,
Hisan

Will higher resolution model weight release?

Such as 512x512,768x768.

Removing dependency on ruDALLE?

Hi, great work on Paella! I highly admire all effort that aims to optimize the process, even at the cost of some quality. I dream of 60 fps diffusion!

I am trying to integrate Paella into my AI art rendering platform as an alternative to speed up testing of other features, however I already have Stable Diffusion supported which requires a recent version of the transformers package. ruDALLE depends on very old versions of transformers (4.10 instead of 4.19) and makes it impossible to use both together.

To me it feels a bit much to import ruDALLE only for its VAE? Let me know what you think!

Add Paellaaa to community section

Hello friends, hlky and I have been working on a fork of Paella with some alterations. These are: dual CLIP and T5 embeddings and cross attention. Feel free to link to our repository in your community "ongoing research" section.

We are scaling up training and hope to release weights and a demo once completed. Thanks!

Paellaaa

Typo in Paper

In section 4.4.4 you refer to Figure 5, when I think you meant to refer to Figure 12.

Recommended dataset size for finetuning?

How many samples in my dataset would I need to fine-tune Paella?

Can paella perform well on class-conditioned image generation training on ImageNet?

Paella is only trained on large dataset for text2image. I wonder whether you train clsss-conditioned generation model on ImageNet. And if so, how about the performance? Thanks!

grad exposure

i train paella on MSCOCO, and downsize a little bit paella, to 247M parameters. But the training loss suddenly increases, and then to nan. wonder how to solve this problem.

Sorry, another colab typo

Your line loading the open_clip models has a couple typos in it and it doesn't recognize the models as is.

would video coming soon?

hello, i really like your video. and codes, its very clearly. would video coming soon about this repo?

Cool model! Integration into Diffusers?

Hey @dome272,

Congrats on releasing this model! Are you interested in adding it to diffusers: https://github.com/huggingface/diffusers by any chance? Would love to help :-)

paella_sampling.ipynb Gdown downloads dont work

Getting an error when trying to download models using gdown

Possibility to use with byt5-small ?

Hi, XL model way too large for me! I tried byt5-small but crash:

Traceback (most recent call last):
  File "/home/nuck/discore/src_core/lib/corelib.py", line 79, in invoke_safe
    func(*kargs, **kwargs)
  File "/home/nuck/discore/src_core/classes/printlib.py", line 193, in wrapper
    return func(*args, **kwargs)
  File "/home/nuck/discore/src_core/rendering/renderer.py", line 106, in emit
    cb(v, name)
  File "/home/nuck/discore/sessions/pinchsky-dance/script.py", line 101, in callback
    paella.txt2img(seed=v.seed, steps=v.steps, chg=clamp(v.chg, 0.1, 1), w=v.w, h=v.h, cfg=v.cfg, p=v.prompt, negprompt=v.negprompt, sampler=v.sampler)
  File "/home/nuck/discore/src_plugins/paella/PaellaPlugin.py", line 47, in txt2img
    ret = colab.txt2img(prompt, steps=steps, size=size, cfg=cfg, seed=seed)
  File "/home/nuck/discore/src_plugins/paella/colab.py", line 219, in txt2img
    sampled_tokens, intermediate = sample(model,
  File "/home/nuck/discore/src_plugins/paella/colab.py", line 66, in sample
    logits = model(sampled, t, **model_inputs, attn_weights=attn_weights)
  File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nuck/discore/plug-repos/paella/Paella/utils/modules.py", line 273, in forward
    c_embed = self.gen_c_embeddings(byt5, clip, clip_image)
  File "/home/nuck/discore/plug-repos/paella/Paella/utils/modules.py", line 224, in gen_c_embeddings
    seq = self.byt5_mapper(byt5)
  File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1472 and 2560x1024)
mat1 and mat2 shapes cannot be multiplied (1x1472 and 2560x1024)

Key error "state_dict" while loading vqgan model

File "/home/me/Paella/src/utils.py", line 77, in load_conditional_models
vqgan.load_state_dict(torch.load(vqgan_path, map_location=device)['state_dict'])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'state_dict'

How much ram is needed on cpu?

I'm trying to run the paella_sampling.ipynb locally with 8gb normal ram, but it appears in the 5th block it fills my 16gb swap and my ram.

I run this on a linux laptop through a venv. Is the code for the notebook not optimized/outdated ?

venv commandline: steps

source myvenv/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=myvenv
pip install Jupyter
jupyter notebook
(open and run paella_sampling.ipynb through launched web browser)

I would very much like to test this project for local install and use on cpu, but I don't know how to use the other files provided.

Reproducibility of results

Hey, first of all great work!

However, I struggle to reproduce your qualitative visual results from the first page:

results.

I am using classifier free guidance with strength 5. What can I do to further improve the quality?

Best!

Tensor size error

hi, this is amazing..... but in the colab as soon as multiconditioning starts, it throws this error:

/content/Paella/modules.py in forward(self, x, c, r)
145 r_embed = self.gen_r_embedding(r)
146 x = self.embedding(x).permute(0, 3, 1, 2)
--> 147 s = torch.cat([c, r_embed], dim=-1)[:, :, None, None]
148 level_outputs = self.down_encode(x, s)
149 x = self._up_decode(level_outputs, s)

RuntimeError: Tensors must have same number of dimensions: got 4 and 2

Distributed code cannot run

when running the distributed code, it cannot start the running. I tried both on a temporal srun debug node and sbatch script, both result in the following error.