Code Monkey home page Code Monkey logo

paella's People

Contributors

backnotprop avatar dome272 avatar maubreville avatar thuanz123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

paella's Issues

Image variation finetuned model

Hi
The colab script syas there's a fintuned model needed for runing image variation. Can you upload it?

I'm trying this way

input_image = Image.open(str(input_image))
num_outputs = int(num_outputs)
latent_shape = (32, 32)
with torch.inference_mode():
    with torch.autocast(device_type="cuda"):
        input_image = clip_preprocess(input_image).to(device).unsqueeze(0)
        clip_embeddings = clip_model.encode_image(input_image).float()
        clip_embeddings = torch.repeat_interleave(clip_embeddings, num_outputs, dim=0)
        sampled = sample(model, clip_embeddings, T=12, size=latent_shape, starting_t=0, temp_range=[1.0, 1.0],
                                 typical_filtering=True, typical_mass=0.2, typical_min_tokens=1,
                                 classifier_free_scale=5, renoise_steps=11)
sampled = decode(sampled[-1])

and getting very bad results.
Like these ones (Input on left and 3 outputs on the right)

Screenshot from 2022-12-09 18-09-51

questions on parameters

Some questions and observations after trying this out. Nice notebook! This is not an issue
but no Discussion tab is present in your github repo.

  1. Is there a way to specify additional hyperparameters such as seed, image size, iterations...?
  2. When you set the batch size to 1, the resulting image displayed is huge (not accurate to the actual resolution)
  3. It would be great if the output (under /content/output in colab) could be one image. I understand that i can set batch size to 1, but I am looking to put in a number, and have that many images saved under /content/output. Having images glued together horizontally is less appealing (for me).

Is this a bug or I missed something?

Hi, when I check the vqgan.py file, In the line of 110, self.encode(x, quantize), it has a quantize parameter, but in the encode function definition (line of 91), it has no this parameter . thanks.

Easy to finetune for Img2Img?

Hi,

I noticed that in the paper you don't mention img2img as Stable Diffusion does. Do you think it is easy to finetune for that?

Best,
Benno

Model license

Since the model isn't included in this repository, and thus isn't explicitly licensed under MIT, how are you choosing to license the model?

Dataset for finetuning

Hello,

Is it possible to finetune it on a dataset of around 2000 image-text pairs?

Also, what should the dataset include? Does the model need to know basic definitions of objects, if I am trying to introduce a new object? For instance if it is a new object like a rugby ball, does the dataset need to have individual images of rugby balls, to be able to respond to a prompt "a player kicking a red rugby ball"? Or if the dataset only has images of players holding the ball, it will work?

The point I am trying to make is, does the model need to have one object per image, or it can learn easily with multiple objects in an image. How does it learn which token maps to which object, if it is the latter case?

Regards,
Hisan

Removing dependency on ruDALLE?

Hi, great work on Paella! I highly admire all effort that aims to optimize the process, even at the cost of some quality. I dream of 60 fps diffusion!

I am trying to integrate Paella into my AI art rendering platform as an alternative to speed up testing of other features, however I already have Stable Diffusion supported which requires a recent version of the transformers package. ruDALLE depends on very old versions of transformers (4.10 instead of 4.19) and makes it impossible to use both together.

To me it feels a bit much to import ruDALLE only for its VAE? Let me know what you think!

Add Paellaaa to community section

Hello friends, hlky and I have been working on a fork of Paella with some alterations. These are: dual CLIP and T5 embeddings and cross attention. Feel free to link to our repository in your community "ongoing research" section.

We are scaling up training and hope to release weights and a demo once completed. Thanks!

Paellaaa

Typo in Paper

In section 4.4.4 you refer to Figure 5, when I think you meant to refer to Figure 12.

grad exposure

i train paella on MSCOCO, and downsize a little bit paella, to 247M parameters. But the training loss suddenly increases, and then to nan. wonder how to solve this problem.
image

Sorry, another colab typo

Your line loading the open_clip models has a couple typos in it and it doesn't recognize the models as is.

would video coming soon?

hello, i really like your video. and codes, its very clearly. would video coming soon about this repo?

Possibility to use with byt5-small ?

Hi, XL model way too large for me! I tried byt5-small but crash:

Traceback (most recent call last):
  File "/home/nuck/discore/src_core/lib/corelib.py", line 79, in invoke_safe
    func(*kargs, **kwargs)
  File "/home/nuck/discore/src_core/classes/printlib.py", line 193, in wrapper
    return func(*args, **kwargs)
  File "/home/nuck/discore/src_core/rendering/renderer.py", line 106, in emit
    cb(v, name)
  File "/home/nuck/discore/sessions/pinchsky-dance/script.py", line 101, in callback
    paella.txt2img(seed=v.seed, steps=v.steps, chg=clamp(v.chg, 0.1, 1), w=v.w, h=v.h, cfg=v.cfg, p=v.prompt, negprompt=v.negprompt, sampler=v.sampler)
  File "/home/nuck/discore/src_plugins/paella/PaellaPlugin.py", line 47, in txt2img
    ret = colab.txt2img(prompt, steps=steps, size=size, cfg=cfg, seed=seed)
  File "/home/nuck/discore/src_plugins/paella/colab.py", line 219, in txt2img
    sampled_tokens, intermediate = sample(model,
  File "/home/nuck/discore/src_plugins/paella/colab.py", line 66, in sample
    logits = model(sampled, t, **model_inputs, attn_weights=attn_weights)
  File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nuck/discore/plug-repos/paella/Paella/utils/modules.py", line 273, in forward
    c_embed = self.gen_c_embeddings(byt5, clip, clip_image)
  File "/home/nuck/discore/plug-repos/paella/Paella/utils/modules.py", line 224, in gen_c_embeddings
    seq = self.byt5_mapper(byt5)
  File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1472 and 2560x1024)
mat1 and mat2 shapes cannot be multiplied (1x1472 and 2560x1024)

Key error "state_dict" while loading vqgan model

File "/home/me/Paella/src/utils.py", line 77, in load_conditional_models
vqgan.load_state_dict(torch.load(vqgan_path, map_location=device)['state_dict'])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'state_dict'

How much ram is needed on cpu?

I'm trying to run the paella_sampling.ipynb locally with 8gb normal ram, but it appears in the 5th block it fills my 16gb swap and my ram.

I run this on a linux laptop through a venv. Is the code for the notebook not optimized/outdated ?

venv commandline: steps

  1. source myvenv/bin/activate
  2. pip install ipykernel
  3. python -m ipykernel install --user --name=myvenv
  4. pip install Jupyter
  5. jupyter notebook
  6. (open and run paella_sampling.ipynb through launched web browser)

I would very much like to test this project for local install and use on cpu, but I don't know how to use the other files provided.

Reproducibility of results

Hey, first of all great work!

However, I struggle to reproduce your qualitative visual results from the first page:

image
results.

image

I am using classifier free guidance with strength 5. What can I do to further improve the quality?

Best!

Tensor size error

hi, this is amazing..... but in the colab as soon as multiconditioning starts, it throws this error:

/content/Paella/modules.py in forward(self, x, c, r)
145 r_embed = self.gen_r_embedding(r)
146 x = self.embedding(x).permute(0, 3, 1, 2)
--> 147 s = torch.cat([c, r_embed], dim=-1)[:, :, None, None]
148 level_outputs = self.down_encode(x, s)
149 x = self._up_decode(level_outputs, s)

RuntimeError: Tensors must have same number of dimensions: got 4 and 2

Distributed code cannot run

when running the distributed code, it cannot start the running. I tried both on a temporal srun debug node and sbatch script, both result in the following error.
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.