dome272 / paella Goto Github PK
View Code? Open in Web Editor NEWOfficial Implementation of Paella https://arxiv.org/abs/2211.07292v2
License: MIT License
Official Implementation of Paella https://arxiv.org/abs/2211.07292v2
License: MIT License
Hi
The colab script syas there's a fintuned model needed for runing image variation. Can you upload it?
I'm trying this way
input_image = Image.open(str(input_image))
num_outputs = int(num_outputs)
latent_shape = (32, 32)
with torch.inference_mode():
with torch.autocast(device_type="cuda"):
input_image = clip_preprocess(input_image).to(device).unsqueeze(0)
clip_embeddings = clip_model.encode_image(input_image).float()
clip_embeddings = torch.repeat_interleave(clip_embeddings, num_outputs, dim=0)
sampled = sample(model, clip_embeddings, T=12, size=latent_shape, starting_t=0, temp_range=[1.0, 1.0],
typical_filtering=True, typical_mass=0.2, typical_min_tokens=1,
classifier_free_scale=5, renoise_steps=11)
sampled = decode(sampled[-1])
and getting very bad results.
Like these ones (Input on left and 3 outputs on the right)
Hyperparameters link goes to a nonexistent line.
Could you provide the evaluation code?
Some questions and observations after trying this out. Nice notebook! This is not an issue
but no Discussion tab is present in your github repo.
Hi, when I check the vqgan.py file, In the line of 110, self.encode(x, quantize), it has a quantize parameter, but in the encode function definition (line of 91), it has no this parameter . thanks.
Hi,
I noticed that in the paper you don't mention img2img as Stable Diffusion does. Do you think it is easy to finetune for that?
Best,
Benno
Since the model isn't included in this repository, and thus isn't explicitly licensed under MIT, how are you choosing to license the model?
Hello,
Is it possible to finetune it on a dataset of around 2000 image-text pairs?
Also, what should the dataset include? Does the model need to know basic definitions of objects, if I am trying to introduce a new object? For instance if it is a new object like a rugby ball, does the dataset need to have individual images of rugby balls, to be able to respond to a prompt "a player kicking a red rugby ball"? Or if the dataset only has images of players holding the ball, it will work?
The point I am trying to make is, does the model need to have one object per image, or it can learn easily with multiple objects in an image. How does it learn which token maps to which object, if it is the latter case?
Regards,
Hisan
Such as 512x512,768x768.
Hi, great work on Paella! I highly admire all effort that aims to optimize the process, even at the cost of some quality. I dream of 60 fps diffusion!
I am trying to integrate Paella into my AI art rendering platform as an alternative to speed up testing of other features, however I already have Stable Diffusion supported which requires a recent version of the transformers package. ruDALLE depends on very old versions of transformers (4.10 instead of 4.19) and makes it impossible to use both together.
To me it feels a bit much to import ruDALLE only for its VAE? Let me know what you think!
Hello friends, hlky and I have been working on a fork of Paella with some alterations. These are: dual CLIP and T5 embeddings and cross attention. Feel free to link to our repository in your community "ongoing research" section.
We are scaling up training and hope to release weights and a demo once completed. Thanks!
In section 4.4.4 you refer to Figure 5, when I think you meant to refer to Figure 12.
How many samples in my dataset would I need to fine-tune Paella?
Paella is only trained on large dataset for text2image. I wonder whether you train clsss-conditioned generation model on ImageNet. And if so, how about the performance? Thanks!
Your line loading the open_clip models has a couple typos in it and it doesn't recognize the models as is.
hello, i really like your video. and codes, its very clearly. would video coming soon about this repo?
Hey @dome272,
Congrats on releasing this model! Are you interested in adding it to diffusers: https://github.com/huggingface/diffusers by any chance? Would love to help :-)
Hi, XL model way too large for me! I tried byt5-small but crash:
Traceback (most recent call last):
File "/home/nuck/discore/src_core/lib/corelib.py", line 79, in invoke_safe
func(*kargs, **kwargs)
File "/home/nuck/discore/src_core/classes/printlib.py", line 193, in wrapper
return func(*args, **kwargs)
File "/home/nuck/discore/src_core/rendering/renderer.py", line 106, in emit
cb(v, name)
File "/home/nuck/discore/sessions/pinchsky-dance/script.py", line 101, in callback
paella.txt2img(seed=v.seed, steps=v.steps, chg=clamp(v.chg, 0.1, 1), w=v.w, h=v.h, cfg=v.cfg, p=v.prompt, negprompt=v.negprompt, sampler=v.sampler)
File "/home/nuck/discore/src_plugins/paella/PaellaPlugin.py", line 47, in txt2img
ret = colab.txt2img(prompt, steps=steps, size=size, cfg=cfg, seed=seed)
File "/home/nuck/discore/src_plugins/paella/colab.py", line 219, in txt2img
sampled_tokens, intermediate = sample(model,
File "/home/nuck/discore/src_plugins/paella/colab.py", line 66, in sample
logits = model(sampled, t, **model_inputs, attn_weights=attn_weights)
File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nuck/discore/plug-repos/paella/Paella/utils/modules.py", line 273, in forward
c_embed = self.gen_c_embeddings(byt5, clip, clip_image)
File "/home/nuck/discore/plug-repos/paella/Paella/utils/modules.py", line 224, in gen_c_embeddings
seq = self.byt5_mapper(byt5)
File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nuck/discore/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1472 and 2560x1024)
mat1 and mat2 shapes cannot be multiplied (1x1472 and 2560x1024)
File "/home/me/Paella/src/utils.py", line 77, in load_conditional_models
vqgan.load_state_dict(torch.load(vqgan_path, map_location=device)['state_dict'])
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'state_dict'
I'm trying to run the paella_sampling.ipynb locally with 8gb normal ram, but it appears in the 5th block it fills my 16gb swap and my ram.
I run this on a linux laptop through a venv. Is the code for the notebook not optimized/outdated ?
venv commandline: steps
I would very much like to test this project for local install and use on cpu, but I don't know how to use the other files provided.
hi, this is amazing..... but in the colab as soon as multiconditioning starts, it throws this error:
/content/Paella/modules.py in forward(self, x, c, r)
145 r_embed = self.gen_r_embedding(r)
146 x = self.embedding(x).permute(0, 3, 1, 2)
--> 147 s = torch.cat([c, r_embed], dim=-1)[:, :, None, None]
148 level_outputs = self.down_encode(x, s)
149 x = self._up_decode(level_outputs, s)
RuntimeError: Tensors must have same number of dimensions: got 4 and 2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.