Code Monkey home page Code Monkey logo

stablediffusion's People

Contributors

aalbersk avatar apolinario avatar cookielau avatar dango233 avatar dependabot[bot] avatar dmarx avatar eltociear avatar enter-tainer avatar hardmaru avatar jamesthesnake avatar kjerk avatar lwneal avatar miao-ju avatar modelearth avatar rromb avatar tracelessle avatar udonda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stablediffusion's Issues

.half call to fix?

I am sorry that I cannot provide my python Traceback log (already reverted my code, sadly).

I was trying to update my previous code based on SD v1, I faced the error message :
Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same conv2d

Since I haven't modified the core part of the model, I thought this error came from the compatibility issue between my custom-made code.

However, I couldn't find where HalfTensor came from. And I found this line from SD v2 code a while after reverting my code :
return checkpoint(self._forward, (x,), self.parameters(), True) # TODO: check checkpoint usage, is True # TODO: fix the .half call!!!

I am wondering where is the ".half call" to fix; it might solve my 2nd attempt for the patch. πŸ‘

Fine tuning the model

Hello! First, thanks a lot for all your work !
Quick question, I tried to fine-tune the v2.0 of the model on new images using the same scripts I was using for the v1.4 v1.5 (dreambooth and textual inversion), but the results are very bad (almost only noise).
1/ Is it normal ? (what is different about the model architecture / training that makes the training scripts not working well with v2.0?
2/ What should I look into to adapt the fine tuning scripts to work with v2.0 ?

Thanks a lot for your answers!

when will Image Upscaling with Stable Diffusion function be available?

Hi Stability-AI teamπŸ˜€

I have a question.
I would appreciate it if you could answer.

I want to use the Image Upscaling with Stable Diffusion function.
However, when I checked the link below, it still didn't seem to work.
https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler

It said "Use it with diffusers (coming soon)".

when will it be available?
And if there are any other demo pages that can use the upscaling function, please let me know.

Thank you.

Linux Conda Xformers Install Issue

Everything goes fine installing, except xformers. When I run pip install -e ., this error occurs:

The detected CUDA version (11.4) mismatches the version that was used to compile
    PyTorch (10.2). Please make sure to use the same CUDA versions.

What is the Minimum RAM required for running the V2

Hey,

Thank you for releasing this great model and making this publicly available.

We are able to run the 1.5 model on an EC2 instance G5XL with 16GB RAM, However when trying to deploy V2 on the same instance the process is killed - I tried deploying on an instance with larger RAM and it deployed ok.
So, my question is: is the minimum memory size increased in V2? what is the minimal RAM required? is there a recommended EC2 instance for running the mode?

Streamlit SD-Upscale x4, CUDA out of memory. Tried to allocate 400.00 GiB

Normally the CUDA oom is a normal thing with smaller GPUs but... 400GiB? I dont think that exists as a gpu so this is obviously a bug.
512x512 input. Goes through every ddim step before kaboom.
Using conda made with the environment yaml. Running on a 4090 machine.
full log:
Traceback (most recent call last): File "c:\users\------\miniconda3\envs\ldm\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 556, in _run_script exec(code, module.__dict__) File "Z:\SD\SD_2.0\stablediffusion\scripts\streamlit\superresolution.py", line 170, in <module> run() File "Z:\SD\SD_2.0\stablediffusion\scripts\streamlit\superresolution.py", line 152, in run result = paint( File "Z:\SD\SD_2.0\stablediffusion\scripts\streamlit\superresolution.py", line 109, in paint x_samples_ddim = model.decode_first_stage(samples) File "c:\users\------\miniconda3\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "z:\sd\sd_2.0\stablediffusion\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage return self.first_stage_model.decode(z) File "z:\sd\sd_2.0\stablediffusion\ldm\models\autoencoder.py", line 90, in decode dec = self.decoder(z) File "c:\users\------\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "z:\sd\sd_2.0\stablediffusion\ldm\modules\diffusionmodules\model.py", line 631, in forward h = self.mid.attn_1(h) File "c:\users\------\miniconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "z:\sd\sd_2.0\stablediffusion\ldm\modules\diffusionmodules\model.py", line 191, in forward w_ = torch.bmm(q,k) # b,hw,hw w[b,i,j]=sum_c q[b,i,c]k[b,c,j] RuntimeError: CUDA out of memory. Tried to allocate 400.00 GiB (GPU 0; 23.99 GiB total capacity; 6.47 GiB already allocated; 0 bytes free; 17.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

the zero shot FID for the Stable Diffusion

In the NVIDIA paper "EDiffi", it said the zero-shot FID for the stable diffusion on the COCO2014 validation set can be 8.59.
82fd7edabfe3b14cafb6dbbcec3d6c5

But in the paper "Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces", it said the reimplemented result is 25.40.
0824424847ea98175161b611ad59d87

Can you tell me the official result for the zero-shot COCO2014 validation set?

no such file or directory: /usr/src/app/.cache/huggingface/hub/models--laion--CLIP-ViT-H-14-laion2B-s32B-b79K/refs/main

I download open_clip_pytorch_model.bin from huggingface and saved it in ./laion/CLIP-ViT-H-14-laion2B-s32B-b79K, but got the erro: no such file or directory: /usr/src/app/.cache/huggingface/hub/models--laion--CLIP-ViT-H-14-laion2B-s32B-b79K/refs/main.

I found that in file factory.py of open_clip lib, the model is required to download in line 156, without checking the model exists locally or not. Anyone can help me?

Installation seems unclear

What is a .ckpt file? There doesn't seem to be any in the repo but it's used in the example. Are we supposed to download these or are they under a different name?

Can a full command line

python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8 --ckpt <path/to/model.ckpt>

Be simplified so that model.ckpt is already in the repro or else downloaded?

Xformer performance decrease

After xformer compilation i have experienced memory usage improvement but performance actually decreased from ~15 it/s to ~12 it/s (512x512). Specs RTX 3090ti plus Intel Xeon Gold 5220R. Just inform:D

PLMS sampling is broken

Using the 768 v-diffusion model, using prompt "fruit basket".

With DDIM sampling:
image

With PLMS sampling:
image

Still using invisible watermark package that rarely works?

I'm surprised version 2 is still using the https://github.com/ShieldMnt/invisible-watermark package which rarely works. The encoding scheme is just not at all robust. The github itself has years old issues submissions of people complaining that it doesn't work that remain unaddressed and unresolved.

Either excise the package completely, since it is almost completely useless, or write your own watermarking code. Maybe something more robust like hamming codes + pixel rounding? (Or you could at least fix the above package so it solves recursively to force it to properly decode into the watermark).

How can i reproduce depth2img samples of yours??

Hello,

I'd like to reproduce your sample results of old_man.png with depth2img checkpoint.

I tried several runs but I couldn't generate good images.

I think starting from setting of your samples will help to make high quality samples.

May i ask the parameters of 'assets/stable-samples/depth2img'???

Should call out the change in UNet model's attention heads

It is well known that in SD2, the text encoder changed, and downstream developers should take a notice and change the text encoder. But it is little known that the UNet model has changed as well. In particular, this line caused most troubles and can explain why a lot of people have problem running base model with their old code:

https://github.com/Stability-AI/stablediffusion/blob/main/configs/stable-diffusion/v2-inference.yaml#L32

Since for most implementations (SDv1 models), the multi-head attention is implemented as one matrix multiplication for many heads, the weights is unchanged and scripts can just take weights in SDv2 as is.

However, because we now fixed on number of head channels rather than number of heads, it will generate garbage values if people who ported Stable Diffusion to other platforms doesn't change their corresponding network configuration as well.

Saw a few mentions of they cannot make 512 base model work on HN and want to call it out here.

which torchtext version?

With torchtext 0.14.0 I got

Traceback (most recent call last):
  File "app.py", line 12, in <module>
    from pytorch_lightning import seed_everything
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
    from pytorch_lightning.metrics.utils import deprecated_metrics, void
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/utils.py", line 29, in <module>
    from pytorch_lightning.utilities import rank_zero_deprecation
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 31, in <module>
    from torchtext.legacy.data import Batch
ModuleNotFoundError: No module named 'torchtext.legacy'

requirements issue

i created a new conda env for this and downloaded the basic requirements but i am stuck on the pip install -e . step, receiving the following error:

does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

Text 2 Mask integration?

I’ve seen elsewhere txt2mask capability integrated into the web application for SD 1.
Was wondering if we could get something similar integrated into SD2 but without having to use the web application and more so we can use the low level scripts. If people want to build on top of those- then that would be great- but in the interest of compatibility- a txt2mask script should be written with native SD in mind.

my use case is that I have a discord bot that takes my prompts etc- I would find it useful to provide a mask prompt - which can then be used as part of the inpainting. Having the txt2mask functionality in a web app is not useful to me - since I am still executing the python scripts that come with SD via a pipeline.

Tokenizer in OpenCLIP seems to be appending 0s rather than eot tokens to the specified length

Previously, stablediffusion uses CLIP's tokenizer, that will appending eot tokens until the specified length (77). It seems that the newer one (at least the one in txt2img.py) uses SimpleTokenizer and will appending 0 until the specified length: https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/tokenizer.py#L183

Not sure what's the implication to the training process would be. I also checked the vocab, 0 does mean ! rather than any special tokens such as <start_of_text> or <end_of_text>.

Watermark bias

I don't know if this is a good place to report such problems, but it seems that the network is overtrained on images containing watermarks. I'm posting an example of where it imprinted clearly recognizable dreamstime.com watermark.
tmp80smcll8

ModuleNotFoundError: No module named 'imwatermark'

Traceback (most recent call last):
  File "scripts/txt2img.py", line 14, in <module>
    from imwatermark import WatermarkEncoder
ModuleNotFoundError: No module named 'imwatermark'

Unintuitively, this is not solved by installing imWatermark, but:

%pip install invisible-watermark

Community Integration: Making AIGC cheaper, faster, and more efficient

Thank you for your rapid and outstanding contribution to Stable Diffusion 2.0!
AIGC has recently risen to be one of the hottest topics in AI. Unfortunately, large hardware requirements and training costs are still a severe impediment to the rapid growth of the AIGC industry. The Stable Diffusion v1 version of the model requires 150,000 A100 GPU Hours for a single training session.

We are happy to share a fantastic solution where the costs of training AIGC models such as stable diffusion can be 7 times cheaper!

Colossal-AI released a complete open-source Stable Diffusion pretraining and fine-tuning solution with the pretraining cost reduced by 6.5 times, and the hardware cost of fine-tuning by 7 times. An RTX 2070/3050 PC is good enough to accomplish the fine-tuning task flow, allowing AIGC models such as Stable Diffusion to be available to a wider community.

Open-source code:https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion

More details can be found on the blog. We are also very happy to provide such improvements for Stable Diffusion 2.0, and believe the democratization of AIGC models is also very helpful for Stable Diffusion 2.0 users. We would appreciate it if we could build the integration with you to benefit both of our users, and we are willing to provide help you need in this cooperation for free.

Thank you very much.

Best regards,
Yongbin Li, HPC-AI Tech

finger problem

Maybe it can solve the finger problem by adding the reverse use of attitude estimation and target detection model

[Bug] The "decode_first_stage" function in DDPM does not respect the "force_not_quantize" parameter

The decode_first_stage function in the ldm/models/diffusion/ddpm.py file looks like this.

    def decode_first_stage(self, z, predict_cids=False, force_not_quantize=False):
        if predict_cids:
            if z.dim() == 4:
                z = torch.argmax(z.exp(), dim=1).long()
            z = self.first_stage_model.quantize.get_codebook_entry(z, shape=None)
            z = rearrange(z, 'b h w c -> b c h w').contiguous()

        z = 1. / self.scale_factor * z
        return self.first_stage_model.decode(z)

The predict_cids and force_not_quantize parameters are accepted but never used.

The last line in the old repo looks like this, which makes more sense:
return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)

So the question is, will it break anything applying this change?

depth2img mode with mask?

Is it possible to use depth2img with an image mask (i.e., in-painting)?

I'm trying to work my way through the scripts themselves and am still trying to grok what exactly is going on in text2img, img2img, and depth2img. What is the best resource for understanding the architecture of StableDiffusion, particularly as-implemented?

Wrong argument in txt2img.py

when running txt2img.py, the argument "--repeat", data = [p for p in data for i in range(opt.repeat)] should be data = [p for p in data for i in range(batch_size)], since we want the prompt to repeat exactly "batch_size" times, to be wrapped up with chunk() function. Otherwise, the shape issue will come up later on.
I have already submitted the PR 55 to potentially fix this.

x4-upscaler encoder diverges for any input, because of half precision ?

Hi,
I'm playing around with the x4-upscaler model and I'm currently trying to pass an image into its encoder, but it keeps diverging for no reason, resulting in nan values. I tried doing some debugging and it gives me the following variables once I reach line 534 in "./ldm/modules/diffusionmodules/model.py":
image
hs is a list of the consecutive outputs of the res/downscale blocks, and we clearly see them diverging to nan values.
x is a tensor filled with zeros in this example, but I've tried with actual images and get roughly the same divergence process.

My first intuition was that the image processing I did wasn't right, but after double checking I am doing just as the inpainting script, eg scaling between -1 and 1, and casting to torch.float32 and 'cuda' device...

My second intuition was that maybe the encoder weights weren't given and thus were random. (?) But checking into the model's checkpoint they are here.

My last intuition is the fact that the tensors in hs are torch.float16, which is highly surprising to me given that mixed precision completely breaks the decoder and that it is thus not enabled.
Any clue on how and why they would be half precision ? on whether on not this could be the source of my problem ? and on my problem overall ?

Thanks in advance for any time you put into answering !

process is killled when i run txt2img.py

Hi, thanks for you great job !
when i run txt2img.py, it broken and killed. Do you know what's the problem ?

DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Killed

ModuleNotFoundError: No module named 'torchtext.legacy'

Walked through the README and got this. I didn't use conda though to install pytorch might try to do that instead

!python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt models/ldm/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 
Traceback (most recent call last):
  File "scripts/txt2img.py", line 11, in <module>
    from pytorch_lightning import seed_everything
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
    from pytorch_lightning.metrics.utils import deprecated_metrics, void
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/utils.py", line 29, in <module>
    from pytorch_lightning.utilities import rank_zero_deprecation
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 31, in <module>
    from torchtext.legacy.data import Batch
ModuleNotFoundError: No module named 'torchtext.legacy'

https://colab.research.google.com/drive/10jKS9pAB2bdN3SHekZzoKzm4jo2F4W1Q?usp=sharing

[Feature Request]: img2txt - Image to text ?

with current technology would it be possible to ask the AI to generate a text from an image? in order to know what technology could describe the image, a tool for AI to describe the image for us.

(com a tecnologia atual seria possivel solicitar a IA gerar um texto a partir de uma imagem ? com a finalidade de saber o que a tecnologia poderia descrever da imagem, uma ferramenta para a IA descrever a imagem para a gente. )

Inpainting Masking

Hi, I have a question that may seem a bit obvious, but I would like some clarification.

In the ddim_sampling here: https://github.com/Stability-AI/stablediffusion/blob/main/ldm/models/diffusion/ddim.py#L157

you have

img = img_orig * mask + (1. - mask) * img

Why isn't it

img = img_orig * (1 - mask) + mask * img

instead?

By taking the original image and multiplying it to the mask (img_orig * mask), doesn't that mean that the new iteration of the image would be the same as the original, since the masked portion is replaced by the original image? I thought the latter would make more sense, so the mask should multiply by the new iteration of the image which gets updated for each iteration.

windows xformers install breaking

hi, i am having issues installing xformers, breaking at pip install -e .

Obtaining file:///C:/Users/heart/Desktop/DEV/xformers
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  Γ— python setup.py egg_info did not run successfully.
  β”‚ exit code: 1
  ╰─> [18 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\heart\Desktop\DEV\xformers\setup.py", line 270, in <module>
          ext_modules=get_extensions(),
        File "C:\Users\heart\Desktop\DEV\xformers\setup.py", line 210, in get_extensions
          cuda_version = get_cuda_version(CUDA_HOME)
        File "C:\Users\heart\Desktop\DEV\xformers\setup.py", line 67, in get_cuda_version
          raw_output = subprocess.check_output([nvcc_bin, "-V"], universal_newlines=True)
        File "C:\Users\heart\anaconda3\envs\stability\lib\subprocess.py", line 421, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "C:\Users\heart\anaconda3\envs\stability\lib\subprocess.py", line 503, in run
          with Popen(*popenargs, **kwargs) as process:
        File "C:\Users\heart\anaconda3\envs\stability\lib\subprocess.py", line 971, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "C:\Users\heart\anaconda3\envs\stability\lib\subprocess.py", line 1440, in _execute_child
          hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
      FileNotFoundError: [WinError 2] The system cannot find the file specified
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

DPM Solver doesn't support FP16 mode

I'm running on an 8Gb card (1070) so have limited VRAM.

One of the changes I make is to use FP16 (Half) to use less VRAM, so for example, in txt2img.py, I have modified the code like this:

seed_everything(opt.seed)

torch.set_default_tensor_type(torch.HalfTensor)

config = OmegaConf.load(f"{opt.config}")
model = load_model_from_config(config, f"{opt.ckpt}")
model = model.half()

the set_default... and model.half() lines are the additions.

This has generally worked okay, except when trying to use --dpm.

With --dpm, there is an error that comes from dpm_solver.py and any use of "torch.linspace"

torch.linspace when running on the CPU (as that part of the DPM Solver does) isn't supported for FP16.

As a workaround I have modified those lines to specify a dtype of float, like this:

self.t_array = torch.linspace(0., 1., self.total_N + 1,dtype=torch.float)[1:].reshape((1, -1))

This seems to be working okay, but I don't know if this is the best fix.

It would be good to get an official fix in the repo for this, and also a command line option to officially support using FP16.

Thanks

CUDA out of memory? (3080ti 12GB)

I just installed Stable Diffusion 2.0 on my Linux box and it's sort of working.

I keep getting "CUDA out of memory" errors.
When using the txt2img example I had to decrease the resolution to 384x384 to avoid a crash.

With the x4 upscaler web interface I always end with a crash like:
CUDA out of memory. Tried to allocate 2.81 GiB (GPU 0; 11.77 GiB total capacity; 7.84 GiB already allocated...

I tried setting PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:64 (and other values) which allowed me to increase the txt2img resoluiton to 512x512, but x4 upscaler still crashes.

Is this normal behavior or do I simply need more VRAM?

My machine is 5900X, 32GB RAM, 3080ti 12GB, Pop!_OS 22.04 LTS.

Any plans to public the training code?

Hi Stability-AI team,
Thank you for your outstanding work!

I looked at the code base and only found the test code, are there any plans to publish the training code?

Thanks again!

ViT-H-14 Model Config missing

Model config for ViT-H-14 not found. Probably easily solvable. FIrst issue tho 😁

LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 865.91 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels ERROR:root:Model config for ViT-H-14 not found; available models ['RN50', 'RN50-quickgelu', 'RN50x4', 'RN50x16', 'RN101', 'RN101-quickgelu', 'timm-efficientnetv2_rw_s', 'timm-resnet50d', 'timm-resnetaa50d', 'timm-resnetblur50', 'timm-swin_base_patch4_window7_224', 'timm-vit_base_patch16_224', 'timm-vit_base_patch32_224', 'timm-vit_small_patch16_224', 'ViT-B-16', 'ViT-B-32', 'ViT-B-32-quickgelu', 'ViT-L-14']. Traceback (most recent call last): File "txt2img.py", line 290, in <module> main(opt) File "txt2img.py", line 191, in main model = load_model_from_config(config, f"{opt.ckpt}") File "txt2img.py", line 35, in load_model_from_config model = instantiate_from_config(config.model) File "/app/stablediffusion/ldm/util.py", line 79, in instantiate_from_config return get_obj_from_str(config["target"])(**config.get("params", dict())) File "/app/stablediffusion/ldm/models/diffusion/ddpm.py", line 563, in __init__ self.instantiate_cond_stage(cond_stage_config) File "/app/stablediffusion/ldm/models/diffusion/ddpm.py", line 630, in instantiate_cond_stage model = instantiate_from_config(config) File "/app/stablediffusion/ldm/util.py", line 79, in instantiate_from_config return get_obj_from_str(config["target"])(**config.get("params", dict())) File "/app/stablediffusion/ldm/modules/encoders/modules.py", line 147, in __init__ model, _, _ = open_clip.create_model_and_transforms(arch, device=torch.device('cpu'), pretrained=version) File "/opt/conda/lib/python3.8/site-packages/open_clip/factory.py", line 133, in create_model_and_transforms model = create_model( File "/opt/conda/lib/python3.8/site-packages/open_clip/factory.py", line 83, in create_model raise RuntimeError(f'Model config for {model_name} not found.') RuntimeError: Model config for ViT-H-14 not found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.