nicklucche / stable-diffusion-nvidia-docker Goto Github PK
View Code? Open in Web Editor NEWGPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
License: MIT License
GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
License: MIT License
Great thanks, at this point I think we can close this issue and make another one for the pooled VRAM!
Originally posted by @NickLucche in #5 (comment)
I would like to be able to pool resources (VRAM) from the multiple cards I have installed into one pool. For example,
I have 4x NVIDIA P100 cards installed. I want to combine them all (16GB VRAM each) into 64GB VRAM so that complicated or high-resolution images don't overload the process with a 16GB VRAM limit.
This also would be useful for people with multiple 4GB VRAM consumer/hobbyist cards to reach workable amounts of VRAM without buying enterprise GPUs.
Hi, Thank you so much for this, but am super confused with the Models part and checkpoints and how do I add LoRa?
can you give more examples of the commands ? I already succeeded in running it.
and how do I add negative prompt ? it will be a huge help if you showed me
Thanks again
there's no button like that anywhere...
then as i've done everything ecc the webui page does not load, "no reply"
i've created it with this command and with the token as requested...
"docker run --name stable-diffusion --gpus all -it -e TOKEN=<YOUR_TOKEN> -p 7860:7860 nicklucche/stable-diffusion"
plz help
Hi.I want to use this docker to train models,what should i do?
Thank you so much for your work containerizing this.
I must ask, is multi-GPU support planned soon? I have 4 to 8 cards that I would like to use at once.
Thanks! :)
Sorry to ask such an taboo topics... In other words, how to use this with local models?
Interpolating between two input prompts (like it's done here https://github.com/nateraw/stable-diffusion-videos) looks very cool and not hard to do.
In the easiest implementation, runtime will simply go up by a factor of N (number of frames to generate) from the single image generation.
re-structure the "Samples" section of the README to showcase some of the things you can do with stable diffusion, like fixing a seed and gradually increasing the guidance scale to get results that are progressively "closer" to the prompted input. I think some tips like that could be useful
This is likely caused by an incorrect cuda version (e.g. nvidia-smi
reports gpu driver 11.7, the container uses 11.3).
Hotfix:
# get inside container
docker exec -it stable-diffusion bash
# upgrade pytorch-cuda package
conda install pytorch torchvision cudatoolkit=11.6 -c pytorch -c conda-forge
# exit container
ctrl + D
# restart container
docker restart stable-diffusion
good morning
ive using ur tutorial and installed in my personal gpu server, but it always failed: i check the whole steps and didnt find wheres the problem. including hugging face token.
initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
so i think its the docker problem.
Is there any way to use this without a hugging face token, or to change the SD model to something besides 1.5? Or is it part of the docker script?
Could you possibly edit the docker script so that it no longer requires a HF download?
Hi Nick,
Really appreciate this project. Someone from Nvidia recommends this to me.
I am wondering does this support AUTOMATIC1111? Because looks like that one is really popular and have rich functions.
Hey im trying to install it but I ran into this issue:
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error: Repository Not Found for url: https://huggingface.co/api/models/CompVis/stable-diffusion-v1-4/revision/fp16. If the repo is private, make sure you are authenticated. (R
I have no clue what do to. I already created a new token with Read permissions and I applied it like this:
docker run --name stable-diffusion --gpus all -it -e TOKEN= -p 7860:7860 nicklucche/stable-diffusion
Any help for coding illiterate?
Hi Nick,
Thanks for the work on this project. It's fantastic!
Would you be open to PRs? If so, would you mind pushing the app directory so we can modify and build the container locally?
I am using a Tesla K80 for this container and get the following error when trying to generate images:
'false INTERNAL ASSERT FAILED at "../c10/cuda/CUDAGraphsC10Utils.h":73, please report a bug to PyTorch. Unknown CUDA graph CaptureStatus5399'
I plan to get another 3090, but if it is possible to "pool" cpu and gpu performance and allocate 24GB ram as well as a "shared same amount each", i'd love to try that! Might be dreadfully slow due to "potato cpu vs gpu performance", but worth an experimentation!
Cpu in question here is a 5900x.
Output on 6GB VRAM card is just an all-black image at 512x512 and 384x384.
Might be due to default args we're using when creating the sampler
Hi, I recently got two Tesla P40 GPUs which I was hoping to use with this. From my understanding, the Tesla P40s need the vGPU license in order to pass through via WSL. I am using my tesla cards locally for other applications as well and basically use this as a graphics/machine learning server running windows 11 so I don't really want to install Linux on the PC itself.
Do you see any easy way to run this without docker? Hopefully I'm wrong about the licensing. I tried to export the container run the scripts locally but I honestly don't know what i'm doing with that and didn't make much progress.
diffusers supports using local pretrained weights, instead of requiring an API key:
search this page ( https://github.com/huggingface/diffusers#text-to-image-generation-with-stable-diffusion ) for the following phrase:
Assuming the folder is stored locally under ./stable-diffusion-v1-4, you can also run stable diffusion without requiring an authentication token:
Would it be possible to allow local weight file usage, instead of requiring an API token?
I've tried using this web UI and it seems more featured: https://github.com/hlky/stable-diffusion-webui
Specifically the face fix and upscale models I was able to add and enable.
Is there a way I could use that web UI with your docker container release?
I successfully created the container as the default docker run... command but when I tried setting it up again using the FP16=0 part I get an error "docker: invalid reference format: repository name must be lowercase.".
perhaps I am misunderstanding how to use the variable as I am just getting started with containers in general....
here is my command: docker run --name stablediffusion --gpus all -it -e FP16=0 TOKEN=mytokenhere -p 7860:7860 nicklucche/stable-diffusion
I also tried adding the variable with quotes but it didn¨t work.. any suggestions? (I have a12 GB GPU)
I executed your steps to get this installed and up and running and I was able to get it working with a single 3090 (single GPU operation is the default).
However, when I try to add in my second 3090 with this command:
sudo docker run --name stable-diffusion --pull=always --gpus all -it -p 7860:7860 -e DEVICES=all nicklucche/stable-diffusion
This is the error that I get:
latest: Pulling from nicklucche/stable-diffusion
Digest: sha256:a7bbc5df2f879279513cfa26b51e0c42c1d8298944dc474e2500535ec23b5be4
Status: Image is up to date for nicklucche/stable-diffusion:latest
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
File "server.py", line 22, in <module>
pipeline = init_pipeline()
File "/app/main.py", line 56, in init_pipeline
n_procs, devices, model_parallel_assignment=model_ass, **kwargs
File "/app/parallel.py", line 168, in from_pretrained
with open("./clip_config.pickle", "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './clip_config.pickle'
And I cloned your git into my home directory (which automatically creates the stable-diffusion-nvidia-docker
sub-directory).
Your help and guidance in terms of how I can get multiple GPUs up and running is greatly appreciated.
Thank you.
Hardware:
Intel 6700K
Asus Z170-E motherboard
64 GB DDR4-2400 Unbuffered, non ECC RAM
2x Gigabyte 3090
OS:
Windows 10 22H2
Ubuntu 22.04 LTS running via WSL2 (it is confirmed via PowerShell that it is running WSL2).
I was able to install the "vanilla" Automatic 1111 (as a Docker container), along with Nvidia Docker Container Toolkit, etc.
All of that is up and running.
Thank you.
Could you add the option to change the sampler? thank you
I put up a g4dn.12xlarge instance with 4 T4's, tried a command but ended up with AssertionError :/
[ec2-user@ip ~]$ docker run --name stable-diffusion --gpus all -it -e DEVICES=0,1,2,3 -e MODEL_PARALLEL=1 -e TOKEN=token -p 7860:7860 nicklucche/stable-diffusion:multi-gpu Loading model.. Looking for a valid assignment in which to split model parts to device(s): [0, 1, 2, 3] Free GPU memory (per device): [8665, 8665, 8665, 8665] Search has found that 17 model(s) can be split over 4 device(s)! Assignments: [{0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}, {0: 0, 1: 0, 2: 0, 3: 0}] Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Model parallel worker component assignment: {0: 0, 1: 0, 2: 0, 3: 0} Creating and moving model parts to respective devices.. Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.34k/1.34k [00:00<00:00, 739kB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.5k/12.5k [00:00<00:00, 12.9MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 342/342 [00:00<00:00, 182kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 543/543 [00:00<00:00, 307kB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.63k/4.63k [00:00<00:00, 2.48MB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 608M/608M [00:07<00:00, 77.8MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 209/209 [00:00<00:00, 117kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 209/209 [00:00<00:00, 122kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 572/572 [00:00<00:00, 317kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 246M/246M [00:03<00:00, 72.5MB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 58.8MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 563kB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 788/788 [00:00<00:00, 1.07MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 62.3MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 772/772 [00:00<00:00, 1.07MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.72G/1.72G [00:22<00:00, 75.2MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71.2k/71.2k [00:00<00:00, 37.7MB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 550/550 [00:00<00:00, 300kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 167M/167M [00:02<00:00, 74.1MB/s] Traceback (most recent call last): File "server.py", line 9, in <module> from main import inference, MP as model_parallel File "/app/main.py", line 55, in <module> n_procs, devices, model_parallel_assignment=model_ass, **kwargs File "/app/parallel.py", line 149, in from_pretrained assert d AssertionError
It's loading a lot of models, 17 in fact. Might that be the culprit?
Anyways, if I can participate in testing or help in any way, I'm here to do so :)
Also wondering why it says only 8665MB of free memory when nvidia-smi told me I had 15360MiB per GPU free just before that.
Originally posted by @huotarih in #8 (comment)
I'm running this container with docker-compose. Where are the models downloaded in the container? I would like to persist them in a volume.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.