Code Monkey home page Code Monkey logo

docker-diffusers-api's Introduction

docker-diffusers-api ("banana-sd-base")

Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers. Used by kiri.art, perfect for local, server & serverless.

Docker CircleCI semantic-release MIT License Open in Dev Containers

Copyright (c) Gadi Cohen, 2022. MIT Licensed. Please give credit and link back to this repo if you use it in a public project.

Features

  • Models: stable-diffusion, waifu-diffusion, and easy to add others (e.g. jp-sd)
  • Pipelines: txt2img, img2img and inpainting in a single container (all diffusers official and community pipelines are wrapped, but untested)
  • All model inputs supported, including setting nsfw filter per request
  • Permute base config to multiple forks based on yaml config with vars
  • Optionally send signed event logs / performance data to a REST endpoint / webhook.
  • Can automatically download a checkpoint file and convert to diffusers.
  • S3 support, dreambooth training.

Note: This image was created for kiri.art. Everything is open source but there may be certain request / response assumptions. If anything is unclear, please open an issue.

Important Notices

Official help in our dedicated forum https://forums.kiri.art/c/docker-diffusers-api/16.

This README refers to the in-development dev branch and may reference features and fixes not yet in the published releases.

v1 has not yet been officially released yet but has been running well in production on kiri.art for almost a month. We'd be grateful for any feedback from early adopters to help make this official. For more details, see Upgrading from v0 to v1. Previous releases available on the dev-v0-final and main-v0-final branches.

Currently only NVIDIA / CUDA devices are supported. Tracking Apple / M1 support in issue #20.

Installation & Setup:

Setup varies depending on your use case.

  1. To run locally or on a server, with runtime downloads:

    docker run --gpus all -p 8000:8000 -e HF_AUTH_TOKEN=$HF_AUTH_TOKEN gadicc/diffusers-api.

    See the guides for various cloud providers.

  2. To run serverless, include the model at build time:

    1. docker-diffusers-api-build-download ( banana, others)
    2. docker-diffusers-api-runpod, see the guide
  3. Building from source.

    1. Fork / clone this repo.
    2. docker build -t gadicc/diffusers-api .
    3. See CONTRIBUTING.md for more helpful hints.

Other configurations are possible but these are the most common cases

Everything is set via docker build-args or environment variables.

Usage:

See also Testing below.

The container expects an HTTP POST request to /, with a JSON body resembling the following:

{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "width": 512,
    "height": 512,
    "seed": 3239022079
  },
  "callInputs": {
    // You can leave these out to use the default
    "MODEL_ID": "runwayml/stable-diffusion-v1-5",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "LMSDiscreteScheduler",
    "safety_checker": true,
  },
}

It's important to remember that docker-diffusers-api is primarily a wrapper around HuggingFace's diffusers library. Basic familiarity with diffusers is indespensible for a good experience with docker-diffusers-api. Explaining some of the options above:

  • modelInputs - for the most part - are passed directly to the selected diffusers pipeline unchanged. So, for the default StableDiffusionPipeline, you can see all options in the relevant pipeline docs for its __call__ method. The main exceptions are:

    • Only valid JSON values can be given (strings, numbers, etc)
    • seed, a number, is transformed into a generator.
    • images are converted to / from base64 encoded strings.
  • callInputs affect which model, pipeline, scheduler and other lower level options are used to construct the final pipeline. Notably:

    • SCHEDULER: any scheduler included in diffusers should work out the box, provided it can loaded with its default config and without requiring any other explicit arguments at init time. In any event, the following schedulers are the most common and most well tested: DPMSolverMultistepScheduler (fast! only needs 20 steps!), LMSDiscreteScheduler, DDIMScheduler, PNDMScheduler, EulerAncestralDiscreteScheduler, EulerDiscreteScheduler.

    • PIPELINE: the most common are StableDiffusionPipeline, StableDiffusionImg2ImgPipeline, StableDiffusionInpaintPipeline, and the community lpw_stable_diffusion which allows for long prompts (more than 77 tokens) and prompt weights (things like ((big eyes)), (red hair:1.2), etc), and accepts a custom_pipeline_method callInput with values text2img ("text", not "txt"), img2img and inpaint. See these links for all the possible modelInputs's that can be passed to the pipeline's __call__ method.

    • MODEL_URL (optional) can be used to retrieve the model from locations other than HuggingFace, e.g. an HTTP server, S3-compatible storage, etc. For more info, see the storage docs and this post for info on how to use and store optimized models from your own cloud.

Examples and testing

There are also very basic examples in test.py, which you can view and call python test.py if the container is already running on port 8000. You can also specify a specific test, change some options, and run against a deployed banana image:

$ python test.py
Usage: python3 test.py [--banana] [--xmfe=1/0] [--scheduler=SomeScheduler] [all / test1] [test2] [etc]

# Run against http://localhost:8000/ (Nvidia Quadro RTX 5000)
$ python test.py txt2img
Running test: txt2img
Request took 5.9s (init: 3.2s, inference: 5.9s)
Saved /home/dragon/www/banana/banana-sd-base/tests/output/txt2img.png

# Run against deployed banana image (Nvidia A100)
$ export BANANA_API_KEY=XXX
$ BANANA_MODEL_KEY=XXX python3 test.py --banana txt2img
Running test: txt2img
Request took 19.4s (init: 2.5s, inference: 3.5s)
Saved /home/dragon/www/banana/banana-sd-base/tests/output/txt2img.png

# Note that 2nd runs are much faster (ignore init, that isn't run again)
Request took 3.0s (init: 2.4s, inference: 2.1s)

The best example of course is https://kiri.art/ and it's source code.

Adding other Models

You have two options.

  1. For a diffusers model, simply set MODEL_ID build-var / call-arg to the name of the model hosted on HuggingFace, and it will be downloaded automatically at build time.

  2. For a non-diffusers model, simply set the CHECKPOINT_URL build-var / call-arg to the URL of a .ckpt file, which will be downloaded and converted to the diffusers format automatically at build time. CHECKPOINT_CONFIG_URL can also be set.

Troubleshooting

  • 403 Client Error: Forbidden for url

    Make sure you've accepted the license on the model card of the HuggingFace model specified in MODEL_ID, and that you correctly passed HF_AUTH_TOKEN to the container.

Event logs / web hooks / performance data

Set SEND_URL (and optionally SIGN_KEY) environment variable(s) to send event and timing data on init, inference and other start and end events. This can either be used to log performance data, or for webhooks on event start / finish.

The timing data is now returned in the response payload too, like this: { $timings: { init: timeInMs, inference: timeInMs } }, with any other events (such a training, upload, etc).

You can go to https://webhook.site/ and use the provided "unique URL" as your SEND_URL to see how it works, if you don't have your own REST endpoint (yet).

If SIGN_KEY is used, you can verify the signature like this (TypeScript):

import crypto from "crypto";

async function handler(req: NextApiRequest, res: NextApiResponse) {
  const data = req.body;

  const containerSig = data.sig as string;
  delete data.sig;

  const ourSig = crypto
    .createHash("md5")
    .update(JSON.stringify(data) + process.env.SIGN_KEY)
    .digest("hex");

  const signatureIsValid = containerSig === ourSig;
}

If you send a callInput called startRequestId, it will get sent back as part of the send payload in most cases.

You can also set callInputs SEND_URL and SIGN_KEY to set or override these values on a per-request basis.

Acknowledgements

docker-diffusers-api's People

Contributors

aroop avatar devanmolsharma avatar gadicc avatar msuess avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-diffusers-api's Issues

Improve README

Notes sent by DM and my interim comments:

i. If you encounter a 403 Client Error: Forbidden for url, make sure you have accepted the model license (https://huggingface.co/CompVis/stable-diffusion-v1-4) and specified the correct HF_AUTH_TOKEN in your env.

(Building the docker image took more than 70 minutes on my 16G RAM laptop, maybe you could tell it, great checkpoint to go for a coffee or a walk πŸ˜‰ ) => downloading all models was a bit painful.

Noted :)

I saw in the readme, "easy to add other models" but can find any documentation, can you point me in the right direction please?

  1. For a diffusers model, it's symbol as setting the MODEL_ID var (I'll come back to vars in a sec) and rebuilding. It will download everything you need at build.

  2. For a non-diffusers model, as symbol as setting CHECKPOINT_URL (and an arbitrary MODEL_ID to your liking that's used internally but doesn't actually make a difference beyond which directory name is used).

At least, it should be! If not, let me know πŸ™‚

the '2. Variables' is really hard to understand.

  1. If you only intend to deploy one repo, ignore all the permutations stuff, and just set the vars in the Dockerfile. (Very soon banana is launching custom build vars, and you won't need to edit the dockerfile at all).

  2. If you plan to deploy multiple models, basically what the permute script does is look at the .yaml file which specifies all the different permutations you want and creates a bunch of repos for you (under the permutations dir), where it automatically substited all the vars from the yaml file into those sub-repo Dockerfiles.

Hope that made sense!

inpaint error

I try to run this project locally with Docker, but every time I use the inpaint function, the model input parameters from frontend to the backend are wrong in the dev branch. I want to know how your backend inpaint model is used. At the same time, in main branch I met mask_image and image in tensor.cat error, and image input channels Inconsistent error

Banana.dev - does not appear to have a file named model_index.json.

Hey,

I thought I would try the Banana extension repo today. I have tried changing the ENV variable model_id to a few different models, but I keep getting this error:

'message': '', 'created': 1676709740, 'apiVersion': 'January 11, 2023', 'modelOutputs': [{'$error': {'code': 'APP_INFERENCE_ERROR', 'name': 'OSError', 'message': 'stabilityai/stable-diffusion-2-1-base does not appear to have a file named model_index.json.', 'stack': 'Traceback (most recent call last):\n File "/api/diffusers/src/diffusers/configuration_utils.py", line 326, in load_config\n config_file = hf_hub_download(\n File "/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn\n return fn(*args, **kwargs)\n File "/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1205, in hf_hub_download\n raise LocalEntryNotFoundError(\nhuggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set \'local_files_only\' to False.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/api/server.py", line 39, in inference\n output = user_src.inference(model_inputs)\n File "/api/app.py", line 227, in inference\n pipeline = getPipelineForModel(pipeline_name, model, normalized_model_id)\n File "/api/getPipeline.py", line 83, in getPipelineForModel\n pipeline = DiffusionPipeline.from_pretrained(\n File "/api/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 462, in from_pretrained\n config_dict = cls.load_config(\n File "/api/diffusers/src/diffusers/configuration_utils.py", line 354, in load_config\n raise EnvironmentError(\nOSError: stabilityai/stable-diffusion-2-1-base does not appear to have a file named model_index.json.\n'}}]}

Make sure xformers is installed correctly and a GPU is available

Thanks for this repo.

Model is from hugging face rdmodel:

I'm getting the following error while running :
docker build -t banana-sd --build-arg HF_AUTH_TOKEN=${MY_HF_WRITE_TOKEN} .

(Running on Mac M1)

 => ERROR [output 26/33] RUN python3 download.py                        1731.0s
------
 > [output 26/33] RUN python3 download.py:
#33 25.25 Downloading model: rdcoder/rd-model-1
#33 25.25 Initializing DPMSolverMultistepScheduler for rdcoder/rd-model-1...
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 289/289 [00:00<00:00, 82.2kB/s]
#33 27.70 Initialized DPMSolverMultistepScheduler for rdcoder/rd-model-1 in 2612ms
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 543/543 [00:00<00:00, 47.8kB/s]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 342/342 [00:00<00:00, 30.0kB/s]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.72k/4.72k [00:00<00:00, 427kB/s]]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.22G/1.22G [06:18<00:00, 3.22MB/s]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 612/612 [00:00<00:00, 118kB/s]1s/it]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 492M/492M [02:50<00:00, 2.89MB/s]t] 
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 525k/525k [00:01<00:00, 328kB/s]it]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 472/472 [00:00<00:00, 57.4kB/s]/it]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 792/792 [00:00<00:00, 55.2kB/s]/it]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.06M/1.06M [00:02<00:00, 431kB/s]t]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 876/876 [00:00<00:00, 102kB/s]9s/it]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.44G/3.44G [16:51<00:00, 3.40MB/s]]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 552/552 [00:00<00:00, 38.4kB/s]2s/it]
Downloading: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 335M/335M [01:35<00:00, 3.51MB/s]/it]
Fetching 15 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15/15 [28:05<00:00, 112.37s/it]
#33 1714.5 /api/diffusers/src/diffusers/models/attention.py:433: UserWarning: Could not enable memory efficient attention. Make sure xformers is installed correctly and a GPU is available: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU 
#33 1714.5   warnings.warn(
#33 1729.9 /tmp/tmpt4f85uyv: line 3:    58 Killed                  /bin/bash -c 'python3 download.py'
#33 1730.1 ERROR conda.cli.main_run:execute(47): `conda run /bin/bash -c python3 download.py` failed. (See above for error)
------
executor failed running [/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c python3 download.py]: exit code: 137

Important Dockerfile lines:

ARG MODEL_ID="rdcoder/rd-model-1"
ARG PRECISION=""
ARG PIPELINE="StableDiffusionPipeline"
ARG USE_DREAMBOOTH=0

permutations.yaml :

list:

  - name: rd-model-name
    HF_AUTH_TOKEN: $HF_AUTH_TOKEN
    MODEL_ID: rdcoder/rd-model-1
    PIPELINE: StableDiffusionPipeline

Cuda out of memory error.

I'm able to Post to the API in docker on my local machine. I get a 200 success after the Inpainting function is finished. Then on my fontend when i get the Data back it's returning this:

{"$error":{"code":"PIPELINE_ERROR","name":"RuntimeError","message":"CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 8.00 GiB total capacity; 7.16 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation

I have a:
Aleen Laptop
RTX 3070
64 mb Ram

Thanks! -foo

Blurred & noisy images when used with stable-diffusion-2 and stable-diffusion-2-1

``When i try to build the container from the Dockerfile. Passing either stabilityai/stable-diffusion-2-1 or `stabilityai/stable-diffusion-2-1. Im getting very noisy and blurry images with the parameters adapted from the example:

{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 10,
    "guidance_scale": 7.5,
    "width": 1024,
    "height": 1024,
    "seed": 3239022079
  },
  "callInputs": {
    "MODEL_ID": "stabilityai/stable-diffusion-2-1",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "LMSDiscreteScheduler",
    "safety_checker": true
  }
}

The output looks like this:

cbimage

Am i missing something important here or this a bug?

Apple M1 / M2 / MPS support

Hi!

I just downladed de proyect and try to build and deploy the docker on my M1.
I always get the same error.
[+] Building 163.5s (15/44)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 6.90kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime 1.5s
[+] Building 163.6s (15/44)
=> => transferring context: 64.22kB 0.0s
=> CACHED [base 1/5] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75 0.0s
=> [base 2/5] RUN if [ -n "" ] ; then echo quit | openssl s_client -proxy $(echo | cut -b 8-) -servername google.com -connect google.com:443 -showcerts | sed 'H;1h; 0.3s
=> [base 3/5] RUN apt-get update 14.3s
=> [base 4/5] RUN apt-get install -yqq git 27.6s
[+] Building 1320.8s (18/44)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 6.90kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime 1.5s
=> [internal] load build context 0.0s
=> => transferring context: 64.22kB 0.0s
=> CACHED [base 1/5] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75 0.0s
=> [base 2/5] RUN if [ -n "" ] ; then echo quit | openssl s_client -proxy $(echo | cut -b 8-) -servername google.com -connect google.com:443 -showcerts | sed 'H;1h; 0.3s
=> [base 3/5] RUN apt-get update 14.3s
=> [base 4/5] RUN apt-get install -yqq git 27.6s
=> [base 5/5] RUN apt-get install -yqq zstd 8.3s
=> [output 1/32] RUN mkdir /api 0.5s
=> [patchmatch 1/3] WORKDIR /tmp 0.0s
=> [patchmatch 2/3] COPY scripts/patchmatch-setup.sh . 0.0s
=> [patchmatch 3/3] RUN sh patchmatch-setup.sh 0.4s
=> [output 2/32] WORKDIR /api 0.0s
=> [output 3/32] RUN conda update -n base -c defaults conda 101.1s
=> [output 4/32] RUN conda create -n xformers python=3.10 33.9s
=> [output 5/32] RUN python --version 6.3s
=> ERROR [output 6/32] RUN conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1 1126.9s

[output 6/32] RUN conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1:
#14 9.049 Collecting package metadata (current_repodata.json): ...working... done
#14 85.41 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#14 85.44 Collecting package metadata (repodata.json): ...working... done
#14 489.9 Solving environment: ...working... done
#14 619.6
#14 619.6 ## Package Plan ##
#14 619.6
#14 619.6 environment location: /opt/conda/envs/xformers
#14 619.6
#14 619.6 added / updated specs:
#14 619.6 - cudatoolkit=11.6
#14 619.6 - pytorch=1.12.1
#14 619.6
#14 619.6
#14 619.6 The following packages will be downloaded:
#14 619.6
#14 619.6 package | build
#14 619.6 ---------------------------|-----------------
#14 619.6 blas-1.0 | mkl 6 KB
#14 619.6 ca-certificates-2022.12.7 | ha878542_0 143 KB conda-forge
#14 619.6 certifi-2022.12.7 | pyhd8ed1ab_0 147 KB conda-forge
#14 619.6 cudatoolkit-11.6.0 | hecad31d_10 821.2 MB conda-forge
#14 619.6 intel-openmp-2022.1.0 | h9e868ea_3769 4.5 MB
#14 619.6 mkl-2022.1.0 | hc2b9512_224 129.7 MB
#14 619.6 pytorch-1.12.1 |py3.10_cuda11.6_cudnn8.3.2_0 1.20 GB pytorch
#14 619.6 pytorch-mutex-1.0 | cuda 3 KB pytorch
#14 619.6 typing_extensions-4.4.0 | pyha770c72_0 29 KB conda-forge
#14 619.6 ------------------------------------------------------------
#14 619.6 Total: 2.13 GB
#14 619.6
#14 619.6 The following NEW packages will be INSTALLED:
#14 619.6
#14 619.6 blas pkgs/main/linux-64::blas-1.0-mkl
#14 619.6 cudatoolkit conda-forge/linux-64::cudatoolkit-11.6.0-hecad31d_10
#14 619.6 intel-openmp pkgs/main/linux-64::intel-openmp-2022.1.0-h9e868ea_3769
#14 619.6 mkl pkgs/main/linux-64::mkl-2022.1.0-hc2b9512_224
#14 619.6 pytorch pytorch/linux-64::pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0
#14 619.6 pytorch-mutex pytorch/noarch::pytorch-mutex-1.0-cuda
#14 619.6 typing_extensions conda-forge/noarch::typing_extensions-4.4.0-pyha770c72_0
#14 619.6
#14 619.6 The following packages will be UPDATED:
#14 619.6
#14 619.6 ca-certificates pkgs/main::ca-certificates-2022.10.11~ --> conda-forge::ca-certificates-2022.12.7-ha878542_0
#14 619.6 certifi pkgs/main/linux-64::certifi-2022.9.24~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0
#14 619.6
#14 619.6
#14 619.6 Proceed ([y]/n)?
#14 619.6
#14 619.6 Downloading and Extracting Packages

#14 1110.5 CondaError: Downloaded bytes did not match Content-Length
#14 1110.5 url: https://conda.anaconda.org/pytorch/linux-64/pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0.tar.bz2
#14 1110.5 target_path: /opt/conda/pkgs/pytorch-1.12.1-py3.10_cuda11.6_cudnn8.3.2_0.tar.bz2
#14 1110.5 Content-Length: 1284916176
#14 1110.5 downloaded bytes: 1100035059
#14 1110.5
#14 1110.5
#14 1110.5
#14 1126.1 ERROR conda.cli.main_run:execute(47): conda run /bin/bash -c conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1 failed. (See above for error)

executor failed running [/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1]: exit code: 1

I understand that it's a download problem, but I'm not good at docker to be able to fix this problem.

Any suggestions?

Img2Img and inpainting no longer working with 1.5

They are returning brand new images. Same code works with 1.4

Edit:
For clarity, the inpainting works with runwayml/stable-diffusion-inpainting but doing either img2img or inpainting with runwayml/stable-diffusion-v1-5 returns brand new images

Any simple way to monetize a setup like this?

I'm not a developer, but I know WordPress well, and I've worked with Python enough to tweak things.

I have an idea for a site, but I have no idea how to setup a payment system connected to credits or whatnot. Do you know if there is something like that open source?

Thanks!

Is CPU only supported?

Hi,

I'm trying to run the project on my server that only have a CPU, is that possible and if so which parameters do I need to apply?

I'm already running the container without the "--gpus all" parameter

I believe I'm running version 1.6.0

Here is the error i'm getting

{
    "$error": {
        "code": "APP_INFERENCE_ERROR",
        "name": "ValueError",
        "message": "torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU ",
        "stack": "Traceback (most recent call last):\n  File \"/api/server.py\", line 53, in inference\n    output = await user_src.inference(all_inputs, streaming_response)\n  File \"/api/app.py\", line 442, in inference\n    pipeline.enable_xformers_memory_efficient_attention()  # default on\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1453, in enable_xformers_memory_efficient_attention\n    self.set_use_memory_efficient_attention_xformers(True, attention_op)\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1479, in set_use_memory_efficient_attention_xformers\n    fn_recursive_set_mem_eff(module)\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 1469, in fn_recursive_set_mem_eff\n    module.set_use_memory_efficient_attention_xformers(valid, attention_op)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 227, in set_use_memory_efficient_attention_xformers\n    fn_recursive_set_mem_eff(module)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n    fn_recursive_set_mem_eff(child)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n    fn_recursive_set_mem_eff(child)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 223, in fn_recursive_set_mem_eff\n    fn_recursive_set_mem_eff(child)\n  File \"/api/diffusers/src/diffusers/models/modeling_utils.py\", line 220, in fn_recursive_set_mem_eff\n    module.set_use_memory_efficient_attention_xformers(valid, attention_op)\n  File \"/api/diffusers/src/diffusers/models/attention_processor.py\", line 200, in set_use_memory_efficient_attention_xformers\n    raise ValueError(\nValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU \n"
    }
}

Need help using custom ckpt file from S3

Hello!

I'm trying to use a custom ckpt to deploy to banana. My file is in S3 and I tried setting the CHECKPOINT_URL ARG in the Dockerfile with no luck (looks like the default stability weight got loaded instead of my ckpt in the S3 bucket).

I tried setting MODEL_URL to the s3 location as well, and not seeing too much luck either (it reports a tar error of the ckpt file from s3).

Am I approaching this the wrong way? I digged around and saw there's code to convert ckpt to diffuser format (and use it while building on banana). Would appreciate some guidance, thank you! πŸ™πŸ»

Clearer error when container's data isn't in `{ modelInputs, callInputs }` format.

Not at all obvious from this discord message:

Starting worker [6] { "prompt": "a sunset" } [2022-11-04 14:02:16 +0000] [6] [ERROR] Exception occurred while handling uri: 'http://0.0.0.0:8000/' Traceback (most recent call last): File "handle_request", line 81, in handle_request FutureStatic, File "server.py", line 36, in inference output = user_src.inference(model_inputs) File "/api/app.py", line 144, in inference startRequestId = call_inputs.get("startRequestId", None) AttributeError: 'NoneType' object has no attribute 'get' [2022-11-04 14:02:16 +0000] - (sanic.access)[INFO][127.0.0.1:53964]: POST http://0.0.0.0:8000/ 500 139

Fetching files on requests cause timeout

Hi,

I started locally an instance and run a generation request from the documentation. However I receive a timeout that seems to be caused by the fact that the request timeout from the hugging face CDN.

Pulling 15 files take times, is it possible to warmup the server and download all this before? Or keep the download instead of dropping them.

Thanks!

Docker command

docker run -it --rm \
  --gpus all  \
  -p 3000:8000 \
  -e HF_AUTH_TOKEN="XXX" \
  -e AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
  -e AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
  -e AWS_DEFAULT_REGION="$AWS_DEFAULT_REGION" \
  -e AWS_S3_ENDPOINT_URL="$AWS_S3_ENDPOINT_URL" \
  -e AWS_S3_DEFAULT_BUCKET="$AWS_S3_DEFAULT_BUCKET" \
  -v ~/root-cache:/root/.cache \
  "$@" gadicc/diffusers-api:latest

Request

{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "width": 512,
    "height": 512,
    "seed": 3239022079
  },
  "callInputs": {
    "MODEL_ID": "runwayml/stable-diffusion-v1-5",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "LMSDiscreteScheduler",
    "safety_checker": true
  }
}

Server log


{
  "modelInputs": {
    "prompt": "Super dog",
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "width": 512,
    "height": 512,
    "seed": 3239022079
  },
  "callInputs": {
    "MODEL_ID": "runwayml/stable-diffusion-v1-5",
    "PIPELINE": "StableDiffusionPipeline",
    "SCHEDULER": "LMSDiscreteScheduler",
    "safety_checker": true
  }
}
download_model {'model_url': None, 'model_id': 'runwayml/stable-diffusion-v1-5', 'model_revision': None, 'hf_model_id': None}
loadModel {'model_id': 'runwayml/stable-diffusion-v1-5', 'load': False, 'precision': None, 'revision': None}
Downloading model: runwayml/stable-diffusion-v1-5
Fetching 15 files:   0%|                                                                                                                                                                       | 0/15 [00:00<?, ?it/s]
Downloading (…)"model.safetensors";:   0%|                                                                                                                                                | 0.00/1.22G [00:00<?, ?B/s]
Downloading (…)_model.safetensors";:   0%|                                                                                                                                                 | 0.00/335M [00:00<?, ?B/s]
Downloading (…)_model.safetensors";:   0%|                                                                                                                                                | 0.00/3.44G [00:00<?, ?B/s]
Downloading (…)"model.safetensors";:   0%|                                                                                                                                                 | 0.00/492M [00:00<?, ?B/s]



Downloading (…)"model.safetensors";:   2%|β–ˆβ–ˆβ–Ž                                                                                                                                   | 21.0M/1.22G [02:04<1:56:43, 171kB/s]
Downloading (…)_model.safetensors";:   3%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                                                                    | 10.5M/335M [01:49<47:32, 114kB/s]
Downloading (…)_model.safetensors";:   0%|▍                                                                                                                                     | 10.5M/3.44G [01:38<7:39:24, 124kB/s]

Fetching 15 files:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                                                                               | 3/15 [02:59<11:58, 59.84s/it]
Downloading (…)"model.safetensors";:   2%|β–ˆβ–ˆβ–Ž                                                                                                                                   | 21.0M/1.22G [02:58<2:49:42, 117kB/s]
Downloading (…)"model.safetensors";:   2%|β–ˆβ–ˆβ–Š                                                                                                                                   | 10.5M/492M [02:57<2:16:15, 58.9kB/s]
Downloading (…)_model.safetensors";:   0%|▍                                                                                                                                   | 10.5M/3.44G [02:58<16:10:00, 58.9kB/s]
Downloading (…)_model.safetensors";:   6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                                                | 21.0M/335M [02:58<44:24, 118kB/s]
[2023-01-31 16:17:45 +0000] - (sanic.access)[INFO][172.17.0.1:59646]: POST http://localhost:8000/  200 5470

Postman side

{
    "$error": {
        "code": "APP_INFERENCE_ERROR",
        "name": "ConnectionError",
        "message": "HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.",
        "stack": "Traceback (most recent call last):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 444, in _error_catcher\n    yield\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 567, in read\n    data = self._fp_read(amt) if not fp_closed else b\"\"\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 533, in _fp_read\n    return self._fp.read(amt) if amt is not None else self._fp.read()\n  File \"/opt/conda/envs/xformers/lib/python3.9/http/client.py\", line 463, in read\n    n = self.readinto(b)\n  File \"/opt/conda/envs/xformers/lib/python3.9/http/client.py\", line 507, in readinto\n    n = self.fp.readinto(b)\n  File \"/opt/conda/envs/xformers/lib/python3.9/socket.py\", line 704, in readinto\n    return self._sock.recv_into(b)\n  File \"/opt/conda/envs/xformers/lib/python3.9/ssl.py\", line 1242, in recv_into\n    return self.read(nbytes, buffer)\n  File \"/opt/conda/envs/xformers/lib/python3.9/ssl.py\", line 1100, in read\n    return self._sslobj.read(len, buffer)\nsocket.timeout: The read operation timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/requests/models.py\", line 816, in generate\n    yield from self.raw.stream(chunk_size, decode_content=True)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 628, in stream\n    data = self.read(amt=amt, decode_content=decode_content)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 593, in read\n    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)\n  File \"/opt/conda/envs/xformers/lib/python3.9/contextlib.py\", line 137, in __exit__\n    self.gen.throw(typ, value, traceback)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/urllib3/response.py\", line 449, in _error_catcher\n    raise ReadTimeoutError(self._pool, None, \"Read timed out.\")\nurllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/api/server.py\", line 39, in inference\n    output = user_src.inference(model_inputs)\n  File \"/api/app.py\", line 178, in inference\n    download_model(\n  File \"/api/download.py\", line 148, in download_model\n    loadModel(\n  File \"/api/loadModel.py\", line 59, in loadModel\n    model = pipeline.from_pretrained(\n  File \"/api/diffusers/src/diffusers/pipelines/pipeline_utils.py\", line 524, in from_pretrained\n    cached_folder = snapshot_download(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn\n    return fn(*args, **kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py\", line 215, in snapshot_download\n    thread_map(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/tqdm/contrib/concurrent.py\", line 94, in thread_map\n    return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/tqdm/contrib/concurrent.py\", line 76, in _executor_map\n    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/tqdm/std.py\", line 1195, in __iter__\n    for obj in iterable:\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/_base.py\", line 609, in result_iterator\n    yield fs.pop().result()\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/_base.py\", line 446, in result\n    return self.__get_result()\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/_base.py\", line 391, in __get_result\n    raise self._exception\n  File \"/opt/conda/envs/xformers/lib/python3.9/concurrent/futures/thread.py\", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py\", line 194, in _inner_hf_hub_download\n    return hf_hub_download(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn\n    return fn(*args, **kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 1282, in hf_hub_download\n    http_get(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 530, in http_get\n    for chunk in r.iter_content(chunk_size=10 * 1024 * 1024):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/requests/models.py\", line 822, in generate\n    raise ConnectionError(e)\nrequests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.\n"
    }
}

Add VAE to txt-to-speech Inference

Hey hey!

So I am using some models that either have VAE baked in or require a separate VAE to be defined during inference like this:

model = "CompVis/stable-diffusion-v1-4"
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)

when I either manually added the vae or used a model with a vae baked in for the MODEL_ID, I received the following error, for example with the model dreamlike-art/dreamlike-photoreal-2.0

'name': 'RuntimeError', 'message': 'Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same', 'stack': 'Traceback (most recent call last):\n  File "/api/app.py", line 382, in inference\n    images = pipeline(**model_inputs).images\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context\n    return func(*args, **kwargs)\n  File "/api/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 606, in __call__\n    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prompt_embeds).sample\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/api/diffusers/src/diffusers/models/unet_2d_condition.py", line 475, in forward\n    sample = self.conv_in(sample)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl\n    return forward_call(*input, **kwargs)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward\n    return self._conv_forward(input, self.weight, self.bias)\n  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward\n    return F.conv2d(input, weight, bias, self.stride,\nRuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Line 382 in the inference function which looks like this:

images = pipeline(**model_inputs).images

Perhaps we need to add a .half() to the input somewhere, not sure where. though.

Any help would be greatly appreciated!

It's the last hurdle I am facing to be generating images.

IDEA:
It would be awesome if we could define an optional VAE when making API call like this:

model_inputs["callInputs"] = {
                "MODEL_ID": "runwayml/stable-diffusion-v1-5",
                "PIPELINE": "StableDiffusionPipeline",
                "SCHEDULER": self.scheduler,
                "VAE": "stabilityai/sd-vae-ft-mse"
            }

ldm upsampling

this is on my list but so far i only know of one other person interested in it, so, thumbs up if you want it too :)

Patchmatch for outpainting

Follow on from #1 (merged to patchmatch branch).

Hey @msuess, thanks again for your awesome work here.
I've finally had a chance to look at this a bit more, sorry it took me a while.

I'm less familiar with PatchMatch, but from my understanding - and please correct me if I'm wrong - there's nothing diffusers or even GPU specific to the code here, which makes me think it would be better to put in it's own container, and even host somewhere with CPU serverless for faster times and cheaper bills. Thoughts?

Also, 2 days ago, a Stable Diffusion model finetuned for inpainting was released, and apparently works really well for outpainting too. Would love your feedback if you have a chance, to let us know how it compares to patchmatch, and we should proceed with both models (See https://github.com/kiri-art/docker-diffusers-api/blob/main/CHANGELOG.md for more info).

The automated release is failing 🚨

🚨 The automated release from the split branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this πŸ’ͺ.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the split branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Cannot push to the Git repository.

semantic-release cannot push the version tag to the branch split on the remote Git repository with URL https://[secure]@github.com/kiri-art/docker-diffusers-api.git.

This can be caused by:


Good luck with your project ✨

Your semantic-release bot πŸ“¦πŸš€

Ability to load local models

Hi, thanks for making this project available!

I was wondering if it is possible to point directly to local models? Instead of downloading from a URL or HF Hub?

startRequestId used in csend should be optional

As reported in https://discord.com/channels/771185033779609630/775513653461516299/1037716160168329287

 Exception occurred while handling uri: 'http://0.0.0.0:8000/' Traceback (most recent call last):
 File "handle_request", line 81, in handle_request FutureStatic,
 File "server.py", line 36, in inference output = user_src.inference(model_inputs)
 File "/api/app.py", line 140, in inference startRequestId = call_inputs.get("startRequestId", None) AttributeError: 'NoneType' object has no attribute 'get'
 [2022-11-03 13:11:11 +0000] - (sanic.access)[INFO][127.0.0.1:55872]: POST http://0.0.0.0:8000/ 500 139

Will fix this! In the meantime:

{
  // CallInputs section (MODEL_ID, etc)
  startRequestId: "ANYTHING"
}

It used to be (optional) but due to a different feature that obviously broke this.

Currently RUNTIME_DOWNOADS requires a MODEL_URL callInput

probably a noob question, but after sending a request to a txt2img model from local UI to remote banana, the below error is returned. any suggestions?

{
"callID": "e850c01c-4835-4330-86a5-b2045c225951",
"finished": false,
"modelOutputs": [
{
"$error": {
"code": "NO_MODEL_URL",
"message": "Currently RUNTIME_DOWNOADS requires a MODEL_URL callInput"
}
}
],
"message": ""
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.