Code Monkey home page Code Monkey logo

keras-sd-serving's Introduction

Various ways of serving Stable Diffusion

This repository shows a various ways to deploy Stable Diffusion. Currently, we are interested in the Stable Diffusion implementation from keras-cv, and the target platforms/frameworks that we aim includes TF Serving, Hugging Face Endpoint, and FastAPI.

From the version 0.4.0 release of keras-cv, StableDiffusionV2 is included, and this repository support both version 1 and 2 of the Stable Diffusion.

1. All in One Endpoint

This method shows how to deploy Stable Diffusion as a whole in a single endpoint. Stable Diffusion consists of three models(encoder, diffusion model, decoder) and some glue codes to handle the inputs and outputs of each models. In this scenario, everything is packaged into a single Endpoint.

  • Hugging Face ๐Ÿค— Endpoint: In order to deploy something in Hugging Face Endpoint, we need to create a custom handler. Hugging Face Endpoint let us easily deploy any machine learning models with pre/post processing logics in a custom handler [Colab | Standalone Codebase]

  • FastAPI Endpoint: [Colab | Standalone]

    • Docker Image: gcr.io/gcp-ml-172005/sd-fastapi-allinone:latest

2. Three Endpoints

This method shows how to deploy Stable Diffusion in three separate Endpoints. As a preliminary work, this notebook was written to demonstrate how to split three parts of Stable Diffusion into three separate modules. In this example, you will see how to interact with three different endpoints to generate images with a given text prompt.

  • Hugging Face Endpoint: [Colab | Text Encoder | Diffusion Model | Decoder]

  • FastAPI Endpoint: [Central | Text Encoder | Diffusion Model | Decoder]

    • Docker Image(text-encoder): gcr.io/gcp-ml-172005/sd-fastapi-text-encoder:latest
    • Docker Image(diffusion-model): gcr.io/gcp-ml-172005/sd-fastapi-diffusion-model:latest
    • Docker Image(decoder): gcr.io/gcp-ml-172005/sd-fastapi-decoder:latest
  • TF Serving Endpoint: [Colab | Dockerfiles + k8s Resources]

    • SavedModel: [Colab | Text Encoder | Diffusion Model | Decoder]
      • wrapping encoder, diffusion model, and decoder and some glue codes in separate SavedModels. With them, we can not only deploy each models on cloud with TF Serving but also embed in web and mobild applications with TFJS and TFLite. We will explore the embedded use cases later phase of this project.
    • Docker Images
      • text-encoder: gcr.io/gcp-ml-172005/tfs-sd-text-encoder:latest
      • text-encoder w/ base64: gcr.io/gcp-ml-172005/tfs-sd-text-encoder-base64:latest
      • text-encoder-v2: gcr.io/gcp-ml-172005/tfs-sd-text-encoder-v2:latest
      • text-encoder-v2 w/ base64: gcr.io/gcp-ml-172005/tfs-sd-text-encoder-v2-base64:latest
      • diffusion-model: gcr.io/gcp-ml-172005/tfs-sd-diffusion-model:latest
      • diffusion-model w/ base64: gcr.io/gcp-ml-172005/tfs-sd-diffusion-model-base64:latest
      • diffusion-model-v2: gcr.io/gcp-ml-172005/tfs-sd-diffusion-model-v2:latest
      • diffusion-model-v2 w/ base64: gcr.io/gcp-ml-172005/tfs-sd-diffusion-model-v2-base64:latest
      • decoder: gcr.io/gcp-ml-172005/tfs-sd-decoder:latest
      • decoder w/ base64: gcr.io/gcp-ml-172005/tfs-sd-decoder-base64:latest

NOTE: Passing intermediate values between models through network could be costly, and some platform limits certain payload size. For instance, Vertex AI limits the request size to 1.5MB. To this end, we provide different TF Serving Docker images which handles inputs and produces outputs in base64 format.

3. One Endpoint with Two local APIs (w/ ๐Ÿค— Endpoint)

With the separation of Stable Diffusion, we could organize each parts in any environments. This is powerful especially if we want to deploy specialized diffusion models such as inpainting and finetuned diffusion model. In this case, we only need to replace the currently deployed diffusion model or just deploy a new diffusion model besides while keeping the other two(text encoder and decoder) as is.

Also, it is worth noting that we could run text encoder and decoder parts in local(Python clients or web/mobile with TF Serving) while having diffusion model on cloud. In this repository, we currently show an example using Hugging Face ๐Ÿค— Endpoint. However, you could easily expand the posibilities.

NOTE: along with this project, we have developed one more project to fine-tune Keras based Stable Diffusion at Fine-tuning Stable Diffusion using Keras. We currently provide a fine-tuned model to Pokemon dataset.

  • Original txt2img generation: [Colab]

  • Original inpainting: [Colab]

4. On-Device Deployment (w/ TFLite) - WIP

We have managed to convert SavedModels into TFLite models, and we are hosting them as below (thanks to @farmaker47):

These TFLite models have the same signature as the SavedModels, and all the pre/post operations are included inside. All of them are converted with float16 quantization optimize process. You can find more about how to convert SavedModels to TFLite models in this repository.

TODO

  • Implement SimpleTokenizer in JAVA and JavaScript
  • Run TFLite models on Android and Web browser

Timing Tests

details

Sequential

The figure below shows how long each scenario took from text encoding to diffusion to decoding. It assumes each request(batch_size=4) is handled sequentially with a single server running on Hugging Face Endpoint for each endpoint. all-in-one endpoint deployed the Stable Diffusion on A10 equipped server while separate endpoints deployed text encoder on 2 vCPU + 4GB RAM, diffusion model on A10 equipped server, and decoder on T4 equipped server. Finally, one endpoint, two local only deployed difusion model on A10 equipped server while keeping the other two on Colab environment (w/ T4). Please take a look how these are measured from this notebook

๐Ÿšจ XLA support

In this notebook, we show how we can XLA-compile the SavedModels to achieve a speed-up of about 52% over the non-XLA variant.

Acknowledgements

Thanks to the ML Developer Programs' team at Google for providing GCP credits.

keras-sd-serving's People

Contributors

deep-diver avatar sayakpaul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

keras-sd-serving's Issues

Do your have plan to Run TFLite models in Python or Java ?

Do your have plan to Run TFLite models in Python or Java ?

On-Device Deployment (w/ TFLite) - WIP

We have managed to convert SavedModels into TFLite models, and we are hosting them as below (thanks to @farmaker47):

These TFLite models have the same signature as the SavedModels, and all the pre/post operations are included inside. All of them are converted with float16 quantization optimize process. You can find more about how to convert SavedModels to TFLite models in this repository.

TODO

  • Implement SimpleTokenizer in JAVA and JavaScript
  • Run TFLite models on Android and Web browser

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.