Sep 23, 2022 • Written By Bozhao Yu
For the most recent update, check out this blog post Creating Stable Diffusion 2.0 Services With BentoML And Diffusers.
With limited local compute resources, the Stable Diffusion model takes a long time to generate quality images. Running the model online using a cloud service gives us access to practically unlimited compute resources and enables us getting quality results much faster. Hosting the model as a microservice also allows other creative applications to more easily leverage the power of the model without having to deal with the complexity of running ML models online.
One way to host the Stable Diffusion model online is to use BentoML and AWS EC2. BentoML is an open-source platform that enables building, deploying, and operating machine learning services at scale. In this article, we will create a production-ready Stable Diffusion service with BentoML and deploy it to AWS EC2. Here’s a sneak peek of what you will get following the steps in this article.
A RESTful OpenAPI service with both
/txt2img (text to image) and
/img2img (image + text to image) endpoints behind a Swagger user interface.
Example images generated from text prompts using the
Example images generated from image and text prompts using the
Python 3.9 or above
• AWS CLI
Code and samples in this articles can be found in https://github.com/bentoml/stable-diffusion-bentoml.
Clone repository and install dependencies.
Choose and download the Stable Diffusion model. Single precision is best for CPUs or GPUs with more than 10GB of VRAM. Half precision is best for GPUs with less than 10GB VRAM.
To serve the model behind a RESTful API, we will create a BentoML service. The following example uses the single precision model for prediction and the service.py module for tying the service together with business logic. We can expose the functions as APIs by decorating them with
@svc.api. In addition, we can specify the
input and and
output types of the APIs in the arguments. For example, the
txt2img endpoint accepts a
JSON input and returns an
Image output, whereas the
img2img endpoint accepts an
Image and a
JSON as input and returns an
Image as output.
The core inference logic is defined in a
StableDiffusionRunnable. The runnable is responsible for calling the
img2img_pipe methods on the model and passing in the necessary arguments. A custom Runner is instantiated from the
StableDiffusionRunnable for executing the model inference logic in the APIs.
Next, run the following command to bring up a BentoML service for testing. Running the Stable Diffusion model inference locally with CPUs is quite slow. Each request will take roughly 5 minutes to complete. In the next section, we will explore how to accelerate the inference speed by running the service on a machine with GPUs.
Curl the text-to-image
Curl the image-to-image /img2img endpoint.
Required files and dependencies are defined in the bentofile.yaml file.
Build a bento with the command below. A Bento is the distribution format for a BentoML service. It is a self-contained archive that contains all the files and configurations required to run the service.
🎉 The Stable Diffusion bento has been built. If for any reason you were unable to build the bento successfully, worry not, you can download our pre-built bentos with the commands below.
We will be using bentoctl to deploy the bento to EC2.
bentoctl helps deploy your bentos into any cloud platforms through Terraform. Install the AWS EC2 operator to generate and apply the Terraform files.
The deployment has already been configured in the deployment_config.yaml file. Please feel free to update to your specification. By default, it is configured to deploy the Bento on a g4dn.xlarge instance with the Deep Learning AMI GPU PyTorch 1.12.0 (Ubuntu 20.04) AMI on the
Generate the Terraform files.
Build the Docker image and push to AWS ECR. Image upload may take a long time depending on your bandwidth.
Apply the Terraform files to deploy the bento to AWS EC2. You can navigate to the EC2 console and open the public IP address from the browser to access the Swagger UI.
Finally, delete the deployment if the Stable Diffusion BentoML service is no longer needed.
In this article, we built a production-ready service for Stable Diffusion using BentoML and deployed it to AWS EC2. Deploying the service on AWS EC2 allowed us to run the Stable Diffusion model on more powerful hardware, generate images with low latency, and scale beyond a single machine. If you enjoyed reading the article, please show your support by ⭐ the BentoML project on GitHub and join the Slack community to meet more like-minded people.