August 23, 2023 • Written By Sherlock Xu
Note: This blog post is not applicable any more. To deploy diffusion models, refer to the BentoML documentation.
Today, we are thrilled to unveil our latest member to the BentoML ecosystem — OneDiffusion, an open-source, all-in-one platform specially designed to streamline the deployment of diffusion models. It supports both pretrained and fine-tuned diffusion models with LoRA adapters, allowing you to run a variety of image generation tasks with ease and flexibility. As it is integrated seamlessly with the BentoML framework, you can use OneDiffusion to deploy diffusion models to the cloud or on-premises, and build powerful and scalable AI applications.
As advancements in AI surge forward, diffusion models are carving a niche for themselves, with Stable Diffusion (SD) standing at the forefront of their breakthroughs. Stable Diffusion models excel at generating detailed visuals based on text cues and are able to perform tasks such as inpainting and outpainting. Stable Diffusion XL 1.0, the recent pinnacle of Stability AI’s text-to-image suite, can create vivid images from shorter prompts and even embed textual content within these visuals.
However, diffusion models aren’t without challenges. Their intricate architecture and heavy computational demands make production serving and deployment a daunting task. Traditional deployment methodologies are often unable to cater to the unique requirements of these models, leading to inefficiencies and performance bottlenecks.
At BentoML, we work to empower every organization to compete and succeed with AI applications. We believe that democratizing the serving and deployment of diffusion models represents an important step towards this mission. Following our previous endeavor with OpenLLM, an open-source solution for running inference with any open-source LLMs, we embarked on the journey to create OneDiffusion.
OneDiffusion isn’t just another deployment tool; it’s a tailor-made solution for diffusion models. By offering features specifically designed to address the deployment complexities, OneDiffusion makes deploying diffusion models more straightforward than ever.
OneDiffusion is designed for AI application developers who require a robust and flexible platform for deploying diffusion models in production. Key features include:
To use OneDiffusion, make sure you have Python 3.8 (or later) and pip
installed, and then install OneDiffusion by using pip
:
pip install onediffusion
Once it is installed, you can start a Stable Diffusion server by running the following command. By default, OneDiffusion uses stabilityai/stable-diffusion-2
and it downloads the model automatically to the BentoML Model Store if it has not been registered before.
onediffusion start stable-diffusion
This starts a server at http://0.0.0.0:3000/. You can interact with it by visiting the Swagger UI or send a request via curl
.
curl -X 'POST' \ '<http://0.0.0.0:3000/text2img>' \ -H 'accept: image/jpeg' \ -H 'Content-Type: application/json' \ --output output.jpg \ -d '{ "prompt": "a bento box", "negative_prompt": null, "height": 768, "width": 768, "num_inference_steps": 50, "guidance_scale": 7.5, "eta": 0 }'
To use a specific model version, add the --model-id
option as below:
onediffusion start stable-diffusion --model-id runwayml/stable-diffusion-v1-5
To specify another pipeline, use the --pipeline
option as below. The img2img
pipeline allows you to modify images based on a given prompt and image.
onediffusion start stable-diffusion --pipeline "img2img"
OneDiffusion also supports running Stable Diffusion XL v1.0. To start an XL server, simply run:
onediffusion start stable-diffusion-xl
Similarly, visit http://0.0.0.0:3000/ or send a request via curl
to interact with the XL server. Example prompt:
{ "prompt": "the scene is a picturesque environment with beautiful flowers and trees. In the center, there is a small cat. The cat is shown with its chin being scratched. It is crouched down peacefully. The cat's eyes are filled with excitement and satisfaction as it uses its small paws to hold onto the food, emitting a content purring sound.", "negative_prompt": null, "height": 1024, "width": 1024, "num_inference_steps": 50, "guidance_scale": 7.5, "eta": 0 }
Example output:
Low-Rank Adaptation (LoRA) is a training method to fine-tune models without the need to retrain all parameters. You can add LoRA weights to your diffusion models for specific data needs.
Add the --lora-weights
option as below:
onediffusion start stable-diffusion-xl --lora-weights "/path/to/lora-weights.safetensors"
Alternatively, dynamically load LoRA weights by adding the lora_weights
field:
{ "prompt": "the scene is a picturesque environment with beautiful flowers and trees. In the center, there is a small cat. The cat is shown with its chin being scratched. It is crouched down peacefully. The cat's eyes are filled with excitement and satisfaction as it uses its small paws to hold onto the food, emitting a content purring sound.", "negative_prompt": null, "height": 1024, "width": 1024, "num_inference_steps": 50, "guidance_scale": 7.5, "eta": 0, "lora_weights": "/path/to/lora-weights.safetensors" }
By specifying the path of LoRA weights at runtime, you can influence model outputs dynamically. Even with identical prompts, the application of different LoRA weights can yield vastly different results. Example output (oil painting vs. pixel):
You can create a BentoML Runner for a diffusion model model by using bentoml.diffusers_simple.create_runner
, which downloads the model specified automatically if it does not exist locally.
import bentoml # Create a Runner for a Stable Diffusion model runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4") # Create a Runner for a Stable Diffusion XL model runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")
You can then wrap the Runner into a BentoML Service. See the BentoML documentation for more details.
You can build a Bento for an existing diffusion model by running onediffusion build
. To specify the model to be packaged into the Bento, use --model-id
. Otherwise, OneDiffusion packages the default model into the Bento.
# Build a Bento with a Stable Diffusion model onediffusion build stable-diffusion # Build a Bento with a Stable Diffusion XL model onediffusion build stable-diffusion-xl
Once your Bento is ready, you can push it to BentoCloud.
The recent wave of AI has propelled diffusion models to great heights. As these models become indispensable in AI applications, the challenges in deploying them also become pronounced. We recognize that many are daunted by the intricacies of rolling out diffusion models in real-world scenarios. With the open sourcing of OneDiffusion, we look to alleviate these concerns, making the deployment process smoother and more intuitive. However, open source is merely the beginning. Our work extends beyond that, and we look forward to working with the community to improve the project in the following ways:
We invite contributions of all kinds to OneDiffusion! Check out the following resources to start your OneDiffusion journey and stay tuned for more announcements about OneDiffusion and BentoML.
BentoML is the platform for AI developers to build, ship, and scale AI applications. Headquartered in San Francisco, BentoML’s open source products are enabling thousands of organizations’ mission-critical AI applications around the globe. Our serverless cloud platform brings developer velocity and cost-efficiency to enterprise AI use cases. BentoML is on a mission to empower every organization to compete and succeed with AI. Visit our website to learn more.