Introducing OneDiffusion: Run Any Diffusion Model Anywhere

August 23, 2023 • Written By Sherlock Xu

Today, we are thrilled to unveil our latest member to the BentoML ecosystem — OneDiffusion, an open-source, all-in-one platform specially designed to streamline the deployment of diffusion models. It supports both pretrained and fine-tuned diffusion models with LoRA adapters, allowing you to run a variety of image generation tasks with ease and flexibility. As it is integrated seamlessly with the BentoML framework, you can use OneDiffusion to deploy diffusion models to the cloud or on-premises, and build powerful and scalable AI applications.


As advancements in AI surge forward, diffusion models are carving a niche for themselves, with Stable Diffusion (SD) standing at the forefront of their breakthroughs. Stable Diffusion models excel at generating detailed visuals based on text cues and are able to perform tasks such as inpainting and outpainting. Stable Diffusion XL 1.0, the recent pinnacle of Stability AI’s text-to-image suite, can create vivid images from shorter prompts and even embed textual content within these visuals.

However, diffusion models aren’t without challenges. Their intricate architecture and heavy computational demands make production serving and deployment a daunting task. Traditional deployment methodologies are often unable to cater to the unique requirements of these models, leading to inefficiencies and performance bottlenecks.

At BentoML, we work to empower every organization to compete and succeed with AI applications. We believe that democratizing the serving and deployment of diffusion models represents an important step towards this mission. Following our previous endeavor with OpenLLM, an open-source solution for running inference with any open-source LLMs, we embarked on the journey to create OneDiffusion.

OneDiffusion isn’t just another deployment tool; it’s a tailor-made solution for diffusion models. By offering features specifically designed to address the deployment complexities, OneDiffusion makes deploying diffusion models more straightforward than ever.

Key features

OneDiffusion is designed for AI application developers who require a robust and flexible platform for deploying diffusion models in production. Key features include:

  • 🌐 Broad compatibility: Support both pretrained and LoRA-adapted diffusion models, providing flexibility in choosing and deploying the appropriate model for various image generation tasks. It currently supports Stable Diffusion (v1.4, v1.5 and v2.0) and Stable Diffusion XL (v1.0) models. Support for more models (for example, ControlNet) is on the way.
  • 💪 Optimized performance and scalability: Automatically select the best optimizations like half-precision weights or xFormers to achieve best inference speed out of the box.
  • ⌛️ Dynamic LoRA adapter loading: Dynamically load and unload LoRA adapters on every request, providing greater adaptability and ensuring the models remain responsive to changing inputs and conditions.
  • 🍱 First-class support for BentoML: You can build Bentos directly with OneDiffusion or create BentoML Services with Runners and then package them into Bentos. After you push Bentos to BentoCloud, you can easily scale the API Servers and Runners of your diffusion models without compromising IO performance.

Get started

To use OneDiffusion, make sure you have Python 3.8 (or later) and pip installed, and then install OneDiffusion by using pip:

pip install onediffusion

Once it is installed, you can start a Stable Diffusion server by running the following command. By default, OneDiffusion uses stabilityai/stable-diffusion-2 and it downloads the model automatically to the BentoML Model Store if it has not been registered before.

onediffusion start stable-diffusion

This starts a server at You can interact with it by visiting the Swagger UI or send a request via curl.

curl -X 'POST' \ '<>' \ -H 'accept: image/jpeg' \ -H 'Content-Type: application/json' \ --output output.jpg \ -d '{ "prompt": "a bento box", "negative_prompt": null, "height": 768, "width": 768, "num_inference_steps": 50, "guidance_scale": 7.5, "eta": 0 }'

To use a specific model version, add the --model-id option as below:

onediffusion start stable-diffusion --model-id runwayml/stable-diffusion-v1-5

To specify another pipeline, use the --pipeline option as below. The img2img pipeline allows you to modify images based on a given prompt and image.

onediffusion start stable-diffusion --pipeline "img2img"

Start a Stable Diffusion XL server

OneDiffusion also supports running Stable Diffusion XL v1.0. To start an XL server, simply run:

onediffusion start stable-diffusion-xl

Similarly, visit or send a request via curl to interact with the XL server. Example prompt:

{ "prompt": "the scene is a picturesque environment with beautiful flowers and trees. In the center, there is a small cat. The cat is shown with its chin being scratched. It is crouched down peacefully. The cat's eyes are filled with excitement and satisfaction as it uses its small paws to hold onto the food, emitting a content purring sound.", "negative_prompt": null, "height": 1024, "width": 1024, "num_inference_steps": 50, "guidance_scale": 7.5, "eta": 0 }

Example output:


Add LoRA weights

Low-Rank Adaptation (LoRA) is a training method to fine-tune models without the need to retrain all parameters. You can add LoRA weights to your diffusion models for specific data needs.

Add the --lora-weights option as below:

onediffusion start stable-diffusion-xl --lora-weights "/path/to/lora-weights.safetensors"

Alternatively, dynamically load LoRA weights by adding the lora_weights field:

{ "prompt": "the scene is a picturesque environment with beautiful flowers and trees. In the center, there is a small cat. The cat is shown with its chin being scratched. It is crouched down peacefully. The cat's eyes are filled with excitement and satisfaction as it uses its small paws to hold onto the food, emitting a content purring sound.", "negative_prompt": null, "height": 1024, "width": 1024, "num_inference_steps": 50, "guidance_scale": 7.5, "eta": 0, "lora_weights": "/path/to/lora-weights.safetensors" }

By specifying the path of LoRA weights at runtime, you can influence model outputs dynamically. Even with identical prompts, the application of different LoRA weights can yield vastly different results. Example output (oil painting vs. pixel):

Create a BentoML Runner

You can create a BentoML Runner for a diffusion model model by using bentoml.diffusers_simple.create_runner, which downloads the model specified automatically if it does not exist locally.

import bentoml # Create a Runner for a Stable Diffusion model runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4") # Create a Runner for a Stable Diffusion XL model runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")

You can then wrap the Runner into a BentoML Service. See the BentoML documentation for more details.

Build a Bento

You can build a Bento for an existing diffusion model by running onediffusion build. To specify the model to be packaged into the Bento, use --model-id. Otherwise, OneDiffusion packages the default model into the Bento.

# Build a Bento with a Stable Diffusion model onediffusion build stable-diffusion # Build a Bento with a Stable Diffusion XL model onediffusion build stable-diffusion-xl

Once your Bento is ready, you can push it to BentoCloud.

What’s next?

The recent wave of AI has propelled diffusion models to great heights. As these models become indispensable in AI applications, the challenges in deploying them also become pronounced. We recognize that many are daunted by the intricacies of rolling out diffusion models in real-world scenarios. With the open sourcing of OneDiffusion, we look to alleviate these concerns, making the deployment process smoother and more intuitive. However, open source is merely the beginning. Our work extends beyond that, and we look forward to working with the community to improve the project in the following ways:

  • Support more models, such as ControlNet and DeepFloyd IF
  • Support more pipelines, such as inpainting
  • Add a Python API client to interact with diffusion models
  • Implement advanced optimization like AITemplate
  • Offer a unified fine-tuning training API

We invite contributions of all kinds to OneDiffusion! Check out the following resources to start your OneDiffusion journey and stay tuned for more announcements about OneDiffusion and BentoML.

About BentoML

BentoML is the platform for AI developers to build, ship, and scale AI applications. Headquartered in San Francisco, BentoML’s open source products are enabling thousands of organizations’ mission-critical AI applications around the globe. Our serverless cloud platform brings developer velocity and cost-efficiency to enterprise AI use cases. BentoML is on a mission to empower every organization to compete and succeed with AI. Visit our website to learn more.