June 21, 2023 • Written By Sherlock Xu
Note: The content in this blog post may not be applicable any more. To learn more about the latest features of OpenLLM and its usage, see the OpenLLM readme.
We are thrilled to announce the open-source release of OpenLLM under the Apache 2.0 license! OpenLLM is an open platform designed to streamline the deployment and operation of large language models (LLMs) in production. With OpenLLM, you can run inference with any open-source LLM, deploy models to the cloud or on-premises, and build powerful AI applications. It supports a wide range of open-source LLMs and offers flexible APIs with first-class support for BentoML and LangChain.
LLMs are deep learning models that have been trained on extensive text data, enabling them to understand and generate new content. While LLMs like GPT-4 from OpenAI and PaLM 2 from Google have shown promising results, organizations may be hesitant to adopt the technology due to several limitations.
In the wake of the ChatGPT frenzy, open-source LLMs such as Dolly and Flan-T5 have emerged, providing more flexibility as organizations can deploy them locally and run smaller models that are fine-tuned for their specific use cases.
At BentoML, our goal is to bridge the gap between training ML models and deploying them in production. We believe this process should be facilitated with tools that prioritize ease-of-use, flexibility, openness, and transparency. As such, we are open-sourcing OpenLLM to empower software engineers to better fine-tune, serve, and deploy their models to production. In addition to deploying an LLM into an API endpoint, OpenLLM enables you to build applications on top of it. It integrates seamlessly with BentoML and LangChain, allowing you to compose or chain LLM inference with other AI models, such as StableDiffusion, Whisper, or any custom models, and build LangChain applications with OpenLLM and BentoML.
OpenLLM offers a rich set of features:
To use OpenLLM, you need to have Python 3.8 or later and pip installed on your machine. We highly recommend using a virtual environment to prevent package conflicts.
1. Install OpenLLM with pip.
pip install openllm
2. Once the installation is complete, you can view the supported open-source LLMs with the following command.
openllm models -o porcelain
The example output is as follows. By default, OpenLLM doesn't include the dependencies to run all models. You may need to install model-specific dependencies to run them. See Supported Models for details.
flan-t5 dolly-v2 chatglm starcoder falcon stablelm
3. You can easily start a model as a REST server with OpenLLM. The following command uses dolly-v2 as an example.
openllm start dolly-v2
To specify a specific variant of the model to be served, provide the --model-id option as follows:
openllm start dolly-v2 --model-id databricks/dolly-v2-7b
4. OpenLLM provides a built-in Python client. You can interact with the model by creating a client and sending a query to the endpoint at http://localhost:3000 as follows. This endpoint provides a Web UI for interaction and experimentation.
import openllm client = openllm.client.HTTPClient('http://localhost:3000') client.query('What are large language models?')
5. Alternatively, use the openllm query command to query the model from a separate terminal:
export OPENLLM_ENDPOINT=http://localhost:3000 openllm query 'What are large language models?'
Expected output:
Processing query: What are large language models? Responses: Large language models (LLMs) are artificial intelligence (AI) systems that can parse natural language and provide responses similar to human responses. These systems can be trained with vast amounts of data in order to produce human-like responses to natural language.
6. After you fine-tune the model, you can build a bento based on it. A bento is a deployable artifact containing all the application information, including the model, code, and dependencies.
openllm build dolly-v2
7. You can then containerize your model and deploy it to BentoCloud or your own Kubernetes cluster. For more information, see the BentoML documentation.
We seek to empower every organization to compete and succeed with AI applications and the release of OpenLLM marks an important milestone in this endeavor. As we work to further expand the BentoML ecosystem, we will continue improving OpenLLM in terms of quantization, performance, and fine-tune capabilities, and welcome contributions of all kinds to the project. Check out the following resources to start your OpenLLM journey and stay tuned for more announcements about OpenLLM and BentoML.
BentoML is the platform for AI developers to build, ship, and scale AI applications. Headquartered in San Francisco, BentoML’s open source products are enabling thousands of organizations’ mission-critical AI applications around the globe. Our serverless cloud platform brings developer velocity and cost-efficiency to enterprise AI use cases. BentoML is on a mission to empower every organization to compete and succeed with AI. Visit https://www.bentoml.com to learn more.