Skip to main content

OpenLLM

This page demonstrates how to use OpenLLM with LangChain.

OpenLLM is an open platform for operating large language models (LLMs) in production. It enables developers to easily run inference with any open-source LLMs, deploy to the cloud or on-premises, and build powerful AI apps.

Installation and Setup​

Install the OpenLLM package via PyPI:

pip install openllm

[!NOTE] OpenLLM will require GPU to run locally. If you already have a OpenLLM server running elsewhere, you might want to install openllm-client instead.

pip install openllm-client

LLM​

OpenLLM supports a wide range of open-source LLMs as well as serving users' own fine-tuned LLMs.

Wrappers​

There is a OpenLLM Wrapper which supports loading LLM in-process:

from langchain_community.llms import OpenLLM
API Reference:OpenLLM

For a remote OpenLLM server, one might be interested in using OpenLLMAPI:

from langchain_community.llms import OpenLLMAPI
API Reference:OpenLLMAPI

Wrapper for OpenLLM server​

This wrapper supports connecting to an OpenLLM server. The OpenLLM server can run either locally or on the cloud.

To try it out locally, start an OpenLLM server:

openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

Wrapper usage:

from langchain_community.llms import OpenLLMAPI

llm = OpenLLMAPI(server_url='http://localhost:3000')

llm.invoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")

# in async context
await llm.ainvoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")

# streaming
for it in llm.stream("What is the difference between a duck and a goose? And why there are so many Goose in Canada?"):
print(it, flush=True, end='')

# asynchronous streaming
async for it in llm.astream("What is the difference between a duck and a goose? And why there are so many Goose in Canada?"):
print(it, flush=True, end='')
API Reference:OpenLLMAPI

Wrapper for Local Inference​

You can also use the OpenLLM wrapper to load LLM in current Python process for running inference.

from langchain_community.llms import OpenLLM

llm = OpenLLM(model_id='microsoft/Phi-3-mini-4k-instruct', trust_remote_code=True)

llm.invoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
API Reference:OpenLLM

[!NOTE] Currently, local inference will support only batch or one-shot generation (synchronous).

Usage​

For a more detailed walkthrough of the OpenLLM Wrapper, see the example notebook


Was this page helpful?


You can leave detailed feedback on GitHub.