OpenLLM

This page demonstrates how to use OpenLLM with LangChain.

OpenLLM is an open platform for operating large language models (LLMs) in production. It enables developers to easily run inference with any open-source LLMs, deploy to the cloud or on-premises, and build powerful AI apps.

Installation and Setup

Install the OpenLLM package via PyPI:

pip install openllm

[!NOTE] OpenLLM will require GPU to run locally. If you already have a OpenLLM server running elsewhere, you might want to install openllm-client instead.
pip install openllm-client

LLM

OpenLLM supports a wide range of open-source LLMs as well as serving users' own fine-tuned LLMs.

Wrappers

There is a OpenLLM Wrapper which supports loading LLM in-process:

from langchain_community.llms import OpenLLM

API Reference:OpenLLM

For a remote OpenLLM server, one might be interested in using OpenLLMAPI:

from langchain_community.llms import OpenLLMAPI

API Reference:OpenLLMAPI

Wrapper for OpenLLM server

This wrapper supports connecting to an OpenLLM server. The OpenLLM server can run either locally or on the cloud.

To try it out locally, start an OpenLLM server:

openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code

Wrapper usage:

from langchain_community.llms import OpenLLMAPI

llm = OpenLLMAPI(server_url='http://localhost:3000')

llm.invoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")

# in async context
await llm.ainvoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")

# streaming
for it in llm.stream("What is the difference between a duck and a goose? And why there are so many Goose in Canada?"):
    print(it, flush=True, end='')

# asynchronous streaming
async for it in llm.astream("What is the difference between a duck and a goose? And why there are so many Goose in Canada?"):
    print(it, flush=True, end='')

API Reference:OpenLLMAPI

Wrapper for Local Inference

You can also use the OpenLLM wrapper to load LLM in current Python process for running inference.

from langchain_community.llms import OpenLLM

llm = OpenLLM(model_id='microsoft/Phi-3-mini-4k-instruct', trust_remote_code=True)

llm.invoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")

API Reference:OpenLLM

[!NOTE] Currently, local inference will support only batch or one-shot generation (synchronous).

Usage

For a more detailed walkthrough of the OpenLLM Wrapper, see the example notebook

OpenLLM

Installation and Setup

LLM

Wrappers

Wrapper for OpenLLM server

Wrapper for Local Inference

Usage

Was this page helpful?

You can leave detailed feedback on GitHub.

OpenLLM

Installation and Setup​

LLM​

Wrappers​

Wrapper for OpenLLM server​

Wrapper for Local Inference​

Usage​

Was this page helpful?

You can leave detailed feedback on GitHub.

Installation and Setup

LLM

Wrappers

Wrapper for OpenLLM server

Wrapper for Local Inference

Usage