OpenLLM
This page demonstrates how to use OpenLLM with LangChain.
OpenLLM
is an open platform for operating large language models (LLMs) in
production. It enables developers to easily run inference with any open-source
LLMs, deploy to the cloud or on-premises, and build powerful AI apps.
Installation and Setupβ
Install the OpenLLM package via PyPI:
pip install openllm
[!NOTE] OpenLLM will require GPU to run locally. If you already have a OpenLLM server running elsewhere, you might want to install
openllm-client
instead.pip install openllm-client
LLMβ
OpenLLM supports a wide range of open-source LLMs as well as serving users' own fine-tuned LLMs.
Wrappersβ
There is a OpenLLM
Wrapper which supports loading LLM in-process:
from langchain_community.llms import OpenLLM
For a remote OpenLLM server, one might be interested in using OpenLLMAPI
:
from langchain_community.llms import OpenLLMAPI
Wrapper for OpenLLM serverβ
This wrapper supports connecting to an OpenLLM server. The OpenLLM server can run either locally or on the cloud.
To try it out locally, start an OpenLLM server:
openllm start microsoft/Phi-3-mini-4k-instruct --trust-remote-code
Wrapper usage:
from langchain_community.llms import OpenLLMAPI
llm = OpenLLMAPI(server_url='http://localhost:3000')
llm.invoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
# in async context
await llm.ainvoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
# streaming
for it in llm.stream("What is the difference between a duck and a goose? And why there are so many Goose in Canada?"):
print(it, flush=True, end='')
# asynchronous streaming
async for it in llm.astream("What is the difference between a duck and a goose? And why there are so many Goose in Canada?"):
print(it, flush=True, end='')
Wrapper for Local Inferenceβ
You can also use the OpenLLM wrapper to load LLM in current Python process for running inference.
from langchain_community.llms import OpenLLM
llm = OpenLLM(model_id='microsoft/Phi-3-mini-4k-instruct', trust_remote_code=True)
llm.invoke("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
[!NOTE] Currently, local inference will support only batch or one-shot generation (synchronous).
Usageβ
For a more detailed walkthrough of the OpenLLM Wrapper, see the example notebook