How to stream chat model responses
All chat
models
implement the Runnable
interface,
which comes with a default implementations of standard runnable
methods (i.e.Β invoke
, batch
, stream
, streamEvents
).
The default streaming implementation provides an AsyncGenerator
that yields a single value: the final output from the underlying chat
model provider.
The default implementation does not provide support for token-by-token streaming, but it ensures that the the model can be swapped in for any other model as it supports the same standard interface.
The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.
See which integrations support token-by-token streaming here.
Streamingβ
Below, we use a ---
to help visualize the delimiter between tokens.
Pick your chat model:
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
- Groq
- VertexAI
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
Add environment variables
OPENAI_API_KEY=your-api-key
Instantiate the model
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/anthropic
yarn add @langchain/anthropic
pnpm add @langchain/anthropic
Add environment variables
ANTHROPIC_API_KEY=your-api-key
Instantiate the model
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
model: "claude-3-5-sonnet-20240620",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
Add environment variables
FIREWORKS_API_KEY=your-api-key
Instantiate the model
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const model = new ChatFireworks({
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/mistralai
yarn add @langchain/mistralai
pnpm add @langchain/mistralai
Add environment variables
MISTRAL_API_KEY=your-api-key
Instantiate the model
import { ChatMistralAI } from "@langchain/mistralai";
const model = new ChatMistralAI({
model: "mistral-large-latest",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/groq
yarn add @langchain/groq
pnpm add @langchain/groq
Add environment variables
GROQ_API_KEY=your-api-key
Instantiate the model
import { ChatGroq } from "@langchain/groq";
const model = new ChatGroq({
model: "mixtral-8x7b-32768",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai
yarn add @langchain/google-vertexai
pnpm add @langchain/google-vertexai
Add environment variables
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
Instantiate the model
import { ChatVertexAI } from "@langchain/google-vertexai";
const model = new ChatVertexAI({
model: "gemini-1.5-flash",
temperature: 0
});
const stream = await model.stream(
"Write me a 1 verse song about goldfish on the moon"
);
for await (const chunk of stream) {
console.log(`${chunk.content}
---`);
}
---
Sw
---
imming
---
in
---
a
---
world
---
of
---
silver
---
beams
---
,
---
Gold
---
fish
---
on
---
the
---
moon
---
,
---
living
---
their
---
dreams
---
.
---
---
---
Stream eventsβ
Chat models also support the standard streamEvents() method.
This method is useful if youβre streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser).
let idx = 0;
const stream = model.streamEvents(
"Write me a 1 verse song about goldfish on the moon",
{
version: "v2",
}
);
for await (const event of stream) {
idx += 1;
if (idx === 5) {
console.log("...Truncated");
break;
}
console.log(event);
}
{
event: 'on_chat_model_start',
data: { input: 'Write me a 1 verse song about goldfish on the moon' },
name: 'ChatOpenAI',
tags: [],
run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
metadata: {
ls_provider: 'openai',
ls_model_name: 'gpt-3.5-turbo',
ls_model_type: 'chat',
ls_temperature: 1,
ls_max_tokens: undefined,
ls_stop: undefined
}
}
{
event: 'on_chat_model_stream',
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: [Object],
lc_namespace: [Array],
content: '',
name: undefined,
additional_kwargs: {},
response_metadata: [Object],
id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
usage_metadata: undefined
}
},
run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
name: 'ChatOpenAI',
tags: [],
metadata: {
ls_provider: 'openai',
ls_model_name: 'gpt-3.5-turbo',
ls_model_type: 'chat',
ls_temperature: 1,
ls_max_tokens: undefined,
ls_stop: undefined
}
}
{
event: 'on_chat_model_stream',
run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
name: 'ChatOpenAI',
tags: [],
metadata: {
ls_provider: 'openai',
ls_model_name: 'gpt-3.5-turbo',
ls_model_type: 'chat',
ls_temperature: 1,
ls_max_tokens: undefined,
ls_stop: undefined
},
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: [Object],
lc_namespace: [Array],
content: '',
name: undefined,
additional_kwargs: {},
response_metadata: [Object],
id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
usage_metadata: undefined
}
}
}
{
event: 'on_chat_model_stream',
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: [Object],
lc_namespace: [Array],
content: 'Sw',
name: undefined,
additional_kwargs: {},
response_metadata: [Object],
id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
usage_metadata: undefined
}
},
run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
name: 'ChatOpenAI',
tags: [],
metadata: {
ls_provider: 'openai',
ls_model_name: 'gpt-3.5-turbo',
ls_model_type: 'chat',
ls_temperature: 1,
ls_max_tokens: undefined,
ls_stop: undefined
}
}
...Truncated
Next stepsβ
Youβve now seen a few ways you can stream chat model responses.
Next, check out this guide for more on streaming with other LangChain modules.