How to stream chat model responses

All chat models implement the Runnable interface, which comes with a default implementations of standard runnable methods (i.e. invoke, batch, stream, streamEvents).

The default streaming implementation provides an AsyncGenerator that yields a single value: the final output from the underlying chat model provider.

tip

The default implementation does not provide support for token-by-token streaming, but it ensures that the the model can be swapped in for any other model as it supports the same standard interface.

The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.

See which integrations support token-by-token streaming here.

Streaming

Below, we use a --- to help visualize the delimiter between tokens.

Pick your chat model:

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

Add environment variables

OPENAI_API_KEY=your-api-key

Instantiate the model

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

Add environment variables

ANTHROPIC_API_KEY=your-api-key

Instantiate the model

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community 

pnpm add @langchain/community 

Add environment variables

FIREWORKS_API_KEY=your-api-key

Instantiate the model

import { ChatFireworks } from "@langchain/community/chat_models/fireworks";

const model = new ChatFireworks({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

Add environment variables

MISTRAL_API_KEY=your-api-key

Instantiate the model

import { ChatMistralAI } from "@langchain/mistralai";

const model = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

Add environment variables

GROQ_API_KEY=your-api-key

Instantiate the model

import { ChatGroq } from "@langchain/groq";

const model = new ChatGroq({
  model: "mixtral-8x7b-32768",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

Add environment variables

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

Instantiate the model

import { ChatVertexAI } from "@langchain/google-vertexai";

const model = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});

const stream = await model.stream(
  "Write me a 1 verse song about goldfish on the moon"
);

for await (const chunk of stream) {
  console.log(`${chunk.content}
---`);
}

---
Sw
---
imming
---
 in
---
 a
---
 world
---
 of
---
 silver
---
 beams
---
,

---
Gold
---
fish
---
 on
---
 the
---
 moon
---
,
---
 living
---
 their
---
 dreams
---
.
---

---

---

Stream events

Chat models also support the standard streamEvents() method.

This method is useful if you’re streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser).

let idx = 0;

const stream = model.streamEvents(
  "Write me a 1 verse song about goldfish on the moon",
  {
    version: "v2",
  }
);

for await (const event of stream) {
  idx += 1;
  if (idx === 5) {
    console.log("...Truncated");
    break;
  }
  console.log(event);
}

{
  event: 'on_chat_model_start',
  data: { input: 'Write me a 1 verse song about goldfish on the moon' },
  name: 'ChatOpenAI',
  tags: [],
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  }
}
{
  event: 'on_chat_model_stream',
  data: {
    chunk: AIMessageChunk {
      lc_serializable: true,
      lc_kwargs: [Object],
      lc_namespace: [Array],
      content: '',
      name: undefined,
      additional_kwargs: {},
      response_metadata: [Object],
      id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
      tool_calls: [],
      invalid_tool_calls: [],
      tool_call_chunks: [],
      usage_metadata: undefined
    }
  },
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  name: 'ChatOpenAI',
  tags: [],
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  }
}
{
  event: 'on_chat_model_stream',
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  name: 'ChatOpenAI',
  tags: [],
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  },
  data: {
    chunk: AIMessageChunk {
      lc_serializable: true,
      lc_kwargs: [Object],
      lc_namespace: [Array],
      content: '',
      name: undefined,
      additional_kwargs: {},
      response_metadata: [Object],
      id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
      tool_calls: [],
      invalid_tool_calls: [],
      tool_call_chunks: [],
      usage_metadata: undefined
    }
  }
}
{
  event: 'on_chat_model_stream',
  data: {
    chunk: AIMessageChunk {
      lc_serializable: true,
      lc_kwargs: [Object],
      lc_namespace: [Array],
      content: 'Sw',
      name: undefined,
      additional_kwargs: {},
      response_metadata: [Object],
      id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
      tool_calls: [],
      invalid_tool_calls: [],
      tool_call_chunks: [],
      usage_metadata: undefined
    }
  },
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  name: 'ChatOpenAI',
  tags: [],
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  }
}
...Truncated

Next steps

You’ve now seen a few ways you can stream chat model responses.

Next, check out this guide for more on streaming with other LangChain modules.

How to stream chat model responses

Streaming

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Stream events

Next steps

Was this page helpful?

You can also leave detailed feedback on GitHub.

Streaming​

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Stream events​

Next steps​

Was this page helpful?

You can also leave detailed feedback on GitHub.

Streaming

Stream events

Next steps