Streaming

For chat completions, CanalAPI supports streaming responses using Server-Sent Events (SSE), identical to the OpenAI streaming protocol.

Enabling streaming

Set stream: true in the request body:

curl "$CANALAPI_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $CANALAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "stream": true,
    "messages": [{"role": "user", "content": "Stream a haiku."}]
  }'

The response is text/event-stream. Each event line has the form data: {json}. The stream ends with data: [DONE].

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.CANALAPI_API_KEY,
  baseURL: process.env.CANALAPI_BASE_URL,
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  stream: true,
  messages: [{ role: 'user', content: 'Stream a haiku.' }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
process.stdout.write('\n');

Python

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["CANALAPI_API_KEY"],
    base_url=os.environ["CANALAPI_BASE_URL"],
)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    stream=True,
    messages=[{"role": "user", "content": "Stream a haiku."}],
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta and delta.content:
        print(delta.content, end="", flush=True)
print()

Tips

Disable proxy buffering. Reverse proxies that buffer responses defeat streaming. Configure them to flush immediately for text/event-stream.
Handle client disconnects. If the client disconnects mid-stream, your server should propagate the cancellation upstream so unused tokens are not billed.
Backpressure. When piping to a slow consumer, throttle reads so memory does not grow unbounded.

Enabling streaming

JavaScript

Python

Tips

On this page