API Reference

Streaming

Stream model responses token by token using Server-Sent Events (SSE). Compatible with the OpenAI streaming format.

Enable streaming

Add "stream": true to your request body:

cURL

Python

Node.js

curl https://api.computeshare.servequake.com/v1/chat/completions \
  -H "Authorization: Bearer ngk_your_key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Write a haiku about GPUs"}],
    "stream": true
  }'

from openai import OpenAI

client = OpenAI(
    api_key="ngk_your_key",
    base_url="https://api.computeshare.servequake.com/v1"
)

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a haiku about GPUs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # newline at end

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ngk_your_key",
  baseURL: "https://api.computeshare.servequake.com/v1",
});

const stream = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Write a haiku about GPUs" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}

SSE format

Each streamed chunk is a data: line with a JSON object:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1744502400,"model":"auto","choices":[{"index":0,"delta":{"content":"Silicon"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1744502400,"model":"auto","choices":[{"index":0,"delta":{"content":" dreams"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1744502400,"model":"auto","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Handling the stream

Each data: line contains a partial response chunk
The delta.content field contains the new text for that chunk
When finish_reason: "stop" appears, the response is complete
The stream ends with data: [DONE]

Streaming vs non-streaming

	Streaming	Non-streaming
First token	~100–500ms	Full generation time
User experience	Feels instant	Wait then all at once
Failover	Fail-hard (mid-stream)	Up to 3 retry attempts
Best for	Chat UIs, long responses	Batch processing, APIs

⚠️ Streaming requests use a single endpoint with no failover. If the hoster disconnects mid-stream, the connection closes. For reliability-critical use cases, use non-streaming with automatic failover.

← Usage & Credits Routing →