Getting Started

Models

NeuralGate routes requests across a network of hosters serving open-weight models. Use "auto" for smart routing or specify a model directly.

List available models

curl https://api.computeshare.servequake.com/v1/models \
  -H "Authorization: Bearer ngk_your_key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "auto",
      "object": "model",
      "owned_by": "neuralgate"
    },
    {
      "id": "gemma-4-31B-it-Q8_0.gguf",
      "object": "model",
      "owned_by": "hoster:Deric H100 Cluster"
    }
  ]
}

The "auto" model

Setting "model": "auto" lets NeuralGate pick the best available model based on your query complexity:

Simple queries — served by local models (fastest, cheapest)
Complex queries — escalated to larger models or cloud fallback
If privacy_mode is true — always served locally, no cloud fallback

💡 Start with model: "auto". Switch to a specific model ID once you know which one fits your use case.

Model IDs

Model IDs are the exact filenames or identifiers hosters register. They look like:

Model ID	Type	Notes
gemma-4-31B-it-Q8_0.gguf	Local hoster	Google Gemma 4 31B, full precision
Qwen_Qwen3.5-9B-Q4_K_M.gguf	Local hoster	Qwen 3.5 9B quantized
auto	Smart routing	NeuralGate picks best available

Fallback chains

If your primary model is unavailable, specify fallbacks:

{
  "model": "my-primary-model",
  "fallback_models": ["gemma-4-31B-it-Q8_0.gguf", "auto"],
  "messages": [...]
}

NeuralGate tries each model in order until one succeeds.

Context window filtering

NeuralGate automatically skips hosters whose context window is smaller than your input. For a 50,000-token prompt, only hosters with context_window ≥ 50,000 will be selected.

← Authentication Chat Completions →