Getting Started
Models
NeuralGate routes requests across a network of hosters serving open-weight models. Use "auto" for smart routing or specify a model directly.
List available models
curl https://api.computeshare.servequake.com/v1/models \
-H "Authorization: Bearer ngk_your_key"
Response
{
"object": "list",
"data": [
{
"id": "auto",
"object": "model",
"owned_by": "neuralgate"
},
{
"id": "gemma-4-31B-it-Q8_0.gguf",
"object": "model",
"owned_by": "hoster:Deric H100 Cluster"
}
]
}
The "auto" model
Setting "model": "auto" lets NeuralGate pick the best available model based on your query complexity:
- Simple queries — served by local models (fastest, cheapest)
- Complex queries — escalated to larger models or cloud fallback
- If privacy_mode is true — always served locally, no cloud fallback
💡 Start with
model: "auto". Switch to a specific model ID once you know which one fits your use case.Model IDs
Model IDs are the exact filenames or identifiers hosters register. They look like:
| Model ID | Type | Notes |
|---|---|---|
| gemma-4-31B-it-Q8_0.gguf | Local hoster | Google Gemma 4 31B, full precision |
| Qwen_Qwen3.5-9B-Q4_K_M.gguf | Local hoster | Qwen 3.5 9B quantized |
| auto | Smart routing | NeuralGate picks best available |
Fallback chains
If your primary model is unavailable, specify fallbacks:
{
"model": "my-primary-model",
"fallback_models": ["gemma-4-31B-it-Q8_0.gguf", "auto"],
"messages": [...]
}
NeuralGate tries each model in order until one succeeds.
Context window filtering
NeuralGate automatically skips hosters whose context window is smaller than your input. For a 50,000-token prompt, only hosters with context_window ≥ 50,000 will be selected.