Routing Controls
How Routing Works
NeuralGate automatically selects the best hoster for every request. You can customize this behavior with routing parameters.
Default routing (auto)
When you use "model": "auto", NeuralGate applies a 3-tier decision tree:
Request arrives
↓
[Tier 1] Keyword/length check
→ Complex query ("analyze", "summarize") or prompt > 5000 chars
→ Escalates to cloud immediately
↓ (simple query)
[Tier 2] Local model with self-awareness
→ Model tags response <CONFIDENT> or <UNCERTAIN>
→ <UNCERTAIN> → escalates to cloud
↓ (confident)
Local response served ✓
Routing parameters
Add these to any /v1/chat/completions or /v1/chat request body:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | string | "auto" | Model ID or "auto" for smart routing |
| tier | string | null | "cheapest" | "balanced" | "fastest" |
| max_latency_ms | integer | null | Skip hosters slower than this threshold |
| privacy_mode | boolean | false | Never fallback to cloud providers |
| trusted_only | boolean | false | Only use hosters with trust score > 0.8 |
| fallback_models | array | null | Ordered list of fallback model IDs |
| allow_hosters | array | null | Only route to these hoster IDs |
| deny_hosters | array | null | Never route to these hoster IDs |
Service tiers
# Cheapest available hoster
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "cheapest"}
# Fastest (lowest latency) hoster
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "fastest"}
# Balanced (trust-score weighted, default)
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "balanced"}
Privacy mode
Prevent any request from being sent to cloud providers (Anthropic, OpenAI). The request will be served locally or fail:
{
"model": "auto",
"messages": [...],
"privacy_mode": true
}
💡 In privacy mode, complex queries that would normally escalate to cloud are instead served by the local model, even if it's less confident.
Fallback chains
If your primary model is unavailable, automatically try the next:
{
"model": "my-primary-model",
"fallback_models": ["gemma-4-31B-it-Q8_0.gguf", "auto"],
"messages": [...]
}
Provider filtering
# Only use specific hosters
{
"model": "gemma-4-31B-it-Q8_0.gguf",
"allow_hosters": ["hoster-uuid-1", "hoster-uuid-2"],
"messages": [...]
}
# Exclude specific hosters
{
"model": "gemma-4-31B-it-Q8_0.gguf",
"deny_hosters": ["untrusted-hoster-uuid"],
"messages": [...]
}
Latency filtering
# Only use hosters with avg latency under 2 seconds
{
"model": "gemma-4-31B-it-Q8_0.gguf",
"max_latency_ms": 2000,
"messages": [...]
}