Routing Controls

How Routing Works

NeuralGate automatically selects the best hoster for every request. You can customize this behavior with routing parameters.

Default routing (auto)

When you use "model": "auto", NeuralGate applies a 3-tier decision tree:

Request arrives
  ↓
[Tier 1] Keyword/length check
  → Complex query ("analyze", "summarize") or prompt > 5000 chars
  → Escalates to cloud immediately

  ↓ (simple query)
[Tier 2] Local model with self-awareness
  → Model tags response <CONFIDENT> or <UNCERTAIN>
  → <UNCERTAIN> → escalates to cloud

  ↓ (confident)
Local response served ✓

Routing parameters

Add these to any /v1/chat/completions or /v1/chat request body:

ParameterTypeDefaultDescription
modelstring"auto"Model ID or "auto" for smart routing
tierstringnull"cheapest" | "balanced" | "fastest"
max_latency_msintegernullSkip hosters slower than this threshold
privacy_modebooleanfalseNever fallback to cloud providers
trusted_onlybooleanfalseOnly use hosters with trust score > 0.8
fallback_modelsarraynullOrdered list of fallback model IDs
allow_hostersarraynullOnly route to these hoster IDs
deny_hostersarraynullNever route to these hoster IDs

Service tiers

# Cheapest available hoster
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "cheapest"}

# Fastest (lowest latency) hoster  
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "fastest"}

# Balanced (trust-score weighted, default)
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "balanced"}

Privacy mode

Prevent any request from being sent to cloud providers (Anthropic, OpenAI). The request will be served locally or fail:

{
  "model": "auto",
  "messages": [...],
  "privacy_mode": true
}
💡 In privacy mode, complex queries that would normally escalate to cloud are instead served by the local model, even if it's less confident.

Fallback chains

If your primary model is unavailable, automatically try the next:

{
  "model": "my-primary-model",
  "fallback_models": ["gemma-4-31B-it-Q8_0.gguf", "auto"],
  "messages": [...]
}

Provider filtering

# Only use specific hosters
{
  "model": "gemma-4-31B-it-Q8_0.gguf",
  "allow_hosters": ["hoster-uuid-1", "hoster-uuid-2"],
  "messages": [...]
}

# Exclude specific hosters
{
  "model": "gemma-4-31B-it-Q8_0.gguf",
  "deny_hosters": ["untrusted-hoster-uuid"],
  "messages": [...]
}

Latency filtering

# Only use hosters with avg latency under 2 seconds
{
  "model": "gemma-4-31B-it-Q8_0.gguf",
  "max_latency_ms": 2000,
  "messages": [...]
}