Routing Controls

How Routing Works

NeuralGate automatically selects the best hoster for every request. You can customize this behavior with routing parameters.

Default routing (auto)

When you use "model": "auto", NeuralGate applies a 3-tier decision tree:

Request arrives
  ↓
[Tier 1] Keyword/length check
  → Complex query ("analyze", "summarize") or prompt > 5000 chars
  → Escalates to cloud immediately

  ↓ (simple query)
[Tier 2] Local model with self-awareness
  → Model tags response <CONFIDENT> or <UNCERTAIN>
  → <UNCERTAIN> → escalates to cloud

  ↓ (confident)
Local response served ✓

Routing parameters

Add these to any /v1/chat/completions or /v1/chat request body:

Parameter	Type	Default	Description
model	string	"auto"	Model ID or "auto" for smart routing
tier	string	null	"cheapest" \| "balanced" \| "fastest"
max_latency_ms	integer	null	Skip hosters slower than this threshold
privacy_mode	boolean	false	Never fallback to cloud providers
trusted_only	boolean	false	Only use hosters with trust score > 0.8
fallback_models	array	null	Ordered list of fallback model IDs
allow_hosters	array	null	Only route to these hoster IDs
deny_hosters	array	null	Never route to these hoster IDs

Service tiers

# Cheapest available hoster
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "cheapest"}

# Fastest (lowest latency) hoster  
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "fastest"}

# Balanced (trust-score weighted, default)
{"model": "gemma-4-31B-it-Q8_0.gguf", "tier": "balanced"}

Privacy mode

Prevent any request from being sent to cloud providers (Anthropic, OpenAI). The request will be served locally or fail:

{
  "model": "auto",
  "messages": [...],
  "privacy_mode": true
}

💡 In privacy mode, complex queries that would normally escalate to cloud are instead served by the local model, even if it's less confident.

Fallback chains

If your primary model is unavailable, automatically try the next:

{
  "model": "my-primary-model",
  "fallback_models": ["gemma-4-31B-it-Q8_0.gguf", "auto"],
  "messages": [...]
}

Provider filtering

# Only use specific hosters
{
  "model": "gemma-4-31B-it-Q8_0.gguf",
  "allow_hosters": ["hoster-uuid-1", "hoster-uuid-2"],
  "messages": [...]
}

# Exclude specific hosters
{
  "model": "gemma-4-31B-it-Q8_0.gguf",
  "deny_hosters": ["untrusted-hoster-uuid"],
  "messages": [...]
}

Latency filtering

# Only use hosters with avg latency under 2 seconds
{
  "model": "gemma-4-31B-it-Q8_0.gguf",
  "max_latency_ms": 2000,
  "messages": [...]
}

← Streaming Hosting →