Hoster Setup Guide
Everything you need to configure your GPU server and register with NeuralGate. Takes about 15 minutes.
Hoster Setup Guide
Everything you need to expose your GPU server and start earning. Takes about 15 minutes.
Quick Setup Tool (recommended)
Download and run the NeuralGate hoster client. It checks your setup, generates credentials, and gives you the exact registration command.
⬇ Download neuralgate-hoster.pypython3 neuralgate-hoster.py
Run an OpenAI-compatible LLM server
NeuralGate works with any server that exposes /health and /v1/chat/completions. Choose your preferred backend:
# Download llama.cpp server git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make -j$(nproc) llama-server # Run with a model (adjust path and model name) ./llama-server \ --model /path/to/your/model.gguf \ --port 8080 \ --host 0.0.0.0 \ --ctx-size 4096 \ --n-gpu-layers 99
# Install vLLM pip install vllm # Run with a Hugging Face model python -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-3-8B-Instruct \ --port 8080 \ --host 0.0.0.0
# Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull a model and run ollama pull llama3 ollama serve # Runs on port 11434 by default # Note: Ollama uses /api/chat not /v1/chat/completions # Use the ollama-openai bridge for compatibility: pip install litellm litellm --model ollama/llama3 --port 8080
The recommended setup for GPU servers. One command starts llama.cpp with CUDA support, auto-restarts on crash, and persists your model files.
1. Create your project directory:
mkdir neuralgate-node && cd neuralgate-node
mkdir models # your GGUF models go here
2. Create docker-compose.yml:
version: "3.8"
services:
llama-server:
image: ghcr.io/ggerganov/llama.cpp:server-cuda
container_name: neuralgate-node
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- ./models:/models:ro
environment:
- LLAMA_ARG_MODEL=/models/your-model.gguf
- LLAMA_ARG_CTX_SIZE=8192
- LLAMA_ARG_N_GPU_LAYERS=99
- LLAMA_ARG_PORT=8080
- LLAMA_ARG_HOST=0.0.0.0
- LLAMA_ARG_API_KEY=your-bearer-token-here
- LLAMA_ARG_CONT_BATCHING=1
- LLAMA_ARG_FLASH_ATTN=1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
3. Place your GGUF model in ./models/, then start:
# Start the server
docker compose up -d
# Check logs
docker compose logs -f
# Check health
curl http://localhost:8080/health
deploy.resources section and change LLAMA_ARG_N_GPU_LAYERS=0 for CPU-only inference.CPU-only compose file:
version: "3.8"
services:
llama-server:
image: ghcr.io/ggerganov/llama.cpp:server
container_name: neuralgate-node
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- ./models:/models:ro
environment:
- LLAMA_ARG_MODEL=/models/your-model.gguf
- LLAMA_ARG_CTX_SIZE=4096
- LLAMA_ARG_N_GPU_LAYERS=0
- LLAMA_ARG_THREADS=4
- LLAMA_ARG_PORT=8080
- LLAMA_ARG_HOST=0.0.0.0
- LLAMA_ARG_API_KEY=your-bearer-token-here
Useful commands:
# Stop the server
docker compose down
# Restart after config change
docker compose restart
# Update to latest llama.cpp
docker compose pull && docker compose up -d
# View resource usage
docker stats neuralgate-node
Secure your server with a bearer token
Protect your endpoint so only NeuralGate can call it. Generate a random token and configure your server to require it.
# Generate a secure random token
python3 -c "import secrets; print('Bearer', secrets.token_urlsafe(32))"
For llama.cpp, add the token to your launch command:
./llama-server --model /path/to/model.gguf --port 8080 --host 0.0.0.0 \ --api-key YOUR_GENERATED_TOKEN
Expose your server publicly
NeuralGate needs a public URL to route traffic to your server. If your server already has a public IP, skip this step. Otherwise use a tunnel:
# Install cloudflared curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o cloudflared chmod +x cloudflared # Create a quick tunnel (no account needed) ./cloudflared tunnel --url http://localhost:8080
https://random-name.trycloudflare.com. Use that as your endpoint URL.# Install ngrok from https://ngrok.com/download # Then: ngrok http 8080
# Check your public IP curl ifconfig.me # Make sure port 8080 is open in your firewall # For Ubuntu/ufw: sudo ufw allow 8080/tcp
http://YOUR_IP:8080Verify your setup
Before registering, confirm your server passes both checks NeuralGate will run:
# Check 1: Health endpoint
curl https://your-endpoint.com/health
# Expected: {"status":"ok"} or any 200 response
# Check 2: Inference
curl -X POST https://your-endpoint.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"your-model","messages":[{"role":"user","content":"Say NEURALGATE_OK"}],"max_tokens":10}'
# Expected: response containing NEURALGATE_OK
Register on NeuralGate
Once your server is running and public, register it. NeuralGate will verify it automatically and start routing traffic.
curl -X POST https://api.computeshare.servequake.com/hosters/register \
-H "Content-Type: application/json" \
-d '{
"name": "Your Name",
"email": "you@example.com",
"endpoint_url": "https://your-endpoint.com",
"api_key": "YOUR_BEARER_TOKEN",
"payout_address": "your-paypal@email.com",
"models": [{
"model_id": "your-model-name",
"model_alias": "My Model",
"price_per_input_token": 100,
"price_per_output_token": 300,
"context_window": 4096,
"max_tokens": 2048
}]
}'
Setting your price
Prices are in microdollars per 1 million tokens. NeuralGate adds a 20% margin on top — customers pay your price + 20%.
Model size Suggested input Suggested output Typical monthly earnings ──────────────────────────────────────────────────────────────────────────── 7B model 100 ($0.10/1M) 300 ($0.30/1M) $20–80 13B model 200 ($0.20/1M) 600 ($0.60/1M) $40–150 70B model 500 ($0.50/1M) 1500 ($1.50/1M) $100–400 397B model 2000 ($2.00/1M) 6000 ($6.00/1M) $500+
Restrict access to NeuralGate only (recommended)
Prevent customers from bypassing NeuralGate and calling your server directly. Only allow traffic from the NeuralGate gateway IP.
Option A: iptables (Linux)
# Allow only NeuralGate gateway to reach your server on port 8080
sudo iptables -A INPUT -p tcp --dport 8080 -s api.computeshare.servequake.com -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8080 -j DROP
# Save rules permanently
sudo iptables-save | sudo tee /etc/iptables/rules.v4
Option B: Docker Compose (bind to localhost, use nginx)
# In docker-compose.yml — only bind to localhost:
ports:
- "127.0.0.1:8080:8080" # NOT 0.0.0.0:8080
# Then use nginx to only allow NeuralGate's IP:
# /etc/nginx/sites-available/llama
server {
listen 8080;
allow api.computeshare.servequake.com;
deny all;
location / {
proxy_pass http://127.0.0.1:8080;
}
}
Option C: Cloudflare Tunnel
If you use cloudflared, the tunnel URL is only known to NeuralGate (since you register it with us). Customers can't guess your tunnel URL. This is inherently more secure than a public IP.
--api-key). Even without IP restriction, an unknown bearer token prevents unauthorized access.Ready to start earning?
Your GPUs are sitting idle. Put them to work.
Register as a Hoster → Credits & Earnings Guide →