Hosting

Hoster Setup Guide

Everything you need to configure your GPU server and register with NeuralGate. Takes about 15 minutes.

Hoster Setup Guide

Everything you need to expose your GPU server and start earning. Takes about 15 minutes.

Quick Setup Tool (recommended)

Download and run the NeuralGate hoster client. It checks your setup, generates credentials, and gives you the exact registration command.

⬇ Download neuralgate-hoster.py

python3 neuralgate-hoster.py

💡 The client will walk you through everything below automatically.

Run an OpenAI-compatible LLM server

NeuralGate works with any server that exposes /health and /v1/chat/completions. Choose your preferred backend:

llama.cpp

vLLM

Ollama

Docker Compose

# Download llama.cpp server
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j$(nproc) llama-server

# Run with a model (adjust path and model name)
./llama-server \
  --model /path/to/your/model.gguf \
  --port 8080 \
  --host 0.0.0.0 \
  --ctx-size 4096 \
  --n-gpu-layers 99

# Install vLLM
pip install vllm

# Run with a Hugging Face model
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3-8B-Instruct \
  --port 8080 \
  --host 0.0.0.0

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model and run
ollama pull llama3
ollama serve  # Runs on port 11434 by default

# Note: Ollama uses /api/chat not /v1/chat/completions
# Use the ollama-openai bridge for compatibility:
pip install litellm
litellm --model ollama/llama3 --port 8080

The recommended setup for GPU servers. One command starts llama.cpp with CUDA support, auto-restarts on crash, and persists your model files.

1. Create your project directory:

mkdir neuralgate-node && cd neuralgate-node
mkdir models  # your GGUF models go here

2. Create docker-compose.yml:

version: "3.8"

services:
  llama-server:
    image: ghcr.io/ggerganov/llama.cpp:server-cuda
    container_name: neuralgate-node
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - ./models:/models:ro
    environment:
      - LLAMA_ARG_MODEL=/models/your-model.gguf
      - LLAMA_ARG_CTX_SIZE=8192
      - LLAMA_ARG_N_GPU_LAYERS=99
      - LLAMA_ARG_PORT=8080
      - LLAMA_ARG_HOST=0.0.0.0
      - LLAMA_ARG_API_KEY=your-bearer-token-here
      - LLAMA_ARG_CONT_BATCHING=1
      - LLAMA_ARG_FLASH_ATTN=1
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

3. Place your GGUF model in ./models/, then start:

# Start the server
docker compose up -d

# Check logs
docker compose logs -f

# Check health
curl http://localhost:8080/health

💡 No GPU? Remove the deploy.resources section and change LLAMA_ARG_N_GPU_LAYERS=0 for CPU-only inference.

CPU-only compose file:

version: "3.8"

services:
  llama-server:
    image: ghcr.io/ggerganov/llama.cpp:server
    container_name: neuralgate-node
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - ./models:/models:ro
    environment:
      - LLAMA_ARG_MODEL=/models/your-model.gguf
      - LLAMA_ARG_CTX_SIZE=4096
      - LLAMA_ARG_N_GPU_LAYERS=0
      - LLAMA_ARG_THREADS=4
      - LLAMA_ARG_PORT=8080
      - LLAMA_ARG_HOST=0.0.0.0
      - LLAMA_ARG_API_KEY=your-bearer-token-here

Useful commands:

# Stop the server
docker compose down

# Restart after config change
docker compose restart

# Update to latest llama.cpp
docker compose pull && docker compose up -d

# View resource usage
docker stats neuralgate-node

Secure your server with a bearer token

Protect your endpoint so only NeuralGate can call it. Generate a random token and configure your server to require it.

# Generate a secure random token
python3 -c "import secrets; print('Bearer', secrets.token_urlsafe(32))"

For llama.cpp, add the token to your launch command:

./llama-server --model /path/to/model.gguf --port 8080 --host 0.0.0.0 \
  --api-key YOUR_GENERATED_TOKEN

⚠️ Don't skip this step. An open endpoint can be abused by anyone who finds your IP.

Expose your server publicly

NeuralGate needs a public URL to route traffic to your server. If your server already has a public IP, skip this step. Otherwise use a tunnel:

Cloudflare Tunnel (free)

ngrok

Public IP

# Install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o cloudflared
chmod +x cloudflared

# Create a quick tunnel (no account needed)
./cloudflared tunnel --url http://localhost:8080

💡 Cloudflare gives you a free HTTPS URL like https://random-name.trycloudflare.com. Use that as your endpoint URL.

# Install ngrok from https://ngrok.com/download
# Then:
ngrok http 8080

💡 The free tier URL changes every restart. Use ngrok's paid plan or Cloudflare for a stable URL.

# Check your public IP
curl ifconfig.me

# Make sure port 8080 is open in your firewall
# For Ubuntu/ufw:
sudo ufw allow 8080/tcp

💡 Use your public IP directly: http://YOUR_IP:8080

Verify your setup

Before registering, confirm your server passes both checks NeuralGate will run:

# Check 1: Health endpoint
curl https://your-endpoint.com/health
# Expected: {"status":"ok"} or any 200 response

# Check 2: Inference
curl -X POST https://your-endpoint.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"your-model","messages":[{"role":"user","content":"Say NEURALGATE_OK"}],"max_tokens":10}'
# Expected: response containing NEURALGATE_OK

Register on NeuralGate

Once your server is running and public, register it. NeuralGate will verify it automatically and start routing traffic.

curl -X POST https://api.computeshare.servequake.com/hosters/register \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Your Name",
    "email": "you@example.com",
    "endpoint_url": "https://your-endpoint.com",
    "api_key": "YOUR_BEARER_TOKEN",
    "payout_address": "your-paypal@email.com",
    "models": [{
      "model_id": "your-model-name",
      "model_alias": "My Model",
      "price_per_input_token": 100,
      "price_per_output_token": 300,
      "context_window": 4096,
      "max_tokens": 2048
    }]
  }'

💡 Or use the web form instead of curl.

💰

Setting your price

Prices are in microdollars per 1 million tokens. NeuralGate adds a 20% margin on top — customers pay your price + 20%.

Model size    Suggested input   Suggested output   Typical monthly earnings
────────────────────────────────────────────────────────────────────────────
7B  model     100  ($0.10/1M)   300  ($0.30/1M)    $20–80
13B model     200  ($0.20/1M)   600  ($0.60/1M)    $40–150
70B model     500  ($0.50/1M)   1500 ($1.50/1M)    $100–400
397B model    2000 ($2.00/1M)   6000 ($6.00/1M)    $500+

🔒 Privacy commitment: NeuralGate never logs prompt or response content. Only token counts are stored for billing. Read our Privacy Policy →

🔒

Restrict access to NeuralGate only (recommended)

Prevent customers from bypassing NeuralGate and calling your server directly. Only allow traffic from the NeuralGate gateway IP.

Option A: iptables (Linux)

# Allow only NeuralGate gateway to reach your server on port 8080
sudo iptables -A INPUT -p tcp --dport 8080 -s api.computeshare.servequake.com -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8080 -j DROP

# Save rules permanently
sudo iptables-save | sudo tee /etc/iptables/rules.v4

Option B: Docker Compose (bind to localhost, use nginx)

# In docker-compose.yml — only bind to localhost:
ports:
  - "127.0.0.1:8080:8080"  # NOT 0.0.0.0:8080

# Then use nginx to only allow NeuralGate's IP:
# /etc/nginx/sites-available/llama
server {
    listen 8080;
    allow api.computeshare.servequake.com;
    deny all;
    location / {
        proxy_pass http://127.0.0.1:8080;
    }
}

Option C: Cloudflare Tunnel

If you use cloudflared, the tunnel URL is only known to NeuralGate (since you register it with us). Customers can't guess your tunnel URL. This is inherently more secure than a public IP.

💡 At minimum, always set a bearer token on your server (--api-key). Even without IP restriction, an unknown bearer token prevents unauthorized access.

Ready to start earning?

Your GPUs are sitting idle. Put them to work.

← Hosting Overview Credits & Earnings →