documentation

operating groundshade

Daily operator guide. Configuration, defense levels, fast-lane allowlists, the admin dashboard, the metrics reference, and the tuning table.

updated

This page covers running GroundShade in production. The wire formats and full design live in SPEC.md.

TL;DR

# Plain run with built-in defaults. Forwards to http://127.0.0.1:3000.
groundshade

# With a config file:
groundshade --config /etc/groundshade/config.yaml

The proxy listens on 0.0.0.0:8080. The admin port and dashboard live on 127.0.0.1:9090.

Configuration

YAML, loaded in priority order:

  1. --config <path> flag
  2. GROUNDSHADE_CONFIG env var
  3. /etc/groundshade/config.yaml
  4. ./groundshade.yaml
  5. Built-in defaults (zero-config)

examples/config/full.yaml documents every option at its default. examples/config/minimal.yaml is the smallest useful starting point.

Environment overrides

These env vars override the loaded YAML at startup and on every SIGHUP reload:

VariablePurpose
GROUNDSHADE_CONFIGConfig file path
GROUNDSHADE_LISTEN_HTTPInbound HTTP listener
GROUNDSHADE_LISTEN_ADMINAdmin listener
GROUNDSHADE_ADMIN_TOKENBearer token gating /admin/* and /metrics
GROUNDSHADE_UPSTREAM_URLDefault upstream URL
GROUNDSHADE_TRUSTED_PROXIESComma-separated CIDRs
GROUNDSHADE_TRUST_STATE_DIRTrust-key state directory
GROUNDSHADE_TRUST_SECRETHMAC signing key (hex)
GROUNDSHADE_LOG_FORMATjson or text
GROUNDSHADE_LOG_IP_HASHtrue hashes client IPs in logs

The Docker image sets GROUNDSHADE_LISTEN_ADMIN=0.0.0.0:9090 and GROUNDSHADE_TRUST_STATE_DIR=/var/lib/groundshade. Outside Docker, the state directory defaults to $XDG_STATE_HOME/groundshade or $HOME/.local/state/groundshade.

Hot reload

SIGHUP re-reads the config file and reapplies environment overrides. The listeners drain gracefully and the proxy restarts on the same PID with the same trust signing key, so outstanding gs_trust cookies stay valid. Defense state, sliding windows, connection counts, and metric counters all reset.

A validation failure logs a warning and keeps the old config running. Bad reloads never take the proxy down.

To catch a bad config before you reload, validate it without starting the server:

groundshade --check-config --config /etc/groundshade/config.yaml
# exit 0 + "config OK" if valid, exit 1 + the error if not

It applies the same env overrides as a real start, so it checks the effective config. Run it as a container pre-flight (docker exec groundshade groundshade --check-config) before the SIGHUP.

The inbound port is unbound briefly during reload (under 100 ms in normal drains, up to 30 s if old connections take their time). For true zero-downtime, run two replicas behind a load balancer.

In zero-config mode (no config file resolves), SIGHUP is logged and ignored.

What it does

Calm state: GroundShade behaves as a normal reverse proxy. No challenge code runs.

Defense state: when a route’s origin pain crosses thresholds (p95 latency or 5xx rate over a 30 s window with at least 50 samples), the route lifts:

LevelWho sees a challenge (without a trust token)
L1UAs matching l1_ua_patterns; forged-browser clients (UA claims browser, JA4 disagrees); optionally write methods if l1_suspicion_methods is set. v0.7.1’s always_challenge_forged_browser flag promotes the forged-browser arm to fire at every level, including Open.
L2L1 scope plus thin clients (no Referer, no Accept-Language)
L3Everyone except the fast lane

Defaults for l1_ua_patterns: headless, bot, crawl, spider, python, curl, go-http, libwww. l1_suspicion_methods is empty by default; opt in only for write-only APIs that never see a real browser POST.

Independent of level, two behavioural signals can short-circuit:

  • Rate signal (hard threshold). The 60 s sliding window per (route, IP /24) or (route, JA4) crossed defense.rate_signals.hard_threshold. Default: 1,000 requests.
  • Trustless persistence. The /24 was challenged defense.trustless_persistence.threshold times without ever solving. Default: 20. Sticks until a single solve clears it.

The rate signal’s soft threshold acts as an extra L1 arm on the firing request only. No permanent state change.

Per-route policy knobs (v0.7.1)

Two config fields let you pin route policy without operator action:

  • defense.escalation.min_level (default open): the level the route is allowed to fall to on cooldown. Setting l1 keeps a sensitive route in active defense even when the detector reports calm. The detector can still escalate above; it just won’t step below. Useful for admin panels, payment endpoints, and known scrape targets. The difference from shields_up is that the detector still drives the upper levels; shields_up pins all the way at shields_up.
  • defense.scope.always_challenge_forged_browser (default false): when true, requests classified as ForgedBrowser (browser-shaped UA paired with a script-tool JA4) are placed in challenge scope at every level, including Open. Pairs with the v0.6 rate signal: rate catches volume; this flag catches polite low-rate forgers that fly under the hard threshold. No-op under CF orange-cloud, where every request arrives with CF’s JA4 and the classifier returns Browser for all.

Both are per-route overrides via the routes list, so the same proxy can have a relaxed default route and a strict /admin/* route:

routes:
  - match:
      path: "/admin/*"
    defense:
      escalation:
        min_level: l1
      scope:
        always_challenge_forged_browser: true

Fast lane

The fast lane always bypasses challenges. First match wins, in order:

  1. Client IP matches fastlane.allow_ips (CIDRs, not spoofable when trusted_proxies is set correctly).
  2. Path matches a configured feed glob (/feed, /rss, *.atom, etc.).
  3. User-Agent substring matches fastlane.allow_user_agents (spoofable; pair with allow_ips if the threat model demands).
  4. Authorization: ApiKey id:secret matches a configured key.
  5. UA claims a known crawler (Google, Bing, DuckDuckGo, Apple, optionally Yandex) and the IP passes reverse-DNS plus forward-DNS verification.

fastlane.crawlers.yandex is false by default. The others are on.

No-JS passage challenge (opt-in, v0.7.2)

The default HTML interstitial needs JavaScript to solve the proof-of-work. Clients with JS off (Tor Browser on Safer/Safest, NoScript users, text browsers, some accessibility setups) hit the interstitial and have no way through. The passage challenge gives them a path that does not need JavaScript.

It is a friction layer, not a bot detector. It raises cost on no-JS clients (a one-click form, a server-enforced wait, single-use tokens, ip-prefix + JA4 binding, a shorter trust TTL) but a patient client can still pass. The JS proof-of-work stays the stronger proof; do not treat the passage as a replacement.

Enable it per route with challenge.mode:

  • js (default): JS PoW only; no-JS clients dead-end. No change.
  • auto: JS clients solve the PoW as before; no-JS clients get the passage form in the <noscript> branch of the same page.
  • nojs: the passage form is the whole page, no PoW. For routes you know are no-JS (for example an onion address).
routes:
  - match:
      path: "/onion/**"
    challenge:
      # A per-route challenge block REPLACES the global one wholesale, so
      # copy any pow/interstitial settings you rely on into it too.
      mode: auto
      nojs:
        delay_secs: 5
        trust_ttl_secs: 600

The flow: a challenged no-JS client clicks Continue (a form with an invisible honeypot), waits out delay_secs while a CSS-only progress bar fills and the page auto-reloads, then is issued a gs_trust cookie and sent back to where it was (query string preserved). The wait is enforced server-side; there is no JS to gate the button. Tuning lives under challenge.nojs: delay_secs, redeem_window_secs, trust_ttl_secs, max_issued_per_prefix, issue_window_secs. Because the passage is a weaker proof, nojs.trust_ttl_secs must be <= trust.token_ttl_secs (validated at startup).

Watch challenges_issued_total{kind="nojs"}, challenges_solved_total{kind="nojs"}, challenge_failed_total{kind="nojs",reason=...}, and passage_wait_total to see passage traffic and failures.

A note for Tor: behind one exit node many clients share an IP prefix and Tor Browser’s uniform JA4, so the binding is weak there; the single-use token and per-prefix issuance cap carry the anti-abuse weight.

Operator workflows

Engage shields under attack

curl -X POST http://127.0.0.1:9090/admin/shields \
  -H "Authorization: Bearer $GROUNDSHADE_ADMIN_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"level":"up"}'

Add "route": "*|/api/*" to target one route. Disengage with {"level":"down"}.

Mint an API key for a partner

secret=$(openssl rand -hex 32)
echo "secret (give to partner): $secret"
cat <<EOF
fastlane:
  api_keys:
    - id: partner-acme
      secret_hash: "$(printf '%s' "$secret" | sha256sum | awk '{print $1}')"
      label: "Acme webhook receiver"
EOF

SIGHUP to reload. The partner sends Authorization: ApiKey partner-acme:<secret> on every request.

Shortcut: make api-key.

Let a non-browser client through

Two paths:

  1. Issue an API key (above). Use this for known clients.

  2. Have the client solve the JSON challenge. Reference solvers ship at examples/solvers/:

    token=$(./examples/solvers/solve.py https://your.site/api/endpoint)
    curl -H "Authorization: ChallengeSolution $token" https://your.site/api/endpoint

Allowlist a monitor

For monitors that can’t send a custom auth header (UptimeRobot free tier, Uptime-Kuma, PageSpeed), use the fast-lane allowlists. Two independent layers; either match bypasses:

fastlane:
  allow_ips:
    - "63.143.42.240/28"      # UptimeRobot range 1
    - "69.162.124.224/28"     # UptimeRobot range 2
  allow_user_agents:
    - "UptimeRobot"
    - "Uptime-Kuma"
    - "PageSpeed"

Security note. UA substring match is forgeable. Treat it as ergonomics, not security. Pair allow_user_agents with allow_ips when the threat model demands it; an attacker then has to defeat both.

The dashboard’s fast-lane section shows hit counts per reason. If a monitor’s counter never moves, either the IP range is wrong or your fronting proxy is stripping the real client IP before it reaches GroundShade.

Read the dashboard

Open http://127.0.0.1:9090/ in a browser. With an admin token configured, you land on /admin/login; submitting the token sets an HttpOnly gs_admin cookie and redirects to the dashboard.

The page polls /admin/status, /admin/routes, and /metrics once per second and renders:

  • A hero showing the worst-route level and a one-line summary of connections, drop rate, and uptime.
  • Per-route cards with a 32 s sparkline of request rate, the shields toggle, and per-route signal tracked-key counts.
  • A traffic proportion bar (forwarded vs challenged) and three challenge funnels with drop/pass rates: browser (JS PoW), API (JSON), and the opt-in no-JS passage (issued, solved, wait, fail).
  • A FAST LANE row with per-reason counters.
  • A CLIENTS row classifying traffic by UA + JA4 family (browser, script, forged, bot, unknown).
  • A SIGNALS row with JA4 state, rate soft/hard hits, trustless hits, and tracked-key counts.
  • A CONNECTIONS row with active connections, refusals, and the self-throttle flag.

Point Prometheus at http://127.0.0.1:9090/metrics with the same Authorization: Bearer <admin-token> header.

Admin endpoints

MethodPathPurpose
GET/admin/statusVersion + counts
GET/admin/routesPer-route defense snapshot
POST/admin/shieldsEngage or disengage shields
GET/admin/loginLogin form
POST/admin/loginSubmit token, get cookie
POST/admin/logoutDrop the cookie
GET/metricsPrometheus exposition

Metrics reference

MetricLabelsTypeMeaning
groundshade_requests_totalroute, decision, levelcounterRequests processed by decision (forward, challenge_html, challenge_json, reject) and route level
groundshade_route_levelroutegaugeEffective level: 0 Open, 1 L1, 2 L2, 3 L3, 4 ShieldsUp
groundshade_challenges_issued_totalroute, kindcounterChallenges minted; kind is html, json, or nojs (passage)
groundshade_challenges_solved_totalroute, kindcounterSuccessful solves; same kind set
groundshade_challenge_failed_totalroute, kind, reasoncounterNo-JS passage redemptions that failed (reason: bad_signature, expired, replay, binding, bad_path, honeypot, cap, unauthorized)
groundshade_passage_wait_totalroutecounterNo-JS passage reloads served before maturity (the wait state)
groundshade_tokens_issued_totalroutecounterTrust tokens minted
groundshade_fastlane_totalroute, reasoncounterFast-lane hits by reason (apikey, crawler, feed, ip_allowlist, ua_allowlist)
groundshade_connections_active(none)gaugeInbound TCP connections held
groundshade_connections_rejected_totalreasoncounterConnections refused at the accept layer
groundshade_self_throttle(none)gauge1 while the proxy is in self-throttle
groundshade_client_family_totalfamilycounterCoarse classification: browser, script, forged_browser, bot, unknown
groundshade_signals_immediate_totalroute, reasoncounterChallenged on sight by a signal (rate_hard, trustless)
groundshade_signals_soft_noisy_totalroutecounterRequests where the rate soft threshold fired (L1 scope arm)
groundshade_signals_rate_tracked_keysroute, keygaugeDistinct keys tracked by the rate signal (key is ip_prefix or ja4)
groundshade_signals_trustless_tracked_keysroutegaugeDistinct prefixes tracked by trustless persistence
groundshade_ja4_detected(none)gauge1 once at least one request has carried JA4

Behavioural signals

Both signals run after the fast lane on every non-bypass request.

Rate signal. Per route, a 60 s sliding window keyed in parallel by client prefix (IPv4 /24 or IPv6 /56) and JA4. Crossing soft_threshold (200 req/min) widens the L1 scope check by one arm on that request. Crossing hard_threshold (1,000 req/min) issues a challenge regardless of level.

Trustless persistence. Per route, a per-prefix counter of challenges issued without ever solving. Past 20 (default), the prefix is challenged on sight. One solve clears it. The “ever earned trust” bit is sticky for the entry’s lifetime.

Note on prefixes: trust-token IP binding uses v4 /24 and v6 /48 by default. The rate and trustless signals use v4 /24 and v6 /56. The v6 numbers differ on purpose; signals use RIPE’s recommended customer-prefix size.

SEO safety invariant

Both signals consult after the fast lane. Verified crawlers, operator allowlists, feed paths, and API keys never reach the signal evaluator. A misconfigured threshold cannot accidentally challenge a search-engine crawler. Tests in crates/groundshade-proxy/tests/e2e_signals_seo.rs lock the invariant.

JA4 availability

The proxy auto-detects whether your fronting proxy is forwarding X-JA4. After 100 requests (or 60 s), if no JA4 has arrived, it logs a single WARN and the per-JA4 arm goes silent. The per-IP/24 arm and trustless persistence keep working. The dashboard’s signals row shows the current state.

Tuning

SymptomKnobDirection
Solves too slow on phoneschallenge.pow.leading_zero_bitsLower (16–17)
Bots find solving cheapchallenge.pow.leading_zero_bitsRaise (19–20)
False positives at L1defense.scope.l1_ua_patternsTrim
Headless setups pass too easilychallenge.probe.min_scoreRaise from 0 to 5 or 10
Origin still hot under heavy trafficdefense.trigger.p95_latency_ms / err5xx_rateLower
Proxy running out of FDsselfdef.max_connections_totalRaise (after kernel ulimit)
connections_rejected_total{reason="per_ip"} climbing behind a fronting proxylisten.trusted_proxiesAdd the proxy’s pinned /32
Sensitive route (admin, payment, scrape target) should stay in defensedefense.escalation.min_levelSet to l1 (or higher) on that route
Forged-browser traffic walks through at Open leveldefense.scope.always_challenge_forged_browserSet true per route (no-op behind CF orange-cloud)
Real users tripping rate signal on rich pagesdefense.rate_signals.soft_thresholdRaise
Polite scrapers slipping pastdefense.rate_signals.hard_thresholdLower (500–800)
Trustless flagging real users with stale cookiesdefense.trustless_persistence.thresholdRaise (30–50)
Memory budget tightdefense.rate_signals.max_keys_per_routeLower (20,000)

Logs

JSON to stdout. Default fields: method, host, path, status, duration, JA4, UA, and client_ip_hash. Client IPs are hashed with a daily-rotated salt. To log raw IPs, set observe.log_ip_hash: false and accept the GDPR responsibility.

Persistent state

Only one file:

  • state_dir/trust.key: 32 bytes, mode 0600. The HMAC signing key for trust tokens. Persists across restarts so outstanding cookies survive a redeploy.

No DB, no Redis, no on-disk logs unless you redirect stdout.

To rotate the key and invalidate every outstanding cookie, delete the file and restart. Or run shields-up with a fresh GROUNDSHADE_TRUST_SECRET.