# Operating GroundShade

> Daily operator guide. Configuration, defense levels, fast-lane allowlists, the admin dashboard, the metrics reference, and the tuning table.

This page covers running GroundShade in production. The wire formats
and full design live in
[SPEC.md](https://codeberg.org/groundshade/groundshade/src/branch/main/SPEC.md).

## TL;DR

```sh
# Plain run with built-in defaults. Forwards to http://127.0.0.1:3000.
groundshade

# With a config file:
groundshade --config /etc/groundshade/config.yaml
```

The proxy listens on `0.0.0.0:8080`. The admin port and dashboard
live on `127.0.0.1:9090`.

## Configuration

YAML, loaded in priority order:

1. `--config <path>` flag
2. `GROUNDSHADE_CONFIG` env var
3. `/etc/groundshade/config.yaml`
4. `./groundshade.yaml`
5. Built-in defaults (zero-config)

[`examples/config/full.yaml`](https://codeberg.org/groundshade/groundshade/src/branch/main/examples/config/full.yaml)
documents every option at its default.
[`examples/config/minimal.yaml`](https://codeberg.org/groundshade/groundshade/src/branch/main/examples/config/minimal.yaml)
is the smallest useful starting point.

### Environment overrides

These env vars override the loaded YAML at startup and on every
`SIGHUP` reload:

| Variable | Purpose |
|---|---|
| `GROUNDSHADE_CONFIG` | Config file path |
| `GROUNDSHADE_LISTEN_HTTP` | Inbound HTTP listener |
| `GROUNDSHADE_LISTEN_ADMIN` | Admin listener |
| `GROUNDSHADE_ADMIN_TOKEN` | Bearer token gating `/admin/*` and `/metrics` |
| `GROUNDSHADE_UPSTREAM_URL` | Default upstream URL |
| `GROUNDSHADE_TRUSTED_PROXIES` | Comma-separated CIDRs |
| `GROUNDSHADE_TRUST_STATE_DIR` | Trust-key state directory |
| `GROUNDSHADE_TRUST_SECRET` | HMAC signing key (hex) |
| `GROUNDSHADE_LOG_FORMAT` | `json` or `text` |
| `GROUNDSHADE_LOG_IP_HASH` | `true` hashes client IPs in logs |

The Docker image sets `GROUNDSHADE_LISTEN_ADMIN=0.0.0.0:9090` and
`GROUNDSHADE_TRUST_STATE_DIR=/var/lib/groundshade`. Outside Docker,
the state directory defaults to `$XDG_STATE_HOME/groundshade` or
`$HOME/.local/state/groundshade`.

### Hot reload

`SIGHUP` re-reads the config file and reapplies environment
overrides. The listeners drain gracefully and the proxy restarts on
the same PID with the same trust signing key, so outstanding
`gs_trust` cookies stay valid. Defense state, sliding windows,
connection counts, and metric counters all reset.

A validation failure logs a warning and keeps the old config
running. Bad reloads never take the proxy down.

To catch a bad config before you reload, validate it without starting
the server:

```sh
groundshade --check-config --config /etc/groundshade/config.yaml
# exit 0 + "config OK" if valid, exit 1 + the error if not
```

It applies the same env overrides as a real start, so it checks the
effective config. Run it as a container pre-flight
(`docker exec groundshade groundshade --check-config`) before the SIGHUP.

The inbound port is unbound briefly during reload (under 100 ms in
normal drains, up to 30 s if old connections take their time). For
true zero-downtime, run two replicas behind a load balancer.

In zero-config mode (no config file resolves), SIGHUP is logged and
ignored.

## What it does

Calm state: GroundShade behaves as a normal reverse proxy. No
challenge code runs.

Defense state: when a route's origin pain crosses thresholds (p95
latency or 5xx rate over a 30 s window with at least 50 samples),
the route lifts:

| Level | Who sees a challenge (without a trust token) |
|---|---|
| L1 | UAs matching `l1_ua_patterns`; forged-browser clients (UA claims browser, JA4 disagrees); optionally write methods if `l1_suspicion_methods` is set. v0.7.1's `always_challenge_forged_browser` flag promotes the forged-browser arm to fire at every level, including Open. |
| L2 | L1 scope plus thin clients (no `Referer`, no `Accept-Language`) |
| L3 | Everyone except the fast lane |

Defaults for `l1_ua_patterns`: `headless`, `bot`, `crawl`, `spider`,
`python`, `curl`, `go-http`, `libwww`. `l1_suspicion_methods` is
empty by default; opt in only for write-only APIs that never see a
real browser POST.

Independent of level, two behavioural signals can short-circuit:

- **Rate signal (hard threshold).** The 60 s sliding window per
  `(route, IP /24)` or `(route, JA4)` crossed
  `defense.rate_signals.hard_threshold`. Default: 1,000 requests.
- **Trustless persistence.** The `/24` was challenged
  `defense.trustless_persistence.threshold` times without ever
  solving. Default: 20. Sticks until a single solve clears it.

The rate signal's soft threshold acts as an extra L1 arm on the
firing request only. No permanent state change.

### Per-route policy knobs (v0.7.1)

Two config fields let you pin route policy without operator action:

- **`defense.escalation.min_level`** (default `open`): the level the
  route is allowed to *fall to* on cooldown. Setting `l1` keeps a
  sensitive route in active defense even when the detector reports
  calm. The detector can still escalate above; it just won't step
  below. Useful for admin panels, payment endpoints, and known scrape
  targets. The difference from `shields_up` is that the detector still
  drives the upper levels; `shields_up` pins all the way at
  `shields_up`.
- **`defense.scope.always_challenge_forged_browser`** (default
  `false`): when `true`, requests classified as ForgedBrowser
  (browser-shaped UA paired with a script-tool JA4) are placed in
  challenge scope at every level, including Open. Pairs with the v0.6
  rate signal: rate catches volume; this flag catches polite low-rate
  forgers that fly under the hard threshold. No-op under CF
  orange-cloud, where every request arrives with CF's JA4 and the
  classifier returns Browser for all.

Both are per-route overrides via the `routes` list, so the same proxy
can have a relaxed default route and a strict `/admin/*` route:

```yaml
routes:
  - match:
      path: "/admin/*"
    defense:
      escalation:
        min_level: l1
      scope:
        always_challenge_forged_browser: true
```

### Fast lane

The fast lane always bypasses challenges. First match wins, in
order:

1. Client IP matches `fastlane.allow_ips` (CIDRs, not spoofable when
   `trusted_proxies` is set correctly).
2. Path matches a configured feed glob (`/feed`, `/rss`, `*.atom`,
   etc.).
3. `User-Agent` substring matches `fastlane.allow_user_agents`
   (spoofable; pair with `allow_ips` if the threat model demands).
4. `Authorization: ApiKey id:secret` matches a configured key.
5. UA claims a known crawler (Google, Bing, DuckDuckGo, Apple,
   optionally Yandex) **and** the IP passes reverse-DNS plus
   forward-DNS verification.

`fastlane.crawlers.yandex` is `false` by default. The others are on.

### No-JS passage challenge (opt-in, v0.7.2)

The default HTML interstitial needs JavaScript to solve the
proof-of-work. Clients with JS off (Tor Browser on Safer/Safest,
NoScript users, text browsers, some accessibility setups) hit the
interstitial and have no way through. The passage challenge gives them
a path that does not need JavaScript.

It is a **friction layer, not a bot detector**. It raises cost on no-JS
clients (a one-click form, a server-enforced wait, single-use tokens,
ip-prefix + JA4 binding, a shorter trust TTL) but a patient client can
still pass. The JS proof-of-work stays the stronger proof; do not treat
the passage as a replacement.

Enable it per route with `challenge.mode`:

- `js` (default): JS PoW only; no-JS clients dead-end. No change.
- `auto`: JS clients solve the PoW as before; no-JS clients get the
  passage form in the `<noscript>` branch of the same page.
- `nojs`: the passage form is the whole page, no PoW. For routes you
  know are no-JS (for example an onion address).

```yaml
routes:
  - match:
      path: "/onion/**"
    challenge:
      # A per-route challenge block REPLACES the global one wholesale, so
      # copy any pow/interstitial settings you rely on into it too.
      mode: auto
      nojs:
        delay_secs: 5
        trust_ttl_secs: 600
```

The flow: a challenged no-JS client clicks Continue (a form with an
invisible honeypot), waits out `delay_secs` while a CSS-only progress
bar fills and the page auto-reloads, then is issued a `gs_trust` cookie
and sent back to where it was (query string preserved). The wait is
enforced server-side; there is no JS to gate the button. Tuning lives
under `challenge.nojs`: `delay_secs`, `redeem_window_secs`,
`trust_ttl_secs`, `max_issued_per_prefix`, `issue_window_secs`. Because
the passage is a weaker proof, `nojs.trust_ttl_secs` must be `<=
trust.token_ttl_secs` (validated at startup).

Watch `challenges_issued_total{kind="nojs"}`,
`challenges_solved_total{kind="nojs"}`,
`challenge_failed_total{kind="nojs",reason=...}`, and
`passage_wait_total` to see passage traffic and failures.

A note for Tor: behind one exit node many clients share an IP prefix
and Tor Browser's uniform JA4, so the binding is weak there; the
single-use token and per-prefix issuance cap carry the anti-abuse
weight.

## Operator workflows

### Engage shields under attack

```sh
curl -X POST http://127.0.0.1:9090/admin/shields \
  -H "Authorization: Bearer $GROUNDSHADE_ADMIN_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"level":"up"}'
```

Add `"route": "*|/api/*"` to target one route. Disengage with
`{"level":"down"}`.

### Mint an API key for a partner

```sh
secret=$(openssl rand -hex 32)
echo "secret (give to partner): $secret"
cat <<EOF
fastlane:
  api_keys:
    - id: partner-acme
      secret_hash: "$(printf '%s' "$secret" | sha256sum | awk '{print $1}')"
      label: "Acme webhook receiver"
EOF
```

`SIGHUP` to reload. The partner sends
`Authorization: ApiKey partner-acme:<secret>` on every request.

Shortcut: `make api-key`.

### Let a non-browser client through

Two paths:

1. Issue an API key (above). Use this for known clients.
2. Have the client solve the JSON challenge. Reference solvers ship
   at
   [`examples/solvers/`](https://codeberg.org/groundshade/groundshade/src/branch/main/examples/solvers):

   ```sh
   token=$(./examples/solvers/solve.py https://your.site/api/endpoint)
   curl -H "Authorization: ChallengeSolution $token" https://your.site/api/endpoint
   ```

### Allowlist a monitor

For monitors that can't send a custom auth header (UptimeRobot free
tier, Uptime-Kuma, PageSpeed), use the fast-lane allowlists. Two
independent layers; either match bypasses:

```yaml
fastlane:
  allow_ips:
    - "63.143.42.240/28"      # UptimeRobot range 1
    - "69.162.124.224/28"     # UptimeRobot range 2
  allow_user_agents:
    - "UptimeRobot"
    - "Uptime-Kuma"
    - "PageSpeed"
```

> **Security note.** UA substring match is forgeable. Treat it as
> ergonomics, not security. Pair `allow_user_agents` with
> `allow_ips` when the threat model demands it; an attacker then
> has to defeat both.

The dashboard's fast-lane section shows hit counts per reason. If a
monitor's counter never moves, either the IP range is wrong or your
fronting proxy is stripping the real client IP before it reaches
GroundShade.

### Read the dashboard

Open `http://127.0.0.1:9090/` in a browser. With an admin token
configured, you land on `/admin/login`; submitting the token sets an
`HttpOnly` `gs_admin` cookie and redirects to the dashboard.

The page polls `/admin/status`, `/admin/routes`, and `/metrics` once
per second and renders:

- A hero showing the worst-route level and a one-line summary of
  connections, drop rate, and uptime.
- Per-route cards with a 32 s sparkline of request rate, the
  shields toggle, and per-route signal tracked-key counts.
- A traffic proportion bar (forwarded vs challenged) and three
  challenge funnels with drop/pass rates: browser (JS PoW), API
  (JSON), and the opt-in no-JS passage (issued, solved, wait, fail).
- A FAST LANE row with per-reason counters.
- A CLIENTS row classifying traffic by UA + JA4 family (browser,
  script, forged, bot, unknown).
- A SIGNALS row with JA4 state, rate soft/hard hits, trustless
  hits, and tracked-key counts.
- A CONNECTIONS row with active connections, refusals, and the
  self-throttle flag.

Point Prometheus at `http://127.0.0.1:9090/metrics` with the same
`Authorization: Bearer <admin-token>` header.

### Admin endpoints

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/admin/status` | Version + counts |
| `GET` | `/admin/routes` | Per-route defense snapshot |
| `POST` | `/admin/shields` | Engage or disengage shields |
| `GET` | `/admin/login` | Login form |
| `POST` | `/admin/login` | Submit token, get cookie |
| `POST` | `/admin/logout` | Drop the cookie |
| `GET` | `/metrics` | Prometheus exposition |

### Metrics reference

| Metric | Labels | Type | Meaning |
|---|---|---|---|
| `groundshade_requests_total` | `route`, `decision`, `level` | counter | Requests processed by decision (`forward`, `challenge_html`, `challenge_json`, `reject`) and route level |
| `groundshade_route_level` | `route` | gauge | Effective level: `0` Open, `1` L1, `2` L2, `3` L3, `4` ShieldsUp |
| `groundshade_challenges_issued_total` | `route`, `kind` | counter | Challenges minted; `kind` is `html`, `json`, or `nojs` (passage) |
| `groundshade_challenges_solved_total` | `route`, `kind` | counter | Successful solves; same `kind` set |
| `groundshade_challenge_failed_total` | `route`, `kind`, `reason` | counter | No-JS passage redemptions that failed (`reason`: `bad_signature`, `expired`, `replay`, `binding`, `bad_path`, `honeypot`, `cap`, `unauthorized`) |
| `groundshade_passage_wait_total` | `route` | counter | No-JS passage reloads served before maturity (the wait state) |
| `groundshade_tokens_issued_total` | `route` | counter | Trust tokens minted |
| `groundshade_fastlane_total` | `route`, `reason` | counter | Fast-lane hits by reason (`apikey`, `crawler`, `feed`, `ip_allowlist`, `ua_allowlist`) |
| `groundshade_connections_active` | (none) | gauge | Inbound TCP connections held |
| `groundshade_connections_rejected_total` | `reason` | counter | Connections refused at the accept layer |
| `groundshade_self_throttle` | (none) | gauge | `1` while the proxy is in self-throttle |
| `groundshade_client_family_total` | `family` | counter | Coarse classification: `browser`, `script`, `forged_browser`, `bot`, `unknown` |
| `groundshade_signals_immediate_total` | `route`, `reason` | counter | Challenged on sight by a signal (`rate_hard`, `trustless`) |
| `groundshade_signals_soft_noisy_total` | `route` | counter | Requests where the rate soft threshold fired (L1 scope arm) |
| `groundshade_signals_rate_tracked_keys` | `route`, `key` | gauge | Distinct keys tracked by the rate signal (`key` is `ip_prefix` or `ja4`) |
| `groundshade_signals_trustless_tracked_keys` | `route` | gauge | Distinct prefixes tracked by trustless persistence |
| `groundshade_ja4_detected` | (none) | gauge | `1` once at least one request has carried JA4 |

## Behavioural signals

Both signals run after the fast lane on every non-bypass request.

**Rate signal.** Per route, a 60 s sliding window keyed in parallel
by client prefix (IPv4 `/24` or IPv6 `/56`) and JA4. Crossing
`soft_threshold` (200 req/min) widens the L1 scope check by one arm
on that request. Crossing `hard_threshold` (1,000 req/min) issues a
challenge regardless of level.

**Trustless persistence.** Per route, a per-prefix counter of
challenges issued without ever solving. Past 20 (default), the
prefix is challenged on sight. One solve clears it. The "ever
earned trust" bit is sticky for the entry's lifetime.

> Note on prefixes: trust-token IP binding uses v4 `/24` and v6 `/48`
> by default. The rate and trustless signals use v4 `/24` and v6
> `/56`. The v6 numbers differ on purpose; signals use RIPE's
> recommended customer-prefix size.

### SEO safety invariant

Both signals consult **after** the fast lane. Verified crawlers,
operator allowlists, feed paths, and API keys never reach the signal
evaluator. A misconfigured threshold cannot accidentally challenge a
search-engine crawler. Tests in
[`crates/groundshade-proxy/tests/e2e_signals_seo.rs`](https://codeberg.org/groundshade/groundshade/src/branch/main/crates/groundshade-proxy/tests/e2e_signals_seo.rs)
lock the invariant.

### JA4 availability

The proxy auto-detects whether your fronting proxy is forwarding
`X-JA4`. After 100 requests (or 60 s), if no JA4 has arrived, it
logs a single WARN and the per-JA4 arm goes silent. The per-IP/24
arm and trustless persistence keep working. The dashboard's signals
row shows the current state.

## Tuning

| Symptom | Knob | Direction |
|---|---|---|
| Solves too slow on phones | `challenge.pow.leading_zero_bits` | Lower (16–17) |
| Bots find solving cheap | `challenge.pow.leading_zero_bits` | Raise (19–20) |
| False positives at L1 | `defense.scope.l1_ua_patterns` | Trim |
| Headless setups pass too easily | `challenge.probe.min_score` | Raise from `0` to `5` or `10` |
| Origin still hot under heavy traffic | `defense.trigger.p95_latency_ms` / `err5xx_rate` | Lower |
| Proxy running out of FDs | `selfdef.max_connections_total` | Raise (after kernel ulimit) |
| `connections_rejected_total{reason="per_ip"}` climbing behind a fronting proxy | `listen.trusted_proxies` | Add the proxy's pinned `/32` |
| Sensitive route (admin, payment, scrape target) should stay in defense | `defense.escalation.min_level` | Set to `l1` (or higher) on that route |
| Forged-browser traffic walks through at Open level | `defense.scope.always_challenge_forged_browser` | Set `true` per route (no-op behind CF orange-cloud) |
| Real users tripping rate signal on rich pages | `defense.rate_signals.soft_threshold` | Raise |
| Polite scrapers slipping past | `defense.rate_signals.hard_threshold` | Lower (500–800) |
| Trustless flagging real users with stale cookies | `defense.trustless_persistence.threshold` | Raise (30–50) |
| Memory budget tight | `defense.rate_signals.max_keys_per_route` | Lower (20,000) |

## Logs

JSON to stdout. Default fields: method, host, path, status,
duration, JA4, UA, and `client_ip_hash`. Client IPs are hashed with
a daily-rotated salt. To log raw IPs, set `observe.log_ip_hash:
false` and accept the GDPR responsibility.

## Persistent state

Only one file:

- `state_dir/trust.key`: 32 bytes, mode `0600`. The HMAC signing
  key for trust tokens. Persists across restarts so outstanding
  cookies survive a redeploy.

No DB, no Redis, no on-disk logs unless you redirect stdout.

To rotate the key and invalidate every outstanding cookie, delete
the file and restart. Or run shields-up with a fresh
`GROUNDSHADE_TRUST_SECRET`.
