documentation

faq

Common GroundShade questions. PoW choice, JA4 setup behind Caddy, dashboard exposure, Cloudflare compatibility, healthcheck cascades, and self-protection.

updated

Why SHA-256 PoW instead of Argon2id?

v1 ships SHA-256 because the browser computes it natively via crypto.subtle.digest('SHA-256', ...). No WASM artifact, no extra crate on the client, no compilation pipeline. The wire format already carries an algorithm field, so a memory-hard option (Argon2id) can land in a later version behind it.

PoW is the last layer. JA4 fingerprinting, the browser-integrity probe, and the fast lane all run before it. Even a GPU-friendly PoW is meaningful work for a scraper that already paid for the rest of the stack.

Why doesn’t GroundShade terminate TLS?

Terminating TLS makes the install more complex (ACME, key management, cipher tuning) for little gain. The only thing it buys is access to the raw ClientHello for JA4. GroundShade already gets that from the X-JA4 header your fronting proxy forwards.

See examples/deploy/ for one-line terminator configs (Caddy, nginx, HAProxy).

My Googlebot requests are getting challenged

GroundShade fast-lanes a Googlebot UA only when the request’s IP passes reverse-DNS plus forward-DNS verification (the procedure Google documents). If reverse DNS isn’t set up correctly, or you’re testing from a non-Google IP, the bot UA alone won’t bypass.

Verification results cache for 24 h on success and 10 min on failure. The first verified request from a given Googlebot IP costs one DNS round-trip; subsequent requests are free.

Real users on slow phones see the challenge for too long

Lower challenge.pow.leading_zero_bits from the default 18 toward 15 or 16. Each bit doubles expected solve time.

My CI job that scrapes our own API is getting challenged

Two paths:

  1. Issue an API key for the CI client. Best for known clients. See Operating.
  2. Set prefer_json_challenge: true on the relevant route and point the client at examples/solvers/solve.py to solve the JSON challenge once per session.

Does GroundShade share my traffic with anyone?

No. v1 has zero phone-home. The only outbound calls are:

  • Reverse-DNS plus forward-DNS lookups for verified-crawler verification, via your system resolver.
  • Webhook deliveries to URLs you configure.

Nothing else leaves the proxy.

Is the admin dashboard safe to expose publicly?

The admin port supports bearer-token auth via listen.admin_token (or GROUNDSHADE_ADMIN_TOKEN). With a token set, every /admin/* and /metrics request needs Authorization: Bearer <token> or the gs_admin cookie that /admin/login sets.

Startup refuses to bind admin on a non-loopback address without a token configured.

The safe default stays loopback (127.0.0.1:9090) and tunnels over SSH or another authenticated channel. The token is the safety net for deployments that publish admin to a wider network (Docker inside-the-bridge, an internal VPN). Per-user accounts and RBAC are not in scope for v1.

My X-JA4 header is the literal string {tls_client_ja4}

Stock Caddy has no JA4 placeholder. The closest built-in is {tls_client_fingerprint}, which is a TLS cert-based fingerprint used for mTLS, unrelated to JA4. If your Caddyfile contains request_header X-JA4 {tls_client_ja4} and Caddy was built without a JA4 plugin, the placeholder passes through as text. Every trust token GroundShade mints then binds to the literal string and a stolen token works from anywhere.

GroundShade refuses to bind any X-JA4 value containing { or } and logs a one-shot WARN the first time it sees one. The JA4 leg of the binding drops on those requests; the trust token still binds to (route, IP-prefix), which is weaker but at least correct.

To get JA4 working, build Caddy with the caddy-ja4 plugin via xcaddy. The recipe lives at examples/deploy/caddy-ja4.Dockerfile:

FROM caddy:2-builder AS builder
RUN xcaddy build --with github.com/matt-/caddy-ja4

FROM caddy:2-alpine
COPY --from=builder /usr/bin/caddy /usr/bin/caddy

And the Caddyfile changes to:

{
    order ja4 first
    servers {
        listener_wrappers {
            ja4
            tls
        }
    }
}

example.com {
    ja4 { var_name ja4 }
    request_header -X-JA4
    request_header X-JA4 {http.vars.ja4}
    reverse_proxy groundshade:8080 { ... }
}

examples/deploy/caddy-ja4.Caddyfile is the full version. Two non-obvious points:

  • order ja4 first is required in the global block. The plugin’s own README uses order ja4 before respond, which fires too late if {http.vars.ja4} is consumed by request_header (which runs before respond). first makes the directive run before any handler that reads the variable.
  • After any change to the listener_wrappers global block, do a full Caddy restart. caddy reload only refreshes HTTP handlers; listener wrappers attach at socket setup. A reload appears to succeed but silently leaves the wrapper inactive, {http.vars.ja4} expands to empty, and JA4 binding stays off.

With this build, the ja4 directive fails loudly at config-load time if the plugin is missing.

Can I run GroundShade behind Cloudflare?

Yes. Orange-cloud (CF terminates TLS) works after v0.7.0 with two unavoidable limits: JA4-keyed signals go silent (CF terminates the handshake) and the verified-crawler rDNS fast lane goes dead (rDNS resolves CF’s edge, never googlebot.com). Per-IP rate, trustless persistence, trust tokens, and operator allowlists keep working.

Grey-cloud (DNS-only) preserves everything and is the recommended mode when possible. Full setup: behind cloudflare.

Behind Caddy, the upstream gets marked dead under load and every request returns 503

Caddy’s active healthcheck probes /health:

reverse_proxy groundshade:8080 {
    health_uri /health
    health_interval 30s
    health_status 200
}

GroundShade is transparent, so /health proxies straight through to your real upstream. If that upstream rate-limits per-IP (and from its viewpoint all traffic comes from Caddy’s container IP), a heavy burst rate-limits the healthcheck too. Caddy gets back 429, expects exactly 200, marks the upstream dead, and every subsequent request returns 503 no upstreams available until the next probe.

It’s the same cascade you’d hit with any transparent proxy in front of a rate-limited app, or with no proxy at all.

The fix lives in the fronting proxy. Switch to passive detection:

reverse_proxy groundshade:8080 {
    fail_duration 10s
    max_fails 3
    unhealthy_status 5xx
}

Caddy now marks the upstream down only after three consecutive responses that look like a real server problem (a 5xx). 429 doesn’t count, so the rate-limit bucket can’t trigger the cascade. Recovery is automatic once real traffic starts succeeding again.

Does the proxy survive its own attack?

Yes, by design. Every data structure is bounded:

  • Detector ring buffer: 10,000 samples per route.
  • Connection limiter per-IP LRU: 16,384 prefixes.
  • Rate signal LRUs (IP prefix and JA4): 50,000 keys each per route by default (defense.rate_signals.max_keys_per_route).
  • Trustless persistence map: 100,000 prefixes per route by default (defense.trustless_persistence.max_keys).
  • Crawler-verifier LRU: 24 h positive, 10 min negative.
  • Trust-token replay LRU: bounded.

A CI load gate asserts survival under sustained attack (100k req/s from 10k synthetic attackers on 2 cores) and any PR that regresses it doesn’t merge.

selfdef::pressure::spawn_monitor polls /proc/self/statm once per second on Linux. When RSS crosses selfdef.rss_high_watermark_mb (default 512 MB), the connection limiter’s backpressure flag flips, new connections are refused at the accept layer, and the groundshade_self_throttle gauge goes to 1. New connections resume when RSS drops back below. On non-Linux the monitor is a no-op.