documentation

architecture

The mental model. Workspace shape, request lifecycle, in-memory state surfaces, concurrency model, and explicit non-goals.

updated

This page is the mental model. Byte-level wire formats and the full design rationale live in SPEC.md.

Workspace

Three Cargo crates:

groundshade-core/      transport-agnostic brain (no hyper, no sockets)
  config              YAML schema, defaults, validation
  routing             host + path-glob matcher; compiles EffectivePolicy
  defense             origin-pain detector + 3-level state machine
  trust               HMAC-signed bearer tokens with binding + budget
  challenge           SHA-256 PoW, signed challenge tokens, interstitial HTML
  fastlane            verified crawlers (rDNS), API keys, feed paths
  fingerprint         JA4 parsing, UA classifier, client-family bucketing
  signals             rate signal + trustless persistence
  selfdef             ConnectionLimiter (per-IP + global caps + backpressure)
  observe             Prometheus metrics, IP hasher, webhook dispatcher

groundshade-proxy/     hyper-based binary
  proxy.rs            top-level lifecycle (start, shutdown, signals)
  server.rs           inbound + admin accept loops, graceful shutdown
  service.rs          per-request dispatch
  upstream.rs         hyper client, URI rewrite
  ws.rs               WebSocket / Upgrade pass-through
  hop.rs              hop-by-hop header stripping (RFC 9110 §7.6.1)
  body.rs             ProxyBody type alias

groundshade-dashboard/ inlined HTML + JS dashboard

Why the split

groundshade-core is transport-agnostic on purpose. Decisions take a RequestFacts value (a borrowed view of the request) and return a verdict. The v2 plugin work (Caddy module, nginx, HAProxy SPOA, Envoy filter) wraps the same core. No rewrites.

groundshade-proxy owns everything that touches hyper or a socket: accept loops, process lifecycle, graceful shutdown.

Request lifecycle

                              ┌─────────────────────────────┐
                              │  groundshade-proxy::server  │
                              │    accept loop              │
                              └──────────────┬──────────────┘
                                             │  ConnectionLimiter::try_admit
                                             │  (per-IP /24 + global cap)

                              ┌─────────────────────────────┐
                              │   hyper auto::Builder       │
                              │   (h1/h2 negotiation +      │
                              │    header_read_timeout)     │
                              └──────────────┬──────────────┘
                                             │  service_fn → handle()

            ┌────────────────────────────────────────────────────────┐
            │  groundshade-proxy::service::dispatch                  │
            │                                                        │
            │  1. /.well-known/groundshade/* ────▶ challenge/solve   │
            │                                                        │
            │  2. extract host + path                                │
            │  3. Router::resolve(host, path) → EffectivePolicy      │
            │  4. compiled upstream lookup                           │
            │  5. policy.bypass? → forward (no defense, no sample)   │
            │                                                        │
            │  6. FastLane::evaluate_sync                            │
            │     (ip allowlist → feed glob → ua allowlist → apikey) │
            │     FastLane::evaluate_crawler (rDNS-verified bots)    │
            │     ─▶ on match: forward, no sample, no signals        │
            │                                                        │
            │  7. ja4_availability.observe(ja4.is_some())            │
            │  8. signals_evaluate(prefix, ja4, now_secs)            │
            │      → SignalsVerdict { immediate, soft_noisy, rate,   │
            │                          trustless }                   │
            │                                                        │
            │  9. trust cookie / Authorization: ChallengeSolution    │
            │     valid? → signals_note_trust_earned, forward,       │
            │              maybe renew                               │
            │                                                        │
            │ 10. signals_verdict.immediate is Some?                 │
            │     (rate_hard or trustless)                           │
            │     → render interstitial OR JSON 401,                 │
            │       signals_record_challenge                         │
            │                                                        │
            │ 11. level == Open? → forward + sample                  │
            │                                                        │
            │ 12. ChallengeSubject in scope for level                │
            │     OR signals_verdict.soft_noisy?                     │
            │     no  → forward + sample                             │
            │     yes → render interstitial OR JSON 401,             │
            │           signals_record_challenge                     │
            │                                                        │
            └────────────────────────────────────────────────────────┘

The challenge endpoints live at:

  • GET /.well-known/groundshade/challenge issues a SHA-256 PoW offer.
  • POST /.well-known/groundshade/solve redeems a solution for a gs_trust cookie.

State surfaces

In memory:

  • Router. Small, immutable, built at startup. Cloned via Arc.
  • DefenseRegistry. One entry per known route_id. Each holds a detector ring buffer (capped at 10,000 samples), a state machine, and a RouteSignals block.
  • RouteSignals (per route). A RateState with two LruCaches (ClientPrefix → SlidingWindow and JA4 → SlidingWindow, default 50,000 keys each) plus a TrustlessState (ClientPrefix → ClientHistory, default 100,000 keys).
  • Ja4Availability (global). Atomic counters for requests seen and requests with JA4, plus a one-shot warning flag.
  • TrustIssuer. Stateless except for the signing key. Verification is pure compute.
  • ChallengeIssuer. Stateless except for an LRU replay cache of solved (token, nonce) pairs.
  • FastLane. Feed matcher (immutable), API-key table (immutable), IP allowlist (immutable, parsed once), UA allowlist (immutable), crawler verifier with a bounded LRU (24 h positive, 10 min negative).
  • ConnectionLimiter. Atomic global counter, bounded per-IP LRU (cap 16,384 prefixes), and an Arc<IpAllowlist> for the trusted-peer cap bypass.
  • Metrics. Prometheus registry with a fixed handful of families.

On disk:

  • state_dir/trust.key. 32 bytes, mode 0600. Created on first start.

That’s the entirety of GroundShade’s persistent state in v1.

Concurrency

  • Inbound. One tokio task per TCP connection (spawned by the accept loop). Hyper serves multiple requests per connection on h1/h2 multiplexing.
  • Outbound. One shared hyper-util::client::legacy::Client pool per ProxyState. Upgrades (WebSocket, generic Upgrade) use their own short-lived hyper::client::conn::http1 connections; the pool can’t safely reuse them after a protocol switch.
  • Background. One defense-tick task ticks once per second across all routes. One webhook-dispatcher task drains the event queue. One RSS-watermark monitor toggles backpressure on Linux.

Every shared data structure with locking is documented at the lock site. Most are parking_lot::Mutex (faster than std, doesn’t poison).

Non-goals

  • No global mutex. The router map is read-only after startup. The defense registry is per-route mutexed. Counters are atomics.
  • No dynamic dispatch in the hot path. The decision tree is static enums and if-else chains.
  • No unsafe. #![forbid(unsafe_code)] in groundshade-core and the proxy binary.
  • No third-party network calls at startup or on the hot path beyond the system DNS resolver (for crawler verification) and configured webhook endpoints.