diff --git a/docs/STAGING_FLEETOPS_ARCHITECTURE.md b/docs/STAGING_FLEETOPS_ARCHITECTURE.md new file mode 100644 index 0000000..1db1450 --- /dev/null +++ b/docs/STAGING_FLEETOPS_ARCHITECTURE.md @@ -0,0 +1,224 @@ +# Staging Environment & FleetOps Split — Architecture + +**Status:** approved 2026-06-10 · **Owner:** kianiadee · **Audience:** both developers (mixed +technical/ops background — readable without prior context). + +This document describes how we (a) introduce a **staging environment** under the +`fivetitude.com` umbrella so the production FleetNow map is never edited directly, and (b) +**split the product** into two surfaces: **FleetNow** (live tracking) and **FleetOps** (fleet +operations — fuel, analytics, KPIs). + +> **No secrets here.** All connection values come from `.env` at runtime — see +> [`CONNECTIONS.md`](CONNECTIONS.md). + +--- + +## 1. Why this change + +FleetNow (`fleetnow.rahamafresh.com`) is now the client's **production** map, so we can no +longer make feature changes or run tests directly against it. Separately, the client asked us +to separate **fleet tracking** from **fleet operations** (fuel management, analytics). That +gives us two needs: + +1. A **staging environment** that mirrors production for safe development and testing. +2. A **new FleetOps surface** (`fleetops.rahamafresh.com`) distinct from the tracking map. + +### Decisions on record + +| Decision | Choice | +|---|---| +| Staging umbrella domain | **`fivetitude.com`** — DNS is a **wildcard** (`*.fivetitude.com` → the VPS), so staging subdomains need **no per-host DNS records**, only Traefik/Coolify host rules | +| FleetOps surface | **New custom SPA** (FleetNow-style), consuming an extended `dashboard_api` — *not* Grafana | +| Staging data backing | **Full stack reading the shared production `reporting.*` read-layer** (read-only, no DB duplication) | +| Deploy mechanism | **Forgejo → Coolify webhook deploys** across all Coolify apps (replaces polling/manual) | +| FleetOps web server | **Caddy** (greenfield) for the cleaner Caddyfile + native `{env.*}` API-base injection. Chosen for config ergonomics, **not** TLS — Traefik already terminates TLS. Existing nginx SPAs stay as-is (mixed fleet until FleetNow's next touch) | + +--- + +## 2. Target topology + +| Environment | FleetNow (tracking) | FleetOps (operations) | Read-API | +|---|---|---|---| +| **Production** (`rahamafresh.com`) | `fleetnow.rahamafresh.com` — *frozen* | `fleetops.rahamafresh.com` — **new** | `fleetapi.rahamafresh.com` | +| **Staging** (`fivetitude.com`) | `fleetnow.fivetitude.com` | `fleetops.fivetitude.com` | `fleetapi.fivetitude.com` | + +- Every product surface (FleetNow/FleetOps × prod/staging) is a **Coolify app** (Dockerfile → + static web server), one app per cell, each bound to its own git branch. **FleetOps uses + Caddy** (clean Caddyfile, native `{env.*}` for the per-env API base); the existing FleetNow + and the two legacy SPAs remain on **nginx**. Both are plain `:80` file servers — **Traefik + terminates TLS**, so Caddy's auto-HTTPS is intentionally unused. +- The read-API (`dashboard_api`) is a **standalone Traefik-labelled bridge container** — *not* + Coolify-managed. It is deployed by a host script and gains a **second staging instance**. +- **Staging reads the same production TimescaleDB** over the internal Docker network, but as a + **read-only role** with the materialized-view refresher **disabled** (see §6). + +``` + ┌─────────────────────────── VPS (31.97.44.246) ───────────────────────────┐ + PRODUCTION │ │ + fleetnow.raha… ──────┼─► Coolify app (FleetNow:main) ─┐ │ + fleetops.raha… ──────┼─► Coolify app (FleetOps:main) ─┼─► fleetapi.rahamafresh.com (bridge:8890) │ + │ │ │ app role (rw) + refresher │ + │ │ ▼ │ + STAGING │ │ ┌──────────────────────────┐ │ + fleetnow.fivet… ─────┼─► Coolify app (FleetNow:staging)┼──►│ tracksolid_db │ │ + fleetops.fivet… ─────┼─► Coolify app (FleetOps:staging)┼─┐ │ reporting.* / v_trips MV │ │ + │ │ └►│ tracksolid.v_* │ │ + │ fleetapi.fivetitude.com ──────┘ └──────────────────────────┘ │ + │ (bridge:8891, read-only role, refresher OFF) │ + └───────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 3. The two read-API instances + +The API code is `dashboard_api_rev.py` **in this repo**. Production is deployed by +`~/deploy_dashboard_api.sh` (bind-mounts `~/dashboard_api/dashboard_api_rev.py`, **port 8890**, +Traefik host `fleetapi.rahamafresh.com`). Staging mirrors it: + +| | Production | Staging | +|---|---|---| +| Host rule | `fleetapi.rahamafresh.com` | `fleetapi.fivetitude.com` | +| Port | 8890 | **8891** | +| Code mount | `~/dashboard_api/` | `~/dashboard_api_staging/` (WIP checkout) | +| Deploy script | `~/deploy_dashboard_api.sh` | **`deploy_dashboard_api_staging.sh`** (checked into this repo) | +| DB role | app role (read/write) | **read-only** (`dashboard_ro` / `grafana_ro`) | +| `v_trips` refresher | **owns it** | **disabled** | +| CORS origins | `fleetnow.rahamafresh.com`, `fleetintelligence.…`, `liveposition.…`, **+ `fleetops.rahamafresh.com`** | `fleetnow.fivetitude.com`, `fleetops.fivetitude.com` | + +> **CORS must be set unconditionally** in the deploy script (strip any inherited value) — this +> is the [FIX-D03](../CLAUDE.md) lesson. Env/CORS changes require a container **recreate**, not +> a restart. + +### Analytics endpoints (FleetOps) + +FleetOps consumes new **read-only** routes added to `dashboard_api_rev.py`, reusing the +existing psycopg2 pool (`ts_shared_rev.py`), the Content-Type body-parse pattern (FIX-D01), and +the JSONB/GeoJSON return style of the existing `/webhook/*` routes: + +| Route | Backed by | +|---|---| +| `GET /analytics/fleet-summary` | `reporting.v_daily_summary` / `v_weekly_summary` / `v_monthly_summary` + `v_daily_cost_centre` | +| `GET /analytics/utilisation` | derived from the `reporting` summaries (idle_pct, km/day) | +| `GET /analytics/driver-behaviour` | `tracksolid.v_driver_aggregates_daily` | +| `GET /analytics/fuel` | `reporting.v_trips.fuel_consumed_l` + `devices.fuel_100km` — **data-gated** (returns "needs data" flags until populated) | +| `GET /analytics/filters` | `reporting.v_filter_*` (mirrors `GET /webhook/fleet-dashboard`) | + +Any aggregation that isn't a thin wrapper becomes a **new numbered migration** +(`migrations/15_*.sql`) — never edit an applied migration. + +> **Reuse the existing reporting layer.** The analytics building blocks are `reporting.*` +> (migrations 11/14) and the surviving `tracksolid.v_*` views (migration 07). The `ops.*` and +> `dwh_gold.*` schemas were **purged 2026-06-05** (migrations 12/13) — do **not** reference +> `ops.*`, `dwh_gold.*`, `v_utilisation_daily`, or `v_sla_inflight`. + +--- + +## 4. Deploy & promotion (Forgejo → Coolify webhooks) + +All Coolify apps move from polling/manual to **webhook-driven** deploys. For each app, take +Coolify's per-app **deploy webhook URL** (+ token) and register it as a **push webhook in the +matching Forgejo repo**, scoped to the bound branch. + +**Promotion model** (both FleetNow and FleetOps): + +``` +feature branch ──merge──► staging ──(Forgejo webhook)──► Coolify deploys *.fivetitude.com + │ validate + main ◄──merge──────────────────────┘ + │ + └──(Forgejo webhook)──► Coolify deploys *.rahamafresh.com (prod) +``` + +Production is touched **only** by a merge to `main`. That branch discipline is what satisfies +"no direct changes to production FleetNow." + +> **Exception:** the `dashboard_api` bridge is **not** Coolify-managed and does **not** deploy +> via Forgejo webhook — it is deployed by its host script (`deploy_dashboard_api*.sh`). The API +> code's source of truth is this repo; the staging instance bind-mounts a WIP checkout so new +> endpoints are validated on `fleetapi.fivetitude.com` before the file is promoted to +> `~/dashboard_api/` on prod. + +--- + +## 5. FleetOps SPA (new repo) + +- **Remote:** `https://repo.rahamafresh.com/kianiadee/fleetops.git` +- **Local working copy:** `~/Downloads/projects/15_fleetops` (scaffolded from empty) +- **Shape:** FleetNow-style deploy flow, but **Dockerfile → Caddy** via Coolify; branded for + operations/analytics. The Caddyfile is a ~5-line SPA server (`try_files {path} /index.html`, + `encode zstd gzip`) on `:80` behind Traefik. +- **API base URL is build/runtime configurable** via Caddy's native `{env.API_BASE}` + substitution (set per Coolify app): staging → `fleetapi.fivetitude.com`, prod → + `fleetapi.rahamafresh.com`. +- **FleetNow** gets the same treatment in *its own* repo: a `staging` branch and a + parameterized API base URL (assumed currently hardcoded to `fleetapi.rahamafresh.com`). + +--- + +## 6. Safety — staging on the shared production read-layer + +Staging hits the **production database**, so isolation is enforced at the **DB-role level**, +not by a separate DB: + +- The staging `dashboard_api` connects as a **read-only role** — reuse `grafana_ro`, or add a + dedicated `dashboard_ro` with `GRANT SELECT` on `reporting.*` and the `tracksolid.v_*` + views. Accidental writes from staging are then impossible. +- The **`reporting.v_trips` materialized-view refresher is disabled on staging** — production + owns it. The refresher needs write perms and is already pg-advisory-lock guarded (key + `920_145`, FIX-D02); a read-only staging role would only log errors, so disable it explicitly + (refresh interval `0` / env guard). +- New `/analytics/*` queries stay backed by the **indexed `reporting.*` views / matview**, not + raw hypertable scans, so staging traffic doesn't load the prod DB. + +--- + +## 7. Phased rollout + +Ordered by dependency and risk — prove the foundation and the deploy pipeline first; touch the +client's production domains **last**. + +| Phase | Scope | Exit criterion | +|---|---|---| +| **0 — Foundation** | This document; migrate all Coolify apps to Forgejo webhook deploys; provision the read-only DB role | Every existing Coolify app redeploys via webhook; read-only role can `SELECT` `reporting.*` + `tracksolid.v_*` and nothing else | +| **1 — Staging backbone** | Staging `dashboard_api` bridge (`deploy_dashboard_api_staging.sh`, 8891, `fleetapi.fivetitude.com`, read-only, refresher off, staging CORS) | `curl https://fleetapi.fivetitude.com/health` ok; verifiably read-only; no staging rows in `reporting.refresh_log` | +| **2 — FleetNow staging** | FleetNow repo: `staging` branch + parameterized API base + `fleetnow.fivetitude.com` Coolify app | Renders against staging API; `staging` push deploys staging only, `main` merge deploys prod only; prod FleetNow untouched | +| **3 — FleetOps backend** | `/analytics/*` endpoints in `dashboard_api_rev.py`; `migrations/15_*` if needed; tested on the staging API | Every route returns correct shape on `fleetapi.fivetitude.com`; fuel route returns "needs data" flags | +| **4 — FleetOps SPA** | Scaffold `15_fleetops` (git init + remote + SPA/Dockerfile); `fleetops.fivetitude.com` Coolify app | Renders fuel/analytics/utilisation/driver panels from staging endpoints; CORS clean | +| **5 — Production cutover** | Promote API to prod + prod CORS add; `fleetops.rahamafresh.com` Coolify app; prod DNS record; update `CLAUDE.md` / `CONNECTIONS.md` / `PLATFORM_OVERVIEW.html` | FleetOps live on prod; prod FleetNow/API otherwise unchanged; docs current | + +--- + +## 8. Verification checklist + +1. **Staging API up:** `curl -f https://fleetapi.fivetitude.com/health` → `{status: ok}`; + resolve the container via `docker ps --filter name=dashboard_api_staging`. +2. **Read-only enforced:** a write attempt from the staging role fails; **no** + `reporting.refresh_log` rows carry a staging source. +3. **Analytics:** hit each `/analytics/*` on staging, diff the JSON against the underlying view + output via `docker exec $DB psql`; fuel returns "needs data" flags. +4. **CORS:** browser-load `fleetops.fivetitude.com` and `fleetnow.fivetitude.com`; XHRs to + `fleetapi.fivetitude.com` succeed; prod `fleetops.rahamafresh.com` reaches the prod API. +5. **Webhook promotion:** push to `staging` → Forgejo webhook fires → **only** the + `*.fivetitude.com` app redeploys (check Coolify deploy log + Forgejo webhook delivery); + merge to `main` → only the `*.rahamafresh.com` app redeploys. +6. **Prod FleetNow untouched:** prod `fleetnow`/`fleetapi` containers not recreated except the + intentional prod-CORS add. + +--- + +## 9. Risks & open items + +- **FleetNow API-base** parameterization assumes it's currently hardcoded — confirm in that repo. +- **Shared-DB load:** staging traffic is light, but watch the prod DB if staging analytics + queries get heavy; the read-only role + indexed views are the guardrails. +- **Fuel analytics are data-blocked:** `devices.fuel_100km` is NULL fleet-wide and the + `/pushoil` + `/pushobd` webhooks aren't registered, so FleetOps fuel views ship as scaffold + until those Open Items (CLAUDE.md §10) are closed. +- **Naming trap:** `stage.rahamafresh.com` is the *production* host alias (a legacy name). Keep + all real staging under `*.fivetitude.com` to avoid confusion. + +--- + +*Related: [`CONNECTIONS.md`](CONNECTIONS.md) · [`PLATFORM_OVERVIEW.html`](PLATFORM_OVERVIEW.html) · +[`DWH_PIPELINE.md`](DWH_PIPELINE.md) · root `CLAUDE.md` (§3 map dashboards, §7 fix history).*