42 changed files with 17 additions and 656 deletions
--- a/docs/manuals/tracksolid_DB_manual.md
+++ b/docs/manuals/tracksolid_DB_manual.md
--- a/docs/reference/01_BusinessAnalytics.md
+++ b/docs/reference/01_BusinessAnalytics.md
--- a/docs/manuals/tracksolid_docker_commands.md
+++ b/docs/manuals/tracksolid_docker_commands.md
--- a/docs/reports/260410_baseline_report.md
+++ b/docs/reports/260410_baseline_report.md
--- a/docs/manuals/OPERATIONS_MANUAL.md
+++ b/docs/manuals/OPERATIONS_MANUAL.md
--- a/migrations/02_tracksolid_full_schema_rev.sql
+++ b/migrations/02_tracksolid_full_schema_rev.sql
--- a/migrations/03_webhook_schema_migration.sql
+++ b/migrations/03_webhook_schema_migration.sql
--- a/migrations/04_bug_fix_migration.sql
+++ b/migrations/04_bug_fix_migration.sql
--- a/migrations/05_enhancement_migration.sql
+++ b/migrations/05_enhancement_migration.sql
--- a/migrations/06_business_analytics_migration.sql
+++ b/migrations/06_business_analytics_migration.sql
--- a/migrations/07_analytics_views.sql
+++ b/migrations/07_analytics_views.sql
--- a/migrations/08_analytics_config.sql
+++ b/migrations/08_analytics_config.sql
--- a/migrations/09_trips_enrichment.sql
+++ b/migrations/09_trips_enrichment.sql
--- a/migrations/10_driver_clock_views.sql
+++ b/migrations/10_driver_clock_views.sql
--- a/migrations/10_pgbouncer_auth.sql
+++ b/migrations/10_pgbouncer_auth.sql
--- a/data/20260414_FS__Logistics
+++ b/data/20260414_FS__Logistics
--- a/data/20260427_FSG_Vehicles_mitieng.csv
+++ b/data/20260427_FSG_Vehicles_mitieng.csv
--- a/docs/reports/260412_baseline_report.md
+++ b/docs/reports/260412_baseline_report.md
--- a/docs/reports/260427_audit_output.txt
+++ b/docs/reports/260427_audit_output.txt
--- a/docs/reports/260427_device_reconciliation.md
+++ b/docs/reports/260427_device_reconciliation.md
--- a/docs/reference/260507_pgbouncer_deployment.md
+++ b/docs/reference/260507_pgbouncer_deployment.md
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -19,7 +19,7 @@ docker exec $DB psql -U postgres -d tracksolid_db -c "SELECT COUNT(*) FROM track
 **Run a migration file:**
 ```bash
-docker exec -i $DB psql -U postgres -d tracksolid_db < migrations/07_your_migration.sql
+docker exec -i $DB psql -U postgres -d tracksolid_db < 07_your_migration.sql
 ```
 ---
@ -91,19 +91,17 @@ dwh/                        # DWH migrations for tracksolid_dwh@31.97.44.246:588
                            #   261002_bronze_constraints_audit.sql — ON CONFLICT key assertion
                            #   261003_dwh_roles.sql — role contract assertion
                            #   261004_dwh_observability_views.sql — freshness/failure views
-migrations/                 # Numbered SQL migrations 02–10, applied in order by run_migrations.py
+02_tracksolid_full_schema_rev.sql   # Full schema bootstrap
-                            #   02 full schema · 03 webhook · 04 distance fix · 05 enhancements
+03..06_*.sql                # Incremental migrations (06 adds assigned_city, dispatch_log, ops.*)
-                            #   06 ops/analytics · 07 views · 08 config · 09 trips enrichment
+07_analytics_views.sql      # Analytics views migration (applied 2026-04-21)
                            #   10_driver_clock_views.sql · 10_pgbouncer_auth.sql
 Dockerfile                  # Custom image for ingest/webhook containers
 pyproject.toml              # Python project + uv dependency spec
 OPERATIONS_MANUAL.md        # Day-to-day ops runbook
 backup/                     # pg_dump sidecar scripts and config
-data/                       # Source CSVs (FS Logistics 144-device list, FSG vehicles)
+01_BusinessAnalytics.md     # SQL analytics library — read before writing queries
-legacy/                     # Superseded pre-_rev scripts + old pipeline notes (NOT deployed)
+20260414_FS__Logistics - final_fixed.csv  # 144-device driver/vehicle source data
-docs/manuals/               # OPERATIONS_MANUAL, grafana + DWH manuals, docker commands, DB manual
+tracksolidApiDocumentation.md       # API endpoint reference
-docs/reference/             # 01_BusinessAnalytics.md (SQL library — read before writing queries),
+260412_baseline_report.md   # Fleet state snapshot (Apr 2026)
                            #   tracksolidApiDocumentation.md, 260507_pgbouncer_deployment.md
 docs/reports/               # Baseline reports, audit output, improvement reviews
 ```
 ---
@ -173,7 +171,7 @@ dwh_control.v_watermark_lag      -- Grafana: extract vs. load lag per table
 ## 6. API Critical Facts
-**Always read `docs/reference/tracksolidApiDocumentation.md` before adding a new endpoint call.**
+**Always read `tracksolidApiDocumentation.md` before adding a new endpoint call.**
 | Fact | Detail |
 |---|---|
@ -211,7 +209,7 @@ dwh_control.v_watermark_lag      -- Grafana: extract vs. load lag per table
 1. **No prod push without explicit user confirmation.** Always state what you are about to push and wait.
 2. **Never rewrite a migration that is already applied.** Check `tracksolid.schema_migrations` first. Add a new numbered migration file for any schema change.
-3. **Read before writing.** Before suggesting any code change, read the relevant source file. Before writing a query, check `docs/reference/01_BusinessAnalytics.md` for an existing pattern.
+3. **Read before writing.** Before suggesting any code change, read the relevant source file. Before writing a query, check `01_BusinessAnalytics.md` for an existing pattern.
 4. **Reuse shared utilities.** All DB access via `get_conn()`, all API calls via `api_post()`, all cleaning via `clean()` / `clean_num()` / `clean_int()` / `clean_ts()` in `ts_shared_rev.py`. Do not reinvent these.
 5. **Resolve container names dynamically.** Never hardcode the Coolify suffix. Use `docker ps --filter name=<service>`.
 6. **SSH only when asked.** Default workflow is local code → commit → push. SSH into the instance only when explicitly asked to test or run something live.
@ -237,7 +235,7 @@ dwh_control.v_watermark_lag      -- Grafana: extract vs. load lag per table
 | Cities active | Nairobi (primary), Mombasa (deploying), Kampala (4 devices in CSV) |
 | Service flags | KDK 829A GP (239,264 km), Belta KCU-647D (235,000 km) |
-Latest full snapshot: `docs/reports/260412_baseline_report.md`
+Latest full snapshot: `260412_baseline_report.md`
 ---
--- a/docs/manuals/DWH_Execution_Manual.md
+++ b/docs/manuals/DWH_Execution_Manual.md
--- a/docs/manuals/connecting_python_tracksolid.md
+++ b/docs/manuals/connecting_python_tracksolid.md
--- a/dashboard_api_rev.py
+++ b/dashboard_api_rev.py
@ -1,238 +0,0 @@
 """
 dashboard_api_rev.py — Fireside Communications · Map Dashboard Read API
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Stable replacement for the n8n webhooks that fed the Live Position and Fleet
 Trips map dashboards. n8n was acting only as a thin HTTP→SQL proxy; this
 service does the same job directly against the proven reporting.* functions,
 removing n8n's credential-management / reload / version-drift fragility from
 the live-data path.
 It REUSES the existing stack: ts_shared_rev's psycopg2 pool and DATABASE_URL,
 the same Docker image, the same Coolify deploy. The reporting.* functions
 (already verified to return correct GeoJSON) are the single source of truth.
 Endpoints mirror the original n8n webhook paths so the only frontend change
 is the base URL (the `N8N_BASE` constant in each dashboard SPA):
  GET  /webhook/live-positions?cost_centre=&acc_status=
       → { summary, geojson }   (reporting.fn_live_positions)
  GET  /webhook/vehicle-track?vehicle_number=&hours=
       → GeoJSON Feature        (reporting.fn_vehicle_track)
  GET  /webhook/fleet-dashboard
       → { drivers, cost_centres, cities, vehicles }  (filter options)
  POST /webhook/fleet-dashboard   body: {period, vehicle_numbers, driver,
                                         cost_centre, assigned_city,
                                         start_date, end_date}
       → trips payload           (reporting.fn_trips_for_map)
  GET  /health
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 """
 from __future__ import annotations
 import os
 from contextlib import asynccontextmanager
 from datetime import date, datetime, timedelta, timezone
 import psycopg2.extras
 from fastapi import FastAPI, Request
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import JSONResponse
 from ts_shared_rev import close_pool, get_conn, get_logger
 log = get_logger("dashboard_api")
 # Comma-separated list of allowed browser origins (the dashboard domains).
 _ALLOWED_ORIGINS = [
    o.strip()
    for o in os.getenv(
        "DASHBOARD_CORS_ORIGINS",
        "https://liveposition.rahamafresh.com,https://fleetintelligence.rahamafresh.com",
    ).split(",")
    if o.strip()
 ]
@asynccontextmanager
 async def lifespan(app: FastAPI):
    log.info("Dashboard API starting (v1.0). Origins=%s", _ALLOWED_ORIGINS)
    yield
    close_pool()
 app = FastAPI(title="Fireside Map Dashboard API", lifespan=lifespan)
 app.add_middleware(
    CORSMiddleware,
    allow_origins=_ALLOWED_ORIGINS,
    allow_methods=["GET", "POST", "OPTIONS"],
    allow_headers=["*"],
 )
 _EMPTY_GEOJSON = {"type": "FeatureCollection", "features": []}
 def _clean(v):
    """Treat missing / blank / sentinel values as None (= SQL wildcard)."""
    if v is None:
        return None
    s = str(v).strip()
    return s if s and s.lower() not in ("null", "undefined") else None
 # ── Health ────────────────────────────────────────────────────────────────────
@app.get("/health")
 def health():
    return {"status": "ok"}
 # ── Live positions (#004) ───────────────────────────────────────────────────
@app.get("/webhook/live-positions")
 def live_positions(cost_centre: str | None = None, acc_status: str | None = None):
    try:
        with get_conn() as conn:
            with conn.cursor() as cur:
                cur.execute(
                    "SELECT reporting.fn_live_positions(%s, %s)",
                    (_clean(cost_centre), _clean(acc_status)),
                )
                payload = cur.fetchone()[0] or {}
        return JSONResponse(
            {
                "summary": payload.get("summary") or {},
                "geojson": payload.get("geojson") or _EMPTY_GEOJSON,
            }
        )
    except Exception:
        log.exception("live-positions failed")
        return JSONResponse(
            {
                "error": {
                    "type": "unknown",
                    "message": "Live-position feed is unavailable. Try again in a few seconds.",
                }
            }
        )
@app.get("/webhook/vehicle-track")
 def vehicle_track(vehicle_number: str | None = None, hours: int = 1):
    veh = _clean(vehicle_number)
    if not veh:
        return JSONResponse({"error": "vehicle_number is required"})
    hours = max(1, min(24, hours or 1))
    try:
        with get_conn() as conn:
            with conn.cursor() as cur:
                cur.execute(
                    "SELECT reporting.fn_vehicle_track(%s, %s::int)", (veh, hours)
                )
                feature = cur.fetchone()[0]
        return JSONResponse(
            feature
            or {"type": "Feature", "geometry": {"type": "LineString", "coordinates": []}, "properties": {}}
        )
    except Exception:
        log.exception("vehicle-track failed for %s", veh)
        return JSONResponse({"error": "vehicle-track unavailable"})
 # ── Fleet trips (#002) ───────────────────────────────────────────────────────
 _FILTER_OPTIONS_SQL = """
 SELECT
  (SELECT array_agg(driver        ORDER BY driver)        FROM reporting.v_filter_drivers)       AS drivers,
  (SELECT array_agg(cost_centre   ORDER BY cost_centre)   FROM reporting.v_filter_cost_centres)  AS cost_centres,
  (SELECT array_agg(assigned_city ORDER BY assigned_city) FROM reporting.v_filter_cities)        AS cities,
  (SELECT jsonb_agg(jsonb_build_object(
            'vehicle_number', vehicle_number, 'drivers', drivers,
            'cost_centre', cost_centre, 'assigned_city', assigned_city)
          ORDER BY vehicle_number) FROM reporting.v_filter_vehicles)                             AS vehicles
 """
@app.get("/webhook/fleet-dashboard")
 def fleet_filter_options():
    try:
        with get_conn() as conn:
            with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
                cur.execute(_FILTER_OPTIONS_SQL)
                row = cur.fetchone() or {}
        return JSONResponse(
            {
                "drivers": row.get("drivers") or [],
                "cost_centres": row.get("cost_centres") or [],
                "cities": row.get("cities") or [],
                "vehicles": row.get("vehicles") or [],
            }
        )
    except Exception:
        log.exception("fleet-dashboard filter options failed")
        return JSONResponse({"drivers": [], "cost_centres": [], "cities": [], "vehicles": []})
 def _preset_to_range(period: str | None, start_date, end_date):
    """Mirror of the n8n preset_to_range node."""
    today = datetime.now(timezone.utc).date()
    p = (period or "").strip()
    if p == "today":
        return today, today
    if p == "30d":
        return today - timedelta(days=29), today
    if p == "custom":
        def _d(v, default):
            v = _clean(v)
            if not v:
                return default
            try:
                return date.fromisoformat(v)
            except ValueError:
                return default
        return _d(start_date, today), _d(end_date, today)
    # default + '7d'
    return today - timedelta(days=6), today
@app.post("/webhook/fleet-dashboard")
 async def fleet_trips(request: Request):
    try:
        body = await request.json()
    except Exception:
        body = {}
    if not isinstance(body, dict):
        body = {}
    body = body.get("body", body) if isinstance(body.get("body"), dict) else body
    start, end = _preset_to_range(
        body.get("period"), body.get("start_date"), body.get("end_date")
    )
    veh = _clean(body.get("vehicle_numbers"))  # comma-separated string or None
    try:
        with get_conn() as conn:
            with conn.cursor() as cur:
                cur.execute(
                    """
                    SELECT reporting.fn_trips_for_map(
                        CASE WHEN %(veh)s IS NULL THEN NULL
                             ELSE string_to_array(%(veh)s, ',') END,
                        %(driver)s, %(cc)s, %(city)s, %(start)s::date, %(end)s::date
                    )
                    """,
                    {
                        "veh": veh,
                        "driver": _clean(body.get("driver")),
                        "cc": _clean(body.get("cost_centre")),
                        "city": _clean(body.get("assigned_city")),
                        "start": start,
                        "end": end,
                    },
                )
                payload = cur.fetchone()[0]
        return JSONResponse(payload if payload is not None else {})
    except Exception:
        log.exception("fleet-dashboard trips failed")
        return JSONResponse(
            {"error": {"type": "unknown", "message": "Fleet feed is unavailable. Try again in a few seconds."}}
        )
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -59,31 +59,6 @@ services:
      timeout: 5s
      retries: 3
  dashboard_api:
    # Stable read-API for the Live Position + Fleet Trips map dashboards.
    # Replaces the n8n webhooks (n8n was only a thin HTTP->SQL proxy).
    # Calls reporting.fn_live_positions / fn_vehicle_track / fn_trips_for_map.
    build:
      context: .
      dockerfile: Dockerfile
    command: sh -c "uvicorn dashboard_api_rev:app --host 0.0.0.0 --port 8890 --workers 2"
    restart: always
    depends_on:
      timescale_db:
        condition: service_healthy
    env_file: .env
    environment:
      # Browser origins allowed to call this API (the dashboard domains).
      - DASHBOARD_CORS_ORIGINS=${DASHBOARD_CORS_ORIGINS:-https://liveposition.rahamafresh.com,https://fleetintelligence.rahamafresh.com}
    # No host port binding — set a domain (e.g. fleetapi.rahamafresh.com) in the
    # Coolify UI pointing to this service on port 8890. The dashboards then point
    # their N8N_BASE at that domain; paths (/webhook/...) are unchanged.
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8890/health"]
      interval: 30s
      timeout: 5s
      retries: 3
  grafana:
    build:
      context: ./grafana
@ -106,7 +81,7 @@ services:
  pgbouncer:
    # Connection pooler in front of timescale_db.
-    # Runbook: docs/reference/260507_pgbouncer_deployment.md
+    # Runbook: 260507_pgbouncer_deployment.md
    # Internal Docker network only — no host port. SCRAM passthrough via
    # auth_query against the public.user_lookup() function (migration 10).
    image: edoburu/pgbouncer
--- a/docs/reports/260601_improvement_claude_48.html
+++ b/docs/reports/260601_improvement_claude_48.html
@ -1,374 +0,0 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta charset="UTF-8">
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>Tracksolid Stack — Engineering Review &amp; Improvement Plan (2026-06-01)</title>
 <style>
  :root {
    --bg: #0f1115;
    --panel: #171a21;
    --panel-2: #1d212b;
    --ink: #e6e9ef;
    --ink-dim: #aab2c0;
    --line: #2a2f3a;
    --accent: #5b9dff;
    --hi: #ff5d5d;
    --med: #ffb454;
    --lo: #5fd0a0;
    --good: #5fd0a0;
    --mono: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
    --sans: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
  }
  * { box-sizing: border-box; }
  html { scroll-behavior: smooth; }
  body {
    margin: 0; background: var(--bg); color: var(--ink);
    font-family: var(--sans); line-height: 1.6; font-size: 16px;
  }
  .wrap { max-width: 980px; margin: 0 auto; padding: 48px 28px 96px; }
  header.doc {
    border-bottom: 1px solid var(--line); padding-bottom: 28px; margin-bottom: 36px;
  }
  .kicker { color: var(--accent); font-family: var(--mono); font-size: 13px; letter-spacing: .12em; text-transform: uppercase; }
  h1 { font-size: 30px; line-height: 1.25; margin: 10px 0 14px; }
  .meta { color: var(--ink-dim); font-size: 14px; }
  .meta b { color: var(--ink); font-weight: 600; }
  h2 { font-size: 22px; margin: 44px 0 8px; padding-top: 10px; border-top: 1px solid var(--line); }
  h2 .num { color: var(--accent); font-family: var(--mono); margin-right: 10px; }
  h3 { font-size: 17px; margin: 26px 0 6px; }
  p { margin: 10px 0; }
  a { color: var(--accent); }
  code { font-family: var(--mono); font-size: .87em; background: var(--panel-2); padding: 1px 6px; border-radius: 4px; color: #d7e3ff; }
  pre { background: #0b0d12; border: 1px solid var(--line); border-radius: 8px; padding: 14px 16px; overflow-x: auto; font-family: var(--mono); font-size: 13px; color: #cdd6e4; }
  ul, ol { margin: 10px 0 10px 4px; padding-left: 22px; }
  li { margin: 5px 0; }
  .lead { color: var(--ink-dim); font-size: 16px; }
  .callout { border-left: 3px solid var(--accent); background: var(--panel); border-radius: 0 8px 8px 0; padding: 14px 18px; margin: 18px 0; }
  .callout.warn { border-left-color: var(--hi); }
  .callout.warn b { color: var(--hi); }
  .finding { background: var(--panel); border: 1px solid var(--line); border-radius: 10px; padding: 4px 22px 18px; margin: 22px 0; }
  .badge { display: inline-block; font-family: var(--mono); font-size: 11px; font-weight: 700; letter-spacing: .06em; padding: 3px 9px; border-radius: 999px; text-transform: uppercase; vertical-align: middle; margin-left: 8px; }
  .b-hi { background: rgba(255,93,93,.15); color: var(--hi); border: 1px solid rgba(255,93,93,.35); }
  .b-med { background: rgba(255,180,84,.13); color: var(--med); border: 1px solid rgba(255,180,84,.32); }
  .b-lo { background: rgba(95,208,160,.12); color: var(--lo); border: 1px solid rgba(95,208,160,.3); }
  .b-sec { background: rgba(91,157,255,.13); color: var(--accent); border: 1px solid rgba(91,157,255,.32); }
  .ref { font-family: var(--mono); font-size: 12.5px; color: var(--ink-dim); }
  table { width: 100%; border-collapse: collapse; margin: 18px 0; font-size: 14.5px; }
  th, td { text-align: left; padding: 10px 12px; border-bottom: 1px solid var(--line); vertical-align: top; }
  th { color: var(--ink-dim); font-weight: 600; font-size: 12.5px; text-transform: uppercase; letter-spacing: .05em; }
  td.up-h { color: var(--hi); font-weight: 600; }
  td.up-m { color: var(--med); font-weight: 600; }
  td.up-l { color: var(--lo); font-weight: 600; }
  .pill { font-family: var(--mono); font-size: 11px; padding: 2px 7px; border-radius: 4px; background: var(--panel-2); color: var(--ink-dim); }
  .good-box { border: 1px solid rgba(95,208,160,.3); background: rgba(95,208,160,.05); border-radius: 10px; padding: 6px 22px 16px; margin: 22px 0; }
  .good-box h2 { border-top: none; color: var(--good); }
  footer { margin-top: 60px; padding-top: 22px; border-top: 1px solid var(--line); color: var(--ink-dim); font-size: 13px; }
  .toc { background: var(--panel); border: 1px solid var(--line); border-radius: 10px; padding: 16px 22px; margin: 8px 0 0; }
  .toc ol { margin: 6px 0; }
  .toc a { text-decoration: none; }
  .toc a:hover { text-decoration: underline; }
 </style>
 </head>
 <body>
 <div class="wrap">
  <header class="doc">
    <div class="kicker">Engineering Review · Fireside Communications · Tracksolid Fleet Stack</div>
    <h1>Database &amp; Microservice Assessment — Opportunities &amp; Refactoring</h1>
    <p class="meta">
      <b>Date:</b> 2026-06-01 &nbsp;·&nbsp;
      <b>Reviewer:</b> Claude (Opus 4.8) &nbsp;·&nbsp;
      <b>Scope:</b> TimescaleDB/PostGIS schema + migrations, and the three ingestion microservices
      (<code>ingest_movement_rev.py</code>, <code>ingest_events_rev.py</code>, <code>webhook_receiver_rev.py</code> + shared <code>ts_shared_rev.py</code>)
    </p>
    <p class="meta">Findings are ordered by <b>greatest performance upside first</b>, as requested.</p>
  </header>
  <div class="callout warn">
    <p><b>Access caveat — read this first.</b> The remote instance was <b>unreachable from the review environment</b>:
    every probed port (22, 5433, 5432, 443) timed out, so the IP is not whitelisted (or the host was down).
    I could <b>not</b> run <code>EXPLAIN</code>, read live row/chunk counts, confirm which indexes actually exist,
    or inspect the running images. Everything below is a <b>static review</b> of the source and migration files.
    Items needing live confirmation are tagged <span class="pill">verify live</span>.</p>
    <p style="margin-bottom:0"><b>Immediate security note:</b> <code>.env</code> is <b>committed to git</b> (it is listed in
    <code>.gitignore</code>, but was tracked before that rule existed, so the rule is a no-op). The live Tracksolid app
    secret, the Postgres superuser password, and the Grafana admin password are all in the repo history on Forgejo.
    Treat all three as compromised and rotate them.</p>
  </div>
  <div class="toc">
    <strong>Findings</strong>
    <ol>
      <li><a href="#f1">Single-threaded scheduler holds a DB transaction open across throttled geocoding</a> <span class="badge b-hi">High</span></li>
      <li><a href="#f2">dwh_gold daily-metrics ETL is non-functional</a> <span class="badge b-hi">High</span></li>
      <li><a href="#f3">v_driver_aggregates_daily will not scale; safeguard not applied</a> <span class="badge b-hi">High</span></li>
      <li><a href="#f4">pgbouncer deployed but bypassed by the application</a> <span class="badge b-med">Medium</span></li>
      <li><a href="#f5">Migrations race across three containers with no lock</a> <span class="badge b-med">Medium</span></li>
      <li><a href="#f6">Orphaned migration: 10_driver_clock_views.sql never applied</a> <span class="badge b-med">Medium</span></li>
      <li><a href="#f7">Security gaps (webhook auth, committed secrets)</a> <span class="badge b-sec">Security</span></li>
      <li><a href="#f8">Smaller DB-design notes</a> <span class="badge b-lo">Low</span></li>
      <li><a href="#good">What's genuinely good</a></li>
      <li><a href="#plan">Suggested order of attack</a></li>
    </ol>
  </div>
  <!-- ====================== FINDING 1 ====================== -->
  <h2 id="f1"><span class="num">1</span>Single-threaded scheduler holds a DB transaction open across throttled geocoding<span class="badge b-hi">Highest upside</span></h2>
  <div class="finding">
    <p><code>ingest_movement_rev.py</code> runs every job on one <code>schedule</code> thread
      (<span class="ref">main(), lines 674–695</span>). Within that, <code>poll_trips()</code> opens a transaction
      (<span class="ref">with get_conn(), line 343</span>) and then, <b>inside that open transaction</b>, calls
      <code>reverse_geocode()</code> twice per trip (<span class="ref">lines 392–393</span>).
      <code>reverse_geocode</code> enforces a global <b>1 request/second</b> Nominatim throttle
      (<span class="ref">ts_shared_rev.py:463, _geocode_throttle</span>).</p>
    <h3>Consequences</h3>
    <ul>
      <li>A batch of N new trips can hold a single pooled connection open for <b>N×~2 seconds</b> of network I/O — a
        long-running transaction that pins a snapshot (bad for autovacuum's cleanup horizon) and ties up a connection.</li>
      <li>Because the scheduler is one thread, while <code>poll_trips</code> is geocoding, the <b>60-second live-position
        sweep cannot run</b>. The "live" freshness SLA silently degrades to minutes whenever trips/parking work through a
        backlog. <code>poll_track_list</code> (30 min) and <code>poll_stale_locations</code> (10 min) share the same
        thread and also block each other.</li>
      <li>Every 15 min, <code>poll_trips</code> re-runs the 8-subquery enrichment block (<code>_ENRICH_QUERY</code>,
        <span class="ref">lines 295–321</span>) for the <b>entire last hour</b> of trips, even though the
        <code>ON CONFLICT</code> mostly <code>COALESCE</code>s the result away.</li>
    </ul>
    <h3>Recommendation</h3>
    <ul>
      <li>Move geocoding <b>out of the DB transaction</b>: collect trip rows, commit, then geocode + <code>UPDATE</code>
        in a second pass (or delegate geocoding to a queue / n8n).</li>
      <li>Gate enrichment on <code>WHERE start_address IS NULL</code> so already-enriched trips don't re-pay the cost.</li>
      <li>Run the 60s live sweep on its own thread/process so slow reporting jobs cannot starve it.
        <code>schedule</code> + <code>time.sleep(1)</code> on one thread is the wrong concurrency model when one job is
        latency-critical and others do long network I/O.</li>
    </ul>
  </div>
  <!-- ====================== FINDING 2 ====================== -->
  <h2 id="f2"><span class="num">2</span>The <code>dwh_gold</code> daily-metrics ETL is non-functional<span class="badge b-hi">High</span></h2>
  <div class="finding">
    <p><code>dwh_gold.refresh_daily_metrics()</code> (<span class="ref">migration 05, lines 212–264</span>) selects
      <code>t.imei AS vehicle_key</code> and inserts into <code>fact_daily_fleet_metrics.vehicle_key</code>, which is
      <code>INTEGER REFERENCES dwh_gold.dim_vehicles(vehicle_key)</code> (<span class="ref">schema lines 237–243</span>).
      But <code>imei</code> is a 12–15-digit <b>TEXT</b> string:</p>
    <ul>
      <li>15-digit IMEIs overflow <code>int4</code> → <em>"integer out of range"</em>.</li>
      <li>Shorter ones violate the FK because <b>nothing ever populates <code>dim_vehicles</code></b> — no code path
        inserts into it.</li>
    </ul>
    <p>So the function cannot succeed as written, and <code>v_utilisation_daily</code> (which joins
      <code>fact → dim_vehicles → devices</code>, <span class="ref">migration 07, lines 268–286</span>) will always be
      empty. CLAUDE.md lists "schedule the nightly ETL" as a LOW open item — but scheduling it today would error on every
      run.</p>
    <p style="margin-bottom:0"><b>Recommendation:</b> redesign the gold layer around <code>imei</code> (drop the surrogate
      key, or populate <code>dim_vehicles</code> from <code>devices</code> first and look up the key), and fix the column
      type. This is a real bug hiding behind "not scheduled yet."</p>
  </div>
  <!-- ====================== FINDING 3 ====================== -->
  <h2 id="f3"><span class="num">3</span><code>v_driver_aggregates_daily</code> will not scale, and the safeguard wasn't applied<span class="badge b-hi">High</span></h2>
  <div class="finding">
    <p>Migration 07 (<span class="ref">lines 159–223</span>) builds this view with two 31-day scans of
      <code>position_history</code> plus a <code>LAG()</code> window over <code>source='track_list'</code> rows. There is
      <b>no index on <code>position_history.source</code></b>, and the only index on the hypertable is the
      <code>(imei, gps_time)</code> primary key.</p>
    <p>The view's own header comment says <em>"convert to a continuous aggregate once the hypertable exceeds ~100k rows."</em>
      At 156 devices writing a row/minute from the poll sweep plus track_list waypoints, you cross 100k in <b>days</b>, not
      months. <span class="pill">verify live</span> current row + chunk count.</p>
    <p style="margin-bottom:0"><b>Recommendation:</b> build the speeding/harsh aggregates as a TimescaleDB continuous
      aggregate (the pattern already exists in <code>v_mileage_daily_cagg</code>), or at minimum add a partial index
      supporting the <code>source='track_list'</code> + time filter. As-is, the daily driver dashboard does a growing full
      hypertable scan on every load.</p>
  </div>
  <!-- ====================== FINDING 4 ====================== -->
  <h2 id="f4"><span class="num">4</span>pgbouncer is deployed but the application bypasses it entirely<span class="badge b-med">Medium</span></h2>
  <div class="finding">
    <p><code>docker-compose.yaml</code> adds a pgbouncer sidecar (<span class="ref">lines 82–116</span>) "to cap
      tracksolid_db connections," but <code>.env</code> sets
      <code>DATABASE_URL=...@timescale_db:5432/...</code> — the Python pools connect <b>straight to Postgres</b>, not to
      pgbouncer's 6432.</p>
    <p>So the connection cap does nothing for the three services. The real ceiling today is the sum of per-process pools:</p>
    <pre>webhook   : uvicorn --workers 2  →  2 procs × ThreadedConnectionPool(max=12) = 24
 ingest_movement                                                          = 12
 ingest_events                                                            = 12
                                                              total ≈ 48 direct conns</pre>
    <p>At 80–156 devices this is not a live performance problem — it is wasted/contradictory infrastructure and an
      intent-vs-reality gap. You also maintain a SCRAM-passthrough <code>user_lookup()</code> SECURITY DEFINER function
      (<span class="ref">migration 10</span>) with no consumer.</p>
    <p style="margin-bottom:0"><b>Recommendation:</b> either point <code>DATABASE_URL</code> at <code>pgbouncer:6432</code>
      (transaction-pool mode disallows session features, but the code uses none beyond <code>client_encoding</code>), or
      remove the sidecar.</p>
  </div>
  <!-- ====================== FINDING 5 ====================== -->
  <h2 id="f5"><span class="num">5</span>Migrations race across three containers with no lock<span class="badge b-med">Medium · reliability</span></h2>
  <div class="finding">
    <p>All three services run <code>python run_migrations.py</code> on startup (<span class="ref">compose lines 26, 37,
      48</span>) and start in parallel once the DB is healthy. <code>run_migrations.py</code> does check-then-act
      (<code>already_applied()</code> → <code>run_file()</code>, <span class="ref">lines 231–242</span>) with <b>no
      advisory lock</b>. On a fresh database, three containers can pass <code>already_applied()==False</code>
      simultaneously and run the same file.</p>
    <ul>
      <li>Migration 02's <code>CREATE TRIGGER</code> loop (<span class="ref">lines 255–267</span>) has no
        <code>IF NOT EXISTS</code> — concurrent runs throw, and <code>run_file()</code> treats any <code>ERROR:</code> as
        fatal → <code>sys.exit(1)</code> → a service refuses to start.</li>
      <li><code>run_file()</code> greps stderr for <code>ERROR:</code> without <code>-v ON_ERROR_STOP=1</code>, and files
        02/03 have no <code>BEGIN/COMMIT</code>, so a mid-file failure can leave partial schema that later gets mis-seeded
        as "applied."</li>
    </ul>
    <p style="margin-bottom:0"><b>Recommendation:</b> wrap the run in <code>pg_advisory_lock(&lt;const&gt;)</code> /
      unlock, and run psql with <code>ON_ERROR_STOP=1</code>. Low effort, removes a class of cold-start flakiness.</p>
  </div>
  <!-- ====================== FINDING 6 ====================== -->
  <h2 id="f6"><span class="num">6</span>Orphaned migration: <code>10_driver_clock_views.sql</code> is never applied<span class="badge b-med">Medium</span></h2>
  <div class="finding">
    <p>The runner's <code>MIGRATIONS</code> list (<span class="ref">run_migrations.py:27–37</span>) includes
      <code>10_pgbouncer_auth.sql</code> but <b>not</b> <code>10_driver_clock_views.sql</code>. Two files share the
      <code>10_</code> prefix and the list is hand-maintained, so <code>v_driver_clock_daily/_today</code> (which the n8n
      tardiness workflow depends on, per the file header) exist only if someone applied them by hand — they are not
      reproducible from a clean deploy.</p>
    <p style="margin-bottom:0"><b>Recommendation:</b> rename to <code>11_</code> and add to the list. Better: switch the
      runner from a hardcoded list to globbing <code>NN_*.sql</code> sorted, so this cannot recur.</p>
  </div>
  <!-- ====================== FINDING 7 ====================== -->
  <h2 id="f7"><span class="num">7</span>Security gaps worth fixing now<span class="badge b-sec">Security</span></h2>
  <div class="finding">
    <ul>
      <li><b>Webhook auth is effectively off.</b> <code>_validate_token</code>
        (<span class="ref">webhook_receiver_rev.py:84–87</span>) skips validation entirely when
        <code>JIMI_WEBHOOK_TOKEN</code> is empty, and it is <b>not set in <code>.env</code></b>. The push endpoints are
        exposed via Traefik, so anyone who learns the URL can inject arbitrary telemetry/alarms (each <code>/pushgps</code>
        accepts up to 5000 rows, no rate limit). Set the token and make an unset token <b>fail closed</b> in production.</li>
      <li><b>Committed secrets</b> (see top banner). Rotate the Tracksolid app secret, Postgres password, and Grafana admin
        password; <code>git rm --cached .env</code> and scrub history.</li>
      <li><code>dwh/260423_dwh_ddl_v1.sql</code> plaintext passwords are an existing known item in CLAUDE.md — same class of
        problem.</li>
    </ul>
  </div>
  <!-- ====================== FINDING 8 ====================== -->
  <h2 id="f8"><span class="num">8</span>Smaller DB-design notes<span class="badge b-lo">Low — queue these</span></h2>
  <div class="finding">
    <ul>
      <li><b><code>v_mileage_daily_cagg</code> is built on a column that's mostly NULL.</b> It computes
        <code>MAX(current_mileage) - MIN(current_mileage)</code> (<span class="ref">schema lines 293–301</span>), but
        <code>current_mileage</code> is only populated by the poll sweep — <code>track_list</code> and <code>/pushgps</code>
        inserts leave it NULL, and odometer resets/device swaps produce negative or huge deltas. The aggregate's
        <code>dist_km</code> is unreliable. Prefer deriving daily distance from <code>trips.distance_km</code>.</li>
      <li><b><code>ingestion_log</code> has no retention and no index.</b> <code>v_ingestion_health</code> does
        <code>DISTINCT ON (endpoint) … ORDER BY endpoint, run_at DESC</code> over the whole table, which grows ~875
        rows/day forever. Add <code>(endpoint, run_at DESC)</code> plus a retention/partition policy.</li>
      <li><b>Alarm dedup is leaky on the poll path.</b> <code>alarms_dedup UNIQUE (imei, alarm_type, alarm_time)</code>
        (<span class="ref">schema line 199</span>) — the poll path inserts <code>alertTypeId</code> as
        <code>alarm_type</code> with no NOT-NULL guard, and <code>NULL</code> defeats the unique constraint
        (<code>NULL ≠ NULL</code>), so a null-type alarm can duplicate. The webhook path guards this; the poll path
        doesn't.</li>
      <li><b><code>live_positions</code>/staleness queries are seq scans</b> (no index on <code>gps_time</code>) — totally
        fine at ~156 rows today; just don't carry that pattern into anything that scans <code>position_history</code>.</li>
      <li><b>Dead/ambiguous code in <code>_parse_request</code></b> (<span class="ref">webhook lines 90–143</span>): the
        JSON-array branch <code>_parse_data_list</code> is never reached (it always falls through to
        <code>request.form()</code>); harmless but misleading given the docstring claims it handles both.</li>
    </ul>
  </div>
  <!-- ====================== GOOD ====================== -->
  <div class="good-box">
    <h2 id="good" style="border-top:none;"><span class="num" style="color:var(--good)">✓</span>What's genuinely good</h2>
    <p>So this is balanced — the bones are solid:</p>
    <ul>
      <li>Per-row <code>SAVEPOINT</code> isolation so one bad item can't abort a batch.</li>
      <li>Time-guarded upserts via the shared <code>upsert_live_position</code> helper.</li>
      <li>Batched <code>execute_values</code> on the high-volume push / track-list paths.</li>
      <li>Hypertables with compression + retention policies.</li>
      <li>Parameterized SQL throughout — no injection surface.</li>
      <li>Clean signal handling and pool teardown.</li>
      <li>Idempotent migrations with a tracking table and <code>COMMENT ON VIEW</code> provenance.</li>
      <li><code>sync_devices</code> N+1 already parallelized with a bounded thread pool.</li>
    </ul>
    <p style="margin-bottom:0">The issues above are mostly about <b>coupling</b>, <b>one broken ETL</b>, and
      <b>scale-ahead-of-indexing</b> — not a bad foundation.</p>
  </div>
  <!-- ====================== PLAN ====================== -->
  <h2 id="plan"><span class="num">»</span>Suggested order of attack (effort vs. upside)</h2>
  <table>
    <thead>
      <tr><th style="width:38px">#</th><th>Action</th><th style="width:130px">Upside</th><th style="width:80px">Effort</th></tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td>
        <td>Pull geocoding out of the trips transaction + gate on <code>start_address IS NULL</code>; isolate the 60s sweep on its own thread</td>
        <td class="up-h">High — restores live freshness, frees connections</td>
        <td>M</td>
      </tr>
      <tr>
        <td>2</td>
        <td>Fix or redesign <code>refresh_daily_metrics</code> / <code>dim_vehicles</code> (imei vs int key)</td>
        <td class="up-h">High — unblocks all utilisation reporting</td>
        <td>M</td>
      </tr>
      <tr>
        <td>3</td>
        <td>Convert <code>v_driver_aggregates_daily</code> to a continuous aggregate (or add <code>source</code>+time index)</td>
        <td class="up-h">High and growing</td>
        <td>M</td>
      </tr>
      <tr>
        <td>4</td>
        <td>Set <code>JIMI_WEBHOOK_TOKEN</code>; rotate + untrack <code>.env</code></td>
        <td class="up-h">High (security)</td>
        <td>S</td>
      </tr>
      <tr>
        <td>5</td>
        <td>Advisory-lock the migration runner + <code>ON_ERROR_STOP=1</code>; add <code>10_driver_clock_views</code> / switch to glob</td>
        <td class="up-m">Medium (reliability)</td>
        <td>S</td>
      </tr>
      <tr>
        <td>6</td>
        <td>Decide pgbouncer in-or-out; point <code>DATABASE_URL</code> accordingly</td>
        <td class="up-m">Medium (clarity)</td>
        <td>S</td>
      </tr>
      <tr>
        <td>7</td>
        <td><code>ingestion_log</code> index + retention; fix poll-path alarm null dedup; fix cagg distance source</td>
        <td class="up-l">Low–medium</td>
        <td>S</td>
      </tr>
    </tbody>
  </table>
  <div class="callout">
    <p style="margin:0"><b>Next step for live confirmation:</b> if I can get onto the box (whitelist the review IP for
      5433, or an SSH tunnel), I'll confirm the <span class="pill">verify live</span> items — actual
      <code>position_history</code> row/chunk counts, which indexes really exist, whether <code>refresh_daily_metrics</code>
      has ever succeeded, and <code>EXPLAIN ANALYZE</code> on the heavier views — and tighten the priority order with real
      numbers.</p>
  </div>
  <footer>
    Generated 2026-06-01 by Claude (Opus 4.8) for Fireside Communications · Tracksolid Fleet Intelligence.
    Static review only — no live database access was available at review time. File references use
    <span class="ref">file:line</span> against the repository state on branch
    <code>quality-program-2026-04-12</code>.
  </footer>
 </div>
 </body>
 </html>
--- a/documents.txt
+++ b/documents.txt
--- a/data/fireside_logistics_cleaned_v2.csv
+++ b/data/fireside_logistics_cleaned_v2.csv
--- a/docs/manuals/grafanaDeployment.md
+++ b/docs/manuals/grafanaDeployment.md
--- a/docs/manuals/grafanaOperationalManual.md
+++ b/docs/manuals/grafanaOperationalManual.md
--- a/import_drivers_csv.py
+++ b/import_drivers_csv.py
@ -49,7 +49,7 @@ from ts_shared_rev import clean, clean_num, clean_ts, get_conn, get_logger
 log = get_logger("csv_import")
-DEFAULT_CSV_PATH = Path(__file__).parent / "data" / "20260427_FSG_Vehicles_mitieng.csv"
+DEFAULT_CSV_PATH = Path(__file__).parent / "20260427_FSG_Vehicles_mitieng.csv"
 # Columns fetched from DB for diff comparison.
 DB_COLS = [
--- a/docs/reports/new_feature.txt
+++ b/docs/reports/new_feature.txt
--- a/push_webhook.md
+++ b/push_webhook.md
--- a/run_migrations.py
+++ b/run_migrations.py
@ -221,7 +221,7 @@ def main():
    applied = skipped = 0
    for sql_file in MIGRATIONS:
-        path = os.path.join("/app", "migrations", sql_file)
+        path = os.path.join("/app", sql_file)
        if not os.path.exists(path):
            print(f"  SKIP  {sql_file} (file not found in /app)")
--- a/run_migrations.sh
+++ b/run_migrations.sh
@ -55,10 +55,10 @@ run_sql -c "
 " > /dev/null
 # ── Find and apply pending migrations ────────────────────────────────────────
-MIGRATION_FILES=$(find "$SCRIPT_DIR/migrations" -maxdepth 1 -name '[0-9][0-9]_*.sql' | sort)
+MIGRATION_FILES=$(find "$SCRIPT_DIR" -maxdepth 1 -name '[0-9][0-9]_*.sql' | sort)
 if [[ -z "$MIGRATION_FILES" ]]; then
-  echo "No migration files found in $SCRIPT_DIR/migrations"
+  echo "No migration files found in $SCRIPT_DIR"
  exit 0
 fi
--- a/docs/reference/tracksolidApiDocumentation.md
+++ b/docs/reference/tracksolidApiDocumentation.md
--- a/legacy/tracksolid_analytics_pipeline.txt
+++ b/legacy/tracksolid_analytics_pipeline.txt
--- a/legacy/tracksolid_extract.py
+++ b/legacy/tracksolid_extract.py
--- a/legacy/tracksolid_ingestion_pipeline.txt
+++ b/legacy/tracksolid_ingestion_pipeline.txt
--- a/legacy/tracksolid_update_v2.py
+++ b/legacy/tracksolid_update_v2.py
--- a/legacy/tracksolid_vehicle_update.py
+++ b/legacy/tracksolid_vehicle_update.py