chore(repo): reorganize tree into migrations/ data/ legacy/ docs/

Group root-level files (accreted from incremental changes) by purpose without moving any deployment entrypoint or breaking imports: - migrations/ : numbered SQL 02-10 - data/ : source CSVs - legacy/ : superseded pre-_rev scripts + old pipeline notes (not deployed) - docs/{manuals,reference,reports}/ : loose manuals, references, reports - strip stray ** / *** prefixes from 5 doc filenames - delete empty documents.txt / push_webhook.md Reference updates so nothing breaks: - run_migrations.py -> /app/migrations/<file> - run_migrations.sh -> $SCRIPT_DIR/migrations - import_drivers_csv.py -> data/<csv> - docker-compose.yaml -> runbook path comment - CLAUDE.md -> codebase map + inline doc references Deployed Python (3 services + ts_shared_rev + run_migrations) and the documented ops one-shots stay at root, preserving the flat-import layout and all documented commands. Verified: py_compile clean across all modules, every MIGRATIONS entry resolves under migrations/, CI-referenced paths (tests/, mypy targets, db_audit) and the grafana build context intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 02:27:30 +03:00 · 2026-06-01 02:27:30 +03:00 · e5b0e192d8
commit e5b0e192d8
parent 2309464ab8
41 changed files with 393 additions and 17 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -19,7 +19,7 @@ docker exec $DB psql -U postgres -d tracksolid_db -c "SELECT COUNT(*) FROM track

 **Run a migration file:**
 ```bash
-docker exec -i $DB psql -U postgres -d tracksolid_db < 07_your_migration.sql
+docker exec -i $DB psql -U postgres -d tracksolid_db < migrations/07_your_migration.sql
 ```

 ---
@ -91,17 +91,19 @@ dwh/                        # DWH migrations for tracksolid_dwh@31.97.44.246:588
                            #   261002_bronze_constraints_audit.sql — ON CONFLICT key assertion
                            #   261003_dwh_roles.sql — role contract assertion
                            #   261004_dwh_observability_views.sql — freshness/failure views
-02_tracksolid_full_schema_rev.sql   # Full schema bootstrap
-03..06_*.sql                # Incremental migrations (06 adds assigned_city, dispatch_log, ops.*)
-07_analytics_views.sql      # Analytics views migration (applied 2026-04-21)
+migrations/                 # Numbered SQL migrations 02–10, applied in order by run_migrations.py
+                            #   02 full schema · 03 webhook · 04 distance fix · 05 enhancements
+                            #   06 ops/analytics · 07 views · 08 config · 09 trips enrichment
+                            #   10_driver_clock_views.sql · 10_pgbouncer_auth.sql
 Dockerfile                  # Custom image for ingest/webhook containers
 pyproject.toml              # Python project + uv dependency spec
-OPERATIONS_MANUAL.md        # Day-to-day ops runbook
 backup/                     # pg_dump sidecar scripts and config
-01_BusinessAnalytics.md     # SQL analytics library — read before writing queries
-20260414_FS__Logistics - final_fixed.csv  # 144-device driver/vehicle source data
-tracksolidApiDocumentation.md       # API endpoint reference
-260412_baseline_report.md   # Fleet state snapshot (Apr 2026)
+data/                       # Source CSVs (FS Logistics 144-device list, FSG vehicles)
+legacy/                     # Superseded pre-_rev scripts + old pipeline notes (NOT deployed)
+docs/manuals/               # OPERATIONS_MANUAL, grafana + DWH manuals, docker commands, DB manual
+docs/reference/             # 01_BusinessAnalytics.md (SQL library — read before writing queries),
+                            #   tracksolidApiDocumentation.md, 260507_pgbouncer_deployment.md
+docs/reports/               # Baseline reports, audit output, improvement reviews
 ```

 ---
@ -171,7 +173,7 @@ dwh_control.v_watermark_lag      -- Grafana: extract vs. load lag per table

 ## 6. API Critical Facts

-**Always read `tracksolidApiDocumentation.md` before adding a new endpoint call.**
+**Always read `docs/reference/tracksolidApiDocumentation.md` before adding a new endpoint call.**

 | Fact | Detail |
 |---|---|
@ -209,7 +211,7 @@ dwh_control.v_watermark_lag      -- Grafana: extract vs. load lag per table

 1. **No prod push without explicit user confirmation.** Always state what you are about to push and wait.
 2. **Never rewrite a migration that is already applied.** Check `tracksolid.schema_migrations` first. Add a new numbered migration file for any schema change.
-3. **Read before writing.** Before suggesting any code change, read the relevant source file. Before writing a query, check `01_BusinessAnalytics.md` for an existing pattern.
+3. **Read before writing.** Before suggesting any code change, read the relevant source file. Before writing a query, check `docs/reference/01_BusinessAnalytics.md` for an existing pattern.
 4. **Reuse shared utilities.** All DB access via `get_conn()`, all API calls via `api_post()`, all cleaning via `clean()` / `clean_num()` / `clean_int()` / `clean_ts()` in `ts_shared_rev.py`. Do not reinvent these.
 5. **Resolve container names dynamically.** Never hardcode the Coolify suffix. Use `docker ps --filter name=<service>`.
 6. **SSH only when asked.** Default workflow is local code → commit → push. SSH into the instance only when explicitly asked to test or run something live.
@ -235,7 +237,7 @@ dwh_control.v_watermark_lag      -- Grafana: extract vs. load lag per table
 | Cities active | Nairobi (primary), Mombasa (deploying), Kampala (4 devices in CSV) |
 | Service flags | KDK 829A GP (239,264 km), Belta KCU-647D (235,000 km) |

-Latest full snapshot: `260412_baseline_report.md`
+Latest full snapshot: `docs/reports/260412_baseline_report.md`

 ---

--- a/data/20260414_FS__Logistics
+++ b/data/20260414_FS__Logistics
--- a/data/20260427_FSG_Vehicles_mitieng.csv
+++ b/data/20260427_FSG_Vehicles_mitieng.csv
--- a/data/fireside_logistics_cleaned_v2.csv
+++ b/data/fireside_logistics_cleaned_v2.csv
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -81,7 +81,7 @@ services:

  pgbouncer:
    # Connection pooler in front of timescale_db.
-    # Runbook: 260507_pgbouncer_deployment.md
+    # Runbook: docs/reference/260507_pgbouncer_deployment.md
    # Internal Docker network only — no host port. SCRAM passthrough via
    # auth_query against the public.user_lookup() function (migration 10).
    image: edoburu/pgbouncer
--- a/docs/manuals/DWH_Execution_Manual.md
+++ b/docs/manuals/DWH_Execution_Manual.md
--- a/docs/manuals/OPERATIONS_MANUAL.md
+++ b/docs/manuals/OPERATIONS_MANUAL.md
--- a/docs/manuals/connecting_python_tracksolid.md
+++ b/docs/manuals/connecting_python_tracksolid.md
--- a/docs/manuals/grafanaDeployment.md
+++ b/docs/manuals/grafanaDeployment.md
--- a/docs/manuals/grafanaOperationalManual.md
+++ b/docs/manuals/grafanaOperationalManual.md
--- a/docs/manuals/tracksolid_DB_manual.md
+++ b/docs/manuals/tracksolid_DB_manual.md
--- a/docs/manuals/tracksolid_docker_commands.md
+++ b/docs/manuals/tracksolid_docker_commands.md
--- a/docs/reference/01_BusinessAnalytics.md
+++ b/docs/reference/01_BusinessAnalytics.md
--- a/docs/reference/260507_pgbouncer_deployment.md
+++ b/docs/reference/260507_pgbouncer_deployment.md
--- a/docs/reference/tracksolidApiDocumentation.md
+++ b/docs/reference/tracksolidApiDocumentation.md
--- a/docs/reports/260410_baseline_report.md
+++ b/docs/reports/260410_baseline_report.md
--- a/docs/reports/260412_baseline_report.md
+++ b/docs/reports/260412_baseline_report.md
--- a/docs/reports/260427_audit_output.txt
+++ b/docs/reports/260427_audit_output.txt
--- a/docs/reports/260427_device_reconciliation.md
+++ b/docs/reports/260427_device_reconciliation.md
--- a/docs/reports/260601_improvement_claude_48.html
+++ b/docs/reports/260601_improvement_claude_48.html
@ -0,0 +1,374 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Tracksolid Stack — Engineering Review &amp; Improvement Plan (2026-06-01)</title>
+<style>
+  :root {
+    --bg: #0f1115;
+    --panel: #171a21;
+    --panel-2: #1d212b;
+    --ink: #e6e9ef;
+    --ink-dim: #aab2c0;
+    --line: #2a2f3a;
+    --accent: #5b9dff;
+    --hi: #ff5d5d;
+    --med: #ffb454;
+    --lo: #5fd0a0;
+    --good: #5fd0a0;
+    --mono: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
+    --sans: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
+  }
+  * { box-sizing: border-box; }
+  html { scroll-behavior: smooth; }
+  body {
+    margin: 0; background: var(--bg); color: var(--ink);
+    font-family: var(--sans); line-height: 1.6; font-size: 16px;
+  }
+  .wrap { max-width: 980px; margin: 0 auto; padding: 48px 28px 96px; }
+  header.doc {
+    border-bottom: 1px solid var(--line); padding-bottom: 28px; margin-bottom: 36px;
+  }
+  .kicker { color: var(--accent); font-family: var(--mono); font-size: 13px; letter-spacing: .12em; text-transform: uppercase; }
+  h1 { font-size: 30px; line-height: 1.25; margin: 10px 0 14px; }
+  .meta { color: var(--ink-dim); font-size: 14px; }
+  .meta b { color: var(--ink); font-weight: 600; }
+  h2 { font-size: 22px; margin: 44px 0 8px; padding-top: 10px; border-top: 1px solid var(--line); }
+  h2 .num { color: var(--accent); font-family: var(--mono); margin-right: 10px; }
+  h3 { font-size: 17px; margin: 26px 0 6px; }
+  p { margin: 10px 0; }
+  a { color: var(--accent); }
+  code { font-family: var(--mono); font-size: .87em; background: var(--panel-2); padding: 1px 6px; border-radius: 4px; color: #d7e3ff; }
+  pre { background: #0b0d12; border: 1px solid var(--line); border-radius: 8px; padding: 14px 16px; overflow-x: auto; font-family: var(--mono); font-size: 13px; color: #cdd6e4; }
+  ul, ol { margin: 10px 0 10px 4px; padding-left: 22px; }
+  li { margin: 5px 0; }
+  .lead { color: var(--ink-dim); font-size: 16px; }
+
+  .callout { border-left: 3px solid var(--accent); background: var(--panel); border-radius: 0 8px 8px 0; padding: 14px 18px; margin: 18px 0; }
+  .callout.warn { border-left-color: var(--hi); }
+  .callout.warn b { color: var(--hi); }
+
+  .finding { background: var(--panel); border: 1px solid var(--line); border-radius: 10px; padding: 4px 22px 18px; margin: 22px 0; }
+  .badge { display: inline-block; font-family: var(--mono); font-size: 11px; font-weight: 700; letter-spacing: .06em; padding: 3px 9px; border-radius: 999px; text-transform: uppercase; vertical-align: middle; margin-left: 8px; }
+  .b-hi { background: rgba(255,93,93,.15); color: var(--hi); border: 1px solid rgba(255,93,93,.35); }
+  .b-med { background: rgba(255,180,84,.13); color: var(--med); border: 1px solid rgba(255,180,84,.32); }
+  .b-lo { background: rgba(95,208,160,.12); color: var(--lo); border: 1px solid rgba(95,208,160,.3); }
+  .b-sec { background: rgba(91,157,255,.13); color: var(--accent); border: 1px solid rgba(91,157,255,.32); }
+
+  .ref { font-family: var(--mono); font-size: 12.5px; color: var(--ink-dim); }
+
+  table { width: 100%; border-collapse: collapse; margin: 18px 0; font-size: 14.5px; }
+  th, td { text-align: left; padding: 10px 12px; border-bottom: 1px solid var(--line); vertical-align: top; }
+  th { color: var(--ink-dim); font-weight: 600; font-size: 12.5px; text-transform: uppercase; letter-spacing: .05em; }
+  td.up-h { color: var(--hi); font-weight: 600; }
+  td.up-m { color: var(--med); font-weight: 600; }
+  td.up-l { color: var(--lo); font-weight: 600; }
+
+  .pill { font-family: var(--mono); font-size: 11px; padding: 2px 7px; border-radius: 4px; background: var(--panel-2); color: var(--ink-dim); }
+
+  .good-box { border: 1px solid rgba(95,208,160,.3); background: rgba(95,208,160,.05); border-radius: 10px; padding: 6px 22px 16px; margin: 22px 0; }
+  .good-box h2 { border-top: none; color: var(--good); }
+
+  footer { margin-top: 60px; padding-top: 22px; border-top: 1px solid var(--line); color: var(--ink-dim); font-size: 13px; }
+  .toc { background: var(--panel); border: 1px solid var(--line); border-radius: 10px; padding: 16px 22px; margin: 8px 0 0; }
+  .toc ol { margin: 6px 0; }
+  .toc a { text-decoration: none; }
+  .toc a:hover { text-decoration: underline; }
+</style>
+</head>
+<body>
+<div class="wrap">
+
+  <header class="doc">
+    <div class="kicker">Engineering Review · Fireside Communications · Tracksolid Fleet Stack</div>
+    <h1>Database &amp; Microservice Assessment — Opportunities &amp; Refactoring</h1>
+    <p class="meta">
+      <b>Date:</b> 2026-06-01 &nbsp;·&nbsp;
+      <b>Reviewer:</b> Claude (Opus 4.8) &nbsp;·&nbsp;
+      <b>Scope:</b> TimescaleDB/PostGIS schema + migrations, and the three ingestion microservices
+      (<code>ingest_movement_rev.py</code>, <code>ingest_events_rev.py</code>, <code>webhook_receiver_rev.py</code> + shared <code>ts_shared_rev.py</code>)
+    </p>
+    <p class="meta">Findings are ordered by <b>greatest performance upside first</b>, as requested.</p>
+  </header>
+
+  <div class="callout warn">
+    <p><b>Access caveat — read this first.</b> The remote instance was <b>unreachable from the review environment</b>:
+    every probed port (22, 5433, 5432, 443) timed out, so the IP is not whitelisted (or the host was down).
+    I could <b>not</b> run <code>EXPLAIN</code>, read live row/chunk counts, confirm which indexes actually exist,
+    or inspect the running images. Everything below is a <b>static review</b> of the source and migration files.
+    Items needing live confirmation are tagged <span class="pill">verify live</span>.</p>
+    <p style="margin-bottom:0"><b>Immediate security note:</b> <code>.env</code> is <b>committed to git</b> (it is listed in
+    <code>.gitignore</code>, but was tracked before that rule existed, so the rule is a no-op). The live Tracksolid app
+    secret, the Postgres superuser password, and the Grafana admin password are all in the repo history on Forgejo.
+    Treat all three as compromised and rotate them.</p>
+  </div>
+
+  <div class="toc">
+    <strong>Findings</strong>
+    <ol>
+      <li><a href="#f1">Single-threaded scheduler holds a DB transaction open across throttled geocoding</a> <span class="badge b-hi">High</span></li>
+      <li><a href="#f2">dwh_gold daily-metrics ETL is non-functional</a> <span class="badge b-hi">High</span></li>
+      <li><a href="#f3">v_driver_aggregates_daily will not scale; safeguard not applied</a> <span class="badge b-hi">High</span></li>
+      <li><a href="#f4">pgbouncer deployed but bypassed by the application</a> <span class="badge b-med">Medium</span></li>
+      <li><a href="#f5">Migrations race across three containers with no lock</a> <span class="badge b-med">Medium</span></li>
+      <li><a href="#f6">Orphaned migration: 10_driver_clock_views.sql never applied</a> <span class="badge b-med">Medium</span></li>
+      <li><a href="#f7">Security gaps (webhook auth, committed secrets)</a> <span class="badge b-sec">Security</span></li>
+      <li><a href="#f8">Smaller DB-design notes</a> <span class="badge b-lo">Low</span></li>
+      <li><a href="#good">What's genuinely good</a></li>
+      <li><a href="#plan">Suggested order of attack</a></li>
+    </ol>
+  </div>
+
+  <!-- ====================== FINDING 1 ====================== -->
+  <h2 id="f1"><span class="num">1</span>Single-threaded scheduler holds a DB transaction open across throttled geocoding<span class="badge b-hi">Highest upside</span></h2>
+  <div class="finding">
+    <p><code>ingest_movement_rev.py</code> runs every job on one <code>schedule</code> thread
+      (<span class="ref">main(), lines 674–695</span>). Within that, <code>poll_trips()</code> opens a transaction
+      (<span class="ref">with get_conn(), line 343</span>) and then, <b>inside that open transaction</b>, calls
+      <code>reverse_geocode()</code> twice per trip (<span class="ref">lines 392–393</span>).
+      <code>reverse_geocode</code> enforces a global <b>1 request/second</b> Nominatim throttle
+      (<span class="ref">ts_shared_rev.py:463, _geocode_throttle</span>).</p>
+
+    <h3>Consequences</h3>
+    <ul>
+      <li>A batch of N new trips can hold a single pooled connection open for <b>N×~2 seconds</b> of network I/O — a
+        long-running transaction that pins a snapshot (bad for autovacuum's cleanup horizon) and ties up a connection.</li>
+      <li>Because the scheduler is one thread, while <code>poll_trips</code> is geocoding, the <b>60-second live-position
+        sweep cannot run</b>. The "live" freshness SLA silently degrades to minutes whenever trips/parking work through a
+        backlog. <code>poll_track_list</code> (30 min) and <code>poll_stale_locations</code> (10 min) share the same
+        thread and also block each other.</li>
+      <li>Every 15 min, <code>poll_trips</code> re-runs the 8-subquery enrichment block (<code>_ENRICH_QUERY</code>,
+        <span class="ref">lines 295–321</span>) for the <b>entire last hour</b> of trips, even though the
+        <code>ON CONFLICT</code> mostly <code>COALESCE</code>s the result away.</li>
+    </ul>
+
+    <h3>Recommendation</h3>
+    <ul>
+      <li>Move geocoding <b>out of the DB transaction</b>: collect trip rows, commit, then geocode + <code>UPDATE</code>
+        in a second pass (or delegate geocoding to a queue / n8n).</li>
+      <li>Gate enrichment on <code>WHERE start_address IS NULL</code> so already-enriched trips don't re-pay the cost.</li>
+      <li>Run the 60s live sweep on its own thread/process so slow reporting jobs cannot starve it.
+        <code>schedule</code> + <code>time.sleep(1)</code> on one thread is the wrong concurrency model when one job is
+        latency-critical and others do long network I/O.</li>
+    </ul>
+  </div>
+
+  <!-- ====================== FINDING 2 ====================== -->
+  <h2 id="f2"><span class="num">2</span>The <code>dwh_gold</code> daily-metrics ETL is non-functional<span class="badge b-hi">High</span></h2>
+  <div class="finding">
+    <p><code>dwh_gold.refresh_daily_metrics()</code> (<span class="ref">migration 05, lines 212–264</span>) selects
+      <code>t.imei AS vehicle_key</code> and inserts into <code>fact_daily_fleet_metrics.vehicle_key</code>, which is
+      <code>INTEGER REFERENCES dwh_gold.dim_vehicles(vehicle_key)</code> (<span class="ref">schema lines 237–243</span>).
+      But <code>imei</code> is a 12–15-digit <b>TEXT</b> string:</p>
+    <ul>
+      <li>15-digit IMEIs overflow <code>int4</code> → <em>"integer out of range"</em>.</li>
+      <li>Shorter ones violate the FK because <b>nothing ever populates <code>dim_vehicles</code></b> — no code path
+        inserts into it.</li>
+    </ul>
+    <p>So the function cannot succeed as written, and <code>v_utilisation_daily</code> (which joins
+      <code>fact → dim_vehicles → devices</code>, <span class="ref">migration 07, lines 268–286</span>) will always be
+      empty. CLAUDE.md lists "schedule the nightly ETL" as a LOW open item — but scheduling it today would error on every
+      run.</p>
+    <p style="margin-bottom:0"><b>Recommendation:</b> redesign the gold layer around <code>imei</code> (drop the surrogate
+      key, or populate <code>dim_vehicles</code> from <code>devices</code> first and look up the key), and fix the column
+      type. This is a real bug hiding behind "not scheduled yet."</p>
+  </div>
+
+  <!-- ====================== FINDING 3 ====================== -->
+  <h2 id="f3"><span class="num">3</span><code>v_driver_aggregates_daily</code> will not scale, and the safeguard wasn't applied<span class="badge b-hi">High</span></h2>
+  <div class="finding">
+    <p>Migration 07 (<span class="ref">lines 159–223</span>) builds this view with two 31-day scans of
+      <code>position_history</code> plus a <code>LAG()</code> window over <code>source='track_list'</code> rows. There is
+      <b>no index on <code>position_history.source</code></b>, and the only index on the hypertable is the
+      <code>(imei, gps_time)</code> primary key.</p>
+    <p>The view's own header comment says <em>"convert to a continuous aggregate once the hypertable exceeds ~100k rows."</em>
+      At 156 devices writing a row/minute from the poll sweep plus track_list waypoints, you cross 100k in <b>days</b>, not
+      months. <span class="pill">verify live</span> current row + chunk count.</p>
+    <p style="margin-bottom:0"><b>Recommendation:</b> build the speeding/harsh aggregates as a TimescaleDB continuous
+      aggregate (the pattern already exists in <code>v_mileage_daily_cagg</code>), or at minimum add a partial index
+      supporting the <code>source='track_list'</code> + time filter. As-is, the daily driver dashboard does a growing full
+      hypertable scan on every load.</p>
+  </div>
+
+  <!-- ====================== FINDING 4 ====================== -->
+  <h2 id="f4"><span class="num">4</span>pgbouncer is deployed but the application bypasses it entirely<span class="badge b-med">Medium</span></h2>
+  <div class="finding">
+    <p><code>docker-compose.yaml</code> adds a pgbouncer sidecar (<span class="ref">lines 82–116</span>) "to cap
+      tracksolid_db connections," but <code>.env</code> sets
+      <code>DATABASE_URL=...@timescale_db:5432/...</code> — the Python pools connect <b>straight to Postgres</b>, not to
+      pgbouncer's 6432.</p>
+    <p>So the connection cap does nothing for the three services. The real ceiling today is the sum of per-process pools:</p>
+    <pre>webhook   : uvicorn --workers 2  →  2 procs × ThreadedConnectionPool(max=12) = 24
+ingest_movement                                                          = 12
+ingest_events                                                            = 12
+                                                              total ≈ 48 direct conns</pre>
+    <p>At 80–156 devices this is not a live performance problem — it is wasted/contradictory infrastructure and an
+      intent-vs-reality gap. You also maintain a SCRAM-passthrough <code>user_lookup()</code> SECURITY DEFINER function
+      (<span class="ref">migration 10</span>) with no consumer.</p>
+    <p style="margin-bottom:0"><b>Recommendation:</b> either point <code>DATABASE_URL</code> at <code>pgbouncer:6432</code>
+      (transaction-pool mode disallows session features, but the code uses none beyond <code>client_encoding</code>), or
+      remove the sidecar.</p>
+  </div>
+
+  <!-- ====================== FINDING 5 ====================== -->
+  <h2 id="f5"><span class="num">5</span>Migrations race across three containers with no lock<span class="badge b-med">Medium · reliability</span></h2>
+  <div class="finding">
+    <p>All three services run <code>python run_migrations.py</code> on startup (<span class="ref">compose lines 26, 37,
+      48</span>) and start in parallel once the DB is healthy. <code>run_migrations.py</code> does check-then-act
+      (<code>already_applied()</code> → <code>run_file()</code>, <span class="ref">lines 231–242</span>) with <b>no
+      advisory lock</b>. On a fresh database, three containers can pass <code>already_applied()==False</code>
+      simultaneously and run the same file.</p>
+    <ul>
+      <li>Migration 02's <code>CREATE TRIGGER</code> loop (<span class="ref">lines 255–267</span>) has no
+        <code>IF NOT EXISTS</code> — concurrent runs throw, and <code>run_file()</code> treats any <code>ERROR:</code> as
+        fatal → <code>sys.exit(1)</code> → a service refuses to start.</li>
+      <li><code>run_file()</code> greps stderr for <code>ERROR:</code> without <code>-v ON_ERROR_STOP=1</code>, and files
+        02/03 have no <code>BEGIN/COMMIT</code>, so a mid-file failure can leave partial schema that later gets mis-seeded
+        as "applied."</li>
+    </ul>
+    <p style="margin-bottom:0"><b>Recommendation:</b> wrap the run in <code>pg_advisory_lock(&lt;const&gt;)</code> /
+      unlock, and run psql with <code>ON_ERROR_STOP=1</code>. Low effort, removes a class of cold-start flakiness.</p>
+  </div>
+
+  <!-- ====================== FINDING 6 ====================== -->
+  <h2 id="f6"><span class="num">6</span>Orphaned migration: <code>10_driver_clock_views.sql</code> is never applied<span class="badge b-med">Medium</span></h2>
+  <div class="finding">
+    <p>The runner's <code>MIGRATIONS</code> list (<span class="ref">run_migrations.py:27–37</span>) includes
+      <code>10_pgbouncer_auth.sql</code> but <b>not</b> <code>10_driver_clock_views.sql</code>. Two files share the
+      <code>10_</code> prefix and the list is hand-maintained, so <code>v_driver_clock_daily/_today</code> (which the n8n
+      tardiness workflow depends on, per the file header) exist only if someone applied them by hand — they are not
+      reproducible from a clean deploy.</p>
+    <p style="margin-bottom:0"><b>Recommendation:</b> rename to <code>11_</code> and add to the list. Better: switch the
+      runner from a hardcoded list to globbing <code>NN_*.sql</code> sorted, so this cannot recur.</p>
+  </div>
+
+  <!-- ====================== FINDING 7 ====================== -->
+  <h2 id="f7"><span class="num">7</span>Security gaps worth fixing now<span class="badge b-sec">Security</span></h2>
+  <div class="finding">
+    <ul>
+      <li><b>Webhook auth is effectively off.</b> <code>_validate_token</code>
+        (<span class="ref">webhook_receiver_rev.py:84–87</span>) skips validation entirely when
+        <code>JIMI_WEBHOOK_TOKEN</code> is empty, and it is <b>not set in <code>.env</code></b>. The push endpoints are
+        exposed via Traefik, so anyone who learns the URL can inject arbitrary telemetry/alarms (each <code>/pushgps</code>
+        accepts up to 5000 rows, no rate limit). Set the token and make an unset token <b>fail closed</b> in production.</li>
+      <li><b>Committed secrets</b> (see top banner). Rotate the Tracksolid app secret, Postgres password, and Grafana admin
+        password; <code>git rm --cached .env</code> and scrub history.</li>
+      <li><code>dwh/260423_dwh_ddl_v1.sql</code> plaintext passwords are an existing known item in CLAUDE.md — same class of
+        problem.</li>
+    </ul>
+  </div>
+
+  <!-- ====================== FINDING 8 ====================== -->
+  <h2 id="f8"><span class="num">8</span>Smaller DB-design notes<span class="badge b-lo">Low — queue these</span></h2>
+  <div class="finding">
+    <ul>
+      <li><b><code>v_mileage_daily_cagg</code> is built on a column that's mostly NULL.</b> It computes
+        <code>MAX(current_mileage) - MIN(current_mileage)</code> (<span class="ref">schema lines 293–301</span>), but
+        <code>current_mileage</code> is only populated by the poll sweep — <code>track_list</code> and <code>/pushgps</code>
+        inserts leave it NULL, and odometer resets/device swaps produce negative or huge deltas. The aggregate's
+        <code>dist_km</code> is unreliable. Prefer deriving daily distance from <code>trips.distance_km</code>.</li>
+      <li><b><code>ingestion_log</code> has no retention and no index.</b> <code>v_ingestion_health</code> does
+        <code>DISTINCT ON (endpoint) … ORDER BY endpoint, run_at DESC</code> over the whole table, which grows ~875
+        rows/day forever. Add <code>(endpoint, run_at DESC)</code> plus a retention/partition policy.</li>
+      <li><b>Alarm dedup is leaky on the poll path.</b> <code>alarms_dedup UNIQUE (imei, alarm_type, alarm_time)</code>
+        (<span class="ref">schema line 199</span>) — the poll path inserts <code>alertTypeId</code> as
+        <code>alarm_type</code> with no NOT-NULL guard, and <code>NULL</code> defeats the unique constraint
+        (<code>NULL ≠ NULL</code>), so a null-type alarm can duplicate. The webhook path guards this; the poll path
+        doesn't.</li>
+      <li><b><code>live_positions</code>/staleness queries are seq scans</b> (no index on <code>gps_time</code>) — totally
+        fine at ~156 rows today; just don't carry that pattern into anything that scans <code>position_history</code>.</li>
+      <li><b>Dead/ambiguous code in <code>_parse_request</code></b> (<span class="ref">webhook lines 90–143</span>): the
+        JSON-array branch <code>_parse_data_list</code> is never reached (it always falls through to
+        <code>request.form()</code>); harmless but misleading given the docstring claims it handles both.</li>
+    </ul>
+  </div>
+
+  <!-- ====================== GOOD ====================== -->
+  <div class="good-box">
+    <h2 id="good" style="border-top:none;"><span class="num" style="color:var(--good)">✓</span>What's genuinely good</h2>
+    <p>So this is balanced — the bones are solid:</p>
+    <ul>
+      <li>Per-row <code>SAVEPOINT</code> isolation so one bad item can't abort a batch.</li>
+      <li>Time-guarded upserts via the shared <code>upsert_live_position</code> helper.</li>
+      <li>Batched <code>execute_values</code> on the high-volume push / track-list paths.</li>
+      <li>Hypertables with compression + retention policies.</li>
+      <li>Parameterized SQL throughout — no injection surface.</li>
+      <li>Clean signal handling and pool teardown.</li>
+      <li>Idempotent migrations with a tracking table and <code>COMMENT ON VIEW</code> provenance.</li>
+      <li><code>sync_devices</code> N+1 already parallelized with a bounded thread pool.</li>
+    </ul>
+    <p style="margin-bottom:0">The issues above are mostly about <b>coupling</b>, <b>one broken ETL</b>, and
+      <b>scale-ahead-of-indexing</b> — not a bad foundation.</p>
+  </div>
+
+  <!-- ====================== PLAN ====================== -->
+  <h2 id="plan"><span class="num">»</span>Suggested order of attack (effort vs. upside)</h2>
+  <table>
+    <thead>
+      <tr><th style="width:38px">#</th><th>Action</th><th style="width:130px">Upside</th><th style="width:80px">Effort</th></tr>
+    </thead>
+    <tbody>
+      <tr>
+        <td>1</td>
+        <td>Pull geocoding out of the trips transaction + gate on <code>start_address IS NULL</code>; isolate the 60s sweep on its own thread</td>
+        <td class="up-h">High — restores live freshness, frees connections</td>
+        <td>M</td>
+      </tr>
+      <tr>
+        <td>2</td>
+        <td>Fix or redesign <code>refresh_daily_metrics</code> / <code>dim_vehicles</code> (imei vs int key)</td>
+        <td class="up-h">High — unblocks all utilisation reporting</td>
+        <td>M</td>
+      </tr>
+      <tr>
+        <td>3</td>
+        <td>Convert <code>v_driver_aggregates_daily</code> to a continuous aggregate (or add <code>source</code>+time index)</td>
+        <td class="up-h">High and growing</td>
+        <td>M</td>
+      </tr>
+      <tr>
+        <td>4</td>
+        <td>Set <code>JIMI_WEBHOOK_TOKEN</code>; rotate + untrack <code>.env</code></td>
+        <td class="up-h">High (security)</td>
+        <td>S</td>
+      </tr>
+      <tr>
+        <td>5</td>
+        <td>Advisory-lock the migration runner + <code>ON_ERROR_STOP=1</code>; add <code>10_driver_clock_views</code> / switch to glob</td>
+        <td class="up-m">Medium (reliability)</td>
+        <td>S</td>
+      </tr>
+      <tr>
+        <td>6</td>
+        <td>Decide pgbouncer in-or-out; point <code>DATABASE_URL</code> accordingly</td>
+        <td class="up-m">Medium (clarity)</td>
+        <td>S</td>
+      </tr>
+      <tr>
+        <td>7</td>
+        <td><code>ingestion_log</code> index + retention; fix poll-path alarm null dedup; fix cagg distance source</td>
+        <td class="up-l">Low–medium</td>
+        <td>S</td>
+      </tr>
+    </tbody>
+  </table>
+
+  <div class="callout">
+    <p style="margin:0"><b>Next step for live confirmation:</b> if I can get onto the box (whitelist the review IP for
+      5433, or an SSH tunnel), I'll confirm the <span class="pill">verify live</span> items — actual
+      <code>position_history</code> row/chunk counts, which indexes really exist, whether <code>refresh_daily_metrics</code>
+      has ever succeeded, and <code>EXPLAIN ANALYZE</code> on the heavier views — and tighten the priority order with real
+      numbers.</p>
+  </div>
+
+  <footer>
+    Generated 2026-06-01 by Claude (Opus 4.8) for Fireside Communications · Tracksolid Fleet Intelligence.
+    Static review only — no live database access was available at review time. File references use
+    <span class="ref">file:line</span> against the repository state on branch
+    <code>quality-program-2026-04-12</code>.
+  </footer>
+
+</div>
+</body>
+</html>
--- a/docs/reports/new_feature.txt
+++ b/docs/reports/new_feature.txt
--- a/documents.txt
+++ b/documents.txt
--- a/import_drivers_csv.py
+++ b/import_drivers_csv.py
@ -49,7 +49,7 @@ from ts_shared_rev import clean, clean_num, clean_ts, get_conn, get_logger

 log = get_logger("csv_import")

-DEFAULT_CSV_PATH = Path(__file__).parent / "20260427_FSG_Vehicles_mitieng.csv"
+DEFAULT_CSV_PATH = Path(__file__).parent / "data" / "20260427_FSG_Vehicles_mitieng.csv"

 # Columns fetched from DB for diff comparison.
 DB_COLS = [
--- a/legacy/tracksolid_analytics_pipeline.txt
+++ b/legacy/tracksolid_analytics_pipeline.txt
--- a/legacy/tracksolid_extract.py
+++ b/legacy/tracksolid_extract.py
--- a/legacy/tracksolid_ingestion_pipeline.txt
+++ b/legacy/tracksolid_ingestion_pipeline.txt
--- a/legacy/tracksolid_update_v2.py
+++ b/legacy/tracksolid_update_v2.py
--- a/legacy/tracksolid_vehicle_update.py
+++ b/legacy/tracksolid_vehicle_update.py
--- a/migrations/02_tracksolid_full_schema_rev.sql
+++ b/migrations/02_tracksolid_full_schema_rev.sql
--- a/migrations/03_webhook_schema_migration.sql
+++ b/migrations/03_webhook_schema_migration.sql
--- a/migrations/04_bug_fix_migration.sql
+++ b/migrations/04_bug_fix_migration.sql
--- a/migrations/05_enhancement_migration.sql
+++ b/migrations/05_enhancement_migration.sql
--- a/migrations/06_business_analytics_migration.sql
+++ b/migrations/06_business_analytics_migration.sql
--- a/migrations/07_analytics_views.sql
+++ b/migrations/07_analytics_views.sql
--- a/migrations/08_analytics_config.sql
+++ b/migrations/08_analytics_config.sql
--- a/migrations/09_trips_enrichment.sql
+++ b/migrations/09_trips_enrichment.sql
--- a/migrations/10_driver_clock_views.sql
+++ b/migrations/10_driver_clock_views.sql
--- a/migrations/10_pgbouncer_auth.sql
+++ b/migrations/10_pgbouncer_auth.sql
--- a/push_webhook.md
+++ b/push_webhook.md
--- a/run_migrations.py
+++ b/run_migrations.py
@ -221,7 +221,7 @@ def main():

    applied = skipped = 0
    for sql_file in MIGRATIONS:
-        path = os.path.join("/app", sql_file)
+        path = os.path.join("/app", "migrations", sql_file)

        if not os.path.exists(path):
            print(f"  SKIP  {sql_file} (file not found in /app)")
--- a/run_migrations.sh
+++ b/run_migrations.sh
@ -55,10 +55,10 @@ run_sql -c "
 " > /dev/null

 # ── Find and apply pending migrations ────────────────────────────────────────
-MIGRATION_FILES=$(find "$SCRIPT_DIR" -maxdepth 1 -name '[0-9][0-9]_*.sql' | sort)
+MIGRATION_FILES=$(find "$SCRIPT_DIR/migrations" -maxdepth 1 -name '[0-9][0-9]_*.sql' | sort)

 if [[ -z "$MIGRATION_FILES" ]]; then
-  echo "No migration files found in $SCRIPT_DIR"
+  echo "No migration files found in $SCRIPT_DIR/migrations"
  exit 0
 fi