The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none, never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted changed rows and downgraded previously-resolved 'location' geoms back to the cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept the precise geocode the LocationIQ budget paid for. - migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve pass to repair the backlog. Dry-run against the live DB (rolled back) repaired 7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193. - pipeline.ingest(): re-resolve after every applied run that ingested files, so geoms self-heal even before migration 18 lands. - run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when no new clusters) so new clusters map without a manual command (FT-BUG-02). - Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02). - 20260618_bug.txt removed (stale review of a since-rewritten file). Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702 change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
83 lines
4.6 KiB
Markdown
83 lines
4.6 KiB
Markdown
# fleettickets — Platform Audit Report (2026-07-02)
|
||
|
||
Part of the 2026-07-02 cross-repo audit (see the tracksolid repo's
|
||
`docs/reports/260702_platform_audit_report.md` for the DB/host-wide findings).
|
||
Scope here: this repo's code, its migrations, the live `tickets` schema contents,
|
||
and the deployed Coolify container. Companion docs: `260702_fix_plan.md`,
|
||
`260702_work_done.md`.
|
||
|
||
---
|
||
|
||
## Critical
|
||
|
||
### FT-BUG-01 — Precise location geocodes are never applied at ingest time (and get clobbered)
|
||
Verified live: `tickets.geo_locations` holds **808 geocoded locations**, yet only
|
||
**51 of 114k geocoded INC tickets** have `geo_source='location'` — and **0 of 42k CRQ**.
|
||
Two compounding defects:
|
||
|
||
1. The `tg_ticket_geom` trigger (migration 01) resolves feed coords → **cluster
|
||
centroid** → none. It **never consults `tickets.geo_locations`**, so a freshly
|
||
upserted ticket can only ever get cluster precision.
|
||
2. `tickets.resolve_ticket_geoms()` (which does honour geo_locations) only runs at
|
||
the end of the manual `--geocode-*` commands. Every 20-minute delta ingest
|
||
re-upserts changed rows, the trigger recomputes geom, and any previously
|
||
resolved `'location'` geom is silently downgraded back to `'cluster'`.
|
||
|
||
Net effect: the LocationIQ budget spent geocoding 808 locations produces almost no
|
||
map precision. The "cluster-gazetteer geocoding is the critical path" goal is
|
||
effectively defeated by the write path.
|
||
|
||
### FT-SEC-01 — Live container connects as the postgres superuser over the PUBLIC port
|
||
The deployed container's `DATABASE_URL` is
|
||
`postgres://postgres:…@twala.rahamafresh.com:5433/tracksolid_db` — the superuser,
|
||
over the internet-exposed host port, without TLS. The repo's own `.env.example`
|
||
prescribes `tracksolid_owner@timescale_db:5432` (internal Docker network). This
|
||
must be fixed in the Coolify app env **before** the tracksolid stack's new
|
||
loopback-only port binding deploys, or ingestion breaks. (Operational — see plan
|
||
Phase B.)
|
||
|
||
## High
|
||
|
||
### FT-SEC-02 — Unpinned image builds, root runtime
|
||
`Dockerfile` does `pip install .` from `pyproject.toml` version ranges; `uv.lock`
|
||
exists but is **excluded by `.dockerignore`**, so builds are unpinned and
|
||
unreproducible. The container also runs as root (no `USER`).
|
||
|
||
### FT-BUG-02 — New clusters/locations only geocode when someone remembers to run the command
|
||
`--geocode-clusters` / `--geocode-locations` are manual. Live data shows the lag:
|
||
156 INC + 71 CRQ tickets sit at `geo_source='none'`, and location coverage is 808
|
||
keys vs thousands of distinct location_names. Both functions are already
|
||
incremental (NOT EXISTS guards), so chaining a cluster-geocode pass into the
|
||
scheduled ingest is nearly free (0 API calls on a quiet run).
|
||
|
||
## Medium / notes
|
||
|
||
- **FT-PERF-01** — `tickets.inc` = 766 MB / `tickets.crq` = 486 MB, dominated by the
|
||
`raw` jsonb (avg 754 B/row) and **134 MB of never-scanned geo indexes**
|
||
(`ix_inc_geog` 83 MB, `ix_inc_geom` 51 MB, `ix_crq_geom` 49 MB — 0 scans; the map
|
||
path reads `fn_tickets_for_map`, which planner-serves off the raw/actionable
|
||
indexes). A separate spawned task is already preparing the index-drop migration —
|
||
not duplicated here. **Numbering note:** that task landed `17_drop_unused_geo_indexes.sql`; this
|
||
audit's trigger fix is therefore numbered `18_trigger_location_geom.sql`.
|
||
- **FT-OPS-01** — `20260618_bug.txt` (untracked, repo root) reviews a file
|
||
(`import_tickets.py`) that no longer exists; its live findings (ETag skip,
|
||
meta-outside-txn) were fixed in the pipeline rewrite. Stale — remove.
|
||
- **FT-OPS-02** — migration numbering gap (no `11_*.sql`) — harmless (lexical
|
||
ordering), just don't reuse 11.
|
||
- **FT-OPS-03** — `run_migrations.py` is manual (not run on container start, unlike
|
||
the tracksolid stack). Acceptable for a batch worker, but document it in deploys.
|
||
- **FT-NOTE-01** — straggler archiving: when nothing is pending, `ingest()` archives
|
||
*every* file still in `changes/` (including pre-watermark leftovers) — files are
|
||
moved to `processed/`, not destroyed, so this is acceptable; noted for awareness.
|
||
- **FT-NOTE-02** — the geocode loops open a fresh psycopg2 connection per written
|
||
row (`shared.get_conn` has no pool). Fine at batch cadence; revisit only if
|
||
geocode volumes grow 10×.
|
||
|
||
## In good shape
|
||
- The CDC drain design is solid: per-file watermark + archive in one txn each,
|
||
crash-safe resume, `--reseed` for bucket cutovers, raw-first schema resilient to
|
||
source drift.
|
||
- `capture_history()` gives durable closure/backlog history despite the
|
||
current-state upsert model.
|
||
- Sentinel/alarm-row filtering, place extraction, and the viewbox+distance-guarded
|
||
two-pass geocoder are careful, well-commented work.
|