fleettickets/docs/260702_audit_report.md
david kiania bb38d354e5 fix(geocode): precise location geoms survive delta re-upserts (FT-BUG-01)
The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none,
never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted
changed rows and downgraded previously-resolved 'location' geoms back to the
cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept
the precise geocode the LocationIQ budget paid for.

- migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster
  -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve
  pass to repair the backlog. Dry-run against the live DB (rolled back) repaired
  7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193.
- pipeline.ingest(): re-resolve after every applied run that ingested files, so
  geoms self-heal even before migration 18 lands.
- run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when
  no new clusters) so new clusters map without a manual command (FT-BUG-02).
- Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02).
- 20260618_bug.txt removed (stale review of a since-rewritten file).

Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702
change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 09:47:15 +03:00

83 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# fleettickets — Platform Audit Report (2026-07-02)
Part of the 2026-07-02 cross-repo audit (see the tracksolid repo's
`docs/reports/260702_platform_audit_report.md` for the DB/host-wide findings).
Scope here: this repo's code, its migrations, the live `tickets` schema contents,
and the deployed Coolify container. Companion docs: `260702_fix_plan.md`,
`260702_work_done.md`.
---
## Critical
### FT-BUG-01 — Precise location geocodes are never applied at ingest time (and get clobbered)
Verified live: `tickets.geo_locations` holds **808 geocoded locations**, yet only
**51 of 114k geocoded INC tickets** have `geo_source='location'` — and **0 of 42k CRQ**.
Two compounding defects:
1. The `tg_ticket_geom` trigger (migration 01) resolves feed coords → **cluster
centroid** → none. It **never consults `tickets.geo_locations`**, so a freshly
upserted ticket can only ever get cluster precision.
2. `tickets.resolve_ticket_geoms()` (which does honour geo_locations) only runs at
the end of the manual `--geocode-*` commands. Every 20-minute delta ingest
re-upserts changed rows, the trigger recomputes geom, and any previously
resolved `'location'` geom is silently downgraded back to `'cluster'`.
Net effect: the LocationIQ budget spent geocoding 808 locations produces almost no
map precision. The "cluster-gazetteer geocoding is the critical path" goal is
effectively defeated by the write path.
### FT-SEC-01 — Live container connects as the postgres superuser over the PUBLIC port
The deployed container's `DATABASE_URL` is
`postgres://postgres:…@twala.rahamafresh.com:5433/tracksolid_db` — the superuser,
over the internet-exposed host port, without TLS. The repo's own `.env.example`
prescribes `tracksolid_owner@timescale_db:5432` (internal Docker network). This
must be fixed in the Coolify app env **before** the tracksolid stack's new
loopback-only port binding deploys, or ingestion breaks. (Operational — see plan
Phase B.)
## High
### FT-SEC-02 — Unpinned image builds, root runtime
`Dockerfile` does `pip install .` from `pyproject.toml` version ranges; `uv.lock`
exists but is **excluded by `.dockerignore`**, so builds are unpinned and
unreproducible. The container also runs as root (no `USER`).
### FT-BUG-02 — New clusters/locations only geocode when someone remembers to run the command
`--geocode-clusters` / `--geocode-locations` are manual. Live data shows the lag:
156 INC + 71 CRQ tickets sit at `geo_source='none'`, and location coverage is 808
keys vs thousands of distinct location_names. Both functions are already
incremental (NOT EXISTS guards), so chaining a cluster-geocode pass into the
scheduled ingest is nearly free (0 API calls on a quiet run).
## Medium / notes
- **FT-PERF-01** — `tickets.inc` = 766 MB / `tickets.crq` = 486 MB, dominated by the
`raw` jsonb (avg 754 B/row) and **134 MB of never-scanned geo indexes**
(`ix_inc_geog` 83 MB, `ix_inc_geom` 51 MB, `ix_crq_geom` 49 MB — 0 scans; the map
path reads `fn_tickets_for_map`, which planner-serves off the raw/actionable
indexes). A separate spawned task is already preparing the index-drop migration —
not duplicated here. **Numbering note:** that task landed `17_drop_unused_geo_indexes.sql`; this
audit's trigger fix is therefore numbered `18_trigger_location_geom.sql`.
- **FT-OPS-01** — `20260618_bug.txt` (untracked, repo root) reviews a file
(`import_tickets.py`) that no longer exists; its live findings (ETag skip,
meta-outside-txn) were fixed in the pipeline rewrite. Stale — remove.
- **FT-OPS-02** — migration numbering gap (no `11_*.sql`) — harmless (lexical
ordering), just don't reuse 11.
- **FT-OPS-03** — `run_migrations.py` is manual (not run on container start, unlike
the tracksolid stack). Acceptable for a batch worker, but document it in deploys.
- **FT-NOTE-01** — straggler archiving: when nothing is pending, `ingest()` archives
*every* file still in `changes/` (including pre-watermark leftovers) — files are
moved to `processed/`, not destroyed, so this is acceptable; noted for awareness.
- **FT-NOTE-02** — the geocode loops open a fresh psycopg2 connection per written
row (`shared.get_conn` has no pool). Fine at batch cadence; revisit only if
geocode volumes grow 10×.
## In good shape
- The CDC drain design is solid: per-file watermark + archive in one txn each,
crash-safe resume, `--reseed` for bucket cutovers, raw-first schema resilient to
source drift.
- `capture_history()` gives durable closure/backlog history despite the
current-state upsert model.
- Sentinel/alarm-row filtering, place extraction, and the viewbox+distance-guarded
two-pass geocoder are careful, well-commented work.