fleettickets/docs/260702_audit_report.md
david kiania bb38d354e5 fix(geocode): precise location geoms survive delta re-upserts (FT-BUG-01)
The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none,
never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted
changed rows and downgraded previously-resolved 'location' geoms back to the
cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept
the precise geocode the LocationIQ budget paid for.

- migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster
  -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve
  pass to repair the backlog. Dry-run against the live DB (rolled back) repaired
  7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193.
- pipeline.ingest(): re-resolve after every applied run that ingested files, so
  geoms self-heal even before migration 18 lands.
- run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when
  no new clusters) so new clusters map without a manual command (FT-BUG-02).
- Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02).
- 20260618_bug.txt removed (stale review of a since-rewritten file).

Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702
change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 09:47:15 +03:00

4.6 KiB
Raw Blame History

fleettickets — Platform Audit Report (2026-07-02)

Part of the 2026-07-02 cross-repo audit (see the tracksolid repo's docs/reports/260702_platform_audit_report.md for the DB/host-wide findings). Scope here: this repo's code, its migrations, the live tickets schema contents, and the deployed Coolify container. Companion docs: 260702_fix_plan.md, 260702_work_done.md.


Critical

FT-BUG-01 — Precise location geocodes are never applied at ingest time (and get clobbered)

Verified live: tickets.geo_locations holds 808 geocoded locations, yet only 51 of 114k geocoded INC tickets have geo_source='location' — and 0 of 42k CRQ. Two compounding defects:

  1. The tg_ticket_geom trigger (migration 01) resolves feed coords → cluster centroid → none. It never consults tickets.geo_locations, so a freshly upserted ticket can only ever get cluster precision.
  2. tickets.resolve_ticket_geoms() (which does honour geo_locations) only runs at the end of the manual --geocode-* commands. Every 20-minute delta ingest re-upserts changed rows, the trigger recomputes geom, and any previously resolved 'location' geom is silently downgraded back to 'cluster'.

Net effect: the LocationIQ budget spent geocoding 808 locations produces almost no map precision. The "cluster-gazetteer geocoding is the critical path" goal is effectively defeated by the write path.

FT-SEC-01 — Live container connects as the postgres superuser over the PUBLIC port

The deployed container's DATABASE_URL is postgres://postgres:…@twala.rahamafresh.com:5433/tracksolid_db — the superuser, over the internet-exposed host port, without TLS. The repo's own .env.example prescribes tracksolid_owner@timescale_db:5432 (internal Docker network). This must be fixed in the Coolify app env before the tracksolid stack's new loopback-only port binding deploys, or ingestion breaks. (Operational — see plan Phase B.)

High

FT-SEC-02 — Unpinned image builds, root runtime

Dockerfile does pip install . from pyproject.toml version ranges; uv.lock exists but is excluded by .dockerignore, so builds are unpinned and unreproducible. The container also runs as root (no USER).

FT-BUG-02 — New clusters/locations only geocode when someone remembers to run the command

--geocode-clusters / --geocode-locations are manual. Live data shows the lag: 156 INC + 71 CRQ tickets sit at geo_source='none', and location coverage is 808 keys vs thousands of distinct location_names. Both functions are already incremental (NOT EXISTS guards), so chaining a cluster-geocode pass into the scheduled ingest is nearly free (0 API calls on a quiet run).

Medium / notes

  • FT-PERF-01tickets.inc = 766 MB / tickets.crq = 486 MB, dominated by the raw jsonb (avg 754 B/row) and 134 MB of never-scanned geo indexes (ix_inc_geog 83 MB, ix_inc_geom 51 MB, ix_crq_geom 49 MB — 0 scans; the map path reads fn_tickets_for_map, which planner-serves off the raw/actionable indexes). A separate spawned task is already preparing the index-drop migration — not duplicated here. Numbering note: that task landed 17_drop_unused_geo_indexes.sql; this audit's trigger fix is therefore numbered 18_trigger_location_geom.sql.
  • FT-OPS-0120260618_bug.txt (untracked, repo root) reviews a file (import_tickets.py) that no longer exists; its live findings (ETag skip, meta-outside-txn) were fixed in the pipeline rewrite. Stale — remove.
  • FT-OPS-02 — migration numbering gap (no 11_*.sql) — harmless (lexical ordering), just don't reuse 11.
  • FT-OPS-03run_migrations.py is manual (not run on container start, unlike the tracksolid stack). Acceptable for a batch worker, but document it in deploys.
  • FT-NOTE-01 — straggler archiving: when nothing is pending, ingest() archives every file still in changes/ (including pre-watermark leftovers) — files are moved to processed/, not destroyed, so this is acceptable; noted for awareness.
  • FT-NOTE-02 — the geocode loops open a fresh psycopg2 connection per written row (shared.get_conn has no pool). Fine at batch cadence; revisit only if geocode volumes grow 10×.

In good shape

  • The CDC drain design is solid: per-file watermark + archive in one txn each, crash-safe resume, --reseed for bucket cutovers, raw-first schema resilient to source drift.
  • capture_history() gives durable closure/backlog history despite the current-state upsert model.
  • Sentinel/alarm-row filtering, place extraction, and the viewbox+distance-guarded two-pass geocoder are careful, well-commented work.