fleettickets/migrations/18_trigger_location_geom.sql
david kiania bb38d354e5 fix(geocode): precise location geoms survive delta re-upserts (FT-BUG-01)
The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none,
never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted
changed rows and downgraded previously-resolved 'location' geoms back to the
cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept
the precise geocode the LocationIQ budget paid for.

- migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster
  -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve
  pass to repair the backlog. Dry-run against the live DB (rolled back) repaired
  7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193.
- pipeline.ingest(): re-resolve after every applied run that ingested files, so
  geoms self-heal even before migration 18 lands.
- run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when
  no new clusters) so new clusters map without a manual command (FT-BUG-02).
- Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02).
- 20260618_bug.txt removed (stale review of a since-rewritten file).

Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702
change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 09:47:15 +03:00

55 lines
3 KiB
PL/PgSQL

-- 18_trigger_location_geom.sql — fleettickets · precise geoms survive re-upserts.
-- (migration 17 is 17_drop_unused_geo_indexes.sql — a parallel 260702 change.)
-- ─────────────────────────────────────────────────────────────────────────────
-- FT-BUG-01 (260702 audit): tg_ticket_geom resolved feed coords → cluster
-- centroid → none, NEVER consulting tickets.geo_locations. So the precise
-- location geocodes (808 keys, paid LocationIQ budget) were only applied by the
-- manual --geocode-* commands' resolve pass — and the very next 20-minute delta
-- ingest re-upserted changed rows and the trigger downgraded them back to the
-- cluster centroid. Live effect: 51 of 114k INC rows (and 0 CRQ) carried
-- geo_source='location'.
--
-- Fix: the trigger now resolves feed → LOCATION (geo_locations by
-- norm_cluster(location_name)) → cluster → none, mirroring
-- resolve_ticket_geoms()'s precedence exactly. Then one resolve pass repairs the
-- whole backlog immediately.
-- Idempotent — CREATE OR REPLACE + a re-runnable resolve.
-- ─────────────────────────────────────────────────────────────────────────────
CREATE OR REPLACE FUNCTION tickets.tg_ticket_geom()
RETURNS trigger LANGUAGE plpgsql AS $fn$
DECLARE
v_lat double precision := NULLIF(NEW.raw->>'latitude','')::double precision;
v_lng double precision := NULLIF(NEW.raw->>'longitude','')::double precision;
g geometry(Point, 4326);
BEGIN
IF TG_OP = 'UPDATE' AND NEW.raw IS NOT DISTINCT FROM OLD.raw THEN
RETURN NEW; -- geom/geo_source-only update — keep caller's value
END IF;
IF v_lat IS NOT NULL AND v_lng IS NOT NULL
AND v_lat BETWEEN -90 AND 90 AND v_lng BETWEEN -180 AND 180
AND NOT (v_lat = 0 AND v_lng = 0) THEN
NEW.geom := ST_SetSRID(ST_MakePoint(v_lng, v_lat), 4326);
NEW.geo_source := 'feed';
RETURN NEW;
END IF;
-- [mig-18] precise location geocode BEFORE the coarse cluster centroid
SELECT gl.geom INTO g FROM tickets.geo_locations gl
WHERE gl.query_key = tickets.norm_cluster(NEW.raw->>'location_name')
AND gl.geom IS NOT NULL LIMIT 1;
IF g IS NOT NULL THEN
NEW.geom := g; NEW.geo_source := 'location';
RETURN NEW;
END IF;
SELECT gc.geom INTO g FROM tickets.geo_clusters gc
WHERE gc.cluster_key = tickets.norm_cluster(NEW.raw->>'cluster')
AND gc.geom IS NOT NULL LIMIT 1;
IF g IS NOT NULL THEN NEW.geom := g; NEW.geo_source := 'cluster';
ELSE NEW.geom := NULL; NEW.geo_source := 'none'; END IF;
RETURN NEW;
END $fn$;
-- Repair the backlog now: re-resolve every non-feed row against the gazetteer
-- (resolve_ticket_geoms already prefers geo_locations over geo_clusters and only
-- touches rows whose geom/geo_source would actually change).
SELECT tickets.resolve_ticket_geoms();