fleettickets/run_ingest.sh
david kiania bb38d354e5 fix(geocode): precise location geoms survive delta re-upserts (FT-BUG-01)
The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none,
never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted
changed rows and downgraded previously-resolved 'location' geoms back to the
cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept
the precise geocode the LocationIQ budget paid for.

- migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster
  -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve
  pass to repair the backlog. Dry-run against the live DB (rolled back) repaired
  7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193.
- pipeline.ingest(): re-resolve after every applied run that ingested files, so
  geoms self-heal even before migration 18 lands.
- run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when
  no new clusters) so new clusters map without a manual command (FT-BUG-02).
- Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02).
- 20260618_bug.txt removed (stale review of a since-rewritten file).

Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702
change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 09:47:15 +03:00

39 lines
1.7 KiB
Bash
Executable file
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env bash
# run_ingest.sh — fleettickets · INC + CRQ ingest wrapper for cron (plain host/VM).
#
# Loads env from the local .env (DATABASE_URL + RUSTFS_* + GEOCODER_*) and drains
# both ticket change streams with --apply (watermark skip-if-unchanged + per-file
# archive are built in, so a run with no new files is a cheap no-op).
#
# Install on the instance (every 20 min, 06:0020:40 EAT):
# */20 6-20 * * * /opt/fleettickets/run_ingest.sh >> /var/log/fleettickets.log 2>&1
# Ensure the crontab runs in the Africa/Nairobi timezone (CRON_TZ=Africa/Nairobi or
# the host/container TZ), since the export filenames and the schedule are EAT.
#
# On Coolify the two ingests run as separate Scheduled Tasks instead (see Dockerfile
# + docs/deployment-and-operations.md); this wrapper is the plain-host fallback.
set -euo pipefail
cd "$(dirname "$0")"
# Load .env if present (KEY=VALUE lines); never commit the real .env.
if [ -f .env ]; then
set -a
# shellcheck disable=SC1091
. ./.env
set +a
fi
# Prefer the project venv if it exists, else the python on PATH (e.g. in-container).
PY="python"
[ -x ".venv/bin/python" ] && PY=".venv/bin/python"
# Run from the repo root (cwd above) so `-m inc.import_inc` / `-m crq.import_crq`
# resolve the packages alongside pipeline.py + shared.py.
"$PY" -m inc.import_inc --from-bucket --apply
"$PY" -m crq.import_crq --from-bucket --apply
# Incremental cluster geocode (FT-BUG-02): NOT-EXISTS-guarded, so a run with no
# new clusters makes zero geocoder calls. Location-level geocoding stays a manual
# command (budget control): python -m inc.import_inc --geocode-locations --apply
"$PY" -m inc.import_inc --geocode-clusters --apply