The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none, never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted changed rows and downgraded previously-resolved 'location' geoms back to the cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept the precise geocode the LocationIQ budget paid for. - migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve pass to repair the backlog. Dry-run against the live DB (rolled back) repaired 7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193. - pipeline.ingest(): re-resolve after every applied run that ingested files, so geoms self-heal even before migration 18 lands. - run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when no new clusters) so new clusters map without a manual command (FT-BUG-02). - Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02). - 20260618_bug.txt removed (stale review of a since-rewritten file). Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702 change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
39 lines
1.7 KiB
Bash
Executable file
39 lines
1.7 KiB
Bash
Executable file
#!/usr/bin/env bash
|
||
# run_ingest.sh — fleettickets · INC + CRQ ingest wrapper for cron (plain host/VM).
|
||
#
|
||
# Loads env from the local .env (DATABASE_URL + RUSTFS_* + GEOCODER_*) and drains
|
||
# both ticket change streams with --apply (watermark skip-if-unchanged + per-file
|
||
# archive are built in, so a run with no new files is a cheap no-op).
|
||
#
|
||
# Install on the instance (every 20 min, 06:00–20:40 EAT):
|
||
# */20 6-20 * * * /opt/fleettickets/run_ingest.sh >> /var/log/fleettickets.log 2>&1
|
||
# Ensure the crontab runs in the Africa/Nairobi timezone (CRON_TZ=Africa/Nairobi or
|
||
# the host/container TZ), since the export filenames and the schedule are EAT.
|
||
#
|
||
# On Coolify the two ingests run as separate Scheduled Tasks instead (see Dockerfile
|
||
# + docs/deployment-and-operations.md); this wrapper is the plain-host fallback.
|
||
set -euo pipefail
|
||
|
||
cd "$(dirname "$0")"
|
||
|
||
# Load .env if present (KEY=VALUE lines); never commit the real .env.
|
||
if [ -f .env ]; then
|
||
set -a
|
||
# shellcheck disable=SC1091
|
||
. ./.env
|
||
set +a
|
||
fi
|
||
|
||
# Prefer the project venv if it exists, else the python on PATH (e.g. in-container).
|
||
PY="python"
|
||
[ -x ".venv/bin/python" ] && PY=".venv/bin/python"
|
||
|
||
# Run from the repo root (cwd above) so `-m inc.import_inc` / `-m crq.import_crq`
|
||
# resolve the packages alongside pipeline.py + shared.py.
|
||
"$PY" -m inc.import_inc --from-bucket --apply
|
||
"$PY" -m crq.import_crq --from-bucket --apply
|
||
|
||
# Incremental cluster geocode (FT-BUG-02): NOT-EXISTS-guarded, so a run with no
|
||
# new clusters makes zero geocoder calls. Location-level geocoding stays a manual
|
||
# command (budget control): python -m inc.import_inc --geocode-locations --apply
|
||
"$PY" -m inc.import_inc --geocode-clusters --apply
|