fleettickets/run_ingest.sh

#!/usr/bin/env bash
# run_ingest.sh — fleettickets · INC + CRQ ingest wrapper for cron (plain host/VM).
#
# Loads env from the local .env (DATABASE_URL + RUSTFS_* + GEOCODER_*) and drains
# both ticket change streams with --apply (watermark skip-if-unchanged + per-file
# archive are built in, so a run with no new files is a cheap no-op).
#
# Install on the instance (every 20 min, 06:00–20:40 EAT):
#   */20 6-20 * * *  /opt/fleettickets/run_ingest.sh >> /var/log/fleettickets.log 2>&1
# Ensure the crontab runs in the Africa/Nairobi timezone (CRON_TZ=Africa/Nairobi or
# the host/container TZ), since the export filenames and the schedule are EAT.
#
# On Coolify the two ingests run as separate Scheduled Tasks instead (see Dockerfile
# + docs/deployment-and-operations.md); this wrapper is the plain-host fallback.
set -euo pipefail

cd "$(dirname "$0")"

# Load .env if present (KEY=VALUE lines); never commit the real .env.
if [ -f .env ]; then
  set -a
  # shellcheck disable=SC1091
  . ./.env
  set +a
fi

# Prefer the project venv if it exists, else the python on PATH (e.g. in-container).
PY="python"
[ -x ".venv/bin/python" ] && PY=".venv/bin/python"

# Run from the repo root (cwd above) so `-m inc.import_inc` / `-m crq.import_crq`
# resolve the packages alongside pipeline.py + shared.py.
"$PY" -m inc.import_inc --from-bucket --apply
"$PY" -m crq.import_crq --from-bucket --apply

# Incremental cluster geocode (FT-BUG-02): NOT-EXISTS-guarded, so a run with no
# new clusters makes zero geocoder calls. Location-level geocoding stays a manual
# command (budget control): python -m inc.import_inc --geocode-locations --apply
"$PY" -m inc.import_inc --geocode-clusters --apply
-												chore: add hourly INC ingest cron wrapper + schedule docs

run_ingest.sh loads .env and runs `import_tickets.py --from-bucket --apply`.
Documented crontab: `15 7-19 * * *` in Africa/Nairobi (ingest at :15, 07:00–19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-15 16:40:50 +00:00
+								#!/usr/bin/env bash
-												feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints

Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-25 20:16:38 +00:00
+								# run_ingest.sh — fleettickets · INC + CRQ ingest wrapper for cron (plain host/VM).
-												chore: add hourly INC ingest cron wrapper + schedule docs

run_ingest.sh loads .env and runs `import_tickets.py --from-bucket --apply`.
Documented crontab: `15 7-19 * * *` in Africa/Nairobi (ingest at :15, 07:00–19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-15 16:40:50 +00:00
+								#
-												feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints

Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-25 20:16:38 +00:00
+								# Loads env from the local .env (DATABASE_URL + RUSTFS_* + GEOCODER_*) and drains
 								# both ticket change streams with --apply (watermark skip-if-unchanged + per-file
 								# archive are built in, so a run with no new files is a cheap no-op).
-												chore: add hourly INC ingest cron wrapper + schedule docs

run_ingest.sh loads .env and runs `import_tickets.py --from-bucket --apply`.
Documented crontab: `15 7-19 * * *` in Africa/Nairobi (ingest at :15, 07:00–19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-15 16:40:50 +00:00
+								#
-												feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints

Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-25 20:16:38 +00:00
+								# Install on the instance (every 20 min, 06:00–20:40 EAT):
 								#   */20 6-20 * * *  /opt/fleettickets/run_ingest.sh >> /var/log/fleettickets.log 2>&1
-												chore: add hourly INC ingest cron wrapper + schedule docs

run_ingest.sh loads .env and runs `import_tickets.py --from-bucket --apply`.
Documented crontab: `15 7-19 * * *` in Africa/Nairobi (ingest at :15, 07:00–19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-15 16:40:50 +00:00
+								# Ensure the crontab runs in the Africa/Nairobi timezone (CRON_TZ=Africa/Nairobi or
 								# the host/container TZ), since the export filenames and the schedule are EAT.
-												feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints

Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-25 20:16:38 +00:00
+								#
 								# On Coolify the two ingests run as separate Scheduled Tasks instead (see Dockerfile
 								# + docs/deployment-and-operations.md); this wrapper is the plain-host fallback.
-												chore: add hourly INC ingest cron wrapper + schedule docs

run_ingest.sh loads .env and runs `import_tickets.py --from-bucket --apply`.
Documented crontab: `15 7-19 * * *` in Africa/Nairobi (ingest at :15, 07:00–19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-15 16:40:50 +00:00
+								set -euo pipefail
 								cd "$(dirname "$0")"
 								# Load .env if present (KEY=VALUE lines); never commit the real .env.
 								if [ -f .env ]; then
 								  set -a
 								  # shellcheck disable=SC1091
 								  . ./.env
 								  set +a
 								fi
 								# Prefer the project venv if it exists, else the python on PATH (e.g. in-container).
 								PY="python"
 								[ -x ".venv/bin/python" ] && PY=".venv/bin/python"
-												feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints

Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

											
										
										
											2026-06-25 20:16:38 +00:00
+								# Run from the repo root (cwd above) so `-m inc.import_inc` / `-m crq.import_crq`
 								# resolve the packages alongside pipeline.py + shared.py.
 								"$PY" -m inc.import_inc --from-bucket --apply
 								"$PY" -m crq.import_crq --from-bucket --apply
-												fix(geocode): precise location geoms survive delta re-upserts (FT-BUG-01)

The tg_ticket_geom trigger resolved feed coords -> cluster centroid -> none,
never consulting tickets.geo_locations, so every 20-min delta ingest re-upserted
changed rows and downgraded previously-resolved 'location' geoms back to the
cluster centroid. Live effect: only 51 of 114k INC (and 0 of 42k CRQ) rows kept
the precise geocode the LocationIQ budget paid for.

- migration 18: trigger now resolves feed -> geo_locations (precise) -> cluster
  -> none, mirroring resolve_ticket_geoms() precedence; ends with one resolve
  pass to repair the backlog. Dry-run against the live DB (rolled back) repaired
  7,481 rows: INC location 51 -> 5,339, CRQ 0 -> 2,193.
- pipeline.ingest(): re-resolve after every applied run that ingested files, so
  geoms self-heal even before migration 18 lands.
- run_ingest.sh: chain an incremental --geocode-clusters pass (0 API calls when
  no new clusters) so new clusters map without a manual command (FT-BUG-02).
- Dockerfile/.dockerignore: pinned installs from uv.lock, non-root user (FT-SEC-02).
- 20260618_bug.txt removed (stale review of a since-rewritten file).

Numbered 18 to coexist with 17_drop_unused_geo_indexes.sql (parallel 260702
change). Audit + plan + work log in docs/260702_*. Local only; not applied to prod.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

											
										
										
											2026-07-02 06:47:15 +00:00
 								# Incremental cluster geocode (FT-BUG-02): NOT-EXISTS-guarded, so a run with no
 								# new clusters makes zero geocoder calls. Location-level geocoding stays a manual
 								# command (budget control): python -m inc.import_inc --geocode-locations --apply
 								"$PY" -m inc.import_inc --geocode-clusters --apply