Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.
- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
packages inc/crq. Docs (README, implementation, deployment-and-operations,
n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.
tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Provider moved the INC CDC feed to a new bucket (tickets -> isptickets, new
per-bucket creds; same s3.rahamafresh.com endpoint, identical 32-col schema).
This is config + a one-time reseed, not a rewrite — the loader already drains
automations/inc/changes/ oldest->newest with a source_max_key watermark.
- default _BUCKET -> isptickets (TICKETS_BUCKET still overrides)
- add --reseed: ignore the stored watermark and drain every changes/ file once
(the old-bucket watermark may post-date the new bucket's first file). Crash-safe
via the existing per-file watermark-advance + archive loop.
- refresh stale "newest-file / full-snapshot-per-hour" docstring/comments to the
CDC reality; .env.example + README updated (new bucket + reseed runbook).
Verified live dry-run: 41/41 files drained (watermark None), alarm/sentinel
filter active, exit 0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rework import_tickets.py from the retired JSON `latest.json` model to the new
hourly full-snapshot CSV export. Strictly INC (CRQ out of scope).
- Ingest the newest automations/inc/<EAT-timestamp>.csv; skip-if-unchanged by
comparing S3 ETag to tickets.import_meta.metadata.source_etag.
- Upsert on ticket_id (PK; no dups, never delete -> closure history accrues).
No truncate. On success, move processed files to automations/inc/processed/.
- Clean at ingest: drop is_alarm=true + the "EXPORT STOPPED..." sentinel; drop
week_*, source_s3_*/source_snapshot_id, department/source_type; lowercase
region, uppercase raw_status; keep service_type + bucket.
- Force path-style S3 addressing; --inc-csv for local dev; --from-bucket for cron.
- Add migrations/02 (import_meta + freshness); refresh README/.env.example/docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>