fleettickets/crq/import_crq.py

"""
crq/import_crq.py — Fireside Communications · CRQ (new-installation) ingestion.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Thin entrypoint over the shared engine (`pipeline.py`) for the CRQ dataset:
    tickets.crq   — new-installation requests  (FleetOps "Tickets" CRQ tab)

CRQ mirrors INC at the data layer — IDENTICAL 32-column CSV schema and the same
incremental CDC change stream automations/crq/changes/<EAT-ts>.csv in the
`isptickets` bucket. This loader upserts on ticket_id, advances the per-dataset
watermark (tickets.import_meta dataset='crq'), and archives each consumed file to
automations/crq/processed/. CRQ flows onto the existing Tickets map via
reporting.fn_tickets_for_map (which already unions tickets.crq).

Scope (current): data layer + map only. CRQ has NO post-apply history capture yet
(installation-lifecycle SLA/backlog semantics differ from incidents — a future
migration). Geocoding is CROSS-DATASET and run from the INC entrypoint
(python -m inc.import_inc --geocode-clusters / --geocode-locations) against the
shared gazetteer, which covers both inc and crq.

Usage (needs DATABASE_URL + RUSTFS_* env; see .env.example):
    python -m crq.import_crq --from-bucket --apply
    python -m crq.import_crq --from-bucket --reseed --apply   # one-time bucket cutover
    python -m crq.import_crq --crq-csv 2026-06-24T12-55-44.csv --apply

Pre-requisite: migrations applied (run_migrations.py) — tickets.crq + its typed
columns (15_crq_table.sql) + geo_clusters/geo_locations + fn_tickets_for_map.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
"""

from __future__ import annotations

import argparse

import pipeline

# CRQ has no post-apply hook yet (history capture is INC-only — see module docstring).
DATASET = pipeline.make_dataset("crq", post_apply=None)


def main() -> None:
    ap = argparse.ArgumentParser(
        description="Ingest CRQ (installation) tickets from CSV (raw-first)")
    ap.add_argument("--apply", action="store_true", help="Write to DB (default: dry-run)")
    ap.add_argument("--from-bucket", action="store_true",
                    help="Drain the incremental CRQ change stream (automations/crq/changes/) "
                         "from the isptickets S3 bucket: every not-yet-processed file "
                         "oldest→newest, upsert on ticket_id, advance the watermark, archive")
    ap.add_argument("--reseed", action="store_true",
                    help="Ignore the stored watermark and drain every file in changes/ once "
                         "(one-time bucket cutover / reseed). Use with --from-bucket --apply")
    ap.add_argument("--crq-csv", dest="local_csv", default=None,
                    help="Local CRQ tickets CSV file (dev)")
    args = ap.parse_args()

    if not (args.from_bucket or args.local_csv):
        ap.error("provide --from-bucket or --crq-csv")
    pipeline.ingest(DATASET, args)


if __name__ == "__main__":
    main()
feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed from import_tickets.py) parameterized by a Dataset config, with thin per-type entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical 32-column source schema and CDC change stream, so the engine is fully shared. - pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget). - crq/import_crq.py: drains automations/crq/changes/ from isptickets into tickets.crq (data layer + map; SLA/dashboard/history deferred). - migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated columns + indexes on tickets.crq (reuses tickets.eat_ts()). - Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject packages inc/crq. Docs (README, implementation, deployment-and-operations, n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook. tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the existing Tickets map once seeded. Verified locally: ruff-clean new files, engine lists/parses both streams against live S3 (crq=52 files, inc unaffected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-25 20:16:38 +00:00			`"""`
			`crq/import_crq.py — Fireside Communications · CRQ (new-installation) ingestion.`
			`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`
			Thin entrypoint over the shared engine (`pipeline.py`) for the CRQ dataset:
			`tickets.crq — new-installation requests (FleetOps "Tickets" CRQ tab)`

			`CRQ mirrors INC at the data layer — IDENTICAL 32-column CSV schema and the same`
			`incremental CDC change stream automations/crq/changes/<EAT-ts>.csv in the`
			`isptickets` bucket. This loader upserts on ticket_id, advances the per-dataset
			`watermark (tickets.import_meta dataset='crq'), and archives each consumed file to`
			`automations/crq/processed/. CRQ flows onto the existing Tickets map via`
			`reporting.fn_tickets_for_map (which already unions tickets.crq).`

			`Scope (current): data layer + map only. CRQ has NO post-apply history capture yet`
			`(installation-lifecycle SLA/backlog semantics differ from incidents — a future`
			`migration). Geocoding is CROSS-DATASET and run from the INC entrypoint`
			`(python -m inc.import_inc --geocode-clusters / --geocode-locations) against the`
			`shared gazetteer, which covers both inc and crq.`

			`Usage (needs DATABASE_URL + RUSTFS_* env; see .env.example):`
			`python -m crq.import_crq --from-bucket --apply`
			`python -m crq.import_crq --from-bucket --reseed --apply # one-time bucket cutover`
			`python -m crq.import_crq --crq-csv 2026-06-24T12-55-44.csv --apply`

			`Pre-requisite: migrations applied (run_migrations.py) — tickets.crq + its typed`
fix(crq): migration 15 creates tickets.crq (live DB never materialized it) Live-DB reconciliation before seeding CRQ revealed two divergences: - tickets.crq did NOT exist: 01_tickets_schema.sql was applied 2026-06-15 from a version predating its crq section, so the IF-NOT-EXISTS ledger guard has blocked it ever since (fn_tickets_for_map + resolve_ticket_geoms already reference crq, so they errored if called — masked because the live INC view uses fn_inc_dashboard). - The live ledger carries un-versioned 13_inc_search_fn.sql / 14_inc_filter_options.sql (applied 2026-06-19, absent from this repo). So 13_crq_columns.sql (ALTER-only, number 13) is replaced by 15_crq_table.sql, which CREATEs tickets.crq self-containedly (table + geom trigger + raw/typed indexes) and adds the typed STORED generated columns. Deterministic + idempotent on both the live DB (crq missing) and a fresh DB (crq minimal from 01). Numbered 15 to sit after the live ledger's max. Docs/CLI references updated 13->15. Applied + seeded on the live DB out-of-band (running container, INC image untouched): 39,240 crq rows, 99.99% geocoded (cluster + shared location cache), watermark current, crq now renders on fn_tickets_for_map. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-25 20:55:17 +00:00			`columns (15_crq_table.sql) + geo_clusters/geo_locations + fn_tickets_for_map.`
feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed from import_tickets.py) parameterized by a Dataset config, with thin per-type entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical 32-column source schema and CDC change stream, so the engine is fully shared. - pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget). - crq/import_crq.py: drains automations/crq/changes/ from isptickets into tickets.crq (data layer + map; SLA/dashboard/history deferred). - migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated columns + indexes on tickets.crq (reuses tickets.eat_ts()). - Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject packages inc/crq. Docs (README, implementation, deployment-and-operations, n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook. tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the existing Tickets map once seeded. Verified locally: ruff-clean new files, engine lists/parses both streams against live S3 (crq=52 files, inc unaffected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-25 20:16:38 +00:00			`━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━`
			`"""`

			`from __future__ import annotations`

			`import argparse`

			`import pipeline`

			`# CRQ has no post-apply hook yet (history capture is INC-only — see module docstring).`
			`DATASET = pipeline.make_dataset("crq", post_apply=None)`


			`def main() -> None:`
			`ap = argparse.ArgumentParser(`
			`description="Ingest CRQ (installation) tickets from CSV (raw-first)")`
			`ap.add_argument("--apply", action="store_true", help="Write to DB (default: dry-run)")`
			`ap.add_argument("--from-bucket", action="store_true",`
			`help="Drain the incremental CRQ change stream (automations/crq/changes/) "`
			`"from the isptickets S3 bucket: every not-yet-processed file "`
			`"oldest→newest, upsert on ticket_id, advance the watermark, archive")`
			`ap.add_argument("--reseed", action="store_true",`
			`help="Ignore the stored watermark and drain every file in changes/ once "`
			`"(one-time bucket cutover / reseed). Use with --from-bucket --apply")`
			`ap.add_argument("--crq-csv", dest="local_csv", default=None,`
			`help="Local CRQ tickets CSV file (dev)")`
			`args = ap.parse_args()`

			`if not (args.from_bucket or args.local_csv):`
			`ap.error("provide --from-bucket or --crq-csv")`
			`pipeline.ingest(DATASET, args)`


			`if __name__ == "__main__":`
			`main()`