Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed from import_tickets.py) parameterized by a Dataset config, with thin per-type entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical 32-column source schema and CDC change stream, so the engine is fully shared. - pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget). - crq/import_crq.py: drains automations/crq/changes/ from isptickets into tickets.crq (data layer + map; SLA/dashboard/history deferred). - migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated columns + indexes on tickets.crq (reuses tickets.eat_ts()). - Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject packages inc/crq. Docs (README, implementation, deployment-and-operations, n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook. tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the existing Tickets map once seeded. Verified locally: ruff-clean new files, engine lists/parses both streams against live S3 (crq=52 files, inc unaffected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7.8 KiB
Implementation record — fleettickets (as built)
What is actually built and deployed, as of the Phase-1 completion. Companion to
docs/phase-1-ingestion.md (plan) and docs/phase-2-dashboard.md (next).
Pipeline (pipeline.py engine + inc/,crq/ entrypoints)
The dataset-agnostic CDC engine lives in pipeline.py, parameterized by a small
Dataset config (name, table, automations/<type>/changes|processed/ prefixes, key
regex, optional post_apply hook). Two thin entrypoints supply that config and the CLI:
inc/import_inc.py (python -m inc.import_inc, post_apply=capture_history) and
crq/import_crq.py (python -m crq.import_crq, no history hook). INC and CRQ share an
identical 32-column source schema, so the engine is fully shared; geocoding is
cross-dataset (one gazetteer/budget, unions tickets.inc + tickets.crq) and is run
from the INC entrypoint.
- Source: the incremental CDC stream
automations/<inc|crq>/changes/<EAT-timestamp>.csvin theispticketsS3 bucket (endpointhttps://s3.rahamafresh.com, path-style, regionus-east-1; was theticketsbucket before the 2026-06-25 cutover). - S3 access via boto3 (no aws-CLI dependency):
list_objects_v2(paginator),get_object,copy_object+delete_objectfor archiving. - Watermark: drains every
changes/file newer thantickets.import_meta.metadata.source_max_key, oldest→newest; reruns with no new file are a cheap no-op.--reseedignores the watermark for a one-time bucket cutover. - Cleaning: drop
is_alarm=truerows + theEXPORT STOPPED…sentinel; dropweek_start/week_end,source_s3_bucket/source_s3_key/source_snapshot_id,department,source_type; normalizeregion→lowercase,raw_status→UPPERCASE. - Upsert on
ticket_id(ON CONFLICT DO UPDATE); never delete. On success, move processed file(s) →automations/inc/processed/. - Geocoding (keyed LocationIQ):
--geocode-clusters(coarse, per cluster) and--geocode-locations(precise, actionable INC; strips network codes; 25 km wrong-city guard). Results cache intickets.geo_clusters/tickets.geo_locations. - History capture: after each
--applyrun (ingest or skip), callstickets.capture_history()→ appends new closures + upserts today's backlog snapshot. - CLI (
inc):--from-bucket(drain the INC change stream),--reseed(ignore the watermark; one-time bucket cutover),--inc-csv <file>(local dev),--apply(else dry-run),--geocode-clusters,--geocode-locations,--capture-history. - CLI (
crq):--from-bucket,--reseed,--crq-csv <file>,--apply(ingest only; geocoding + history are not on the CRQ entrypoint).
Schema / migrations (tracksolid_db, applied via run_migrations.py)
| Migration | What |
|---|---|
| 01_tickets_schema | tickets.inc/crq (raw-jsonb-first), geo_clusters/geo_locations gazetteers, geom-resolution trigger, reporting.fn_tickets_for_map |
| 02_import_meta | tickets.import_meta (snapshot freshness) + fn_tickets_for_map summary.freshness |
| 03_inc_columns | Unpack raw → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via tickets.eat_ts()) |
| 04_inc_latlng | latitude/longitude = COALESCE(feed, ST_Y/ST_X(geom)) (populated from geocode) |
| 05_inc_geography | geog geography(Point,4326) (= geom::geography) + GiST index for routing |
| 06_inc_mttr_minutes | mttr → integer minutes; drop constant is_alarm/is_auto_created/is_auto_closed |
| 07_inc_drop_service_type | drop constant service_type |
| 08_inc_open_sla_view | tickets.inc_open_sla view (open tickets + derived SLA) |
| 09_inc_dashboard_fn | built — reporting.fn_inc_dashboard(cluster, status, window, from, to): one JSON payload (open GeoJSON + windowed closed GeoJSON + metrics + freshness) for the FleetOps live INC map. See docs/phase-2-dashboard.md |
| 10_inc_history_capture | built — tickets.closure_events (append-only observed closures) + tickets.inc_daily_snapshot (per-EAT-day open backlog + flow) + tickets.capture_history(); the ingest calls it each --apply run. Unlocks backlog-over-time |
| 12_inc_dashboard_by_owner | built — owner/team breakdown extension to fn_inc_dashboard |
| 13_crq_columns | built — CRQ mirror of 03: typed STORED generated columns + indexes on tickets.crq (reuses tickets.eat_ts()). Data-layer parity for the CRQ tab |
tickets.inc columns: ticket_id (PK), raw (jsonb, source of truth),
normalized_status/raw_status, bucket, is_actionable, cluster/region/
location_name, assigned_team/owner, sla_status, mttr (min),
created_at_service/scheduled_at/closed_at/first_seen_at/last_seen_at/
source_created_at/source_updated_at (timestamptz), latitude/longitude,
geom/geog/geo_source, ingested_at. Dropped-but-in-raw: service_type,
is_alarm, is_auto_created, is_auto_closed, and the ingest-time drops.
Deployment
- Coolify app built from this repo's
Dockerfile(python:3.12-slim,TZ=Africa/Nairobi, keep-alivetail -f /dev/null). Separate from the FleetOps web app (fleet-ops-staging). - Scheduled Tasks (two):
inc_tickets→python -m inc.import_inc --from-bucket --applyandcrq_tickets→python -m crq.import_crq --from-bucket --apply, both cron*/20 6-20 * * *in EAT (Coolify runs tasks in EAT — no UTC conversion). - Env vars (Coolify):
DATABASE_URL(internal DB host),RUSTFS_*(ispticketsbucket — serves both inc + crq),GEOCODER_*. - For a plain host/VM,
run_ingest.sh+ a crontab line is the alternative.
Full ops runbook (env management, the Forgejo → Coolify auto-deploy webhook, manual
deploys, bucket cutover, verification): docs/deployment-and-operations.md.
State at hand-off
tickets.inc≈ 21,312 rows (current non-alarm INC + a few aged-out history rows); 0 alarm / 0 sentinel (legacy rows cleaned up one-time).- Geocoding ~99.99% (
geomon all but 1 null-cluster ticket);QOA/PTMPcluster codes mapped to Quarry Road / Pipeline. - Read path verified:
reporting.fn_tickets_for_map()+tickets.inc_open_sla.
Data-quality caveats (must inform analytics)
- Source
sla_statusonly meaningful once closed; open SLA must be derived (now − created_at_service,first_seen_atfallback; ~30% lackcreated_at_service). mttris minutes, null until closed; not wall-clock and not a 48h threshold.- Lifecycle timestamps =
created_at_service→closed_at; the*_seen_at/source_*ones are export bookkeeping (don't use for SLA/closure-time). - Content lag ~2 days behind wall-clock.
- History:
tickets.incis current-state (upsert). Closure/creation/MTTR event series work directly; backlog-over-time now accrues viatickets.inc_daily_snapshot+tickets.closure_events(written bytickets.capture_history()each ingest) — builds forward from the first capture.
Roadmap
Phase 2 (built): fn_inc_dashboard read-API → FleetOps live map (open + closed
overlay + metrics); history capture (closure_events + inc_daily_snapshot) for
backlog/closure trends. Remaining: dashboard_api endpoint + FleetOps SPA (other
repos; see docs/dashboard-api-contract.md), FleetNow dispatch off geog,
team closure attribution.
CRQ (this milestone): the shared engine now feeds tickets.crq from
automations/crq/changes/ (crq/import_crq.py), with typed columns (migration 13) and
cross-dataset geocoding — CRQ shows on the Tickets map via fn_tickets_for_map (which
already unions it) and gets its own FleetOps tab. Deferred to a follow-up once
installation-lifecycle semantics are confirmed: the CRQ analogues of migrations
08/09/10 — crq_open_sla, fn_crq_dashboard, and CRQ history capture (tickets.crq
currently has no post_apply hook).