fleettickets/docs/implementation.md
david kiania 5f5d71d500 feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints
Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 23:16:38 +03:00

118 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation record — fleettickets (as built)
What is actually built and deployed, as of the Phase-1 completion. Companion to
`docs/phase-1-ingestion.md` (plan) and `docs/phase-2-dashboard.md` (next).
## Pipeline (`pipeline.py` engine + `inc/`,`crq/` entrypoints)
The dataset-agnostic CDC engine lives in **`pipeline.py`**, parameterized by a small
`Dataset` config (name, table, `automations/<type>/changes|processed/` prefixes, key
regex, optional `post_apply` hook). Two thin entrypoints supply that config and the CLI:
**`inc/import_inc.py`** (`python -m inc.import_inc`, `post_apply=capture_history`) and
**`crq/import_crq.py`** (`python -m crq.import_crq`, no history hook). INC and CRQ share an
**identical 32-column source schema**, so the engine is fully shared; geocoding is
**cross-dataset** (one gazetteer/budget, unions `tickets.inc` + `tickets.crq`) and is run
from the INC entrypoint.
- **Source:** the incremental CDC stream `automations/<inc|crq>/changes/<EAT-timestamp>.csv`
in the **`isptickets`** S3 bucket (endpoint `https://s3.rahamafresh.com`, path-style,
region `us-east-1`; was the `tickets` bucket before the 2026-06-25 cutover).
- **S3 access via boto3** (no aws-CLI dependency): `list_objects_v2` (paginator),
`get_object`, `copy_object` + `delete_object` for archiving.
- **Watermark:** drains every `changes/` file newer than
`tickets.import_meta.metadata.source_max_key`, oldest→newest; reruns with no new file
are a cheap no-op. `--reseed` ignores the watermark for a one-time bucket cutover.
- **Cleaning:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
`week_start`/`week_end`, `source_s3_bucket`/`source_s3_key`/`source_snapshot_id`,
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE.
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete. On success,
**move** processed file(s) → `automations/inc/processed/`.
- **Geocoding** (keyed LocationIQ): `--geocode-clusters` (coarse, per cluster) and
`--geocode-locations` (precise, actionable INC; strips network codes; 25 km
wrong-city guard). Results cache in `tickets.geo_clusters` / `tickets.geo_locations`.
- **History capture:** after each `--apply` run (ingest or skip), calls
`tickets.capture_history()` → appends new closures + upserts today's backlog
snapshot.
- CLI (`inc`): `--from-bucket` (drain the INC change stream), `--reseed` (ignore the
watermark; one-time bucket cutover), `--inc-csv <file>` (local dev), `--apply` (else
dry-run), `--geocode-clusters`, `--geocode-locations`, `--capture-history`.
- CLI (`crq`): `--from-bucket`, `--reseed`, `--crq-csv <file>`, `--apply` (ingest only;
geocoding + history are not on the CRQ entrypoint).
## Schema / migrations (`tracksolid_db`, applied via `run_migrations.py`)
| Migration | What |
|---|---|
| 01_tickets_schema | `tickets.inc`/`crq` (raw-jsonb-first), `geo_clusters`/`geo_locations` gazetteers, geom-resolution trigger, `reporting.fn_tickets_for_map` |
| 02_import_meta | `tickets.import_meta` (snapshot freshness) + `fn_tickets_for_map` `summary.freshness` |
| 03_inc_columns | Unpack `raw` → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via `tickets.eat_ts()`) |
| 04_inc_latlng | `latitude`/`longitude` = `COALESCE(feed, ST_Y/ST_X(geom))` (populated from geocode) |
| 05_inc_geography | `geog geography(Point,4326)` (= `geom::geography`) + GiST index for routing |
| 06_inc_mttr_minutes | `mttr` → integer **minutes**; drop constant `is_alarm`/`is_auto_created`/`is_auto_closed` |
| 07_inc_drop_service_type | drop constant `service_type` |
| 08_inc_open_sla_view | `tickets.inc_open_sla` view (open tickets + derived SLA) |
| 09_inc_dashboard_fn | **built**`reporting.fn_inc_dashboard(cluster, status, window, from, to)`: one JSON payload (open GeoJSON + windowed closed GeoJSON + metrics + freshness) for the FleetOps live INC map. See `docs/phase-2-dashboard.md` |
| 10_inc_history_capture | **built**`tickets.closure_events` (append-only observed closures) + `tickets.inc_daily_snapshot` (per-EAT-day open backlog + flow) + `tickets.capture_history()`; the ingest calls it each `--apply` run. Unlocks backlog-over-time |
| 12_inc_dashboard_by_owner | **built** — owner/team breakdown extension to `fn_inc_dashboard` |
| 13_crq_columns | **built** — CRQ mirror of `03`: typed STORED generated columns + indexes on `tickets.crq` (reuses `tickets.eat_ts()`). Data-layer parity for the CRQ tab |
`tickets.inc` columns: `ticket_id` (PK), `raw` (jsonb, source of truth),
`normalized_status`/`raw_status`, `bucket`, `is_actionable`, `cluster`/`region`/
`location_name`, `assigned_team`/`owner`, `sla_status`, `mttr` (min),
`created_at_service`/`scheduled_at`/`closed_at`/`first_seen_at`/`last_seen_at`/
`source_created_at`/`source_updated_at` (timestamptz), `latitude`/`longitude`,
`geom`/`geog`/`geo_source`, `ingested_at`. Dropped-but-in-`raw`: `service_type`,
`is_alarm`, `is_auto_created`, `is_auto_closed`, and the ingest-time drops.
## Deployment
- **Coolify** app built from this repo's `Dockerfile` (`python:3.12-slim`,
`TZ=Africa/Nairobi`, keep-alive `tail -f /dev/null`). Separate from the FleetOps
web app (`fleet-ops-staging`).
- **Scheduled Tasks (two):** `inc_tickets` → `python -m inc.import_inc --from-bucket
--apply` and `crq_tickets``python -m crq.import_crq --from-bucket --apply`, both cron
`*/20 6-20 * * *` in **EAT** (Coolify runs tasks in EAT — no UTC conversion).
- **Env vars** (Coolify): `DATABASE_URL` (internal DB host), `RUSTFS_*`
(`isptickets` bucket — serves both inc + crq), `GEOCODER_*`.
- For a plain host/VM, `run_ingest.sh` + a crontab line is the alternative.
Full ops runbook (env management, the Forgejo → Coolify auto-deploy webhook, manual
deploys, bucket cutover, verification): **`docs/deployment-and-operations.md`**.
## State at hand-off
- `tickets.inc` ≈ 21,312 rows (current non-alarm INC + a few aged-out history rows);
**0 alarm / 0 sentinel** (legacy rows cleaned up one-time).
- Geocoding ~**99.99%** (`geom` on all but 1 null-cluster ticket); `QOA`/`PTMP`
cluster codes mapped to Quarry Road / Pipeline.
- Read path verified: `reporting.fn_tickets_for_map()` + `tickets.inc_open_sla`.
## Data-quality caveats (must inform analytics)
- Source `sla_status` only meaningful once **closed**; open SLA must be **derived**
(`now created_at_service`, `first_seen_at` fallback; ~30% lack
`created_at_service`).
- `mttr` is **minutes**, null until closed; not wall-clock and not a 48h threshold.
- Lifecycle timestamps = `created_at_service`→`closed_at`; the `*_seen_at` / `source_*`
ones are export bookkeeping (don't use for SLA/closure-time).
- Content lag ~2 days behind wall-clock.
- **History:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR event
series work directly; **backlog-over-time** now accrues via
`tickets.inc_daily_snapshot` + `tickets.closure_events` (written by
`tickets.capture_history()` each ingest) — builds forward from the first capture.
## Roadmap
Phase 2 (built): `fn_inc_dashboard` read-API → FleetOps live map (open + closed
overlay + metrics); history capture (`closure_events` + `inc_daily_snapshot`) for
backlog/closure trends. Remaining: `dashboard_api` endpoint + FleetOps SPA (other
repos; see `docs/dashboard-api-contract.md`), FleetNow **dispatch** off `geog`,
**team closure attribution**.
**CRQ** (this milestone): the shared engine now feeds `tickets.crq` from
`automations/crq/changes/` (`crq/import_crq.py`), with typed columns (migration 13) and
cross-dataset geocoding — CRQ shows on the Tickets map via `fn_tickets_for_map` (which
already unions it) and gets its own FleetOps tab. Deferred to a follow-up once
installation-lifecycle semantics are confirmed: the CRQ analogues of migrations
08/09/10 — `crq_open_sla`, `fn_crq_dashboard`, and CRQ history capture (`tickets.crq`
currently has **no** `post_apply` hook).