fleettickets/docs/implementation.md

119 lines
7.9 KiB
Markdown
Raw Permalink Normal View History

# Implementation record — fleettickets (as built)
What is actually built and deployed, as of the Phase-1 completion. Companion to
`docs/phase-1-ingestion.md` (plan) and `docs/phase-2-dashboard.md` (next).
2026-06-25 20:16:38 +00:00
## Pipeline (`pipeline.py` engine + `inc/`,`crq/` entrypoints)
2026-06-25 20:16:38 +00:00
The dataset-agnostic CDC engine lives in **`pipeline.py`**, parameterized by a small
`Dataset` config (name, table, `automations/<type>/changes|processed/` prefixes, key
regex, optional `post_apply` hook). Two thin entrypoints supply that config and the CLI:
**`inc/import_inc.py`** (`python -m inc.import_inc`, `post_apply=capture_history`) and
**`crq/import_crq.py`** (`python -m crq.import_crq`, no history hook). INC and CRQ share an
**identical 32-column source schema**, so the engine is fully shared; geocoding is
**cross-dataset** (one gazetteer/budget, unions `tickets.inc` + `tickets.crq`) and is run
from the INC entrypoint.
- **Source:** the incremental CDC stream `automations/<inc|crq>/changes/<EAT-timestamp>.csv`
in the **`isptickets`** S3 bucket (endpoint `https://s3.rahamafresh.com`, path-style,
region `us-east-1`; was the `tickets` bucket before the 2026-06-25 cutover).
- **S3 access via boto3** (no aws-CLI dependency): `list_objects_v2` (paginator),
`get_object`, `copy_object` + `delete_object` for archiving.
- **Watermark:** drains every `changes/` file newer than
`tickets.import_meta.metadata.source_max_key`, oldest→newest; reruns with no new file
are a cheap no-op. `--reseed` ignores the watermark for a one-time bucket cutover.
- **Cleaning:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
`week_start`/`week_end`, `source_s3_bucket`/`source_s3_key`/`source_snapshot_id`,
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE.
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete. On success,
**move** processed file(s) → `automations/inc/processed/`.
- **Geocoding** (keyed LocationIQ): `--geocode-clusters` (coarse, per cluster) and
`--geocode-locations` (precise, actionable INC; strips network codes; 25 km
wrong-city guard). Results cache in `tickets.geo_clusters` / `tickets.geo_locations`.
- **History capture:** after each `--apply` run (ingest or skip), calls
`tickets.capture_history()` → appends new closures + upserts today's backlog
snapshot.
2026-06-25 20:16:38 +00:00
- CLI (`inc`): `--from-bucket` (drain the INC change stream), `--reseed` (ignore the
watermark; one-time bucket cutover), `--inc-csv <file>` (local dev), `--apply` (else
dry-run), `--geocode-clusters`, `--geocode-locations`, `--capture-history`.
- CLI (`crq`): `--from-bucket`, `--reseed`, `--crq-csv <file>`, `--apply` (ingest only;
geocoding + history are not on the CRQ entrypoint).
## Schema / migrations (`tracksolid_db`, applied via `run_migrations.py`)
| Migration | What |
|---|---|
| 01_tickets_schema | `tickets.inc`/`crq` (raw-jsonb-first), `geo_clusters`/`geo_locations` gazetteers, geom-resolution trigger, `reporting.fn_tickets_for_map` |
| 02_import_meta | `tickets.import_meta` (snapshot freshness) + `fn_tickets_for_map` `summary.freshness` |
| 03_inc_columns | Unpack `raw` → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via `tickets.eat_ts()`) |
| 04_inc_latlng | `latitude`/`longitude` = `COALESCE(feed, ST_Y/ST_X(geom))` (populated from geocode) |
| 05_inc_geography | `geog geography(Point,4326)` (= `geom::geography`) + GiST index for routing |
| 06_inc_mttr_minutes | `mttr` → integer **minutes**; drop constant `is_alarm`/`is_auto_created`/`is_auto_closed` |
| 07_inc_drop_service_type | drop constant `service_type` |
| 08_inc_open_sla_view | `tickets.inc_open_sla` view (open tickets + derived SLA) |
| 09_inc_dashboard_fn | **built**`reporting.fn_inc_dashboard(cluster, status, window, from, to)`: one JSON payload (open GeoJSON + windowed closed GeoJSON + metrics + freshness) for the FleetOps live INC map. See `docs/phase-2-dashboard.md` |
| 10_inc_history_capture | **built**`tickets.closure_events` (append-only observed closures) + `tickets.inc_daily_snapshot` (per-EAT-day open backlog + flow) + `tickets.capture_history()`; the ingest calls it each `--apply` run. Unlocks backlog-over-time |
2026-06-25 20:16:38 +00:00
| 12_inc_dashboard_by_owner | **built** — owner/team breakdown extension to `fn_inc_dashboard` |
| 15_crq_table | **built** — materializes `tickets.crq` (table + geom trigger + indexes; `01`'s crq section never ran on the live DB) + the typed STORED generated columns from `03` (reuses `tickets.eat_ts()`). Data-layer parity for the CRQ tab |
`tickets.inc` columns: `ticket_id` (PK), `raw` (jsonb, source of truth),
`normalized_status`/`raw_status`, `bucket`, `is_actionable`, `cluster`/`region`/
`location_name`, `assigned_team`/`owner`, `sla_status`, `mttr` (min),
`created_at_service`/`scheduled_at`/`closed_at`/`first_seen_at`/`last_seen_at`/
`source_created_at`/`source_updated_at` (timestamptz), `latitude`/`longitude`,
`geom`/`geog`/`geo_source`, `ingested_at`. Dropped-but-in-`raw`: `service_type`,
`is_alarm`, `is_auto_created`, `is_auto_closed`, and the ingest-time drops.
## Deployment
- **Coolify** app built from this repo's `Dockerfile` (`python:3.12-slim`,
`TZ=Africa/Nairobi`, keep-alive `tail -f /dev/null`). Separate from the FleetOps
web app (`fleet-ops-staging`).
2026-06-25 20:16:38 +00:00
- **Scheduled Tasks (two):** `inc_tickets` → `python -m inc.import_inc --from-bucket
--apply` and `crq_tickets``python -m crq.import_crq --from-bucket --apply`, both cron
`*/20 6-20 * * *` in **EAT** (Coolify runs tasks in EAT — no UTC conversion).
- **Env vars** (Coolify): `DATABASE_URL` (internal DB host), `RUSTFS_*`
2026-06-25 20:16:38 +00:00
(`isptickets` bucket — serves both inc + crq), `GEOCODER_*`.
- For a plain host/VM, `run_ingest.sh` + a crontab line is the alternative.
Full ops runbook (env management, the Forgejo → Coolify auto-deploy webhook, manual
deploys, bucket cutover, verification): **`docs/deployment-and-operations.md`**.
## State at hand-off
- `tickets.inc` ≈ 21,312 rows (current non-alarm INC + a few aged-out history rows);
**0 alarm / 0 sentinel** (legacy rows cleaned up one-time).
- Geocoding ~**99.99%** (`geom` on all but 1 null-cluster ticket); `QOA`/`PTMP`
cluster codes mapped to Quarry Road / Pipeline.
- Read path verified: `reporting.fn_tickets_for_map()` + `tickets.inc_open_sla`.
## Data-quality caveats (must inform analytics)
- Source `sla_status` only meaningful once **closed**; open SLA must be **derived**
(`now created_at_service`, `first_seen_at` fallback; ~30% lack
`created_at_service`).
- `mttr` is **minutes**, null until closed; not wall-clock and not a 48h threshold.
- Lifecycle timestamps = `created_at_service`→`closed_at`; the `*_seen_at` / `source_*`
ones are export bookkeeping (don't use for SLA/closure-time).
- Content lag ~2 days behind wall-clock.
- **History:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR event
series work directly; **backlog-over-time** now accrues via
`tickets.inc_daily_snapshot` + `tickets.closure_events` (written by
`tickets.capture_history()` each ingest) — builds forward from the first capture.
## Roadmap
Phase 2 (built): `fn_inc_dashboard` read-API → FleetOps live map (open + closed
overlay + metrics); history capture (`closure_events` + `inc_daily_snapshot`) for
backlog/closure trends. Remaining: `dashboard_api` endpoint + FleetOps SPA (other
repos; see `docs/dashboard-api-contract.md`), FleetNow **dispatch** off `geog`,
2026-06-25 20:16:38 +00:00
**team closure attribution**.
**CRQ** (this milestone): the shared engine now feeds `tickets.crq` from
`automations/crq/changes/` (`crq/import_crq.py`), with the `tickets.crq` table + typed columns (migration 15) and
2026-06-25 20:16:38 +00:00
cross-dataset geocoding — CRQ shows on the Tickets map via `fn_tickets_for_map` (which
already unions it) and gets its own FleetOps tab. Deferred to a follow-up once
installation-lifecycle semantics are confirmed: the CRQ analogues of migrations
08/09/10 — `crq_open_sla`, `fn_crq_dashboard`, and CRQ history capture (`tickets.crq`
currently has **no** `post_apply` hook).