fleettickets/docs/implementation.md
david kiania 066d866b90 fix(crq): migration 15 creates tickets.crq (live DB never materialized it)
Live-DB reconciliation before seeding CRQ revealed two divergences:
- tickets.crq did NOT exist: 01_tickets_schema.sql was applied 2026-06-15 from a
  version predating its crq section, so the IF-NOT-EXISTS ledger guard has blocked
  it ever since (fn_tickets_for_map + resolve_ticket_geoms already reference crq, so
  they errored if called — masked because the live INC view uses fn_inc_dashboard).
- The live ledger carries un-versioned 13_inc_search_fn.sql / 14_inc_filter_options.sql
  (applied 2026-06-19, absent from this repo).

So 13_crq_columns.sql (ALTER-only, number 13) is replaced by 15_crq_table.sql, which
CREATEs tickets.crq self-containedly (table + geom trigger + raw/typed indexes) and
adds the typed STORED generated columns. Deterministic + idempotent on both the live DB
(crq missing) and a fresh DB (crq minimal from 01). Numbered 15 to sit after the live
ledger's max. Docs/CLI references updated 13->15.

Applied + seeded on the live DB out-of-band (running container, INC image untouched):
39,240 crq rows, 99.99% geocoded (cluster + shared location cache), watermark current,
crq now renders on fn_tickets_for_map.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 23:55:17 +03:00

118 lines
7.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation record — fleettickets (as built)
What is actually built and deployed, as of the Phase-1 completion. Companion to
`docs/phase-1-ingestion.md` (plan) and `docs/phase-2-dashboard.md` (next).
## Pipeline (`pipeline.py` engine + `inc/`,`crq/` entrypoints)
The dataset-agnostic CDC engine lives in **`pipeline.py`**, parameterized by a small
`Dataset` config (name, table, `automations/<type>/changes|processed/` prefixes, key
regex, optional `post_apply` hook). Two thin entrypoints supply that config and the CLI:
**`inc/import_inc.py`** (`python -m inc.import_inc`, `post_apply=capture_history`) and
**`crq/import_crq.py`** (`python -m crq.import_crq`, no history hook). INC and CRQ share an
**identical 32-column source schema**, so the engine is fully shared; geocoding is
**cross-dataset** (one gazetteer/budget, unions `tickets.inc` + `tickets.crq`) and is run
from the INC entrypoint.
- **Source:** the incremental CDC stream `automations/<inc|crq>/changes/<EAT-timestamp>.csv`
in the **`isptickets`** S3 bucket (endpoint `https://s3.rahamafresh.com`, path-style,
region `us-east-1`; was the `tickets` bucket before the 2026-06-25 cutover).
- **S3 access via boto3** (no aws-CLI dependency): `list_objects_v2` (paginator),
`get_object`, `copy_object` + `delete_object` for archiving.
- **Watermark:** drains every `changes/` file newer than
`tickets.import_meta.metadata.source_max_key`, oldest→newest; reruns with no new file
are a cheap no-op. `--reseed` ignores the watermark for a one-time bucket cutover.
- **Cleaning:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
`week_start`/`week_end`, `source_s3_bucket`/`source_s3_key`/`source_snapshot_id`,
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE.
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete. On success,
**move** processed file(s) → `automations/inc/processed/`.
- **Geocoding** (keyed LocationIQ): `--geocode-clusters` (coarse, per cluster) and
`--geocode-locations` (precise, actionable INC; strips network codes; 25 km
wrong-city guard). Results cache in `tickets.geo_clusters` / `tickets.geo_locations`.
- **History capture:** after each `--apply` run (ingest or skip), calls
`tickets.capture_history()` → appends new closures + upserts today's backlog
snapshot.
- CLI (`inc`): `--from-bucket` (drain the INC change stream), `--reseed` (ignore the
watermark; one-time bucket cutover), `--inc-csv <file>` (local dev), `--apply` (else
dry-run), `--geocode-clusters`, `--geocode-locations`, `--capture-history`.
- CLI (`crq`): `--from-bucket`, `--reseed`, `--crq-csv <file>`, `--apply` (ingest only;
geocoding + history are not on the CRQ entrypoint).
## Schema / migrations (`tracksolid_db`, applied via `run_migrations.py`)
| Migration | What |
|---|---|
| 01_tickets_schema | `tickets.inc`/`crq` (raw-jsonb-first), `geo_clusters`/`geo_locations` gazetteers, geom-resolution trigger, `reporting.fn_tickets_for_map` |
| 02_import_meta | `tickets.import_meta` (snapshot freshness) + `fn_tickets_for_map` `summary.freshness` |
| 03_inc_columns | Unpack `raw` → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via `tickets.eat_ts()`) |
| 04_inc_latlng | `latitude`/`longitude` = `COALESCE(feed, ST_Y/ST_X(geom))` (populated from geocode) |
| 05_inc_geography | `geog geography(Point,4326)` (= `geom::geography`) + GiST index for routing |
| 06_inc_mttr_minutes | `mttr` → integer **minutes**; drop constant `is_alarm`/`is_auto_created`/`is_auto_closed` |
| 07_inc_drop_service_type | drop constant `service_type` |
| 08_inc_open_sla_view | `tickets.inc_open_sla` view (open tickets + derived SLA) |
| 09_inc_dashboard_fn | **built**`reporting.fn_inc_dashboard(cluster, status, window, from, to)`: one JSON payload (open GeoJSON + windowed closed GeoJSON + metrics + freshness) for the FleetOps live INC map. See `docs/phase-2-dashboard.md` |
| 10_inc_history_capture | **built**`tickets.closure_events` (append-only observed closures) + `tickets.inc_daily_snapshot` (per-EAT-day open backlog + flow) + `tickets.capture_history()`; the ingest calls it each `--apply` run. Unlocks backlog-over-time |
| 12_inc_dashboard_by_owner | **built** — owner/team breakdown extension to `fn_inc_dashboard` |
| 15_crq_table | **built** — materializes `tickets.crq` (table + geom trigger + indexes; `01`'s crq section never ran on the live DB) + the typed STORED generated columns from `03` (reuses `tickets.eat_ts()`). Data-layer parity for the CRQ tab |
`tickets.inc` columns: `ticket_id` (PK), `raw` (jsonb, source of truth),
`normalized_status`/`raw_status`, `bucket`, `is_actionable`, `cluster`/`region`/
`location_name`, `assigned_team`/`owner`, `sla_status`, `mttr` (min),
`created_at_service`/`scheduled_at`/`closed_at`/`first_seen_at`/`last_seen_at`/
`source_created_at`/`source_updated_at` (timestamptz), `latitude`/`longitude`,
`geom`/`geog`/`geo_source`, `ingested_at`. Dropped-but-in-`raw`: `service_type`,
`is_alarm`, `is_auto_created`, `is_auto_closed`, and the ingest-time drops.
## Deployment
- **Coolify** app built from this repo's `Dockerfile` (`python:3.12-slim`,
`TZ=Africa/Nairobi`, keep-alive `tail -f /dev/null`). Separate from the FleetOps
web app (`fleet-ops-staging`).
- **Scheduled Tasks (two):** `inc_tickets` → `python -m inc.import_inc --from-bucket
--apply` and `crq_tickets``python -m crq.import_crq --from-bucket --apply`, both cron
`*/20 6-20 * * *` in **EAT** (Coolify runs tasks in EAT — no UTC conversion).
- **Env vars** (Coolify): `DATABASE_URL` (internal DB host), `RUSTFS_*`
(`isptickets` bucket — serves both inc + crq), `GEOCODER_*`.
- For a plain host/VM, `run_ingest.sh` + a crontab line is the alternative.
Full ops runbook (env management, the Forgejo → Coolify auto-deploy webhook, manual
deploys, bucket cutover, verification): **`docs/deployment-and-operations.md`**.
## State at hand-off
- `tickets.inc` ≈ 21,312 rows (current non-alarm INC + a few aged-out history rows);
**0 alarm / 0 sentinel** (legacy rows cleaned up one-time).
- Geocoding ~**99.99%** (`geom` on all but 1 null-cluster ticket); `QOA`/`PTMP`
cluster codes mapped to Quarry Road / Pipeline.
- Read path verified: `reporting.fn_tickets_for_map()` + `tickets.inc_open_sla`.
## Data-quality caveats (must inform analytics)
- Source `sla_status` only meaningful once **closed**; open SLA must be **derived**
(`now created_at_service`, `first_seen_at` fallback; ~30% lack
`created_at_service`).
- `mttr` is **minutes**, null until closed; not wall-clock and not a 48h threshold.
- Lifecycle timestamps = `created_at_service`→`closed_at`; the `*_seen_at` / `source_*`
ones are export bookkeeping (don't use for SLA/closure-time).
- Content lag ~2 days behind wall-clock.
- **History:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR event
series work directly; **backlog-over-time** now accrues via
`tickets.inc_daily_snapshot` + `tickets.closure_events` (written by
`tickets.capture_history()` each ingest) — builds forward from the first capture.
## Roadmap
Phase 2 (built): `fn_inc_dashboard` read-API → FleetOps live map (open + closed
overlay + metrics); history capture (`closure_events` + `inc_daily_snapshot`) for
backlog/closure trends. Remaining: `dashboard_api` endpoint + FleetOps SPA (other
repos; see `docs/dashboard-api-contract.md`), FleetNow **dispatch** off `geog`,
**team closure attribution**.
**CRQ** (this milestone): the shared engine now feeds `tickets.crq` from
`automations/crq/changes/` (`crq/import_crq.py`), with the `tickets.crq` table + typed columns (migration 15) and
cross-dataset geocoding — CRQ shows on the Tickets map via `fn_tickets_for_map` (which
already unions it) and gets its own FleetOps tab. Deferred to a follow-up once
installation-lifecycle semantics are confirmed: the CRQ analogues of migrations
08/09/10 — `crq_open_sla`, `fn_crq_dashboard`, and CRQ history capture (`tickets.crq`
currently has **no** `post_apply` hook).