diff --git a/docs/implementation.md b/docs/implementation.md new file mode 100644 index 0000000..da902b4 --- /dev/null +++ b/docs/implementation.md @@ -0,0 +1,83 @@ +# Implementation record — fleettickets (as built) + +What is actually built and deployed, as of the Phase-1 completion. Companion to +`docs/phase-1-ingestion.md` (plan) and `docs/phase-2-dashboard.md` (next). + +## Pipeline (`import_tickets.py`) + +- **Source:** newest `automations/inc/.csv` in the rustfs `tickets` + bucket (endpoint `https://s3.rahamafresh.com`, path-style, region `us-east-1`). +- **S3 access via boto3** (no aws-CLI dependency): `list_objects_v2` (paginator), + `get_object`, `copy_object` + `delete_object` for archiving. +- **Skip-if-unchanged:** newest S3 **ETag** vs `tickets.import_meta.metadata.source_etag`; + equal → skip the DB write (the export re-emits identical content most hours). +- **Cleaning:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop + `week_start`/`week_end`, `source_s3_bucket`/`source_s3_key`/`source_snapshot_id`, + `department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE. +- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete. On success, + **move** processed file(s) → `automations/inc/processed/`. +- **Geocoding** (keyed LocationIQ): `--geocode-clusters` (coarse, per cluster) and + `--geocode-locations` (precise, actionable INC; strips network codes; 25 km + wrong-city guard). Results cache in `tickets.geo_clusters` / `tickets.geo_locations`. +- CLI: `--from-bucket` (newest INC csv), `--inc-csv ` (local dev), `--apply` + (else dry-run), `--geocode-clusters`, `--geocode-locations`. + +## Schema / migrations (`tracksolid_db`, applied via `run_migrations.py`) + +| Migration | What | +|---|---| +| 01_tickets_schema | `tickets.inc`/`crq` (raw-jsonb-first), `geo_clusters`/`geo_locations` gazetteers, geom-resolution trigger, `reporting.fn_tickets_for_map` | +| 02_import_meta | `tickets.import_meta` (snapshot freshness) + `fn_tickets_for_map` `summary.freshness` | +| 03_inc_columns | Unpack `raw` → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via `tickets.eat_ts()`) | +| 04_inc_latlng | `latitude`/`longitude` = `COALESCE(feed, ST_Y/ST_X(geom))` (populated from geocode) | +| 05_inc_geography | `geog geography(Point,4326)` (= `geom::geography`) + GiST index for routing | +| 06_inc_mttr_minutes | `mttr` → integer **minutes**; drop constant `is_alarm`/`is_auto_created`/`is_auto_closed` | +| 07_inc_drop_service_type | drop constant `service_type` | +| 08_inc_open_sla_view | `tickets.inc_open_sla` view (open tickets + derived SLA) | +| 09_inc_dashboard_fn | *(planned)* `reporting.fn_inc_dashboard` — see `docs/phase-2-dashboard.md` | + +`tickets.inc` columns: `ticket_id` (PK), `raw` (jsonb, source of truth), +`normalized_status`/`raw_status`, `bucket`, `is_actionable`, `cluster`/`region`/ +`location_name`, `assigned_team`/`owner`, `sla_status`, `mttr` (min), +`created_at_service`/`scheduled_at`/`closed_at`/`first_seen_at`/`last_seen_at`/ +`source_created_at`/`source_updated_at` (timestamptz), `latitude`/`longitude`, +`geom`/`geog`/`geo_source`, `ingested_at`. Dropped-but-in-`raw`: `service_type`, +`is_alarm`, `is_auto_created`, `is_auto_closed`, and the ingest-time drops. + +## Deployment + +- **Coolify** app built from this repo's `Dockerfile` (`python:3.12-slim`, + `TZ=Africa/Nairobi`, keep-alive `tail -f /dev/null`). Separate from the FleetOps + web app (`fleet-ops-staging`). +- **Scheduled Task:** `python import_tickets.py --from-bucket --apply`, cron + `15 7-19 * * *` in **EAT** (Coolify runs tasks in EAT — no UTC conversion). +- **Env vars** (Coolify): `DATABASE_URL` (internal DB host), `RUSTFS_*`, `GEOCODER_*`. +- For a plain host/VM, `run_ingest.sh` + a crontab line is the alternative. + +## State at hand-off + +- `tickets.inc` ≈ 21,312 rows (current non-alarm INC + a few aged-out history rows); + **0 alarm / 0 sentinel** (legacy rows cleaned up one-time). +- Geocoding ~**99.99%** (`geom` on all but 1 null-cluster ticket); `QOA`/`PTMP` + cluster codes mapped to Quarry Road / Pipeline. +- Read path verified: `reporting.fn_tickets_for_map()` + `tickets.inc_open_sla`. + +## Data-quality caveats (must inform analytics) + +- Source `sla_status` only meaningful once **closed**; open SLA must be **derived** + (`now − created_at_service`, `first_seen_at` fallback; ~30% lack + `created_at_service`). +- `mttr` is **minutes**, null until closed; not wall-clock and not a 48h threshold. +- Lifecycle timestamps = `created_at_service`→`closed_at`; the `*_seen_at` / `source_*` + ones are export bookkeeping (don't use for SLA/closure-time). +- Content lag ~2 days behind wall-clock. +- **History gap:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR + *event* series work directly; **open-backlog-over-time** needs an append-only + history capture (not yet built). + +## Roadmap + +Phase 2: `fn_inc_dashboard` read-API → FleetOps live map (open + closed overlay + +metrics). Then FleetNow **dispatch** off `geog`, **team closure attribution**, and +**history capture** for backlog trends. **CRQ** = separate future project reusing +this machinery against `automations/crq/`. diff --git a/docs/phase-1-ingestion.md b/docs/phase-1-ingestion.md new file mode 100644 index 0000000..03f6de9 --- /dev/null +++ b/docs/phase-1-ingestion.md @@ -0,0 +1,98 @@ +# PRD (Phase 1) — INC hourly CSV ingestion → tracksolid_db → FleetOps Tickets map + +> Status: **complete and deployed** (migrations 01–08, boto3 loader, geocoding, +> Coolify hourly `15 7-19 * * *` EAT). This document is the record of the Phase-1 +> plan; see `README.md` and `docs/implementation.md` for the as-built state. + +## Scope: INC only + +**This workflow is strictly for INC** (incident / customer-fault tickets). It +ingests **only** `automations/inc/.csv`. CRQ (new-installation) +exports at `automations/crq/` are **out of scope** and are not processed here; the +field transforms below are likewise INC-only. + +## Context + +The client (Rahamafresh / Fireside) runs an n8n workflow that exports field-ops +tickets to our S3-compatible bucket **every hour**: + +- `automations/inc/.csv` — **incidents / customer faults** *(in scope)* +- `automations/crq/.csv` — new-installation requests *(out of scope)* + +(See `n8n-hourly-s3-full-data-exports.md`. Sample: `2026-06-15T17-00-00.csv`.) + +`fleettickets` owns the **downstream**: the `tickets` schema in the shared +`tracksolid_db` (raw-jsonb-first `tickets.inc`, geocoding gazetteers, and +`reporting.fn_tickets_for_map`, which `dashboard_api` serves to the FleetOps +"Tickets" tab). `tickets.crq` keeps existing but is not fed by this pipeline. + +**The problem:** the loader was written for the *old* export model — JSON +`{metadata, records}` envelopes at a stable `automations/inc/latest.json`. That +model is gone; the new exports are **flat CSV, timestamped per hour, with no +`latest` pointer, no envelope, and no deltas** — every hourly file is a **full +current-state snapshot**. + +**Two driving objectives this pipeline feeds:** + +1. **SLA tracking** — contract requires tickets closed within **48h of + `created_at_service`**; closed carry source `sla_status` + `mttr`, open need a + derived state (`now − created_at_service` ≥48h breached / ≥36h at-risk). +2. **Vehicle routing (most important)** — accurately geocoded open tickets so + FleetNow can route nearest vehicles; subsequent: team closure attribution. + +## Data contract (verified against live snapshots) + +- 32 columns; header + double-quoted values. INC sample = 31,434 rows. +- `ticket_id` is the **primary key**; the same ticket recurs across snapshots as it + moves `open → closed`. Verified: 31,434 distinct ids per file, **0 in-file dups**, + same id set every hour (0 added/dropped) → **upsert is the dedup mechanism, no + TRUNCATE**. Consecutive files are often byte-identical → skip-if-unchanged. +- `is_alarm=true` (~10,132 rows, all `is_actionable=false`) → **dropped**. +- `latitude`/`longitude` are **empty** in the feed → geocoding required. +- A garbage **sentinel row** (`ticket_id = "EXPORT STOPPED DUE TO EXCESSIVE SIZE…"`) + is commonly the first data line → filtered by `ticket_id` prefix. +- Timestamps (filenames + data) are **EAT (Africa/Nairobi, UTC+3)**. +- `bucket` is meaningful (`closed`/`pending`), distinct from `source_s3_bucket`. + +## Approach + +Keep the **raw-jsonb-first** model and everything downstream; only the loader's +input path changes: JSON-`latest` → **newest timestamped CSV**, plus move-on-success. + +- **Newest file** per `automations/inc/` (parse `YYYY-MM-DDTHH-mm-ss.csv`), via + **boto3** (path-style; no aws-CLI dependency). +- **Skip-if-unchanged**: compare newest S3 **ETag** to the last processed ETag + (`tickets.import_meta.metadata.source_etag`); equal → skip DB write. +- **Cleaning at ingest**: drop `is_alarm=true` + sentinel; drop `week_start`, + `week_end`, `source_s3_bucket`, `source_s3_key`, `source_snapshot_id`, + `department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE; + keep `service_type`* and `bucket`. (*`service_type` later dropped as constant.) +- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete → closure + history accumulates. On success **move** the file(s) to + `automations/inc/processed/`. +- Record snapshot freshness in `tickets.import_meta`. +- Geocoding unchanged: `--geocode-clusters` (coarse) + `--geocode-locations` + (precise, actionable INC; keyed LocationIQ; 25 km wrong-city guard). + +## Orchestration + +Deployed on **Coolify** (own app, `Dockerfile`, keep-alive worker). Ingest runs as a +**Scheduled Task**: `python import_tickets.py --from-bucket --apply`, cron +`15 7-19 * * *` in **EAT**. Env: `DATABASE_URL`, `RUSTFS_*`, `GEOCODER_*`. + +## Data-quality findings (carried into Phase 2) + +- Source `sla_status` ≠ a plain 48h rule, and `mttr` is not wall-clock — pin the + contract's SLA definition before trusting cross-field SLA math. +- `created_at_service` is null on ~30% of rows (incl. most open) → needs a fallback + clock (`first_seen_at`). +- Split timestamp semantics: lifecycle = `created_at_service`→`closed_at`; export + bookkeeping = `created_at`/`updated_at`/`first_seen_at`/`last_seen_at`. +- `assigned_team` missing ~34% (`owner` better). +- Content lag ~2 days (underlying `…wm_task.xlsx` source date). + +## Outcome (as built) + +Live in `tracksolid_db`: `tickets.inc` (raw + typed generated columns), geocoded to +~99.99%, alarm/sentinel removed, hourly refresh with ETag skip + archive. See +`docs/implementation.md`. diff --git a/docs/phase-2-dashboard.md b/docs/phase-2-dashboard.md new file mode 100644 index 0000000..2cb50d8 --- /dev/null +++ b/docs/phase-2-dashboard.md @@ -0,0 +1,141 @@ +# PRD (Phase 2) — INC operations dashboard: read-API layer + +> Phase 1 (hourly INC CSV ingestion → `tickets.inc`, geocoding, typed generated +> columns, `inc_open_sla` view) is **complete and deployed** (migrations 01–08, +> Coolify hourly `15 7-19 * * *` EAT). See `docs/phase-1-ingestion.md` / +> `docs/implementation.md`. This document is Phase 2. + +## Context + +FleetOps needs a **live INC operations map** (modelled on FleetNow): + +- A map showing **all currently-open INC tickets** alongside **live vehicle + positions from FleetNow**. +- A **bottom timeline bar** that overlays **closed tickets** (alongside FleetNow + vehicle routes) for a selected period. +- **Bottom filters**: `cluster`, ticket `status`, and **time** = today / this week / + this month / custom date. +- **Top metric cards** that react to the selected filters — **ticket** metrics + (not vehicle metrics). + +**Scope of THIS repo (confirmed): the data / read-API layer only.** `fleettickets` +exposes parameterized SQL in `tracksolid_db` that `dashboard_api` serves to the +**FleetOps SPA**. The map UI, timeline bar, filter controls, metric cards, and the +**FleetNow vehicle positions/routes** are **other repos/systems**. There is no +vehicle id in the INC feed, so we serve **tickets only**; the SPA overlays FleetNow +vehicles/routes. + +## Confirmed behaviour + +- **Open layer (live):** all `is_actionable = true` INC tickets matching the + cluster/status filter — **not** time-filtered (open = needs action now). +- **Closed overlay (windowed):** closed tickets whose `closed_at` falls in the + selected window, matching cluster/status. +- **Metric cards (windowed):** computed for the current selection. +- **Filters combine with AND**, each optional. **Windows are calendar EAT** + (today / ISO-week / month) or an explicit custom `[from, to)`. +- **Delivery:** one parameterized function returning a single JSON payload + `{ open: GeoJSON, closed: GeoJSON, metrics: {…}, window, freshness }`, mirroring + the existing `reporting.fn_tickets_for_map` style. + +## Deliverable — `migrations/09_inc_dashboard_fn.sql` + +A new read function (and supporting index if needed); additive, idempotent +(`CREATE OR REPLACE`), no change to existing objects. + +### `reporting.fn_inc_dashboard(...)` + +``` +reporting.fn_inc_dashboard( + p_cluster text DEFAULT NULL, -- exact cluster (matches tickets.inc.cluster) + p_status text DEFAULT NULL, -- normalized_status + p_window text DEFAULT 'today', -- 'today' | 'week' | 'month' | 'custom' + p_from timestamptz DEFAULT NULL, -- custom window start (inclusive) + p_to timestamptz DEFAULT NULL -- custom window end (exclusive) +) RETURNS jsonb +``` + +- **Window resolution:** if `p_from`/`p_to` given → use them (custom). Else compute + **EAT calendar bounds** from `p_window`: `today` = `[date_trunc('day', now_eat), + +1 day)`, `week` = `date_trunc('week', …)`, `month` = `date_trunc('month', …)` — + converted back to `timestamptz` via `… AT TIME ZONE 'Africa/Nairobi'`. +- **Returned JSON:** + ```jsonc + { + "window": { "from": "...", "to": "...", "preset": "today" }, + "open": { "type":"FeatureCollection", "features":[ … ] }, // all open, filtered by cluster/status + "closed": { "type":"FeatureCollection", "features":[ … ] }, // closed_at in window, filtered + "metrics": { + "open_now": int, + "closed_in_window": int, + "sla": { + "open": { "breached": int, "at_risk": int, "ok": int, "unknown": int }, + "closed": { "compliant": int, "breached": int } + }, + "by_status": { "": int, … }, + "by_cluster": { "": int, … }, + "closure_rate": { "per_day_avg": num, "series": [ { "day":"YYYY-MM-DD", "count":int }, … ] }, + "avg_mttr_min": num + }, + "freshness": { … } // from tickets.import_meta + } + ``` +- **Feature properties** (both layers): `ticket_id, normalized_status, cluster, + region, location_name, assigned_team, owner, geo_source`. Open adds `sla_state, + hours_open`; closed adds `closed_at, mttr, sla_status`. Geometry from `geom` + (`ST_AsGeoJSON`). Only `geom IS NOT NULL` rows become features; `metrics` count the + full filtered set (note the small geocoding gap). + +### Reuse (don't reinvent) + +- **`tickets.inc_open_sla`** (migration 08) — `sla_state` / `hours_open` for the open + layer + open-SLA metrics. +- **Typed generated columns** (migrations 03–07): `cluster`, `normalized_status`, + `closed_at`, `mttr` (minutes), `assigned_team`, `geom`, `geo_source`. +- **`reporting.fn_tickets_for_map`** (migrations 01–02) — GeoJSON + `jsonb_build_object`/`ST_AsGeoJSON` + `summary.freshness` patterns. +- **Derived SLA logic** — `now() − COALESCE(created_at_service, first_seen_at)` vs + 48h/36h. + +### Indexes + +In place: `ix_inc_closed_at`, `ix_inc_cluster_col`, `ix_inc_norm_status_col`, +`ix_inc_actionable_col`, `ix_inc_geom`, `ix_inc_geog`. Add composite +`(closed_at, cluster)` only if EXPLAIN shows it's needed. + +### Grants + +`GRANT EXECUTE ON FUNCTION reporting.fn_inc_dashboard(...) TO dashboard_ro` (guarded). + +## Dependencies (other repos) + +- **`dashboard_api`** — endpoint e.g. `GET /webhook/inc-dashboard?cluster=&status=&window=&from=&to=` + calling `fn_inc_dashboard`. *(Contract here; impl there.)* +- **FleetOps SPA** (`fleetops`) — map, timeline bar, filter UI, metric cards; + overlays FleetNow vehicles/routes. +- **FleetNow** — live vehicle positions + historical routes. + +## Data-quality caveats (affect metrics, not delivery) + +- Source `sla_status` only meaningful for **closed**; open SLA is derived. +- `created_at_service` null on ~30% → some open are SLA `unknown` (fallback flagged). +- `mttr` is **minutes**, null until closed; closure/MTTR metrics filter accordingly. +- Content lag ~2 days → recent days under-count. +- A few tickets lack `geom` → counted in metrics, absent from map features. + +## Verification + +1. `SELECT reporting.fn_inc_dashboard();` → valid JSON (open/closed FCs, metrics, + window=today, freshness). +2. Filters: `p_cluster`, `p_status`, `p_window := 'month'`, and a custom `p_from/p_to` + — counts match ad-hoc `SELECT`s on `tickets.inc` / `tickets.inc_open_sla`. +3. Window math: today/week/month are correct **EAT** calendar ranges. +4. SLA metrics match the `inc_open_sla` distribution / source `sla_status` in window. +5. `EXPLAIN ANALYZE` on the windowed closed query uses `ix_inc_closed_at`. +6. Apply via `run_migrations.py`; ledgered in `tickets.schema_migrations`. + +## Out of scope (future) + +- **Open-backlog-over-time** / observed open→closed transitions need the append-only + history capture (`tickets.closure_events` + daily snapshot) — separate plan. +- **Dispatch surface** (nearest-vehicle off `geog`) — after analytics.