docs: add docs/ — phase-1/phase-2 PRDs + implementation record

- docs/phase-1-ingestion.md  — Phase 1 PRD (INC hourly CSV ingestion; deployed)
- docs/phase-2-dashboard.md  — Phase 2 PRD (inc_dashboard read-API for FleetOps map)
- docs/implementation.md     — as-built record (pipeline, migrations 01-08, deploy, DQ)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
david kiania 2026-06-16 01:05:18 +03:00
parent e17553ccbf
commit f2408f113e
3 changed files with 322 additions and 0 deletions

83
docs/implementation.md Normal file
View file

@ -0,0 +1,83 @@
# Implementation record — fleettickets (as built)
What is actually built and deployed, as of the Phase-1 completion. Companion to
`docs/phase-1-ingestion.md` (plan) and `docs/phase-2-dashboard.md` (next).
## Pipeline (`import_tickets.py`)
- **Source:** newest `automations/inc/<EAT-timestamp>.csv` in the rustfs `tickets`
bucket (endpoint `https://s3.rahamafresh.com`, path-style, region `us-east-1`).
- **S3 access via boto3** (no aws-CLI dependency): `list_objects_v2` (paginator),
`get_object`, `copy_object` + `delete_object` for archiving.
- **Skip-if-unchanged:** newest S3 **ETag** vs `tickets.import_meta.metadata.source_etag`;
equal → skip the DB write (the export re-emits identical content most hours).
- **Cleaning:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
`week_start`/`week_end`, `source_s3_bucket`/`source_s3_key`/`source_snapshot_id`,
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE.
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete. On success,
**move** processed file(s) → `automations/inc/processed/`.
- **Geocoding** (keyed LocationIQ): `--geocode-clusters` (coarse, per cluster) and
`--geocode-locations` (precise, actionable INC; strips network codes; 25 km
wrong-city guard). Results cache in `tickets.geo_clusters` / `tickets.geo_locations`.
- CLI: `--from-bucket` (newest INC csv), `--inc-csv <file>` (local dev), `--apply`
(else dry-run), `--geocode-clusters`, `--geocode-locations`.
## Schema / migrations (`tracksolid_db`, applied via `run_migrations.py`)
| Migration | What |
|---|---|
| 01_tickets_schema | `tickets.inc`/`crq` (raw-jsonb-first), `geo_clusters`/`geo_locations` gazetteers, geom-resolution trigger, `reporting.fn_tickets_for_map` |
| 02_import_meta | `tickets.import_meta` (snapshot freshness) + `fn_tickets_for_map` `summary.freshness` |
| 03_inc_columns | Unpack `raw` → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via `tickets.eat_ts()`) |
| 04_inc_latlng | `latitude`/`longitude` = `COALESCE(feed, ST_Y/ST_X(geom))` (populated from geocode) |
| 05_inc_geography | `geog geography(Point,4326)` (= `geom::geography`) + GiST index for routing |
| 06_inc_mttr_minutes | `mttr` → integer **minutes**; drop constant `is_alarm`/`is_auto_created`/`is_auto_closed` |
| 07_inc_drop_service_type | drop constant `service_type` |
| 08_inc_open_sla_view | `tickets.inc_open_sla` view (open tickets + derived SLA) |
| 09_inc_dashboard_fn | *(planned)* `reporting.fn_inc_dashboard` — see `docs/phase-2-dashboard.md` |
`tickets.inc` columns: `ticket_id` (PK), `raw` (jsonb, source of truth),
`normalized_status`/`raw_status`, `bucket`, `is_actionable`, `cluster`/`region`/
`location_name`, `assigned_team`/`owner`, `sla_status`, `mttr` (min),
`created_at_service`/`scheduled_at`/`closed_at`/`first_seen_at`/`last_seen_at`/
`source_created_at`/`source_updated_at` (timestamptz), `latitude`/`longitude`,
`geom`/`geog`/`geo_source`, `ingested_at`. Dropped-but-in-`raw`: `service_type`,
`is_alarm`, `is_auto_created`, `is_auto_closed`, and the ingest-time drops.
## Deployment
- **Coolify** app built from this repo's `Dockerfile` (`python:3.12-slim`,
`TZ=Africa/Nairobi`, keep-alive `tail -f /dev/null`). Separate from the FleetOps
web app (`fleet-ops-staging`).
- **Scheduled Task:** `python import_tickets.py --from-bucket --apply`, cron
`15 7-19 * * *` in **EAT** (Coolify runs tasks in EAT — no UTC conversion).
- **Env vars** (Coolify): `DATABASE_URL` (internal DB host), `RUSTFS_*`, `GEOCODER_*`.
- For a plain host/VM, `run_ingest.sh` + a crontab line is the alternative.
## State at hand-off
- `tickets.inc` ≈ 21,312 rows (current non-alarm INC + a few aged-out history rows);
**0 alarm / 0 sentinel** (legacy rows cleaned up one-time).
- Geocoding ~**99.99%** (`geom` on all but 1 null-cluster ticket); `QOA`/`PTMP`
cluster codes mapped to Quarry Road / Pipeline.
- Read path verified: `reporting.fn_tickets_for_map()` + `tickets.inc_open_sla`.
## Data-quality caveats (must inform analytics)
- Source `sla_status` only meaningful once **closed**; open SLA must be **derived**
(`now created_at_service`, `first_seen_at` fallback; ~30% lack
`created_at_service`).
- `mttr` is **minutes**, null until closed; not wall-clock and not a 48h threshold.
- Lifecycle timestamps = `created_at_service`→`closed_at`; the `*_seen_at` / `source_*`
ones are export bookkeeping (don't use for SLA/closure-time).
- Content lag ~2 days behind wall-clock.
- **History gap:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR
*event* series work directly; **open-backlog-over-time** needs an append-only
history capture (not yet built).
## Roadmap
Phase 2: `fn_inc_dashboard` read-API → FleetOps live map (open + closed overlay +
metrics). Then FleetNow **dispatch** off `geog`, **team closure attribution**, and
**history capture** for backlog trends. **CRQ** = separate future project reusing
this machinery against `automations/crq/`.

98
docs/phase-1-ingestion.md Normal file
View file

@ -0,0 +1,98 @@
# PRD (Phase 1) — INC hourly CSV ingestion → tracksolid_db → FleetOps Tickets map
> Status: **complete and deployed** (migrations 0108, boto3 loader, geocoding,
> Coolify hourly `15 7-19 * * *` EAT). This document is the record of the Phase-1
> plan; see `README.md` and `docs/implementation.md` for the as-built state.
## Scope: INC only
**This workflow is strictly for INC** (incident / customer-fault tickets). It
ingests **only** `automations/inc/<EAT-timestamp>.csv`. CRQ (new-installation)
exports at `automations/crq/` are **out of scope** and are not processed here; the
field transforms below are likewise INC-only.
## Context
The client (Rahamafresh / Fireside) runs an n8n workflow that exports field-ops
tickets to our S3-compatible bucket **every hour**:
- `automations/inc/<EAT-timestamp>.csv`**incidents / customer faults** *(in scope)*
- `automations/crq/<EAT-timestamp>.csv` — new-installation requests *(out of scope)*
(See `n8n-hourly-s3-full-data-exports.md`. Sample: `2026-06-15T17-00-00.csv`.)
`fleettickets` owns the **downstream**: the `tickets` schema in the shared
`tracksolid_db` (raw-jsonb-first `tickets.inc`, geocoding gazetteers, and
`reporting.fn_tickets_for_map`, which `dashboard_api` serves to the FleetOps
"Tickets" tab). `tickets.crq` keeps existing but is not fed by this pipeline.
**The problem:** the loader was written for the *old* export model — JSON
`{metadata, records}` envelopes at a stable `automations/inc/latest.json`. That
model is gone; the new exports are **flat CSV, timestamped per hour, with no
`latest` pointer, no envelope, and no deltas** — every hourly file is a **full
current-state snapshot**.
**Two driving objectives this pipeline feeds:**
1. **SLA tracking** — contract requires tickets closed within **48h of
`created_at_service`**; closed carry source `sla_status` + `mttr`, open need a
derived state (`now created_at_service` ≥48h breached / ≥36h at-risk).
2. **Vehicle routing (most important)** — accurately geocoded open tickets so
FleetNow can route nearest vehicles; subsequent: team closure attribution.
## Data contract (verified against live snapshots)
- 32 columns; header + double-quoted values. INC sample = 31,434 rows.
- `ticket_id` is the **primary key**; the same ticket recurs across snapshots as it
moves `open → closed`. Verified: 31,434 distinct ids per file, **0 in-file dups**,
same id set every hour (0 added/dropped) → **upsert is the dedup mechanism, no
TRUNCATE**. Consecutive files are often byte-identical → skip-if-unchanged.
- `is_alarm=true` (~10,132 rows, all `is_actionable=false`) → **dropped**.
- `latitude`/`longitude` are **empty** in the feed → geocoding required.
- A garbage **sentinel row** (`ticket_id = "EXPORT STOPPED DUE TO EXCESSIVE SIZE…"`)
is commonly the first data line → filtered by `ticket_id` prefix.
- Timestamps (filenames + data) are **EAT (Africa/Nairobi, UTC+3)**.
- `bucket` is meaningful (`closed`/`pending`), distinct from `source_s3_bucket`.
## Approach
Keep the **raw-jsonb-first** model and everything downstream; only the loader's
input path changes: JSON-`latest` → **newest timestamped CSV**, plus move-on-success.
- **Newest file** per `automations/inc/` (parse `YYYY-MM-DDTHH-mm-ss.csv`), via
**boto3** (path-style; no aws-CLI dependency).
- **Skip-if-unchanged**: compare newest S3 **ETag** to the last processed ETag
(`tickets.import_meta.metadata.source_etag`); equal → skip DB write.
- **Cleaning at ingest**: drop `is_alarm=true` + sentinel; drop `week_start`,
`week_end`, `source_s3_bucket`, `source_s3_key`, `source_snapshot_id`,
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE;
keep `service_type`* and `bucket`. (*`service_type` later dropped as constant.)
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete → closure
history accumulates. On success **move** the file(s) to
`automations/inc/processed/`.
- Record snapshot freshness in `tickets.import_meta`.
- Geocoding unchanged: `--geocode-clusters` (coarse) + `--geocode-locations`
(precise, actionable INC; keyed LocationIQ; 25 km wrong-city guard).
## Orchestration
Deployed on **Coolify** (own app, `Dockerfile`, keep-alive worker). Ingest runs as a
**Scheduled Task**: `python import_tickets.py --from-bucket --apply`, cron
`15 7-19 * * *` in **EAT**. Env: `DATABASE_URL`, `RUSTFS_*`, `GEOCODER_*`.
## Data-quality findings (carried into Phase 2)
- Source `sla_status` ≠ a plain 48h rule, and `mttr` is not wall-clock — pin the
contract's SLA definition before trusting cross-field SLA math.
- `created_at_service` is null on ~30% of rows (incl. most open) → needs a fallback
clock (`first_seen_at`).
- Split timestamp semantics: lifecycle = `created_at_service`→`closed_at`; export
bookkeeping = `created_at`/`updated_at`/`first_seen_at`/`last_seen_at`.
- `assigned_team` missing ~34% (`owner` better).
- Content lag ~2 days (underlying `…wm_task.xlsx` source date).
## Outcome (as built)
Live in `tracksolid_db`: `tickets.inc` (raw + typed generated columns), geocoded to
~99.99%, alarm/sentinel removed, hourly refresh with ETag skip + archive. See
`docs/implementation.md`.

141
docs/phase-2-dashboard.md Normal file
View file

@ -0,0 +1,141 @@
# PRD (Phase 2) — INC operations dashboard: read-API layer
> Phase 1 (hourly INC CSV ingestion → `tickets.inc`, geocoding, typed generated
> columns, `inc_open_sla` view) is **complete and deployed** (migrations 0108,
> Coolify hourly `15 7-19 * * *` EAT). See `docs/phase-1-ingestion.md` /
> `docs/implementation.md`. This document is Phase 2.
## Context
FleetOps needs a **live INC operations map** (modelled on FleetNow):
- A map showing **all currently-open INC tickets** alongside **live vehicle
positions from FleetNow**.
- A **bottom timeline bar** that overlays **closed tickets** (alongside FleetNow
vehicle routes) for a selected period.
- **Bottom filters**: `cluster`, ticket `status`, and **time** = today / this week /
this month / custom date.
- **Top metric cards** that react to the selected filters — **ticket** metrics
(not vehicle metrics).
**Scope of THIS repo (confirmed): the data / read-API layer only.** `fleettickets`
exposes parameterized SQL in `tracksolid_db` that `dashboard_api` serves to the
**FleetOps SPA**. The map UI, timeline bar, filter controls, metric cards, and the
**FleetNow vehicle positions/routes** are **other repos/systems**. There is no
vehicle id in the INC feed, so we serve **tickets only**; the SPA overlays FleetNow
vehicles/routes.
## Confirmed behaviour
- **Open layer (live):** all `is_actionable = true` INC tickets matching the
cluster/status filter — **not** time-filtered (open = needs action now).
- **Closed overlay (windowed):** closed tickets whose `closed_at` falls in the
selected window, matching cluster/status.
- **Metric cards (windowed):** computed for the current selection.
- **Filters combine with AND**, each optional. **Windows are calendar EAT**
(today / ISO-week / month) or an explicit custom `[from, to)`.
- **Delivery:** one parameterized function returning a single JSON payload
`{ open: GeoJSON, closed: GeoJSON, metrics: {…}, window, freshness }`, mirroring
the existing `reporting.fn_tickets_for_map` style.
## Deliverable — `migrations/09_inc_dashboard_fn.sql`
A new read function (and supporting index if needed); additive, idempotent
(`CREATE OR REPLACE`), no change to existing objects.
### `reporting.fn_inc_dashboard(...)`
```
reporting.fn_inc_dashboard(
p_cluster text DEFAULT NULL, -- exact cluster (matches tickets.inc.cluster)
p_status text DEFAULT NULL, -- normalized_status
p_window text DEFAULT 'today', -- 'today' | 'week' | 'month' | 'custom'
p_from timestamptz DEFAULT NULL, -- custom window start (inclusive)
p_to timestamptz DEFAULT NULL -- custom window end (exclusive)
) RETURNS jsonb
```
- **Window resolution:** if `p_from`/`p_to` given → use them (custom). Else compute
**EAT calendar bounds** from `p_window`: `today` = `[date_trunc('day', now_eat),
+1 day)`, `week` = `date_trunc('week', …)`, `month` = `date_trunc('month', …)`
converted back to `timestamptz` via `… AT TIME ZONE 'Africa/Nairobi'`.
- **Returned JSON:**
```jsonc
{
"window": { "from": "...", "to": "...", "preset": "today" },
"open": { "type":"FeatureCollection", "features":[ … ] }, // all open, filtered by cluster/status
"closed": { "type":"FeatureCollection", "features":[ … ] }, // closed_at in window, filtered
"metrics": {
"open_now": int,
"closed_in_window": int,
"sla": {
"open": { "breached": int, "at_risk": int, "ok": int, "unknown": int },
"closed": { "compliant": int, "breached": int }
},
"by_status": { "<status>": int, … },
"by_cluster": { "<cluster>": int, … },
"closure_rate": { "per_day_avg": num, "series": [ { "day":"YYYY-MM-DD", "count":int }, … ] },
"avg_mttr_min": num
},
"freshness": { … } // from tickets.import_meta
}
```
- **Feature properties** (both layers): `ticket_id, normalized_status, cluster,
region, location_name, assigned_team, owner, geo_source`. Open adds `sla_state,
hours_open`; closed adds `closed_at, mttr, sla_status`. Geometry from `geom`
(`ST_AsGeoJSON`). Only `geom IS NOT NULL` rows become features; `metrics` count the
full filtered set (note the small geocoding gap).
### Reuse (don't reinvent)
- **`tickets.inc_open_sla`** (migration 08) — `sla_state` / `hours_open` for the open
layer + open-SLA metrics.
- **Typed generated columns** (migrations 0307): `cluster`, `normalized_status`,
`closed_at`, `mttr` (minutes), `assigned_team`, `geom`, `geo_source`.
- **`reporting.fn_tickets_for_map`** (migrations 0102) — GeoJSON
`jsonb_build_object`/`ST_AsGeoJSON` + `summary.freshness` patterns.
- **Derived SLA logic**`now() COALESCE(created_at_service, first_seen_at)` vs
48h/36h.
### Indexes
In place: `ix_inc_closed_at`, `ix_inc_cluster_col`, `ix_inc_norm_status_col`,
`ix_inc_actionable_col`, `ix_inc_geom`, `ix_inc_geog`. Add composite
`(closed_at, cluster)` only if EXPLAIN shows it's needed.
### Grants
`GRANT EXECUTE ON FUNCTION reporting.fn_inc_dashboard(...) TO dashboard_ro` (guarded).
## Dependencies (other repos)
- **`dashboard_api`** — endpoint e.g. `GET /webhook/inc-dashboard?cluster=&status=&window=&from=&to=`
calling `fn_inc_dashboard`. *(Contract here; impl there.)*
- **FleetOps SPA** (`fleetops`) — map, timeline bar, filter UI, metric cards;
overlays FleetNow vehicles/routes.
- **FleetNow** — live vehicle positions + historical routes.
## Data-quality caveats (affect metrics, not delivery)
- Source `sla_status` only meaningful for **closed**; open SLA is derived.
- `created_at_service` null on ~30% → some open are SLA `unknown` (fallback flagged).
- `mttr` is **minutes**, null until closed; closure/MTTR metrics filter accordingly.
- Content lag ~2 days → recent days under-count.
- A few tickets lack `geom` → counted in metrics, absent from map features.
## Verification
1. `SELECT reporting.fn_inc_dashboard();` → valid JSON (open/closed FCs, metrics,
window=today, freshness).
2. Filters: `p_cluster`, `p_status`, `p_window := 'month'`, and a custom `p_from/p_to`
— counts match ad-hoc `SELECT`s on `tickets.inc` / `tickets.inc_open_sla`.
3. Window math: today/week/month are correct **EAT** calendar ranges.
4. SLA metrics match the `inc_open_sla` distribution / source `sla_status` in window.
5. `EXPLAIN ANALYZE` on the windowed closed query uses `ix_inc_closed_at`.
6. Apply via `run_migrations.py`; ledgered in `tickets.schema_migrations`.
## Out of scope (future)
- **Open-backlog-over-time** / observed open→closed transitions need the append-only
history capture (`tickets.closure_events` + daily snapshot) — separate plan.
- **Dispatch surface** (nearest-vehicle off `geog`) — after analytics.