docs: add docs/ — phase-1/phase-2 PRDs + implementation record
- docs/phase-1-ingestion.md — Phase 1 PRD (INC hourly CSV ingestion; deployed) - docs/phase-2-dashboard.md — Phase 2 PRD (inc_dashboard read-API for FleetOps map) - docs/implementation.md — as-built record (pipeline, migrations 01-08, deploy, DQ) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
e17553ccbf
commit
f2408f113e
3 changed files with 322 additions and 0 deletions
83
docs/implementation.md
Normal file
83
docs/implementation.md
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
# Implementation record — fleettickets (as built)
|
||||
|
||||
What is actually built and deployed, as of the Phase-1 completion. Companion to
|
||||
`docs/phase-1-ingestion.md` (plan) and `docs/phase-2-dashboard.md` (next).
|
||||
|
||||
## Pipeline (`import_tickets.py`)
|
||||
|
||||
- **Source:** newest `automations/inc/<EAT-timestamp>.csv` in the rustfs `tickets`
|
||||
bucket (endpoint `https://s3.rahamafresh.com`, path-style, region `us-east-1`).
|
||||
- **S3 access via boto3** (no aws-CLI dependency): `list_objects_v2` (paginator),
|
||||
`get_object`, `copy_object` + `delete_object` for archiving.
|
||||
- **Skip-if-unchanged:** newest S3 **ETag** vs `tickets.import_meta.metadata.source_etag`;
|
||||
equal → skip the DB write (the export re-emits identical content most hours).
|
||||
- **Cleaning:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
|
||||
`week_start`/`week_end`, `source_s3_bucket`/`source_s3_key`/`source_snapshot_id`,
|
||||
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE.
|
||||
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete. On success,
|
||||
**move** processed file(s) → `automations/inc/processed/`.
|
||||
- **Geocoding** (keyed LocationIQ): `--geocode-clusters` (coarse, per cluster) and
|
||||
`--geocode-locations` (precise, actionable INC; strips network codes; 25 km
|
||||
wrong-city guard). Results cache in `tickets.geo_clusters` / `tickets.geo_locations`.
|
||||
- CLI: `--from-bucket` (newest INC csv), `--inc-csv <file>` (local dev), `--apply`
|
||||
(else dry-run), `--geocode-clusters`, `--geocode-locations`.
|
||||
|
||||
## Schema / migrations (`tracksolid_db`, applied via `run_migrations.py`)
|
||||
|
||||
| Migration | What |
|
||||
|---|---|
|
||||
| 01_tickets_schema | `tickets.inc`/`crq` (raw-jsonb-first), `geo_clusters`/`geo_locations` gazetteers, geom-resolution trigger, `reporting.fn_tickets_for_map` |
|
||||
| 02_import_meta | `tickets.import_meta` (snapshot freshness) + `fn_tickets_for_map` `summary.freshness` |
|
||||
| 03_inc_columns | Unpack `raw` → typed STORED generated columns (text/numeric/bool + EAT→timestamptz via `tickets.eat_ts()`) |
|
||||
| 04_inc_latlng | `latitude`/`longitude` = `COALESCE(feed, ST_Y/ST_X(geom))` (populated from geocode) |
|
||||
| 05_inc_geography | `geog geography(Point,4326)` (= `geom::geography`) + GiST index for routing |
|
||||
| 06_inc_mttr_minutes | `mttr` → integer **minutes**; drop constant `is_alarm`/`is_auto_created`/`is_auto_closed` |
|
||||
| 07_inc_drop_service_type | drop constant `service_type` |
|
||||
| 08_inc_open_sla_view | `tickets.inc_open_sla` view (open tickets + derived SLA) |
|
||||
| 09_inc_dashboard_fn | *(planned)* `reporting.fn_inc_dashboard` — see `docs/phase-2-dashboard.md` |
|
||||
|
||||
`tickets.inc` columns: `ticket_id` (PK), `raw` (jsonb, source of truth),
|
||||
`normalized_status`/`raw_status`, `bucket`, `is_actionable`, `cluster`/`region`/
|
||||
`location_name`, `assigned_team`/`owner`, `sla_status`, `mttr` (min),
|
||||
`created_at_service`/`scheduled_at`/`closed_at`/`first_seen_at`/`last_seen_at`/
|
||||
`source_created_at`/`source_updated_at` (timestamptz), `latitude`/`longitude`,
|
||||
`geom`/`geog`/`geo_source`, `ingested_at`. Dropped-but-in-`raw`: `service_type`,
|
||||
`is_alarm`, `is_auto_created`, `is_auto_closed`, and the ingest-time drops.
|
||||
|
||||
## Deployment
|
||||
|
||||
- **Coolify** app built from this repo's `Dockerfile` (`python:3.12-slim`,
|
||||
`TZ=Africa/Nairobi`, keep-alive `tail -f /dev/null`). Separate from the FleetOps
|
||||
web app (`fleet-ops-staging`).
|
||||
- **Scheduled Task:** `python import_tickets.py --from-bucket --apply`, cron
|
||||
`15 7-19 * * *` in **EAT** (Coolify runs tasks in EAT — no UTC conversion).
|
||||
- **Env vars** (Coolify): `DATABASE_URL` (internal DB host), `RUSTFS_*`, `GEOCODER_*`.
|
||||
- For a plain host/VM, `run_ingest.sh` + a crontab line is the alternative.
|
||||
|
||||
## State at hand-off
|
||||
|
||||
- `tickets.inc` ≈ 21,312 rows (current non-alarm INC + a few aged-out history rows);
|
||||
**0 alarm / 0 sentinel** (legacy rows cleaned up one-time).
|
||||
- Geocoding ~**99.99%** (`geom` on all but 1 null-cluster ticket); `QOA`/`PTMP`
|
||||
cluster codes mapped to Quarry Road / Pipeline.
|
||||
- Read path verified: `reporting.fn_tickets_for_map()` + `tickets.inc_open_sla`.
|
||||
|
||||
## Data-quality caveats (must inform analytics)
|
||||
|
||||
- Source `sla_status` only meaningful once **closed**; open SLA must be **derived**
|
||||
(`now − created_at_service`, `first_seen_at` fallback; ~30% lack
|
||||
`created_at_service`).
|
||||
- `mttr` is **minutes**, null until closed; not wall-clock and not a 48h threshold.
|
||||
- Lifecycle timestamps = `created_at_service`→`closed_at`; the `*_seen_at` / `source_*`
|
||||
ones are export bookkeeping (don't use for SLA/closure-time).
|
||||
- Content lag ~2 days behind wall-clock.
|
||||
- **History gap:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR
|
||||
*event* series work directly; **open-backlog-over-time** needs an append-only
|
||||
history capture (not yet built).
|
||||
|
||||
## Roadmap
|
||||
|
||||
Phase 2: `fn_inc_dashboard` read-API → FleetOps live map (open + closed overlay +
|
||||
metrics). Then FleetNow **dispatch** off `geog`, **team closure attribution**, and
|
||||
**history capture** for backlog trends. **CRQ** = separate future project reusing
|
||||
this machinery against `automations/crq/`.
|
||||
98
docs/phase-1-ingestion.md
Normal file
98
docs/phase-1-ingestion.md
Normal file
|
|
@ -0,0 +1,98 @@
|
|||
# PRD (Phase 1) — INC hourly CSV ingestion → tracksolid_db → FleetOps Tickets map
|
||||
|
||||
> Status: **complete and deployed** (migrations 01–08, boto3 loader, geocoding,
|
||||
> Coolify hourly `15 7-19 * * *` EAT). This document is the record of the Phase-1
|
||||
> plan; see `README.md` and `docs/implementation.md` for the as-built state.
|
||||
|
||||
## Scope: INC only
|
||||
|
||||
**This workflow is strictly for INC** (incident / customer-fault tickets). It
|
||||
ingests **only** `automations/inc/<EAT-timestamp>.csv`. CRQ (new-installation)
|
||||
exports at `automations/crq/` are **out of scope** and are not processed here; the
|
||||
field transforms below are likewise INC-only.
|
||||
|
||||
## Context
|
||||
|
||||
The client (Rahamafresh / Fireside) runs an n8n workflow that exports field-ops
|
||||
tickets to our S3-compatible bucket **every hour**:
|
||||
|
||||
- `automations/inc/<EAT-timestamp>.csv` — **incidents / customer faults** *(in scope)*
|
||||
- `automations/crq/<EAT-timestamp>.csv` — new-installation requests *(out of scope)*
|
||||
|
||||
(See `n8n-hourly-s3-full-data-exports.md`. Sample: `2026-06-15T17-00-00.csv`.)
|
||||
|
||||
`fleettickets` owns the **downstream**: the `tickets` schema in the shared
|
||||
`tracksolid_db` (raw-jsonb-first `tickets.inc`, geocoding gazetteers, and
|
||||
`reporting.fn_tickets_for_map`, which `dashboard_api` serves to the FleetOps
|
||||
"Tickets" tab). `tickets.crq` keeps existing but is not fed by this pipeline.
|
||||
|
||||
**The problem:** the loader was written for the *old* export model — JSON
|
||||
`{metadata, records}` envelopes at a stable `automations/inc/latest.json`. That
|
||||
model is gone; the new exports are **flat CSV, timestamped per hour, with no
|
||||
`latest` pointer, no envelope, and no deltas** — every hourly file is a **full
|
||||
current-state snapshot**.
|
||||
|
||||
**Two driving objectives this pipeline feeds:**
|
||||
|
||||
1. **SLA tracking** — contract requires tickets closed within **48h of
|
||||
`created_at_service`**; closed carry source `sla_status` + `mttr`, open need a
|
||||
derived state (`now − created_at_service` ≥48h breached / ≥36h at-risk).
|
||||
2. **Vehicle routing (most important)** — accurately geocoded open tickets so
|
||||
FleetNow can route nearest vehicles; subsequent: team closure attribution.
|
||||
|
||||
## Data contract (verified against live snapshots)
|
||||
|
||||
- 32 columns; header + double-quoted values. INC sample = 31,434 rows.
|
||||
- `ticket_id` is the **primary key**; the same ticket recurs across snapshots as it
|
||||
moves `open → closed`. Verified: 31,434 distinct ids per file, **0 in-file dups**,
|
||||
same id set every hour (0 added/dropped) → **upsert is the dedup mechanism, no
|
||||
TRUNCATE**. Consecutive files are often byte-identical → skip-if-unchanged.
|
||||
- `is_alarm=true` (~10,132 rows, all `is_actionable=false`) → **dropped**.
|
||||
- `latitude`/`longitude` are **empty** in the feed → geocoding required.
|
||||
- A garbage **sentinel row** (`ticket_id = "EXPORT STOPPED DUE TO EXCESSIVE SIZE…"`)
|
||||
is commonly the first data line → filtered by `ticket_id` prefix.
|
||||
- Timestamps (filenames + data) are **EAT (Africa/Nairobi, UTC+3)**.
|
||||
- `bucket` is meaningful (`closed`/`pending`), distinct from `source_s3_bucket`.
|
||||
|
||||
## Approach
|
||||
|
||||
Keep the **raw-jsonb-first** model and everything downstream; only the loader's
|
||||
input path changes: JSON-`latest` → **newest timestamped CSV**, plus move-on-success.
|
||||
|
||||
- **Newest file** per `automations/inc/` (parse `YYYY-MM-DDTHH-mm-ss.csv`), via
|
||||
**boto3** (path-style; no aws-CLI dependency).
|
||||
- **Skip-if-unchanged**: compare newest S3 **ETag** to the last processed ETag
|
||||
(`tickets.import_meta.metadata.source_etag`); equal → skip DB write.
|
||||
- **Cleaning at ingest**: drop `is_alarm=true` + sentinel; drop `week_start`,
|
||||
`week_end`, `source_s3_bucket`, `source_s3_key`, `source_snapshot_id`,
|
||||
`department`, `source_type`; normalize `region`→lowercase, `raw_status`→UPPERCASE;
|
||||
keep `service_type`* and `bucket`. (*`service_type` later dropped as constant.)
|
||||
- **Upsert** on `ticket_id` (`ON CONFLICT DO UPDATE`); never delete → closure
|
||||
history accumulates. On success **move** the file(s) to
|
||||
`automations/inc/processed/`.
|
||||
- Record snapshot freshness in `tickets.import_meta`.
|
||||
- Geocoding unchanged: `--geocode-clusters` (coarse) + `--geocode-locations`
|
||||
(precise, actionable INC; keyed LocationIQ; 25 km wrong-city guard).
|
||||
|
||||
## Orchestration
|
||||
|
||||
Deployed on **Coolify** (own app, `Dockerfile`, keep-alive worker). Ingest runs as a
|
||||
**Scheduled Task**: `python import_tickets.py --from-bucket --apply`, cron
|
||||
`15 7-19 * * *` in **EAT**. Env: `DATABASE_URL`, `RUSTFS_*`, `GEOCODER_*`.
|
||||
|
||||
## Data-quality findings (carried into Phase 2)
|
||||
|
||||
- Source `sla_status` ≠ a plain 48h rule, and `mttr` is not wall-clock — pin the
|
||||
contract's SLA definition before trusting cross-field SLA math.
|
||||
- `created_at_service` is null on ~30% of rows (incl. most open) → needs a fallback
|
||||
clock (`first_seen_at`).
|
||||
- Split timestamp semantics: lifecycle = `created_at_service`→`closed_at`; export
|
||||
bookkeeping = `created_at`/`updated_at`/`first_seen_at`/`last_seen_at`.
|
||||
- `assigned_team` missing ~34% (`owner` better).
|
||||
- Content lag ~2 days (underlying `…wm_task.xlsx` source date).
|
||||
|
||||
## Outcome (as built)
|
||||
|
||||
Live in `tracksolid_db`: `tickets.inc` (raw + typed generated columns), geocoded to
|
||||
~99.99%, alarm/sentinel removed, hourly refresh with ETag skip + archive. See
|
||||
`docs/implementation.md`.
|
||||
141
docs/phase-2-dashboard.md
Normal file
141
docs/phase-2-dashboard.md
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
# PRD (Phase 2) — INC operations dashboard: read-API layer
|
||||
|
||||
> Phase 1 (hourly INC CSV ingestion → `tickets.inc`, geocoding, typed generated
|
||||
> columns, `inc_open_sla` view) is **complete and deployed** (migrations 01–08,
|
||||
> Coolify hourly `15 7-19 * * *` EAT). See `docs/phase-1-ingestion.md` /
|
||||
> `docs/implementation.md`. This document is Phase 2.
|
||||
|
||||
## Context
|
||||
|
||||
FleetOps needs a **live INC operations map** (modelled on FleetNow):
|
||||
|
||||
- A map showing **all currently-open INC tickets** alongside **live vehicle
|
||||
positions from FleetNow**.
|
||||
- A **bottom timeline bar** that overlays **closed tickets** (alongside FleetNow
|
||||
vehicle routes) for a selected period.
|
||||
- **Bottom filters**: `cluster`, ticket `status`, and **time** = today / this week /
|
||||
this month / custom date.
|
||||
- **Top metric cards** that react to the selected filters — **ticket** metrics
|
||||
(not vehicle metrics).
|
||||
|
||||
**Scope of THIS repo (confirmed): the data / read-API layer only.** `fleettickets`
|
||||
exposes parameterized SQL in `tracksolid_db` that `dashboard_api` serves to the
|
||||
**FleetOps SPA**. The map UI, timeline bar, filter controls, metric cards, and the
|
||||
**FleetNow vehicle positions/routes** are **other repos/systems**. There is no
|
||||
vehicle id in the INC feed, so we serve **tickets only**; the SPA overlays FleetNow
|
||||
vehicles/routes.
|
||||
|
||||
## Confirmed behaviour
|
||||
|
||||
- **Open layer (live):** all `is_actionable = true` INC tickets matching the
|
||||
cluster/status filter — **not** time-filtered (open = needs action now).
|
||||
- **Closed overlay (windowed):** closed tickets whose `closed_at` falls in the
|
||||
selected window, matching cluster/status.
|
||||
- **Metric cards (windowed):** computed for the current selection.
|
||||
- **Filters combine with AND**, each optional. **Windows are calendar EAT**
|
||||
(today / ISO-week / month) or an explicit custom `[from, to)`.
|
||||
- **Delivery:** one parameterized function returning a single JSON payload
|
||||
`{ open: GeoJSON, closed: GeoJSON, metrics: {…}, window, freshness }`, mirroring
|
||||
the existing `reporting.fn_tickets_for_map` style.
|
||||
|
||||
## Deliverable — `migrations/09_inc_dashboard_fn.sql`
|
||||
|
||||
A new read function (and supporting index if needed); additive, idempotent
|
||||
(`CREATE OR REPLACE`), no change to existing objects.
|
||||
|
||||
### `reporting.fn_inc_dashboard(...)`
|
||||
|
||||
```
|
||||
reporting.fn_inc_dashboard(
|
||||
p_cluster text DEFAULT NULL, -- exact cluster (matches tickets.inc.cluster)
|
||||
p_status text DEFAULT NULL, -- normalized_status
|
||||
p_window text DEFAULT 'today', -- 'today' | 'week' | 'month' | 'custom'
|
||||
p_from timestamptz DEFAULT NULL, -- custom window start (inclusive)
|
||||
p_to timestamptz DEFAULT NULL -- custom window end (exclusive)
|
||||
) RETURNS jsonb
|
||||
```
|
||||
|
||||
- **Window resolution:** if `p_from`/`p_to` given → use them (custom). Else compute
|
||||
**EAT calendar bounds** from `p_window`: `today` = `[date_trunc('day', now_eat),
|
||||
+1 day)`, `week` = `date_trunc('week', …)`, `month` = `date_trunc('month', …)` —
|
||||
converted back to `timestamptz` via `… AT TIME ZONE 'Africa/Nairobi'`.
|
||||
- **Returned JSON:**
|
||||
```jsonc
|
||||
{
|
||||
"window": { "from": "...", "to": "...", "preset": "today" },
|
||||
"open": { "type":"FeatureCollection", "features":[ … ] }, // all open, filtered by cluster/status
|
||||
"closed": { "type":"FeatureCollection", "features":[ … ] }, // closed_at in window, filtered
|
||||
"metrics": {
|
||||
"open_now": int,
|
||||
"closed_in_window": int,
|
||||
"sla": {
|
||||
"open": { "breached": int, "at_risk": int, "ok": int, "unknown": int },
|
||||
"closed": { "compliant": int, "breached": int }
|
||||
},
|
||||
"by_status": { "<status>": int, … },
|
||||
"by_cluster": { "<cluster>": int, … },
|
||||
"closure_rate": { "per_day_avg": num, "series": [ { "day":"YYYY-MM-DD", "count":int }, … ] },
|
||||
"avg_mttr_min": num
|
||||
},
|
||||
"freshness": { … } // from tickets.import_meta
|
||||
}
|
||||
```
|
||||
- **Feature properties** (both layers): `ticket_id, normalized_status, cluster,
|
||||
region, location_name, assigned_team, owner, geo_source`. Open adds `sla_state,
|
||||
hours_open`; closed adds `closed_at, mttr, sla_status`. Geometry from `geom`
|
||||
(`ST_AsGeoJSON`). Only `geom IS NOT NULL` rows become features; `metrics` count the
|
||||
full filtered set (note the small geocoding gap).
|
||||
|
||||
### Reuse (don't reinvent)
|
||||
|
||||
- **`tickets.inc_open_sla`** (migration 08) — `sla_state` / `hours_open` for the open
|
||||
layer + open-SLA metrics.
|
||||
- **Typed generated columns** (migrations 03–07): `cluster`, `normalized_status`,
|
||||
`closed_at`, `mttr` (minutes), `assigned_team`, `geom`, `geo_source`.
|
||||
- **`reporting.fn_tickets_for_map`** (migrations 01–02) — GeoJSON
|
||||
`jsonb_build_object`/`ST_AsGeoJSON` + `summary.freshness` patterns.
|
||||
- **Derived SLA logic** — `now() − COALESCE(created_at_service, first_seen_at)` vs
|
||||
48h/36h.
|
||||
|
||||
### Indexes
|
||||
|
||||
In place: `ix_inc_closed_at`, `ix_inc_cluster_col`, `ix_inc_norm_status_col`,
|
||||
`ix_inc_actionable_col`, `ix_inc_geom`, `ix_inc_geog`. Add composite
|
||||
`(closed_at, cluster)` only if EXPLAIN shows it's needed.
|
||||
|
||||
### Grants
|
||||
|
||||
`GRANT EXECUTE ON FUNCTION reporting.fn_inc_dashboard(...) TO dashboard_ro` (guarded).
|
||||
|
||||
## Dependencies (other repos)
|
||||
|
||||
- **`dashboard_api`** — endpoint e.g. `GET /webhook/inc-dashboard?cluster=&status=&window=&from=&to=`
|
||||
calling `fn_inc_dashboard`. *(Contract here; impl there.)*
|
||||
- **FleetOps SPA** (`fleetops`) — map, timeline bar, filter UI, metric cards;
|
||||
overlays FleetNow vehicles/routes.
|
||||
- **FleetNow** — live vehicle positions + historical routes.
|
||||
|
||||
## Data-quality caveats (affect metrics, not delivery)
|
||||
|
||||
- Source `sla_status` only meaningful for **closed**; open SLA is derived.
|
||||
- `created_at_service` null on ~30% → some open are SLA `unknown` (fallback flagged).
|
||||
- `mttr` is **minutes**, null until closed; closure/MTTR metrics filter accordingly.
|
||||
- Content lag ~2 days → recent days under-count.
|
||||
- A few tickets lack `geom` → counted in metrics, absent from map features.
|
||||
|
||||
## Verification
|
||||
|
||||
1. `SELECT reporting.fn_inc_dashboard();` → valid JSON (open/closed FCs, metrics,
|
||||
window=today, freshness).
|
||||
2. Filters: `p_cluster`, `p_status`, `p_window := 'month'`, and a custom `p_from/p_to`
|
||||
— counts match ad-hoc `SELECT`s on `tickets.inc` / `tickets.inc_open_sla`.
|
||||
3. Window math: today/week/month are correct **EAT** calendar ranges.
|
||||
4. SLA metrics match the `inc_open_sla` distribution / source `sla_status` in window.
|
||||
5. `EXPLAIN ANALYZE` on the windowed closed query uses `ix_inc_closed_at`.
|
||||
6. Apply via `run_migrations.py`; ledgered in `tickets.schema_migrations`.
|
||||
|
||||
## Out of scope (future)
|
||||
|
||||
- **Open-backlog-over-time** / observed open→closed transitions need the append-only
|
||||
history capture (`tickets.closure_events` + daily snapshot) — separate plan.
|
||||
- **Dispatch surface** (nearest-vehicle off `geog`) — after analytics.
|
||||
Loading…
Reference in a new issue