# fleettickets Field-ops **ticket** ingestion, geocoding, and read-schema that powers the **Tickets** map in FleetOps. Extracted from the `tracksolid` repo into its own module (it previously lived there as migrations 21–23 + `tools/import_tickets.py`). Two ticket types, identical 32-column source schema and CDC change stream, served through a **shared engine** (`pipeline.py`) with a thin per-type entrypoint each: - **INC** — incident / customer-fault tickets → `tickets.inc` (`inc/import_inc.py`). Full feature set: typed columns, geocoding, SLA view, dashboard fn, history capture. - **CRQ** — new-installation requests → `tickets.crq` (`crq/import_crq.py`). **Data layer + map** (typed columns, geocoding, appears on the Tickets map via `fn_tickets_for_map`). SLA view / dashboard fn / history capture are deferred — installation-lifecycle semantics differ from incidents (see roadmap). CRQ gets its **own FleetOps tab**, same look & feel as INC. Geocoding is **cross-dataset** (one gazetteer, one geocoder budget, covers inc + crq) and is driven from the INC entrypoint. ## What this owns | Piece | What | |---|---| | `migrations/01_tickets_schema.sql` | The `tickets` schema: `tickets.inc` / `tickets.crq` (raw-jsonb-first), `tickets.geo_clusters` + `tickets.geo_locations` gazetteers, geom-resolution trigger, and `reporting.fn_tickets_for_map` (the GeoJSON read function) | | `migrations/02_import_meta.sql` | `tickets.import_meta` (per-dataset snapshot envelope metadata) + `fn_tickets_for_map` re-defined to expose it as `summary.freshness` (same signature — dashboard_api unchanged) | | `migrations/03_inc_columns.sql` | Unpacks `tickets.inc.raw` into **typed STORED generated columns** (status, cluster, region, team, owner, sla_status, mttr, lat/lng, is_* booleans, and EAT→`timestamptz` timestamps via `tickets.eat_ts()`). Computed for all rows + auto-populated on every ingest; `raw` stays the source of truth | | `migrations/04_inc_latlng.sql` | Redefines `latitude`/`longitude` to `COALESCE(feed, ST_Y/ST_X(geom))` so they're **populated from the geocoded position** (feed is always empty); precision per `geo_source` (`location` vs `cluster` centroid) | | `migrations/05_inc_geography.sql` | Adds `geog geography(Point,4326)` (= `geom::geography`) + GiST index for **routing** — `ST_Distance`/`ST_DWithin`/KNN in real metres (nearest-vehicle, radius search) | | `migrations/06_inc_mttr_minutes.sql` | `mttr` generated column → integer **minutes** (source is decimal hours); drops the constant `is_alarm`/`is_auto_created`/`is_auto_closed` columns (kept in `raw`). `is_actionable` retained | | `migrations/07_inc_drop_service_type.sql` | Drops the constant `service_type` column (always `inc`; kept in `raw`) | | `migrations/08_inc_open_sla_view.sql` | `tickets.inc_open_sla` view — open (`is_actionable`) tickets with **derived SLA** (`hours_open`, `sla_state` vs 48h; clock = `created_at_service` ∥ `first_seen_at`), plus team/cluster/`geog` for dispatch | | `migrations/09_inc_dashboard_fn.sql` | `reporting.fn_inc_dashboard(cluster, status, window, from, to)` — one JSON payload (`window` / `open` GeoJSON / `closed` GeoJSON / `metrics` / `freshness`) powering the FleetOps live INC map. Open=live, closed=windowed (EAT calendar / custom); filters AND | | `migrations/10_inc_history_capture.sql` | History for time-series: `tickets.closure_events` (append-only observed closures) + `tickets.inc_daily_snapshot` (per-EAT-day open backlog + flow), populated by `tickets.capture_history()` each ingest. Unlocks **backlog-over-time** | | `migrations/15_crq_table.sql` | **Materializes `tickets.crq`** (table + geom trigger + indexes — `01`'s crq section never ran on the live DB) and unpacks `raw` into the same **typed STORED generated columns** as INC's `03` (reuses `tickets.eat_ts()`). Brings CRQ to data-layer parity | | `pipeline.py` | **Shared engine** — the dataset-agnostic CDC loader (drains `automations//changes/.csv` from the `isptickets` bucket, upserts on `ticket_id` oldest→newest, watermark + per-file archive) and the **cross-dataset** geocoder (clusters + actionable inc/crq locations) | | `inc/import_inc.py` | INC entrypoint (`python -m inc.import_inc`) — INC `Dataset` config + CLI; runs `tickets.capture_history()` after each `--apply`; hosts the shared geocode commands | | `crq/import_crq.py` | CRQ entrypoint (`python -m crq.import_crq`) — CRQ `Dataset` config + CLI (ingest only; no history hook yet) | | `run_migrations.py` | Applies `migrations/*.sql` in order (ledger: `tickets.schema_migrations`) | | `shared.py` | Minimal DB/logging helpers (self-contained — no tracksolid dependency) | ## What this does NOT own (stays where it is) - **The DB** — the `tickets` schema lives in the shared `tracksolid_db`. - **The read-API** — `dashboard_api` (in the tracksolid stack) serves `GET /webhook/tickets`, which calls `reporting.fn_tickets_for_map` (defined here). - **The frontend** — the Tickets map is a tab in the **FleetOps** SPA (`fleetops` repo). ## Data model (raw-first) Each row is `ticket_id` + `raw` (the full source record as `jsonb`) + a derived `geom` / `geo_source`. Everything reads from `raw`, so a change to the source schema needs no migration. For convenient typed/indexable access, `raw` is also **unpacked into STORED generated columns** (migration 03) — e.g. `normalized_status`, `cluster`, `region`, `assigned_team`, `owner`, `sla_status`, `mttr`, `is_actionable`, `created_at_service`/`closed_at` (as EAT→`timestamptz`). These stay in lock-step with `raw` automatically (no loader change); `raw` remains the source of truth. `geom` is resolved: **feed** coords (`raw` lat/lng) → **location** (geocoded `location_name`) → **cluster** centroid → **none**. Source coordinates are empty in the feed, so geocoding is required: - `--geocode-clusters` — one coordinate per cluster (coarse fallback). - `--geocode-locations` — precise per-location for **actionable INC** tickets: strips the network codes from `location_name` (e.g. `NW_`, `ADR_MNT_`, `FDT`, `SDUS`), geocodes the real place via a **keyed** provider (LocationIQ / OpenCage), and **rejects any result >25 km from the cluster centroid** (wrong-city guard). Results cache in `tickets.geo_locations`. ### Columns on `tickets.inc` | Column | Type | Notes | |---|---|---| | `ticket_id` | text (PK) | e.g. `WOT0715527` | | `raw` | jsonb | full source record — the source of truth | | `normalized_status` · `raw_status` | text | use `normalized_status` for filtering (canonical) | | `bucket` | text | lifecycle: `closed` / `pending` | | `is_actionable` | boolean | the open/closed flag (open = `true`) | | `cluster` · `region` · `location_name` | text | `region` lowercased; `cluster` feeds the gazetteer | | `assigned_team` · `owner` | text | closure attribution dimensions | | `sla_status` | text | source `Compliant`/`Breached` — **only meaningful once closed** | | `mttr` | numeric | **minutes** (source is decimal hours); null until closed | | `created_at_service` · `scheduled_at` · `closed_at` · `first_seen_at` · `last_seen_at` · `source_created_at` · `source_updated_at` | timestamptz | EAT→UTC via `tickets.eat_ts()`. **lifecycle** = `created_at_service`→`closed_at`; **export bookkeeping** = `first_seen_at`/`last_seen_at`/`source_*` | | `latitude` · `longitude` | double precision | `COALESCE(feed, geocoded)` — populated from `geom` | | `geom` | geometry(Point,4326) | display / the map | | `geog` | geography(Point,4326) | **routing** — metres-accurate distance (GiST indexed) | | `geo_source` | text | precision: `feed` / `location` / `cluster` / `none` | | `ingested_at` | timestamptz | when we last upserted this row | Dropped from the unpacked columns (still in `raw`): `service_type`, `is_alarm`, `is_auto_created`, `is_auto_closed` (all single-cardinality), plus the ingest-time drops below. **`reporting.fn_tickets_for_map`** reads from `raw` and serves the map; **`tickets.inc_open_sla`** is the open-ticket SLA view for dashboards/dispatch. ## Setup ```bash uv sync cp .env.example .env # fill in DATABASE_URL, RUSTFS_*, GEOCODER_* python run_migrations.py # apply the schema (idempotent) ``` ## Run Run from the repo root so the `inc`/`crq` packages + `pipeline.py`/`shared.py` import. ```bash # drain the incremental change streams (every new file oldest→newest, then archive) python -m inc.import_inc --from-bucket --apply python -m crq.import_crq --from-bucket --apply # geocode — CROSS-DATASET (covers inc + crq); driven from the INC entrypoint, needs GEOCODER_API_KEY python -m inc.import_inc --geocode-clusters --apply # coarse, once python -m inc.import_inc --geocode-locations --apply # precise, actionable inc+crq # from a local CSV instead of the bucket (dev) python -m inc.import_inc --inc-csv 2026-06-15T17-00-00.csv --apply python -m crq.import_crq --crq-csv 2026-06-24T12-55-44.csv --apply ``` Dry-run is the default (omit `--apply`). `--from-bucket` talks to S3 via **boto3** using the `RUSTFS_*` env (path-style addressing; no aws-CLI dependency). ## Deploy (Coolify) The repo ships a [`Dockerfile`](Dockerfile) — a small batch worker with no web server. Coolify builds it and keeps the container alive (`CMD tail -f /dev/null`); each ingest runs as its own **Scheduled Task**, not a system crontab: - **`inc_tickets`:** `python -m inc.import_inc --from-bucket --apply` - **`crq_tickets`:** `python -m crq.import_crq --from-bucket --apply` - **Frequency:** both `*/20 6-20 * * *` (every 20 min, **06:00–20:40 EAT**). This Coolify instance runs scheduled tasks in **EAT (Africa/Nairobi)**, so no UTC conversion is needed. - **Env vars** (Coolify → Environment Variables): `DATABASE_URL` (internal DB host), `RUSTFS_*` (the `isptickets` bucket credentials), `GEOCODER_*`. The same bucket holds both `automations/inc/` and `automations/crq/`, so one credential set serves both tasks. The watermark makes a run with no new change files a cheap no-op. For a plain host/VM instead of Coolify, [`run_ingest.sh`](run_ingest.sh) loads `.env` and runs **both** ingests; schedule it with a crontab line (`CRON_TZ=Africa/Nairobi` / `*/20 6-20 * * *`). Full operational runbook — container, env management (encrypted; via the UI or `artisan tinker`), the **Forgejo → Coolify auto-deploy webhook**, manual deploys, and the source-bucket cutover procedure — is in [`docs/deployment-and-operations.md`](docs/deployment-and-operations.md). ### Bucket cutover (one-time reseed) When the source provider moves the feed to a new bucket (e.g. `tickets` → `isptickets`), the stored watermark holds a key from the *old* bucket's stream, whose timestamp may be newer than the new bucket's first file — which would otherwise be skipped. Point the `RUSTFS_*` creds + `TICKETS_BUCKET` at the new bucket, then drain it once with `--reseed`, which ignores the stored watermark and ingests **every** file in `changes/` oldest→newest: ```bash python -m inc.import_inc --from-bucket --reseed # dry-run first (or -m crq.import_crq) python -m inc.import_inc --from-bucket --reseed --apply # commit + archive ``` Upserts are idempotent (`ticket_id` PK, rows never deleted) and the new stream's periodic full-state re-emissions re-assert current state, so this is non-destructive and converges even across the cutover gap. After it, the watermark is current — resume normal `--from-bucket --apply` runs (no `--reseed`). The old bucket is left untouched. ## Notes - The n8n export writes an **incremental CDC change stream** to `automations/inc/changes/.csv`: a full-state baseline followed by files holding only the rows that changed (with periodic full-state re-emissions). No `latest` pointer, no metadata envelope. The loader drains **every not-yet-processed file oldest→newest** — taking only the newest would drop intermediate deltas. - **Watermark:** the newest file already applied is recorded in `tickets.import_meta.metadata.source_max_key`; runs skip anything at/older than it, so reruns are cheap no-ops. `--reseed` ignores it for a one-time bucket cutover. - **Upsert on `ticket_id`** (PRIMARY KEY) — duplication is impossible; rows are never deleted, so closed-ticket history accumulates. On success each file is **moved** to `automations/inc/processed/`. - **Cleaning at ingest:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop `week_start`/`week_end`, `source_s3_*`/`source_snapshot_id`, `department`/`source_type`; normalize `region` → lowercase and `raw_status` → UPPERCASE. `service_type` and `bucket` (a `closed`/`pending` flag) are kept. - `tickets.import_meta` captures snapshot freshness (surfaced as `summary.freshness` by `fn_tickets_for_map`). - The curated/geocoded coordinates are written `verified = false` — review `tickets.geo_clusters` / `tickets.geo_locations` and flip `verified` once checked. ## Querying ```sql -- map payload (GeoJSON + summary, incl. summary.freshness) — what dashboard_api serves SELECT reporting.fn_tickets_for_map(); -- open-only by default SELECT reporting.fn_tickets_for_map(p_open_only := false); -- all geocoded tickets -- open tickets by SLA (derived) + by cluster — via the view SELECT sla_state, count(*) FROM tickets.inc_open_sla GROUP BY 1; SELECT cluster, count(*), round(avg(hours_open),1) AS avg_hrs FROM tickets.inc_open_sla GROUP BY 1 ORDER BY 2 DESC; -- closures / creations per day (EAT) SELECT (closed_at AT TIME ZONE 'Africa/Nairobi')::date AS d, count(*) FROM tickets.inc WHERE closed_at IS NOT NULL GROUP BY 1 ORDER BY 1 DESC; -- open-backlog-over-time (accrues from first capture; one row per EAT day) SELECT snapshot_date, open_total, open_breached, closed_today FROM tickets.inc_daily_snapshot ORDER BY snapshot_date DESC; -- nearest open tickets to a vehicle (lng, lat) — metres, index-accelerated KNN SELECT ticket_id, cluster, hours_open, round(ST_Distance(geog, ST_SetSRID(ST_MakePoint(:lng,:lat),4326)::geography))::int AS metres FROM tickets.inc_open_sla ORDER BY geog <-> ST_SetSRID(ST_MakePoint(:lng,:lat),4326)::geography LIMIT 10; ``` ## Data-quality & SLA notes Findings to keep in mind (see the PRD for detail): - **Source `sla_status` is only meaningful for *closed* tickets.** It reads `Compliant` for essentially all *open* tickets, so for open work use the **derived** state in `tickets.inc_open_sla` (`now() − created_at_service` vs the contract's 48h). - **`created_at_service` is missing on ~30% of rows** (incl. most open ones); the SLA view falls back to `first_seen_at` and flags it via `sla_clock_source`. - **`mttr` is not wall-clock** `closed_at − created_at_service` and the source's `Breached`/`Compliant` does **not** match a plain 48h threshold — pin the contract's exact SLA definition before trusting cross-field SLA math. - **Content lag:** the feed's *file* timestamps are current, but the ticket *content* trails ~2 days (the underlying `…wm_task.xlsx` source), so creation/closure dates run a couple of days behind wall-clock. - **History:** `tickets.inc` is current-state (upsert). Closure/creation/MTTR *event* series work directly off `closed_at`/`created_at_service`. **Backlog-over-time** now accrues via `tickets.inc_daily_snapshot` (one row per EAT day, written by `tickets.capture_history()` each ingest); observed closures log to `tickets.closure_events`. Past backlog can't be reconstructed — the series builds from the first capture onward. ## Status / roadmap Live: INC ingestion deployed on Coolify (every 20 min `*/20 6-20 * * *` EAT), schema + generated columns + geocoding + the `inc_open_sla` view in `tracksolid_db`. **CRQ (this milestone):** data layer + map — `tickets.crq` fed from `automations/crq/changes/` by `crq/import_crq.py`, the `tickets.crq` table + typed columns (migration 15), cross-dataset geocoding, and visibility on the Tickets map via `fn_tickets_for_map`. One-time seed: drain the isptickets CRQ stream (`python -m crq.import_crq --from-bucket --apply`) — empty watermark + the stream's periodic full-state snapshots converge to current state — then run the shared geocode once. See [`docs/deployment-and-operations.md`](docs/deployment-and-operations.md). Next (Phase 2): bring CRQ to full INC parity once installation-lifecycle semantics are confirmed — a `crq_open_sla` view, `fn_crq_dashboard`, and CRQ history capture (the INC analogues of migrations 08/09/10). Then time-series analytics (closure rate, MTTR/SLA trends), FleetNow vehicle **dispatch** off `geog`, and **team closure attribution**.