fleettickets/README.md
david kiania 073db9b5b8 feat: unpack tickets.inc.raw into typed generated columns (migration 03)
Add STORED generated columns derived from raw (text/numeric/bool/double + EAT
timestamptz via an IMMUTABLE tickets.eat_ts() wrapper). Computed for all existing
rows and auto-populated on every future ingest — raw stays the source of truth,
no loader change. Indexes on status/cluster/team/closed_at/is_actionable for the
SLA/team/closure queries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 23:08:31 +03:00

109 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# fleettickets
Field-ops **INC ticket** ingestion, geocoding, and read-schema that powers the
**Tickets** map in FleetOps. Extracted from the `tracksolid` repo into its own module
(it previously lived there as migrations 2123 + `tools/import_tickets.py`).
- **INC** — incident / customer-fault tickets *(this pipeline is **strictly INC**)*
- **CRQ** — new-installation requests *(schema kept, but **out of scope** — not ingested here)*
## What this owns
| Piece | What |
|---|---|
| `migrations/01_tickets_schema.sql` | The `tickets` schema: `tickets.inc` / `tickets.crq` (raw-jsonb-first), `tickets.geo_clusters` + `tickets.geo_locations` gazetteers, geom-resolution trigger, and `reporting.fn_tickets_for_map` (the GeoJSON read function) |
| `migrations/02_import_meta.sql` | `tickets.import_meta` (per-dataset snapshot envelope metadata) + `fn_tickets_for_map` re-defined to expose it as `summary.freshness` (same signature — dashboard_api unchanged) |
| `migrations/03_inc_columns.sql` | Unpacks `tickets.inc.raw` into **typed STORED generated columns** (status, cluster, region, team, owner, sla_status, mttr, lat/lng, is_* booleans, and EAT→`timestamptz` timestamps via `tickets.eat_ts()`). Computed for all rows + auto-populated on every ingest; `raw` stays the source of truth |
| `import_tickets.py` | Ingests the **newest INC CSV** from the rustfs `tickets` bucket (`automations/inc/<EAT-timestamp>.csv`) and upserts on `ticket_id`; geocodes clusters + INC locations |
| `run_migrations.py` | Applies `migrations/*.sql` in order (ledger: `tickets.schema_migrations`) |
| `shared.py` | Minimal DB/logging helpers (self-contained — no tracksolid dependency) |
## What this does NOT own (stays where it is)
- **The DB** — the `tickets` schema lives in the shared `tracksolid_db`.
- **The read-API** — `dashboard_api` (in the tracksolid stack) serves
`GET /webhook/tickets`, which calls `reporting.fn_tickets_for_map` (defined here).
- **The frontend** — the Tickets map is a tab in the **FleetOps** SPA (`fleetops` repo).
## Data model (raw-first)
Each row is `ticket_id` + `raw` (the full source record as `jsonb`) + a derived
`geom` / `geo_source`. Everything reads from `raw`, so a change to the source schema
needs no migration. For convenient typed/indexable access, `raw` is also **unpacked
into STORED generated columns** (migration 03) — e.g. `normalized_status`, `cluster`,
`region`, `assigned_team`, `owner`, `sla_status`, `mttr`, `is_actionable`,
`created_at_service`/`closed_at` (as EAT→`timestamptz`). These stay in lock-step with
`raw` automatically (no loader change); `raw` remains the source of truth. `geom` is resolved: **feed** coords (`raw` lat/lng) → **location**
(geocoded `location_name`) → **cluster** centroid → **none**.
Source coordinates are empty in the feed, so geocoding is required:
- `--geocode-clusters` — one coordinate per cluster (coarse fallback).
- `--geocode-locations` — precise per-location for **actionable INC** tickets: strips the
network codes from `location_name` (e.g. `NW_`, `ADR_MNT_`, `FDT<n>`, `SDUS`), geocodes
the real place via a **keyed** provider (LocationIQ / OpenCage), and **rejects any result
>25 km from the cluster centroid** (wrong-city guard). Results cache in
`tickets.geo_locations`.
## Setup
```bash
uv sync
cp .env.example .env # fill in DATABASE_URL, RUSTFS_*, GEOCODER_*
python run_migrations.py # apply the schema (idempotent)
```
## Run
```bash
# ingest the newest INC CSV from the bucket (skip-if-unchanged, then archive)
python import_tickets.py --from-bucket --apply
# geocode (needs GEOCODER_API_KEY)
python import_tickets.py --geocode-clusters --apply # coarse, once
python import_tickets.py --geocode-locations --apply # precise, actionable INC
# from a local CSV instead of the bucket (dev)
python import_tickets.py --inc-csv 2026-06-15T17-00-00.csv --apply
```
Dry-run is the default (omit `--apply`). `import_tickets.py --from-bucket` talks to S3
via **boto3** using the `RUSTFS_*` env (path-style addressing; no aws-CLI dependency).
## Deploy (Coolify)
The repo ships a [`Dockerfile`](Dockerfile) — a small batch worker with no web server.
Coolify builds it and keeps the container alive (`CMD tail -f /dev/null`); the ingest
runs as a **Scheduled Task**, not a system crontab:
- **Command:** `python import_tickets.py --from-bucket --apply`
- **Frequency:** `15 7-19 * * *` (`:15` past each hour, **07:1519:15 EAT**). This
Coolify instance runs scheduled tasks in **EAT (Africa/Nairobi)**, so no UTC
conversion is needed.
- **Env vars** (Coolify → Environment Variables): `DATABASE_URL` (internal DB host),
`RUSTFS_*`, `GEOCODER_*`.
Skip-if-unchanged makes a run on an already-ingested snapshot a cheap no-op.
For a plain host/VM instead of Coolify, [`run_ingest.sh`](run_ingest.sh) loads `.env`
and runs the ingest; schedule it with a crontab line
(`CRON_TZ=Africa/Nairobi` / `15 7-19 * * *`).
## Notes
- The n8n export writes a **full current-state CSV per hour** to
`automations/inc/<EAT-timestamp>.csv` — no `latest` pointer, no metadata envelope, no
deltas. The loader lists the prefix, takes the **newest** file, and ingests it.
- **Skip-if-unchanged:** the newest file's S3 **ETag** is compared to the last processed
file's ETag (stored in `tickets.import_meta.metadata.source_etag`); if equal, the DB write
is skipped (the export re-emits byte-identical content most hours).
- **Upsert on `ticket_id`** (PRIMARY KEY) — duplication is impossible; rows are never
deleted, so closed-ticket history accumulates. On success the file is **moved** to
`automations/inc/processed/`.
- **Cleaning at ingest:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
`week_start`/`week_end`, `source_s3_*`/`source_snapshot_id`, `department`/`source_type`;
normalize `region` → lowercase and `raw_status` → UPPERCASE. `service_type` and `bucket`
(a `closed`/`pending` flag) are kept.
- `tickets.import_meta` captures snapshot freshness (surfaced as `summary.freshness` by
`fn_tickets_for_map`).
- The curated/geocoded coordinates are written `verified = false` — review
`tickets.geo_clusters` / `tickets.geo_locations` and flip `verified` once checked.