- Replace the aws-CLI subprocess calls with boto3 (list_objects_v2 paginator, get_object, copy_object+delete_object) using path-style addressing + RUSTFS_* env. Removes the external aws-CLI dependency so it runs in a slim container. - Add boto3 to pyproject dependencies. - Add Dockerfile (python:3.12-slim, deps, TZ=Africa/Nairobi, keep-alive CMD) and .dockerignore for Coolify; document Coolify Scheduled Task setup in README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
104 lines
5.5 KiB
Markdown
104 lines
5.5 KiB
Markdown
# fleettickets
|
||
|
||
Field-ops **INC ticket** ingestion, geocoding, and read-schema that powers the
|
||
**Tickets** map in FleetOps. Extracted from the `tracksolid` repo into its own module
|
||
(it previously lived there as migrations 21–23 + `tools/import_tickets.py`).
|
||
|
||
- **INC** — incident / customer-fault tickets *(this pipeline is **strictly INC**)*
|
||
- **CRQ** — new-installation requests *(schema kept, but **out of scope** — not ingested here)*
|
||
|
||
## What this owns
|
||
|
||
| Piece | What |
|
||
|---|---|
|
||
| `migrations/01_tickets_schema.sql` | The `tickets` schema: `tickets.inc` / `tickets.crq` (raw-jsonb-first), `tickets.geo_clusters` + `tickets.geo_locations` gazetteers, geom-resolution trigger, and `reporting.fn_tickets_for_map` (the GeoJSON read function) |
|
||
| `migrations/02_import_meta.sql` | `tickets.import_meta` (per-dataset snapshot envelope metadata) + `fn_tickets_for_map` re-defined to expose it as `summary.freshness` (same signature — dashboard_api unchanged) |
|
||
| `import_tickets.py` | Ingests the **newest INC CSV** from the rustfs `tickets` bucket (`automations/inc/<EAT-timestamp>.csv`) and upserts on `ticket_id`; geocodes clusters + INC locations |
|
||
| `run_migrations.py` | Applies `migrations/*.sql` in order (ledger: `tickets.schema_migrations`) |
|
||
| `shared.py` | Minimal DB/logging helpers (self-contained — no tracksolid dependency) |
|
||
|
||
## What this does NOT own (stays where it is)
|
||
|
||
- **The DB** — the `tickets` schema lives in the shared `tracksolid_db`.
|
||
- **The read-API** — `dashboard_api` (in the tracksolid stack) serves
|
||
`GET /webhook/tickets`, which calls `reporting.fn_tickets_for_map` (defined here).
|
||
- **The frontend** — the Tickets map is a tab in the **FleetOps** SPA (`fleetops` repo).
|
||
|
||
## Data model (raw-first)
|
||
|
||
Each row is just `ticket_id` + `raw` (the full source record as `jsonb`) + a derived
|
||
`geom` / `geo_source`. Everything reads from `raw`, so a change to the source schema
|
||
needs no migration. `geom` is resolved: **feed** coords (`raw` lat/lng) → **location**
|
||
(geocoded `location_name`) → **cluster** centroid → **none**.
|
||
|
||
Source coordinates are empty in the feed, so geocoding is required:
|
||
- `--geocode-clusters` — one coordinate per cluster (coarse fallback).
|
||
- `--geocode-locations` — precise per-location for **actionable INC** tickets: strips the
|
||
network codes from `location_name` (e.g. `NW_`, `ADR_MNT_`, `FDT<n>`, `SDUS`), geocodes
|
||
the real place via a **keyed** provider (LocationIQ / OpenCage), and **rejects any result
|
||
>25 km from the cluster centroid** (wrong-city guard). Results cache in
|
||
`tickets.geo_locations`.
|
||
|
||
## Setup
|
||
|
||
```bash
|
||
uv sync
|
||
cp .env.example .env # fill in DATABASE_URL, RUSTFS_*, GEOCODER_*
|
||
python run_migrations.py # apply the schema (idempotent)
|
||
```
|
||
|
||
## Run
|
||
|
||
```bash
|
||
# ingest the newest INC CSV from the bucket (skip-if-unchanged, then archive)
|
||
python import_tickets.py --from-bucket --apply
|
||
|
||
# geocode (needs GEOCODER_API_KEY)
|
||
python import_tickets.py --geocode-clusters --apply # coarse, once
|
||
python import_tickets.py --geocode-locations --apply # precise, actionable INC
|
||
|
||
# from a local CSV instead of the bucket (dev)
|
||
python import_tickets.py --inc-csv 2026-06-15T17-00-00.csv --apply
|
||
```
|
||
|
||
Dry-run is the default (omit `--apply`). `import_tickets.py --from-bucket` talks to S3
|
||
via **boto3** using the `RUSTFS_*` env (path-style addressing; no aws-CLI dependency).
|
||
|
||
## Deploy (Coolify)
|
||
|
||
The repo ships a [`Dockerfile`](Dockerfile) — a small batch worker with no web server.
|
||
Coolify builds it and keeps the container alive (`CMD tail -f /dev/null`); the ingest
|
||
runs as a **Scheduled Task**, not a system crontab:
|
||
|
||
- **Command:** `python import_tickets.py --from-bucket --apply`
|
||
- **Frequency:** `15 7-19 * * *` (`:15` past each hour, 07:00–19:00). If Coolify runs
|
||
scheduled tasks in **UTC**, use `15 4-16 * * *` (EAT is UTC+3); if it exposes a
|
||
per-task timezone, set `Africa/Nairobi` and keep `15 7-19 * * *`.
|
||
- **Env vars** (Coolify → Environment Variables): `DATABASE_URL` (internal DB host),
|
||
`RUSTFS_*`, `GEOCODER_*`.
|
||
|
||
Skip-if-unchanged makes a run on an already-ingested snapshot a cheap no-op.
|
||
|
||
For a plain host/VM instead of Coolify, [`run_ingest.sh`](run_ingest.sh) loads `.env`
|
||
and runs the ingest; schedule it with a crontab line
|
||
(`CRON_TZ=Africa/Nairobi` / `15 7-19 * * *`).
|
||
|
||
## Notes
|
||
|
||
- The n8n export writes a **full current-state CSV per hour** to
|
||
`automations/inc/<EAT-timestamp>.csv` — no `latest` pointer, no metadata envelope, no
|
||
deltas. The loader lists the prefix, takes the **newest** file, and ingests it.
|
||
- **Skip-if-unchanged:** the newest file's S3 **ETag** is compared to the last processed
|
||
file's ETag (stored in `tickets.import_meta.metadata.source_etag`); if equal, the DB write
|
||
is skipped (the export re-emits byte-identical content most hours).
|
||
- **Upsert on `ticket_id`** (PRIMARY KEY) — duplication is impossible; rows are never
|
||
deleted, so closed-ticket history accumulates. On success the file is **moved** to
|
||
`automations/inc/processed/`.
|
||
- **Cleaning at ingest:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
|
||
`week_start`/`week_end`, `source_s3_*`/`source_snapshot_id`, `department`/`source_type`;
|
||
normalize `region` → lowercase and `raw_status` → UPPERCASE. `service_type` and `bucket`
|
||
(a `closed`/`pending` flag) are kept.
|
||
- `tickets.import_meta` captures snapshot freshness (surfaced as `summary.freshness` by
|
||
`fn_tickets_for_map`).
|
||
- The curated/geocoded coordinates are written `verified = false` — review
|
||
`tickets.geo_clusters` / `tickets.geo_locations` and flip `verified` once checked.
|