fleettickets/README.md
david kiania 68f2b99cd3 feat: S3 via boto3 + Dockerfile for Coolify deploy
- Replace the aws-CLI subprocess calls with boto3 (list_objects_v2 paginator,
  get_object, copy_object+delete_object) using path-style addressing + RUSTFS_*
  env. Removes the external aws-CLI dependency so it runs in a slim container.
- Add boto3 to pyproject dependencies.
- Add Dockerfile (python:3.12-slim, deps, TZ=Africa/Nairobi, keep-alive CMD) and
  .dockerignore for Coolify; document Coolify Scheduled Task setup in README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 20:08:05 +03:00

104 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# fleettickets
Field-ops **INC ticket** ingestion, geocoding, and read-schema that powers the
**Tickets** map in FleetOps. Extracted from the `tracksolid` repo into its own module
(it previously lived there as migrations 2123 + `tools/import_tickets.py`).
- **INC** — incident / customer-fault tickets *(this pipeline is **strictly INC**)*
- **CRQ** — new-installation requests *(schema kept, but **out of scope** — not ingested here)*
## What this owns
| Piece | What |
|---|---|
| `migrations/01_tickets_schema.sql` | The `tickets` schema: `tickets.inc` / `tickets.crq` (raw-jsonb-first), `tickets.geo_clusters` + `tickets.geo_locations` gazetteers, geom-resolution trigger, and `reporting.fn_tickets_for_map` (the GeoJSON read function) |
| `migrations/02_import_meta.sql` | `tickets.import_meta` (per-dataset snapshot envelope metadata) + `fn_tickets_for_map` re-defined to expose it as `summary.freshness` (same signature — dashboard_api unchanged) |
| `import_tickets.py` | Ingests the **newest INC CSV** from the rustfs `tickets` bucket (`automations/inc/<EAT-timestamp>.csv`) and upserts on `ticket_id`; geocodes clusters + INC locations |
| `run_migrations.py` | Applies `migrations/*.sql` in order (ledger: `tickets.schema_migrations`) |
| `shared.py` | Minimal DB/logging helpers (self-contained — no tracksolid dependency) |
## What this does NOT own (stays where it is)
- **The DB** — the `tickets` schema lives in the shared `tracksolid_db`.
- **The read-API** — `dashboard_api` (in the tracksolid stack) serves
`GET /webhook/tickets`, which calls `reporting.fn_tickets_for_map` (defined here).
- **The frontend** — the Tickets map is a tab in the **FleetOps** SPA (`fleetops` repo).
## Data model (raw-first)
Each row is just `ticket_id` + `raw` (the full source record as `jsonb`) + a derived
`geom` / `geo_source`. Everything reads from `raw`, so a change to the source schema
needs no migration. `geom` is resolved: **feed** coords (`raw` lat/lng) → **location**
(geocoded `location_name`) → **cluster** centroid → **none**.
Source coordinates are empty in the feed, so geocoding is required:
- `--geocode-clusters` — one coordinate per cluster (coarse fallback).
- `--geocode-locations` — precise per-location for **actionable INC** tickets: strips the
network codes from `location_name` (e.g. `NW_`, `ADR_MNT_`, `FDT<n>`, `SDUS`), geocodes
the real place via a **keyed** provider (LocationIQ / OpenCage), and **rejects any result
>25 km from the cluster centroid** (wrong-city guard). Results cache in
`tickets.geo_locations`.
## Setup
```bash
uv sync
cp .env.example .env # fill in DATABASE_URL, RUSTFS_*, GEOCODER_*
python run_migrations.py # apply the schema (idempotent)
```
## Run
```bash
# ingest the newest INC CSV from the bucket (skip-if-unchanged, then archive)
python import_tickets.py --from-bucket --apply
# geocode (needs GEOCODER_API_KEY)
python import_tickets.py --geocode-clusters --apply # coarse, once
python import_tickets.py --geocode-locations --apply # precise, actionable INC
# from a local CSV instead of the bucket (dev)
python import_tickets.py --inc-csv 2026-06-15T17-00-00.csv --apply
```
Dry-run is the default (omit `--apply`). `import_tickets.py --from-bucket` talks to S3
via **boto3** using the `RUSTFS_*` env (path-style addressing; no aws-CLI dependency).
## Deploy (Coolify)
The repo ships a [`Dockerfile`](Dockerfile) — a small batch worker with no web server.
Coolify builds it and keeps the container alive (`CMD tail -f /dev/null`); the ingest
runs as a **Scheduled Task**, not a system crontab:
- **Command:** `python import_tickets.py --from-bucket --apply`
- **Frequency:** `15 7-19 * * *` (`:15` past each hour, 07:0019:00). If Coolify runs
scheduled tasks in **UTC**, use `15 4-16 * * *` (EAT is UTC+3); if it exposes a
per-task timezone, set `Africa/Nairobi` and keep `15 7-19 * * *`.
- **Env vars** (Coolify → Environment Variables): `DATABASE_URL` (internal DB host),
`RUSTFS_*`, `GEOCODER_*`.
Skip-if-unchanged makes a run on an already-ingested snapshot a cheap no-op.
For a plain host/VM instead of Coolify, [`run_ingest.sh`](run_ingest.sh) loads `.env`
and runs the ingest; schedule it with a crontab line
(`CRON_TZ=Africa/Nairobi` / `15 7-19 * * *`).
## Notes
- The n8n export writes a **full current-state CSV per hour** to
`automations/inc/<EAT-timestamp>.csv` — no `latest` pointer, no metadata envelope, no
deltas. The loader lists the prefix, takes the **newest** file, and ingests it.
- **Skip-if-unchanged:** the newest file's S3 **ETag** is compared to the last processed
file's ETag (stored in `tickets.import_meta.metadata.source_etag`); if equal, the DB write
is skipped (the export re-emits byte-identical content most hours).
- **Upsert on `ticket_id`** (PRIMARY KEY) — duplication is impossible; rows are never
deleted, so closed-ticket history accumulates. On success the file is **moved** to
`automations/inc/processed/`.
- **Cleaning at ingest:** drop `is_alarm=true` rows + the `EXPORT STOPPED…` sentinel; drop
`week_start`/`week_end`, `source_s3_*`/`source_snapshot_id`, `department`/`source_type`;
normalize `region` → lowercase and `raw_status` → UPPERCASE. `service_type` and `bucket`
(a `closed`/`pending` flag) are kept.
- `tickets.import_meta` captures snapshot freshness (surfaced as `summary.freshness` by
`fn_tickets_for_map`).
- The curated/geocoded coordinates are written `verified = false` — review
`tickets.geo_clusters` / `tickets.geo_locations` and flip `verified` once checked.