The S3 source switched from full hourly snapshots at automations/inc/<ts>.csv to an incremental CDC stream at automations/inc/changes/<ts>.csv (first file = full baseline, each later file = only the rows that changed, keyed by ticket_id; no deletions). The loader still pointed at the old root path and only ingested the single newest file, so after the switch it found nothing (no new tickets ingested) and, even with the path fixed, would silently drop intermediate deltas. Changes: - point ingestion at automations/inc/changes/ (_CHANGE_KEY_RE) - ingest EVERY not-yet-processed file in ascending timestamp order (baseline first, then each delta), upserting each - replace the single-ETag skip with a per-file timestamp watermark (import_meta.metadata->>'source_max_key'); rows + watermark commit in one txn per file, then archive to processed/ — so a mid-run failure leaves a consistent, resumable state - docs: rename n8n-hourly-s3-full-data-exports.md -> n8n-s3-ticket-exports.md and rewrite it for the incremental stream; fix the reference in docs/phase-1-ingestion.md Verified live against prod: re-seeded baseline + 5 deltas (26,529 rows), files archived to processed/, watermark advanced, re-run is a no-op. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| docs | ||
| migrations | ||
| .dockerignore | ||
| .env.example | ||
| .gitignore | ||
| Dockerfile | ||
| import_tickets.py | ||
| n8n-s3-export-workflows.md | ||
| n8n-s3-ticket-exports.md | ||
| pyproject.toml | ||
| README.md | ||
| run_ingest.sh | ||
| run_migrations.py | ||
| shared.py | ||
fleettickets
Field-ops INC ticket ingestion, geocoding, and read-schema that powers the
Tickets map in FleetOps. Extracted from the tracksolid repo into its own module
(it previously lived there as migrations 21–23 + tools/import_tickets.py).
- INC — incident / customer-fault tickets (this pipeline is strictly INC)
- CRQ — new-installation requests (schema kept, but out of scope — not ingested here)
What this owns
| Piece | What |
|---|---|
migrations/01_tickets_schema.sql |
The tickets schema: tickets.inc / tickets.crq (raw-jsonb-first), tickets.geo_clusters + tickets.geo_locations gazetteers, geom-resolution trigger, and reporting.fn_tickets_for_map (the GeoJSON read function) |
migrations/02_import_meta.sql |
tickets.import_meta (per-dataset snapshot envelope metadata) + fn_tickets_for_map re-defined to expose it as summary.freshness (same signature — dashboard_api unchanged) |
migrations/03_inc_columns.sql |
Unpacks tickets.inc.raw into typed STORED generated columns (status, cluster, region, team, owner, sla_status, mttr, lat/lng, is_* booleans, and EAT→timestamptz timestamps via tickets.eat_ts()). Computed for all rows + auto-populated on every ingest; raw stays the source of truth |
migrations/04_inc_latlng.sql |
Redefines latitude/longitude to COALESCE(feed, ST_Y/ST_X(geom)) so they're populated from the geocoded position (feed is always empty); precision per geo_source (location vs cluster centroid) |
migrations/05_inc_geography.sql |
Adds geog geography(Point,4326) (= geom::geography) + GiST index for routing — ST_Distance/ST_DWithin/KNN in real metres (nearest-vehicle, radius search) |
migrations/06_inc_mttr_minutes.sql |
mttr generated column → integer minutes (source is decimal hours); drops the constant is_alarm/is_auto_created/is_auto_closed columns (kept in raw). is_actionable retained |
migrations/07_inc_drop_service_type.sql |
Drops the constant service_type column (always inc; kept in raw) |
migrations/08_inc_open_sla_view.sql |
tickets.inc_open_sla view — open (is_actionable) tickets with derived SLA (hours_open, sla_state vs 48h; clock = created_at_service ∥ first_seen_at), plus team/cluster/geog for dispatch |
migrations/09_inc_dashboard_fn.sql |
reporting.fn_inc_dashboard(cluster, status, window, from, to) — one JSON payload (window / open GeoJSON / closed GeoJSON / metrics / freshness) powering the FleetOps live INC map. Open=live, closed=windowed (EAT calendar / custom); filters AND |
migrations/10_inc_history_capture.sql |
History for time-series: tickets.closure_events (append-only observed closures) + tickets.inc_daily_snapshot (per-EAT-day open backlog + flow), populated by tickets.capture_history() each ingest. Unlocks backlog-over-time |
import_tickets.py |
Ingests the newest INC CSV from the rustfs tickets bucket (automations/inc/<EAT-timestamp>.csv) and upserts on ticket_id; geocodes clusters + INC locations |
run_migrations.py |
Applies migrations/*.sql in order (ledger: tickets.schema_migrations) |
shared.py |
Minimal DB/logging helpers (self-contained — no tracksolid dependency) |
What this does NOT own (stays where it is)
- The DB — the
ticketsschema lives in the sharedtracksolid_db. - The read-API —
dashboard_api(in the tracksolid stack) servesGET /webhook/tickets, which callsreporting.fn_tickets_for_map(defined here). - The frontend — the Tickets map is a tab in the FleetOps SPA (
fleetopsrepo).
Data model (raw-first)
Each row is ticket_id + raw (the full source record as jsonb) + a derived
geom / geo_source. Everything reads from raw, so a change to the source schema
needs no migration. For convenient typed/indexable access, raw is also unpacked
into STORED generated columns (migration 03) — e.g. normalized_status, cluster,
region, assigned_team, owner, sla_status, mttr, is_actionable,
created_at_service/closed_at (as EAT→timestamptz). These stay in lock-step with
raw automatically (no loader change); raw remains the source of truth. geom is resolved: feed coords (raw lat/lng) → location
(geocoded location_name) → cluster centroid → none.
Source coordinates are empty in the feed, so geocoding is required:
--geocode-clusters— one coordinate per cluster (coarse fallback).--geocode-locations— precise per-location for actionable INC tickets: strips the network codes fromlocation_name(e.g.NW_,ADR_MNT_,FDT<n>,SDUS), geocodes the real place via a keyed provider (LocationIQ / OpenCage), and **rejects any result25 km from the cluster centroid** (wrong-city guard). Results cache in
tickets.geo_locations.
Columns on tickets.inc
| Column | Type | Notes |
|---|---|---|
ticket_id |
text (PK) | e.g. WOT0715527 |
raw |
jsonb | full source record — the source of truth |
normalized_status · raw_status |
text | use normalized_status for filtering (canonical) |
bucket |
text | lifecycle: closed / pending |
is_actionable |
boolean | the open/closed flag (open = true) |
cluster · region · location_name |
text | region lowercased; cluster feeds the gazetteer |
assigned_team · owner |
text | closure attribution dimensions |
sla_status |
text | source Compliant/Breached — only meaningful once closed |
mttr |
numeric | minutes (source is decimal hours); null until closed |
created_at_service · scheduled_at · closed_at · first_seen_at · last_seen_at · source_created_at · source_updated_at |
timestamptz | EAT→UTC via tickets.eat_ts(). lifecycle = created_at_service→closed_at; export bookkeeping = first_seen_at/last_seen_at/source_* |
latitude · longitude |
double precision | COALESCE(feed, geocoded) — populated from geom |
geom |
geometry(Point,4326) | display / the map |
geog |
geography(Point,4326) | routing — metres-accurate distance (GiST indexed) |
geo_source |
text | precision: feed / location / cluster / none |
ingested_at |
timestamptz | when we last upserted this row |
Dropped from the unpacked columns (still in raw): service_type, is_alarm,
is_auto_created, is_auto_closed (all single-cardinality), plus the ingest-time
drops below. reporting.fn_tickets_for_map reads from raw and serves the map;
tickets.inc_open_sla is the open-ticket SLA view for dashboards/dispatch.
Setup
uv sync
cp .env.example .env # fill in DATABASE_URL, RUSTFS_*, GEOCODER_*
python run_migrations.py # apply the schema (idempotent)
Run
# ingest the newest INC CSV from the bucket (skip-if-unchanged, then archive)
python import_tickets.py --from-bucket --apply
# geocode (needs GEOCODER_API_KEY)
python import_tickets.py --geocode-clusters --apply # coarse, once
python import_tickets.py --geocode-locations --apply # precise, actionable INC
# from a local CSV instead of the bucket (dev)
python import_tickets.py --inc-csv 2026-06-15T17-00-00.csv --apply
Dry-run is the default (omit --apply). import_tickets.py --from-bucket talks to S3
via boto3 using the RUSTFS_* env (path-style addressing; no aws-CLI dependency).
Deploy (Coolify)
The repo ships a Dockerfile — a small batch worker with no web server.
Coolify builds it and keeps the container alive (CMD tail -f /dev/null); the ingest
runs as a Scheduled Task, not a system crontab:
- Command:
python import_tickets.py --from-bucket --apply - Frequency:
15 7-19 * * *(:15past each hour, 07:15–19:15 EAT). This Coolify instance runs scheduled tasks in EAT (Africa/Nairobi), so no UTC conversion is needed. - Env vars (Coolify → Environment Variables):
DATABASE_URL(internal DB host),RUSTFS_*,GEOCODER_*.
Skip-if-unchanged makes a run on an already-ingested snapshot a cheap no-op.
For a plain host/VM instead of Coolify, run_ingest.sh loads .env
and runs the ingest; schedule it with a crontab line
(CRON_TZ=Africa/Nairobi / 15 7-19 * * *).
Notes
- The n8n export writes a full current-state CSV per hour to
automations/inc/<EAT-timestamp>.csv— nolatestpointer, no metadata envelope, no deltas. The loader lists the prefix, takes the newest file, and ingests it. - Skip-if-unchanged: the newest file's S3 ETag is compared to the last processed
file's ETag (stored in
tickets.import_meta.metadata.source_etag); if equal, the DB write is skipped (the export re-emits byte-identical content most hours). - Upsert on
ticket_id(PRIMARY KEY) — duplication is impossible; rows are never deleted, so closed-ticket history accumulates. On success the file is moved toautomations/inc/processed/. - Cleaning at ingest: drop
is_alarm=truerows + theEXPORT STOPPED…sentinel; dropweek_start/week_end,source_s3_*/source_snapshot_id,department/source_type; normalizeregion→ lowercase andraw_status→ UPPERCASE.service_typeandbucket(aclosed/pendingflag) are kept. tickets.import_metacaptures snapshot freshness (surfaced assummary.freshnessbyfn_tickets_for_map).- The curated/geocoded coordinates are written
verified = false— reviewtickets.geo_clusters/tickets.geo_locationsand flipverifiedonce checked.
Querying
-- map payload (GeoJSON + summary, incl. summary.freshness) — what dashboard_api serves
SELECT reporting.fn_tickets_for_map(); -- open-only by default
SELECT reporting.fn_tickets_for_map(p_open_only := false); -- all geocoded tickets
-- open tickets by SLA (derived) + by cluster — via the view
SELECT sla_state, count(*) FROM tickets.inc_open_sla GROUP BY 1;
SELECT cluster, count(*), round(avg(hours_open),1) AS avg_hrs
FROM tickets.inc_open_sla GROUP BY 1 ORDER BY 2 DESC;
-- closures / creations per day (EAT)
SELECT (closed_at AT TIME ZONE 'Africa/Nairobi')::date AS d, count(*)
FROM tickets.inc WHERE closed_at IS NOT NULL GROUP BY 1 ORDER BY 1 DESC;
-- open-backlog-over-time (accrues from first capture; one row per EAT day)
SELECT snapshot_date, open_total, open_breached, closed_today
FROM tickets.inc_daily_snapshot ORDER BY snapshot_date DESC;
-- nearest open tickets to a vehicle (lng, lat) — metres, index-accelerated KNN
SELECT ticket_id, cluster, hours_open,
round(ST_Distance(geog, ST_SetSRID(ST_MakePoint(:lng,:lat),4326)::geography))::int AS metres
FROM tickets.inc_open_sla
ORDER BY geog <-> ST_SetSRID(ST_MakePoint(:lng,:lat),4326)::geography
LIMIT 10;
Data-quality & SLA notes
Findings to keep in mind (see the PRD for detail):
- Source
sla_statusis only meaningful for closed tickets. It readsCompliantfor essentially all open tickets, so for open work use the derived state intickets.inc_open_sla(now() − created_at_servicevs the contract's 48h). created_at_serviceis missing on ~30% of rows (incl. most open ones); the SLA view falls back tofirst_seen_atand flags it viasla_clock_source.mttris not wall-clockclosed_at − created_at_serviceand the source'sBreached/Compliantdoes not match a plain 48h threshold — pin the contract's exact SLA definition before trusting cross-field SLA math.- Content lag: the feed's file timestamps are current, but the ticket content
trails ~2 days (the underlying
…wm_task.xlsxsource), so creation/closure dates run a couple of days behind wall-clock. - History:
tickets.incis current-state (upsert). Closure/creation/MTTR event series work directly offclosed_at/created_at_service. Backlog-over-time now accrues viatickets.inc_daily_snapshot(one row per EAT day, written bytickets.capture_history()each ingest); observed closures log totickets.closure_events. Past backlog can't be reconstructed — the series builds from the first capture onward.
Status / roadmap
Live: INC ingestion deployed on Coolify (hourly 15 7-19 * * * EAT), schema +
generated columns + geocoding + the inc_open_sla view in tracksolid_db.
Next (Phase 2): time-series analytics (closure rate, MTTR/SLA trends), then FleetNow
vehicle dispatch off geog, and team closure attribution. CRQ is a
separate future project that will reuse this machinery against automations/crq/.