No description
Find a file
david kiania df054c92be feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive)
Rework import_tickets.py from the retired JSON `latest.json` model to the new
hourly full-snapshot CSV export. Strictly INC (CRQ out of scope).

- Ingest the newest automations/inc/<EAT-timestamp>.csv; skip-if-unchanged by
  comparing S3 ETag to tickets.import_meta.metadata.source_etag.
- Upsert on ticket_id (PK; no dups, never delete -> closure history accrues).
  No truncate. On success, move processed files to automations/inc/processed/.
- Clean at ingest: drop is_alarm=true + the "EXPORT STOPPED..." sentinel; drop
  week_*, source_s3_*/source_snapshot_id, department/source_type; lowercase
  region, uppercase raw_status; keep service_type + bucket.
- Force path-style S3 addressing; --inc-csv for local dev; --from-bucket for cron.
- Add migrations/02 (import_meta + freshness); refresh README/.env.example/docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 19:33:16 +03:00
migrations feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
.env.example feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
.gitignore feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
import_tickets.py feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
n8n-hourly-s3-full-data-exports.md feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
n8n-s3-export-workflows.md feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
pyproject.toml feat: fleettickets — INC/CRQ ticket ingestion, geocoding + read-schema 2026-06-11 20:13:50 +03:00
README.md feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive) 2026-06-15 19:33:16 +03:00
run_migrations.py feat: fleettickets — INC/CRQ ticket ingestion, geocoding + read-schema 2026-06-11 20:13:50 +03:00
shared.py feat: fleettickets — INC/CRQ ticket ingestion, geocoding + read-schema 2026-06-11 20:13:50 +03:00

fleettickets

Field-ops INC ticket ingestion, geocoding, and read-schema that powers the Tickets map in FleetOps. Extracted from the tracksolid repo into its own module (it previously lived there as migrations 2123 + tools/import_tickets.py).

  • INC — incident / customer-fault tickets (this pipeline is strictly INC)
  • CRQ — new-installation requests (schema kept, but out of scope — not ingested here)

What this owns

Piece What
migrations/01_tickets_schema.sql The tickets schema: tickets.inc / tickets.crq (raw-jsonb-first), tickets.geo_clusters + tickets.geo_locations gazetteers, geom-resolution trigger, and reporting.fn_tickets_for_map (the GeoJSON read function)
migrations/02_import_meta.sql tickets.import_meta (per-dataset snapshot envelope metadata) + fn_tickets_for_map re-defined to expose it as summary.freshness (same signature — dashboard_api unchanged)
import_tickets.py Ingests the newest INC CSV from the rustfs tickets bucket (automations/inc/<EAT-timestamp>.csv) and upserts on ticket_id; geocodes clusters + INC locations
run_migrations.py Applies migrations/*.sql in order (ledger: tickets.schema_migrations)
shared.py Minimal DB/logging helpers (self-contained — no tracksolid dependency)

What this does NOT own (stays where it is)

  • The DB — the tickets schema lives in the shared tracksolid_db.
  • The read-APIdashboard_api (in the tracksolid stack) serves GET /webhook/tickets, which calls reporting.fn_tickets_for_map (defined here).
  • The frontend — the Tickets map is a tab in the FleetOps SPA (fleetops repo).

Data model (raw-first)

Each row is just ticket_id + raw (the full source record as jsonb) + a derived geom / geo_source. Everything reads from raw, so a change to the source schema needs no migration. geom is resolved: feed coords (raw lat/lng) → location (geocoded location_name) → cluster centroid → none.

Source coordinates are empty in the feed, so geocoding is required:

  • --geocode-clusters — one coordinate per cluster (coarse fallback).
  • --geocode-locations — precise per-location for actionable INC tickets: strips the network codes from location_name (e.g. NW_, ADR_MNT_, FDT<n>, SDUS), geocodes the real place via a keyed provider (LocationIQ / OpenCage), and **rejects any result

    25 km from the cluster centroid** (wrong-city guard). Results cache in tickets.geo_locations.

Setup

uv sync
cp .env.example .env        # fill in DATABASE_URL, RUSTFS_*, GEOCODER_*
python run_migrations.py    # apply the schema (idempotent)

Run

# ingest the newest INC CSV from the bucket (skip-if-unchanged, then archive)
python import_tickets.py --from-bucket --apply

# geocode (needs GEOCODER_API_KEY)
python import_tickets.py --geocode-clusters  --apply   # coarse, once
python import_tickets.py --geocode-locations --apply   # precise, actionable INC

# from a local CSV instead of the bucket (dev)
python import_tickets.py --inc-csv 2026-06-15T17-00-00.csv --apply

Dry-run is the default (omit --apply). import_tickets.py --from-bucket shells out to the aws CLI using the RUSTFS_* env (no boto3 dependency). Hourly on the instance via cron 5 * * * * (a few minutes after each export).

Notes

  • The n8n export writes a full current-state CSV per hour to automations/inc/<EAT-timestamp>.csv — no latest pointer, no metadata envelope, no deltas. The loader lists the prefix, takes the newest file, and ingests it.
  • Skip-if-unchanged: the newest file's S3 ETag is compared to the last processed file's ETag (stored in tickets.import_meta.metadata.source_etag); if equal, the DB write is skipped (the export re-emits byte-identical content most hours).
  • Upsert on ticket_id (PRIMARY KEY) — duplication is impossible; rows are never deleted, so closed-ticket history accumulates. On success the file is moved to automations/inc/processed/.
  • Cleaning at ingest: drop is_alarm=true rows + the EXPORT STOPPED… sentinel; drop week_start/week_end, source_s3_*/source_snapshot_id, department/source_type; normalize region → lowercase and raw_status → UPPERCASE. service_type and bucket (a closed/pending flag) are kept.
  • tickets.import_meta captures snapshot freshness (surfaced as summary.freshness by fn_tickets_for_map).
  • The curated/geocoded coordinates are written verified = false — review tickets.geo_clusters / tickets.geo_locations and flip verified once checked.