david kiania f2408f113e docs: add docs/ — phase-1/phase-2 PRDs + implementation record

- docs/phase-1-ingestion.md  — Phase 1 PRD (INC hourly CSV ingestion; deployed)
- docs/phase-2-dashboard.md  — Phase 2 PRD (inc_dashboard read-API for FleetOps map)
- docs/implementation.md     — as-built record (pipeline, migrations 01-08, deploy, DQ)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-16 01:05:18 +03:00

5.1 KiB

Raw Blame History

PRD (Phase 1) — INC hourly CSV ingestion → tracksolid_db → FleetOps Tickets map

Status: complete and deployed (migrations 01–08, boto3 loader, geocoding, Coolify hourly 15 7-19 * * * EAT). This document is the record of the Phase-1 plan; see README.md and docs/implementation.md for the as-built state.

Scope: INC only

This workflow is strictly for INC (incident / customer-fault tickets). It ingests only automations/inc/<EAT-timestamp>.csv. CRQ (new-installation) exports at automations/crq/ are out of scope and are not processed here; the field transforms below are likewise INC-only.

Context

The client (Rahamafresh / Fireside) runs an n8n workflow that exports field-ops tickets to our S3-compatible bucket every hour:

automations/inc/<EAT-timestamp>.csv — incidents / customer faults (in scope)
automations/crq/<EAT-timestamp>.csv — new-installation requests (out of scope)

(See n8n-hourly-s3-full-data-exports.md. Sample: 2026-06-15T17-00-00.csv.)

fleettickets owns the downstream: the tickets schema in the shared tracksolid_db (raw-jsonb-first tickets.inc, geocoding gazetteers, and reporting.fn_tickets_for_map, which dashboard_api serves to the FleetOps "Tickets" tab). tickets.crq keeps existing but is not fed by this pipeline.

The problem: the loader was written for the old export model — JSON {metadata, records} envelopes at a stable automations/inc/latest.json. That model is gone; the new exports are flat CSV, timestamped per hour, with no latest pointer, no envelope, and no deltas — every hourly file is a full current-state snapshot.

Two driving objectives this pipeline feeds:

SLA tracking — contract requires tickets closed within 48h of created_at_service; closed carry source sla_status + mttr, open need a derived state (now − created_at_service ≥48h breached / ≥36h at-risk).
Vehicle routing (most important) — accurately geocoded open tickets so FleetNow can route nearest vehicles; subsequent: team closure attribution.

Data contract (verified against live snapshots)

32 columns; header + double-quoted values. INC sample = 31,434 rows.
ticket_id is the primary key; the same ticket recurs across snapshots as it moves open → closed. Verified: 31,434 distinct ids per file, 0 in-file dups, same id set every hour (0 added/dropped) → upsert is the dedup mechanism, no TRUNCATE. Consecutive files are often byte-identical → skip-if-unchanged.
is_alarm=true (~10,132 rows, all is_actionable=false) → dropped.
latitude/longitude are empty in the feed → geocoding required.
A garbage sentinel row (ticket_id = "EXPORT STOPPED DUE TO EXCESSIVE SIZE…") is commonly the first data line → filtered by ticket_id prefix.
Timestamps (filenames + data) are EAT (Africa/Nairobi, UTC+3).
bucket is meaningful (closed/pending), distinct from source_s3_bucket.

Approach

Keep the raw-jsonb-first model and everything downstream; only the loader's input path changes: JSON-latest → newest timestamped CSV, plus move-on-success.

Newest file per automations/inc/ (parse YYYY-MM-DDTHH-mm-ss.csv), via boto3 (path-style; no aws-CLI dependency).
Skip-if-unchanged: compare newest S3 ETag to the last processed ETag (tickets.import_meta.metadata.source_etag); equal → skip DB write.
Cleaning at ingest: drop is_alarm=true + sentinel; drop week_start, week_end, source_s3_bucket, source_s3_key, source_snapshot_id, department, source_type; normalize region→lowercase, raw_status→UPPERCASE; keep service_type* and bucket. (*service_type later dropped as constant.)
Upsert on ticket_id (ON CONFLICT DO UPDATE); never delete → closure history accumulates. On success move the file(s) to automations/inc/processed/.
Record snapshot freshness in tickets.import_meta.
Geocoding unchanged: --geocode-clusters (coarse) + --geocode-locations (precise, actionable INC; keyed LocationIQ; 25 km wrong-city guard).

Orchestration

Deployed on Coolify (own app, Dockerfile, keep-alive worker). Ingest runs as a Scheduled Task: python import_tickets.py --from-bucket --apply, cron 15 7-19 * * * in EAT. Env: DATABASE_URL, RUSTFS_*, GEOCODER_*.

Data-quality findings (carried into Phase 2)

Source sla_status ≠ a plain 48h rule, and mttr is not wall-clock — pin the contract's SLA definition before trusting cross-field SLA math.
created_at_service is null on ~30% of rows (incl. most open) → needs a fallback clock (first_seen_at).
Split timestamp semantics: lifecycle = created_at_service→closed_at; export bookkeeping = created_at/updated_at/first_seen_at/last_seen_at.
assigned_team missing ~34% (owner better).
Content lag ~2 days (underlying …wm_task.xlsx source date).

Outcome (as built)

Live in tracksolid_db: tickets.inc (raw + typed generated columns), geocoded to ~99.99%, alarm/sentinel removed, hourly refresh with ETag skip + archive. See docs/implementation.md.

5.1 KiB Raw Blame History Unescape Escape