Commit graph

8 commits

Author SHA1 Message Date
david kiania
509338c076 feat(import_tickets): migrate INC ingest to isptickets bucket + --reseed cutover
Provider moved the INC CDC feed to a new bucket (tickets -> isptickets, new
per-bucket creds; same s3.rahamafresh.com endpoint, identical 32-col schema).
This is config + a one-time reseed, not a rewrite — the loader already drains
automations/inc/changes/ oldest->newest with a source_max_key watermark.

- default _BUCKET -> isptickets (TICKETS_BUCKET still overrides)
- add --reseed: ignore the stored watermark and drain every changes/ file once
  (the old-bucket watermark may post-date the new bucket's first file). Crash-safe
  via the existing per-file watermark-advance + archive loop.
- refresh stale "newest-file / full-snapshot-per-hour" docstring/comments to the
  CDC reality; .env.example + README updated (new bucket + reseed runbook).

Verified live dry-run: 41/41 files drained (watermark None), alarm/sentinel
filter active, exit 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 18:20:15 +03:00
david kiania
a4b90a33d8 fix(inc): ingest the incremental changes/ stream (baseline + deltas)
The S3 source switched from full hourly snapshots at
automations/inc/<ts>.csv to an incremental CDC stream at
automations/inc/changes/<ts>.csv (first file = full baseline, each later
file = only the rows that changed, keyed by ticket_id; no deletions).

The loader still pointed at the old root path and only ingested the single
newest file, so after the switch it found nothing (no new tickets ingested)
and, even with the path fixed, would silently drop intermediate deltas.

Changes:
- point ingestion at automations/inc/changes/ (_CHANGE_KEY_RE)
- ingest EVERY not-yet-processed file in ascending timestamp order
  (baseline first, then each delta), upserting each
- replace the single-ETag skip with a per-file timestamp watermark
  (import_meta.metadata->>'source_max_key'); rows + watermark commit in one
  txn per file, then archive to processed/ — so a mid-run failure leaves a
  consistent, resumable state
- docs: rename n8n-hourly-s3-full-data-exports.md -> n8n-s3-ticket-exports.md
  and rewrite it for the incremental stream; fix the reference in
  docs/phase-1-ingestion.md

Verified live against prod: re-seeded baseline + 5 deltas (26,529 rows),
files archived to processed/, watermark advanced, re-run is a no-op.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:37:17 +03:00
david kiania
e71c8914f1 feat(geocode): two-pass estate fallback for building-level location_names
Building-level names (e.g. 'KAHAWA WENDANI ALVO HOUSE') aren't in OSM, so the
precise forward-geocode 404s and tickets stay on the bare cluster centroid
(observed 0/133 placed). geocode_locations now tries an ordered set of
candidates per location (compose_queries): full precise -> estate (leading 2
tokens) -> leading token, each constrained by the existing cluster viewbox +
25km distance check, accepting the FIRST in-range hit. This places tickets in
the right neighbourhood (e.g. 'KAHAWA WENDANI', 'BAMBURI') instead of the broad
cluster centroid. Wrong-area matches for ambiguous coarse tokens are rejected by
the distance check and fall through; genuinely unmatchable tickets keep the
honest cluster-centroid fallback (no pure-cluster candidate, which would only
mislabel the centroid as geo_source='location'). Verified the cascade finds
hits against live LocationIQ on real samples.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 18:51:58 +03:00
david kiania
dca2c94c75 fix: address valid findings from 20260618 bug report
Verified each finding against the code (+ profiled the 31k-row CSV sample);
implemented only the genuinely valid fixes:

- import_tickets.py: fold _record_meta into the upsert transaction so rows +
  snapshot meta commit atomically (BUG 2); guard _ts_from_key against
  regex-matching-but-invalid dates so the sort can't crash (BUG 11);
  extract_place now splits glued NW prefixes (~1.7k rows, e.g. NWKIAMBU→KIAMBU)
  and only drops a trailing '-<seg>' when it's a unit/instruction code, keeping
  real-word tails like '-MALL' (BUG 14). Scoped glued-split to NW only —
  CO/NE/SE begin real words (COAST/NEW/SEASONS) per the data.
- Dockerfile + pyproject.toml: install from pyproject (single source of truth)
  instead of mirroring deps; add build-system + py-modules so `pip install .`
  works for the flat-module layout (BUG 9).
- migrations/03_inc_columns.sql: document the eat_ts IMMUTABLE/tzdata footgun
  and the manual-recompute path (BUG 6).
- .gitignore: narrow *.json → *.local.json so real fixtures can be versioned;
  ignore build/ and *.egg-info/ (BUG 10).

Reclassified/skipped as invalid or by-design: BUG 1, 3, 4, 5, 7, 8, 12, 13.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 13:41:38 +03:00
david kiania
764dee986f feat: history capture — closure_events + daily backlog snapshot (migration 10)
- tickets.closure_events: append-only observed closures (PK ticket_id, closed_at;
  observed_at = first sighting; survives row churn).
- tickets.inc_daily_snapshot: one row per EAT day — open backlog (+ SLA split, by
  cluster/status) and created/closed flow; upserted each run.
- tickets.capture_history(): appends new closures + upserts today's snapshot.
- import_tickets calls it after each --apply run (ingest or skip); add
  --capture-history CLI flag for standalone runs.
Verified: backfilled 21,282 closures; today's snapshot recorded (open_total 30).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:19:23 +03:00
david kiania
68f2b99cd3 feat: S3 via boto3 + Dockerfile for Coolify deploy
- Replace the aws-CLI subprocess calls with boto3 (list_objects_v2 paginator,
  get_object, copy_object+delete_object) using path-style addressing + RUSTFS_*
  env. Removes the external aws-CLI dependency so it runs in a slim container.
- Add boto3 to pyproject dependencies.
- Add Dockerfile (python:3.12-slim, deps, TZ=Africa/Nairobi, keep-alive CMD) and
  .dockerignore for Coolify; document Coolify Scheduled Task setup in README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 20:08:05 +03:00
david kiania
df054c92be feat: INC hourly-CSV ingestion (newest-file, ETag dedup, clean + archive)
Rework import_tickets.py from the retired JSON `latest.json` model to the new
hourly full-snapshot CSV export. Strictly INC (CRQ out of scope).

- Ingest the newest automations/inc/<EAT-timestamp>.csv; skip-if-unchanged by
  comparing S3 ETag to tickets.import_meta.metadata.source_etag.
- Upsert on ticket_id (PK; no dups, never delete -> closure history accrues).
  No truncate. On success, move processed files to automations/inc/processed/.
- Clean at ingest: drop is_alarm=true + the "EXPORT STOPPED..." sentinel; drop
  week_*, source_s3_*/source_snapshot_id, department/source_type; lowercase
  region, uppercase raw_status; keep service_type + bucket.
- Force path-style S3 addressing; --inc-csv for local dev; --from-bucket for cron.
- Add migrations/02 (import_meta + freshness); refresh README/.env.example/docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 19:33:16 +03:00
david kiania
4631cc6382 feat: fleettickets — INC/CRQ ticket ingestion, geocoding + read-schema
Standalone module extracted from the tracksolid repo (was migrations 21-23 +
tools/import_tickets.py). Owns the `tickets` schema in the shared tracksolid_db.

- migrations/01_tickets_schema.sql: consolidated final-state schema (tickets.inc/
  crq raw-jsonb-first, geo_clusters + geo_locations gazetteers, geom trigger,
  reporting.fn_tickets_for_map)
- import_tickets.py: rustfs bucket ingest + cluster/location geocoding
  (LocationIQ/OpenCage, viewbox-bounded + cluster-distance guard)
- run_migrations.py, shared.py (self-contained), pyproject, .env.example, README

The DB stays in tracksolid_db; dashboard_api keeps serving /webhook/tickets; the
Tickets map stays a FleetOps tab.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:13:50 +03:00