Building-level names (e.g. 'KAHAWA WENDANI ALVO HOUSE') aren't in OSM, so the
precise forward-geocode 404s and tickets stay on the bare cluster centroid
(observed 0/133 placed). geocode_locations now tries an ordered set of
candidates per location (compose_queries): full precise -> estate (leading 2
tokens) -> leading token, each constrained by the existing cluster viewbox +
25km distance check, accepting the FIRST in-range hit. This places tickets in
the right neighbourhood (e.g. 'KAHAWA WENDANI', 'BAMBURI') instead of the broad
cluster centroid. Wrong-area matches for ambiguous coarse tokens are rejected by
the distance check and fall through; genuinely unmatchable tickets keep the
honest cluster-centroid fallback (no pure-cluster candidate, which would only
mislabel the centroid as geo_source='location'). Verified the cascade finds
hits against live LocationIQ on real samples.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified each finding against the code (+ profiled the 31k-row CSV sample);
implemented only the genuinely valid fixes:
- import_tickets.py: fold _record_meta into the upsert transaction so rows +
snapshot meta commit atomically (BUG 2); guard _ts_from_key against
regex-matching-but-invalid dates so the sort can't crash (BUG 11);
extract_place now splits glued NW prefixes (~1.7k rows, e.g. NWKIAMBU→KIAMBU)
and only drops a trailing '-<seg>' when it's a unit/instruction code, keeping
real-word tails like '-MALL' (BUG 14). Scoped glued-split to NW only —
CO/NE/SE begin real words (COAST/NEW/SEASONS) per the data.
- Dockerfile + pyproject.toml: install from pyproject (single source of truth)
instead of mirroring deps; add build-system + py-modules so `pip install .`
works for the flat-module layout (BUG 9).
- migrations/03_inc_columns.sql: document the eat_ts IMMUTABLE/tzdata footgun
and the manual-recompute path (BUG 6).
- .gitignore: narrow *.json → *.local.json so real fixtures can be versioned;
ignore build/ and *.egg-info/ (BUG 10).
Reclassified/skipped as invalid or by-design: BUG 1, 3, 4, 5, 7, 8, 12, 13.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- tickets.closure_events: append-only observed closures (PK ticket_id, closed_at;
observed_at = first sighting; survives row churn).
- tickets.inc_daily_snapshot: one row per EAT day — open backlog (+ SLA split, by
cluster/status) and created/closed flow; upserted each run.
- tickets.capture_history(): appends new closures + upserts today's snapshot.
- import_tickets calls it after each --apply run (ingest or skip); add
--capture-history CLI flag for standalone runs.
Verified: backfilled 21,282 closures; today's snapshot recorded (open_total 30).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Replace the aws-CLI subprocess calls with boto3 (list_objects_v2 paginator,
get_object, copy_object+delete_object) using path-style addressing + RUSTFS_*
env. Removes the external aws-CLI dependency so it runs in a slim container.
- Add boto3 to pyproject dependencies.
- Add Dockerfile (python:3.12-slim, deps, TZ=Africa/Nairobi, keep-alive CMD) and
.dockerignore for Coolify; document Coolify Scheduled Task setup in README.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rework import_tickets.py from the retired JSON `latest.json` model to the new
hourly full-snapshot CSV export. Strictly INC (CRQ out of scope).
- Ingest the newest automations/inc/<EAT-timestamp>.csv; skip-if-unchanged by
comparing S3 ETag to tickets.import_meta.metadata.source_etag.
- Upsert on ticket_id (PK; no dups, never delete -> closure history accrues).
No truncate. On success, move processed files to automations/inc/processed/.
- Clean at ingest: drop is_alarm=true + the "EXPORT STOPPED..." sentinel; drop
week_*, source_s3_*/source_snapshot_id, department/source_type; lowercase
region, uppercase raw_status; keep service_type + bucket.
- Force path-style S3 addressing; --inc-csv for local dev; --from-bucket for cron.
- Add migrations/02 (import_meta + freshness); refresh README/.env.example/docs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>