Commit graph

95 commits

Author SHA1 Message Date
kianiadee
94cbd2a85e docs: document FleetNow merged dashboard + read-API topology + FIX-D03
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
- New '3. Map dashboards & read-API' subsection: the three SPAs (liveposition,
  fleetintelligence, fleetnow), how dashboard_api is deployed (standalone bridge
  container, not Coolify), and that FleetNow lives in its own repo.
- FIX-D03: fleetnow CORS origin + the deploy-script strip/guard fixes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:09:48 +03:00
kianiadee
d95e5c2dbd config(cors): allow fleetnow.rahamafresh.com origin on dashboard_api
The merged FleetNow dashboard (separate repo, Coolify) reads this read-API, so
its origin must be in DASHBOARD_CORS_ORIGINS. Added to the code default; live
config is set via the env in ~/deploy_dashboard_api.sh on the host.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 10:07:46 +03:00
david kiania
9986d3b411 docs(claude): add pending Grafana redeploy to open items (post ops/dwh_gold purge)
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 20:39:56 +03:00
david kiania
8c5a43f3b8 chore(db): purge unused ops + dwh_gold schemas
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Drop the dormant ops (workshop / tickets / dispatch / SLA / odometer)
and dwh_gold (nightly ETL aggregates) schemas plus their dependents —
features never implemented, no live writer or scheduled refresh.

- Prod DB (already applied): DROP SCHEMA ops/dwh_gold CASCADE, plus
  tracksolid.dispatch_log, v_sla_inflight, v_utilisation_daily.
- migrations/12_drop_ops.sql + 13_drop_dwh_gold.sql (forward, all
  IF EXISTS) registered in run_migrations.py for rebuild durability.
- grafana: removed 8 now-broken panels (In-flight SLA, Idle Cost,
  Utilisation Heatmap, Row 7 Field-Service SLAs) from daily_operations;
  panel count 21 -> 13.
- docs: scrubbed CLAUDE.md, PLATFORM_OVERVIEW.html (-19KB), DATA_FLOW.md;
  pre-drop seed snapshot in docs/reports/260605_ops_purge_backup.md.

The separate tracksolid_dwh server (31.97.44.246:5888) is unrelated
and untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 18:11:03 +03:00
david kiania
e060933c55 docs: record fleet-dashboard filter fix + v_trips self-refresh
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Update PLATFORM_OVERVIEW.html (§2 migration, §4 read-API, §5 refresh_log,
§7 ops notes) and CLAUDE.md §7 fix history (FIX-D01, FIX-D02) to reflect
the two 2026-06-05 fixes that closed out the n8n→fleetapi cutover:
form-urlencoded POST body parsing, and moving the reporting.v_trips
matview refresh from the retired n8n job into dashboard_api.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 16:58:20 +03:00
david kiania
30b351576c feat(api): self-refresh reporting.v_trips in dashboard_api
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
The Fleet Trips dashboard reads reporting.v_trips (a materialized view).
Its refresh was a scheduled n8n workflow; when n8n was retired the matview
froze (last refresh 2026-06-01) so the dashboard showed no recent trips
even though tracksolid.trips kept ingesting live.

Move the refresh into the owned stack: a background loop in dashboard_api
runs REFRESH MATERIALIZED VIEW CONCURRENTLY reporting.v_trips every
VTRIPS_REFRESH_INTERVAL_S (default 300s). Safe across uvicorn --workers
via a pg advisory lock (one worker refreshes per tick); runs in a thread
so the ~9s refresh never blocks the event loop; logs to
reporting.refresh_log (source='dashboard_api') for continuity. Uses a
dedicated autocommit connection because REFRESH ... CONCURRENTLY cannot
run inside a transaction block.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 16:25:45 +03:00
david kiania
f1387d1476 fix(api): parse form-urlencoded POST body in fleet-dashboard handler
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
The Fleet Trips SPA posts application/x-www-form-urlencoded, but the
POST /webhook/fleet-dashboard handler read the body with request.json().
That threw on every request, the except swallowed it to body={}, and all
filters (vehicle_numbers, cost_centre, assigned_city) plus period/dates
were dropped — so every query returned the full unfiltered fleet (1,266
trips) regardless of the dropdowns. The map/KPIs/trips never changed,
which read as "the dropdowns don't work."

Parse by Content-Type: urllib.parse.parse_qs for form bodies (no new
dependency — avoids python-multipart), JSON still accepted defensively
for n8n-compat callers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 13:23:10 +03:00
david kiania
26fa1a4dc5 docs: add Grafana dashboards appendix + link PLATFORM_OVERVIEW from CLAUDE.md
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Adds section 6 (Grafana dashboards) to PLATFORM_OVERVIEW.html, generated from
the provisioned dashboard JSON: every panel in the NOC Fleet (9 panels) and
Daily Operations (23 panels) dashboards with type and source view/table.
Renumbers Operational notes to section 7. Links the doc from the CLAUDE.md
codebase map.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:50:48 +03:00
david kiania
83a2d06148 docs: add PLATFORM_OVERVIEW.html — current-state platform reference
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Self-contained HTML reference generated from the live DB, documenting the
platform after the maps moved off n8n onto dashboard_api (fleetapi). Covers
architecture/data flow, the n8n→fleetapi migration, deployment topology,
the read-API endpoint reference, and the full database schema — every table
(with columns + row estimates), view, and function across tracksolid /
reporting / ops / dwh_gold / public — plus operational notes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:47:18 +03:00
david kiania
00e81a063b feat(db): capture reporting.* map-dashboard schema as migration 11
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
The reporting schema (fn_live_positions/fn_vehicle_track/fn_trips_for_map,
the v_trips materialized view + indexes, filter/summary views, refresh_log)
backs the dashboard_api map endpoints but existed only on the prod DB, in no
migration — a rebuild would have lost it. Captured the live DDL into
migrations/11_reporting_schema.sql (idempotent: IF NOT EXISTS / CREATE OR
REPLACE, search_path set for unqualified base-table refs, guarded grants) and
registered it in run_migrations.py. Verified it applies cleanly against prod
inside a rolled-back transaction.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:32:44 +03:00
david kiania
831f683b83 fix(api): expose /webhook/live-positions/track so map trail matches SPA path
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
The Live Positions SPA calls GET /webhook/live-positions/track, but the
read-API only exposed /webhook/vehicle-track. Clicking a vehicle to view its
1-hour trail therefore 404'd even after repointing N8N_BASE. Register the SPA's
actual path as a route alias to the same handler (vehicle-track kept as alias),
so the only frontend change remains the base URL. Docstring updated to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 00:54:58 +03:00
david kiania
5703d70aa6 feat(api): dedicated FastAPI read-API for map dashboards (replaces n8n)
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
n8n was a thin HTTP->SQL proxy for the Live Position and Fleet Trips maps and
proved fragile (credential reloads, :latest drift, shared connection limits).
This service calls the same proven reporting.* functions directly, reusing the
existing psycopg2 pool / Docker image / Coolify deploy.

Endpoints mirror the n8n webhook paths so the only frontend change is N8N_BASE:
  GET  /webhook/live-positions  -> {summary, geojson}   (fn_live_positions)
  GET  /webhook/vehicle-track   -> GeoJSON Feature       (fn_vehicle_track)
  GET  /webhook/fleet-dashboard -> filter options
  POST /webhook/fleet-dashboard -> trips payload         (fn_trips_for_map)

Response shapes replicate the n8n "Build response JSON" nodes exactly; empty
filters/sentinels ('', null, undefined) normalize to SQL wildcards. CORS limited
to the dashboard origins. Added dashboard_api service to docker-compose (port
8890, Coolify-routed). SQL contracts validated against prod.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 04:23:37 +03:00
david kiania
e5b0e192d8 chore(repo): reorganize tree into migrations/ data/ legacy/ docs/
Group root-level files (accreted from incremental changes) by purpose
without moving any deployment entrypoint or breaking imports:

- migrations/  : numbered SQL 02-10
- data/        : source CSVs
- legacy/      : superseded pre-_rev scripts + old pipeline notes (not deployed)
- docs/{manuals,reference,reports}/ : loose manuals, references, reports
- strip stray ** / *** prefixes from 5 doc filenames
- delete empty documents.txt / push_webhook.md

Reference updates so nothing breaks:
- run_migrations.py  -> /app/migrations/<file>
- run_migrations.sh  -> $SCRIPT_DIR/migrations
- import_drivers_csv.py -> data/<csv>
- docker-compose.yaml -> runbook path comment
- CLAUDE.md -> codebase map + inline doc references

Deployed Python (3 services + ts_shared_rev + run_migrations) and the
documented ops one-shots stay at root, preserving the flat-import layout
and all documented commands. Verified: py_compile clean across all modules,
every MIGRATIONS entry resolves under migrations/, CI-referenced paths
(tests/, mypy targets, db_audit) and the grafana build context intact.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 02:27:30 +03:00
david kiania
2309464ab8 FIX-M21: alarm cross-feed + stale-IMEI recovery for live_positions
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Cherry-pick of c8f5907 (originally FIX-M20 on main) onto
quality-program-2026-04-12 — renamed to FIX-M21 here to avoid clashing
with this branch's existing [FIX-M20] (trip enrichment, commit 144dede).
Behaviour and code are unchanged from the main-branch original; the
annotation tag is the only difference.

Background
----------
A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two
freshness gaps that share a single root cause: tracksolid.live_positions
was being written by only one path (the 60s polled sweep), and that path
silently omits devices that don't have a "current" fix in Jimi's
location.list response. Effect on the dashboard:

  * 18 vehicles show OFFLINE for days-to-months — last fix is whatever
    the sweep wrote before Jimi dropped them.
  * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback
    because their dedicated tracker has been silent; the camera's lat/lng
    arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but
    we discard it after writing to tracksolid.alarms.

Verified upstream subscription state: only /pushalarm is registered with
Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are
inactive. This change uses only data that already arrives.

What's in this commit
---------------------
ts_shared_rev.py
  * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None)
    — single time-guarded upsert all three writers will share. Guards on
    is_valid_fix() (filters Zero-Island and out-of-range) and
    EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or
    webhook retries can't rewind a fresher marker. COALESCE on optional
    columns so sparse callers don't blank dense ones' values.
  * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices
    whose live_positions.gps_time is NULL or older than the threshold,
    ordered NULLS FIRST so worst-offenders are in batch #1.
  * ensure_device(cur, imei, device_name=None) — relocated from
    webhook_receiver_rev so every live_positions writer can satisfy the
    FK without re-defining the helper. The original underscore-prefixed
    name in webhook_receiver_rev becomes a backwards-compat alias.

webhook_receiver_rev.py
  * /pushalarm — after the alarm row insert, call upsert_live_position
    with the alarm's lat/lng and alarmTime. Sits inside the existing
    per-item SAVEPOINT, so a cross-feed failure rolls back only that
    one alarm's cross-feed, not the alarm row.

ingest_movement_rev.py
  * poll_live_positions — inline INSERT replaced with upsert_live_position
    (extras dict carries the sweep-only columns). Same data, time-guarded.
  * get_device_locations — inline INSERT replaced; also gains an
    ensure_device call so it can be safely fed arbitrary IMEIs.
  * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and
    hands it to get_device_locations. Scheduled every 10 minutes plus a
    startup catch-up call. Uses jimi.device.location.get which returns
    *last-known* fix, so devices the 60s sweep drops can be re-warmed.

Expected post-deploy effect (estimates, see
06_live_location/260521_timescale_location_upgrade_major.md §4)
  * ~1,100-1,600 additional live_positions upserts/day from the alarm
    cross-feed, after the time-guard rejects ~70-80% of races vs the
    fresher 60s sweep.
  * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence
    (JC400P emits ~107 alarms/day per device).
  * 8-14 of the 24 OFFLINE plates expected to recover via location.get's
    last-known-fix path within the first 30 minutes.
  * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour.
  * No 06_live_location code changes required — reads through
    reporting.v_live_positions transparently.

Tests
-----
12 webhook integration tests pass (3 new: cross-feed fires on valid fix;
skips without lat/lng; skips Zero-Island). 8 new unit tests in
test_stale_imeis.py cover the stale selector, the poll wrapper, and the
time-guard contract on upsert_live_position. Full suite: 77 passed.

Deployment
----------
No schema migration. Both webhook_receiver and ingest_movement
containers must be rebuilt — source is image-baked, not bind-mounted.
Rollback is git revert + rebuild.

Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md
Verification playbook:  06_live_location/260521_timescale_location_upgrade_verification.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 22:33:21 +03:00
David Kiania
3b79d5a62e revert(infra): remove pgAdmin4 sidecar and configs
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 00:34:10 +03:00
David Kiania
bc020cb1a8 feat(infra): add pgAdmin4 web sidecar pointed at pgbouncer
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Phase 2 of the pgbouncer + pgAdmin rollout. pgAdmin4 runs as a Coolify-
managed container on the same Docker network as pgbouncer, with a
pre-registered server entry so the tracksolid_db (via pgbouncer) tree
appears immediately on first login.

Net effect: admin tooling moves on-VM (low latency, persistent workspace
in pgadmin-data volume) and connects through pgbouncer:6432 in transaction
mode, so opening many Query Tool tabs no longer exhausts max_connections.
The desktop pgAdmin can be retired once this is verified live, after
which host port 5433 can also be closed.

Requires PGADMIN_DEFAULT_EMAIL and PGADMIN_DEFAULT_PASSWORD in the
Coolify env, plus a subdomain mapping to this service on port 80.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 14:03:32 +03:00
David Kiania
f3ad612a1c fix(infra): drop pgbouncer image tag — pin 1.23.1 unavailable
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
The pinned tag failed to pull on Coolify deploy. Switching to the
untagged edoburu/pgbouncer (rolling latest) so the sidecar can come up.
Will revisit pinning to a known-good tag once verified live.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 13:48:31 +03:00
David Kiania
e811dd8f34 feat(infra): add pgbouncer sidecar to cap tracksolid_db connections
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 13:21:35 +03:00
David Kiania
737ca67712 feat(analytics): add v_driver_clock_daily/today views for tardiness monitoring
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Two read-only views in the tracksolid schema feeding n8n's working-hours
checks: per-IMEI per-Nairobi-day reporting/closing times, start/end
locations + Nominatim addresses, and trip-count/km/drive-hours context.
No policy embedded; cost-centre filtering and tardiness thresholds live
in n8n.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 14:03:40 +03:00
David Kiania
f94d14864f feat(trips): add --skip-geocode flag to backfill script
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
The historical trips table is much larger than the spec assumed (7,634
rows on prod, not the 8 the CLAUDE.md snapshot suggested). Reverse-geocoding
all of them via Nominatim's 1 req/sec TOS throttle would take ~4¼ hours
end-to-end.

--skip-geocode bypasses the Nominatim calls entirely. Geometry, plate, and
idle backfills run in minutes; addresses stay NULL on historical rows and
will only be populated for future trips by poll_trips.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 22:12:07 +03:00
David Kiania
144dedee90 feat(trips): [FIX-M20] enrich tracksolid.trips with coords, route polyline, addresses, plate
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Polling jimi.device.track.mileage does not return start/end coordinates,
fuel, idle, or trip sequence — leaving most trip columns NULL. This change
closes those gaps using data we already have in position_history plus a
best-effort Nominatim lookup.

Migration 09_trips_enrichment.sql adds:
  • route_geom (LineString), start_address, end_address, vehicle_plate,
    waypoints_count on tracksolid.trips
  • GIST indexes on the three geometry columns
  • view tracksolid.v_trips_enriched exposing daily_seq + trip_date_eat
    (replaces reliance on the device-supplied trip_seq, which is only
     populated when /pushtripreport fires)

ingest_movement_rev.py::poll_trips now:
  • extracts idleSecond from the poll response (was previously dropped)
  • per-trip: SELECTs start fix, end fix, ST_MakeLine route, and waypoint
    count from position_history within (start_time, end_time)
  • reverse-geocodes start/end via the new ts_shared_rev.reverse_geocode
    helper (Nominatim, LRU-cached at ~11m precision, 1 req/sec, never raises)
  • caches vehicle_plate from a per-cycle plates dict
  • ON CONFLICT preserves webhook-supplied data when /pushtripreport later
    delivers native coords/fuel/trip_seq

backfill_trips_enrichment.py is a one-shot script (dry-run by default,
--apply to commit, --imei / --since flags) that runs the same enrichment
against historical NULL rows and COALESCEs only — never overwrites.

DWH bronze mirrors and Grafana panels intentionally not touched (frozen
on this branch until the schema work lands).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 21:30:20 +03:00
David Kiania
898fd25a5a feat(analytics): Phase 0 — analytics-config migration and CSV importer rewrite
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Phase 0 of the three-stakeholder analytics redesign:

- 08_analytics_config.sql: ops.cost_rates + ops.kpi_targets with seed
  fuel rates (KES 195/L NBO+MBA, UGX 5200/L KLA) and 6 seed KPI
  targets (utilisation_pct, idle_pct global+osp-patrol,
  fuel_kes_per_100km, mttr_hours, alarms_per_100km). Granted SELECT to
  grafana_ro. Wired into run_migrations.py MIGRATIONS.

- import_drivers_csv.py: full rewrite for the new Mitieng CSV
  (20260427_FSG_Vehicles_mitieng.csv). Snake_case columns, drops
  _infer_city() plate-prefix logic in favour of reading assigned_city
  directly. Adds cost_centre, assigned_route, vehicle_category,
  vehicle_brand, fuel_100km, depot_address. Treats the literal "NULL"
  string as missing. Reuses clean(), clean_num(), clean_ts(),
  get_conn(), get_logger() from ts_shared_rev. Special-cases numeric
  and timestamptz columns in the UPDATE clause.

- audit_device_reconciliation.py: read-only audit comparing the CSV
  against tracksolid.devices. Reports per-account row counts, IMEIs
  on one side only, and devices on both sides whose metadata is still
  NULL.

- 260427_device_reconciliation.md + 260427_audit_output.txt: Phase 0.2
  reconciliation record. First run: DB has 172 devices, CSV has 162,
  delta +10 (10 IMEIs in DB-only, mostly fireside-account auto-syncs).
  Importer run with --only-null --apply filled 154 rows; coverage now
  assigned_city 152/172, cost_centre 150/172.

Applied to stage on 2026-04-27 23:35 UTC.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 23:42:37 +03:00
David Kiania
5418fc48c5 fix(api): map new Mitieng CSV columns in tracksolid_update_v2
Switches column references from the old title-case logistics CSV
(IMEI, Device Name, License Plate No., Telephone, Fuel/100km, ...) to
the snake_case Mitieng export shipped on 2026-04-27 (imei, device_name,
vehicle_number, driver_phone, fuel_100km, ...). Without this, the bulk
device-update API tool fails with KeyError: 'IMEI' on the new CSV.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 23:42:20 +03:00
David Kiania
0b45f8d0f7 fix(grafana): raise geomap maxZoom from 12 to 22 for full-resolution drill-in
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Carto basemap tiles render up to ~19-20; OpenLayers caps at 28. 22 leaves
no practical ceiling for street-level inspection while keeping the EAC-
bounded minZoom in place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 18:32:14 +03:00
David Kiania
bf17d5fa80 fix(grafana): tighten Active Vehicles map to Kenya, Uganda and Tanzania
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Recentred geomap view from lat -2.0/lon 35.5/zoom 5 to lat -3.0/lon 34.5/
zoom 5.5 (Lake Victoria area, the geographic intersection of the three
countries) and raised minZoom to 5.5 so the dashboard can't be panned out
to show neighbouring countries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 18:31:01 +03:00
David Kiania
80c0e6510f fix(grafana): stop SI auto-scaling on km/hours stats; bound geomap to East Africa
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Grafana's lengthkm and h units auto-scale with SI prefixes — fleet km
totals rendered as "Mm" (megametres) and drive-hour totals as days/weeks,
which read as "millions" and "weeks" on the Daily Ops dashboard. Switched
the affected panels (Fleet km today, Drive/Idle hours today, the per-vehicle
roll-up table, the driver leaderboard, and the 7-day distance trend) to
unit "none" with decimals: 1 so values stay in km/h with units carried by
panel titles and column displayNames.

Geomap view recentred to lat -2.0, lon 35.5, zoom 5 with minZoom 5 /
maxZoom 12 so the Active Vehicles map opens on the East African Community
region and cannot zoom out past it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 17:25:58 +03:00
David Kiania
34f5fa1b9c feat(dwh): bronze pipeline migrations, runbook, and execution manual
DWH pipeline (new):
  - dwh/261001_dwh_control.sql — watermarks + per-run audit log schema
  - dwh/261002_bronze_constraints_audit.sql — ON CONFLICT key assertion
  - dwh/261003_dwh_roles.sql — dwh_owner / grafana_ro contract assertion
  - dwh/261004_dwh_observability_views.sql — v_table_freshness,
    v_recent_failures, v_watermark_lag (readable by grafana_ro)
  - docs/DWH_PIPELINE.md — operations runbook (setup, troubleshooting,
    manual re-run, back-fill, rotation)
  - DWH_Execution_Manual.md — reusable playbook for future data
    projects (extract → blob → load pattern, 7 design principles,
    snapshot-vs-incremental matrix, verification gates)
  - docs/superpowers/{specs,plans}/2026-04-24-n8n-dwh-bronze-pipeline-*
    — design spec + 27-task implementation plan

Security:
  - dwh/260423_dwh_ddl_v1.sql — redacted plaintext role passwords to
    'CHANGE_ME_BEFORE_APPLY' placeholders; added SECURITY header
    documenting generation + rotation flow

Docs:
  - CLAUDE.md — §3 adds tracksolid_dwh@31.97.44.246:5888 target,
    §4 adds dwh/ + docs/DWH_PIPELINE.md to codebase map, §5 adds
    bronze + dwh_control schema roll-up, §10 adds deploy task +
    password rotation follow-up

Also includes miscellaneous in-progress files accumulated on this
branch (workspace, analytics notes, vehicle CSVs, extract helpers,
renamed markdown archives).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 01:07:53 +03:00
David Kiania
85cb408dea feat(backup): timestamp and schedule in Africa/Nairobi local time
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
- Default TZ=Africa/Nairobi baked into the sidecar image; override via
  compose TZ env var if another region is ever needed.
- Rename BACKUP_TIMES_UTC → BACKUP_TIMES (legacy var still honored for
  back-compat). Times are now interpreted in the container's local TZ,
  so "02:30" means 02:30 EAT, not UTC.
- Log timestamps and dump filenames use %FT%T%z / %Y%m%d_%H%M%S_%Z
  (e.g. tracksolid_db_20260424_115729_EAT.sql.gz) so the TZ is visible
  on every artifact.
- Prune cutoff computed in local time; YYYYMMDD regex unchanged so it
  still matches legacy UTC filenames during the transition.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 11:30:20 +03:00
David Kiania
c585e67482 feat(backup): run pg_dump multiple times per day via BACKUP_TIMES_UTC
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Replace the single BACKUP_HOUR/BACKUP_MINUTE slot with a comma-separated
list of UTC times. Scheduler walks all slots and sleeps until the soonest
future one, so four daily backups become a one-line env change:

    BACKUP_TIMES_UTC=02:30,08:30,14:30,20:30  (default)

Legacy BACKUP_HOUR/BACKUP_MINUTE still honored as a single slot for
backwards compatibility with existing .env files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 11:00:02 +03:00
David Kiania
3807d9554c fix(db): mount TimescaleDB HA volume at correct PGDATA path
The timescale/timescaledb-ha image uses /home/postgres/pgdata/data as
PGDATA, not /var/lib/postgresql/data. The previous mount pointed at an
empty directory that postgres never wrote to, so Coolify redeploys
destroyed all data with the container's overlay filesystem.

Pin PGDATA explicitly and move the named timescale-data volume to
/home/postgres/pgdata so the real data dir is persisted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 10:59:53 +03:00
David Kiania
fa110f4313 feat: [FIX-M19] multi-account ingest across fireside sub-accounts
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Fleet lives across three Tracksolid sub-accounts:
  fireside         —  63 devices
  Fireside@HQ      —  52 devices
  Fireside_MSA     —  41 devices

Previously sync_devices / poll_live_positions / poll_parking only
queried a single TARGET_ACCOUNT, so ~64% of the fleet was invisible to
the pipeline.

Changes:
  - ts_shared_rev.py: new TARGETS list (env TRACKSOLID_TARGETS,
    comma-separated; falls back to the single TARGET_ACCOUNT).
  - ts_shared_rev.py: new get_active_imeis_by_target() helper that
    groups active IMEIs by their stored account so parking calls can
    pass the right account param per batch.
  - ingest_movement_rev.py: sync_devices and poll_live_positions loop
    over every target and dedupe by IMEI before upserting. poll_parking
    loops over imeis_by_target so each batch carries the matching
    account.
  - CLAUDE.md: FIX-M19 entry.

Requires new env var TRACKSOLID_TARGETS="fireside,Fireside@HQ,Fireside_MSA"
on the ingest services in Coolify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 10:43:07 +03:00
David Kiania
417627675e fix: [FIX-M18] pull driverName/vehicleNumber/sim from detail endpoint
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
jimi.user.device.list returns null for vehicleName, vehicleNumber,
driverName, driverPhone, and sim even after those fields are set via
jimi.open.device.update — the values only surface through
jimi.track.device.detail. sync_devices() now reads from dtl first with
d as fallback, which unblocks backfill of the 144 CSV-driven updates
pushed on 2026-04-22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 18:21:25 +03:00
David Kiania
778686e7ce docs: CLAUDE.md audit — add backup sidecar, missing files, update open items
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 16:01:38 +03:00
David Kiania
108c1be057 feat: nightly pg_dump sidecar uploads to rustfs fleet-db bucket
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Adds a `db_backup` sidecar that dumps tracksolid_db every night at
02:30 UTC (configurable via BACKUP_HOUR/BACKUP_MINUTE), gzips the
output, and uploads to s3://fleet-db/daily/<dbname>_<ts>.sql.gz on
the rustfs S3-compatible instance (s3.rahamafresh.com). Prunes
objects older than BACKUP_KEEP_DAYS (default 30).

Required .env additions (Coolify UI):
  RUSTFS_ENDPOINT=https://s3.rahamafresh.com
  RUSTFS_ACCESS_KEY=...
  RUSTFS_SECRET_KEY=...
  RUSTFS_BUCKET=fleet-db

Mitigates data loss when Coolify service recreation wipes the
service-ID-scoped timescale-data volume.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:53:23 +03:00
David Kiania
257643cae2 fix: auto-register devices on push + allow CSV import to insert new rows
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Three changes that together close the FK-violation loop on /pushalarm:

1. import_drivers_csv.py: when an IMEI is in the CSV but not in
   tracksolid.devices, INSERT a new row instead of skipping. Unblocks
   the 140 X3/JC400P devices listed as a HIGH open item in CLAUDE.md §10.

2. webhook_receiver_rev.py: new _ensure_device() helper upserts a stub
   devices row (status='unknown') before inserting an alarm. Handles the
   third class of devices — not in API sync, not in CSV (e.g. the
   X3-63282 Kampala device flagged in CLAUDE.md §10).

3. CSV refreshed from Downloads (Apr 21 version, 140 active rows).

Also fixes alarm error log previously showing "None" (read deviceImei
instead of the integration push's imei field).

CSV import already applied live on the instance (63 → 201 devices).
Webhook patch requires a Coolify redeploy to pick up _ensure_device().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:29:32 +03:00
David Kiania
636dd2b8b0 fix: parse actual Jimi push format (msgType+data, field name remap)
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Diagnostic logging revealed the real Jimi integration push format:
  Content-Type: application/x-www-form-urlencoded
  Body: msgType=jimi.push.device.alarm&data=<URL-encoded JSON>

Differences from docs:
  - data is one JSON object per POST (not a data_list array)
  - alarm uses imei+alarmTime, NOT deviceImei+gateTime

_parse_request now reads form field `data` (falls back to `data_list`) and
JSON-decodes a single object or array. push_alarm handler accepts either
field naming for forward-compat.

Removes diagnostic INFO log now that format is confirmed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:10:08 +03:00
David Kiania
c54794eb4c diag: log raw push body + content-type at INFO level
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Temporary diagnostic to see what format Jimi actually sends on /pushalarm.
New container is parsing to empty items (pushes arrive but no DB insert),
so we need to see the real body shape. Remove once format is confirmed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:04:55 +03:00
David Kiania
ef36ebebea fix: handle JSON body push format from Jimi integration API
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Jimi's integration push API (tracksolidprodocs.jimicloud.com) sends
Content-Type: application/json with body {"token":"...","data_list":[...]},
not form-encoded. FastAPI Form() silently defaulted to "" so all pushes
were discarded with "Failed to parse data_list:" warnings.

Replaces per-endpoint Form() params with a shared _parse_request() helper
that tries JSON body first, falls back to form-encoded. All seven push
endpoints (pushobd, pushfaultinfo, pushalarm, pushgps, pushhb,
pushtripreport, pushevent) updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 11:44:08 +03:00
David Kiania
85d02c81a5 feat: Daily Operations dashboard + tracksolid analytics views
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Add a second Grafana dashboard focused on daily operational KPIs and live
dispatch, keeping the NOC Live dashboard untouched.

- grafana/provisioning/dashboards-json/daily_operations_dashboard.json
  New dashboard covering §7 Blueprint Panels 3-8 and the §4 dispatch lens:
  freshness banner, today-at-a-glance stat row, active vehicles map,
  currently-idle table, vehicles-not-moved-today, per-vehicle daily KPI
  roll-up, driver behaviour leaderboard, distance trend, alarm frequency,
  idle cost MTD, utilisation heatmap, SLA row (collapsed, data-gated).

- 07_analytics_views.sql
  Nine views in tracksolid.* wrapping the BA-file [DASHBOARD]-tagged
  queries. Each view carries COMMENT ON VIEW with its spec section.
  SELECT granted to grafana_ro. Smoke-tested against live DB.

- run_migrations.py
  Register 06 and 07 in MIGRATIONS list with idempotent seed checks so
  future fresh deploys apply them correctly.

- CLAUDE.md
  Retire the tracksolid_2 schema references (schema no longer exists);
  §9 Fleet State dated 2026-04-19 with correct pipeline status (running,
  875 runs/24h, 0 failures) and accurate position_history row counts
  (hypertable stats don't show in pg_stat_user_tables).

- docs/superpowers/specs/2026-04-19-daily-operations-dashboard-design.md
  Design spec covering architecture, views, panel layout, deployment,
  rollback, and known data gaps.
2026-04-19 13:44:18 +03:00
David Kiania
4371a0d6e6 docs: CLAUDE.md audit — add commands section, fix stale DB access note
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Add §0 Commands (uv, pytest, ruff, mypy, docker exec query pattern).
Fix §3 DB access: DATABASE_URL is internal-only since fix 152fce8.
Add docs/superpowers/ to codebase map.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 00:27:59 +03:00
David Kiania
18e7e668c0 docs: fleet intelligence pitch deck copy and one-pager (25 slides + A4 leave-behind)
Full slide-by-slide copy for elicitation pitch: 6 pain questions, feature
reveal, business case, optional add-ons (RustFS + DuckDB), and one-pager.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 17:05:07 +03:00
David Kiania
c6e4a227c8 docs: add blob storage and data warehouse as optional pitch products
RustFS (S3-compatible blob) and DuckDB (historical analytics) added as
optional add-on tiers with elicitation pain questions and tier model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 17:00:39 +03:00
David Kiania
9f4406d863 docs: fleet intelligence partner pitch design (elicitation method)
6-question pain-first pitch structure targeting regional GPS resellers,
with deck architecture, one-pager layout, and partnership model options.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 16:41:04 +03:00
David Kiania
152fce81a8 fix: point DATABASE_URL at timescale_db container (not legacy 31.97.44.246:5888)
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Ingest scripts were connecting to the old tracksolid_2 database instead of
the timescale_db container in this stack. Grafana was already correct
(uses service name timescale_db:5432). Also strip leading space and quotes
from DATABASE_URL and API_BASE_URL so os.getenv() returns clean values.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 15:43:49 +03:00
David Kiania
07ef491695 fix: change DB host port 5888→5433 (5888 already allocated by legacy DB)
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 14:19:20 +03:00
David Kiania
160f477318 infra: expose timescale_db port 5888 for direct pgcli access
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Maps host port 5888 → container port 5432 so the DB can be reached
directly from the MacBook (requires UFW allow 5888/tcp on the server).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 14:14:32 +03:00
David Kiania
244112154a docs: update CLAUDE.md with session learnings (18 Apr 2026)
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
- §3: note tracksolid_2 as live schema, tracksolid as empty target;
  add DB direct access tip (31.97.44.246:5888, leading space in .env)
- §4: add import_drivers_csv.py and migration 06 to codebase map
- §5: document tracksolid_2 live tables with column differences
  (assigned_team vs cost_centre, city vs assigned_city); add ops.*
- §8: add rule 9 (Forgejo API auth via keychain) and rule 10
  (always check active schema before querying)
- §9: update fleet state — pipeline stopped Apr 6, CSV fleet pending,
  0 driver names, 19 stale positions
- §10: replace driver-name manual item with deploy + CSV import tasks

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 12:26:21 +03:00
David Kiania
274473c544 docs: update analytics report with live DB state (18 Apr 2026)
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
- §1: add current deployment state table — 63 devices, 0 driver names,
  5 trips, pipeline stopped 6 Apr (401 token expiry); note tracksolid_2
  vs tracksolid schema split
- §6: status column per question (Ready/Needs data/Blocked) reflecting
  actual DB state; add cost-per-ticket, city drift, odometer rows
- §8: add Step 0 full deployment sequence (git pull → migrations 01-06
  → container rebuild → sync_driver_audit → import_drivers_csv);
  Step 3 updated to reference import script; Step 5 collapsed to pointer
- Footer: db-state stamp and update date

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 08:39:58 +03:00
David Kiania
cebcf74ba2 feat: business analytics expansion + driver CSV import
- 01_BusinessAnalytics.md: add §0 usage tags, §2.4 cost-per-ticket,
  §3.6–3.8 alarm/drift/odometer, §4.4–4.5 dispatch log + SLA metrics,
  §9 fleet readiness scorecard, §10 service-interval forecaster,
  Appendix B threshold calibration guide (773 → 1437 lines)

- 06_business_analytics_migration.sql: schema support for all new
  analytics sections — assigned_city column, dispatch_log table,
  ops schema, service_log, odometer_readings, tickets skeleton,
  vw_service_forecast view

- import_drivers_csv.py: one-shot script to populate driver_name,
  vehicle_number, vehicle_models, cost_centre, assigned_city, sim,
  iccid, imsi from 20260414_FS__Logistics - final_fixed.csv (144 rows);
  dry-run by default, --apply to commit, --only-null for safe additive mode

- 20260414_FS__Logistics - final_fixed.csv: source data committed for
  reproducibility and container exec workflow

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 08:30:34 +03:00
David Kiania
8867be9d3d perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Audit fixes across the ingestion stack:

Observability
- Move log_ingestion out of batch loops in poll_alarms and poll_parking
  (was emitting N cumulative log rows per run instead of one).
- Add missing log_ingestion + t0 to poll_trips.
- Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT
  DO NOTHING no longer inflates the metric.

Resilience
- SAVEPOINT-per-item added to poll_alarms, poll_live_positions,
  poll_trips, poll_parking so one bad row no longer aborts the batch
  (webhook handlers already had this; pollers were inconsistent).

Performance
- /pushgps and poll_track_list now use psycopg2.extras.execute_values
  with ON CONFLICT DO NOTHING — 10-50x write throughput on larger
  batches.
- sync_devices and sync_driver_audit fetch jimi.track.device.detail
  concurrently via ThreadPoolExecutor(max_workers=8), cutting the
  daily registry sync from ~24s to ~3s for an 80-device fleet.
- poll_track_list split into two phases: parallel API fetch (4 workers,
  no DB connection held) then one batched write. Previously the DB
  connection was held across every per-IMEI HTTP call, risking pool
  starvation.

Security
- _validate_token uses hmac.compare_digest for constant-time token
  comparison (closes timing side-channel).
- _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default
  5000) so a pathological push cannot blow memory.

Tests
- Fix test_null_alarm_type_skipped: its INSERT-count assertion was
  catching the ingestion_log insert written by log_ingestion. Filter
  that out so the test checks only data-table inserts.
- Full suite: 66 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 00:33:55 +03:00