Fleet-platform status — week 1 of 5
Greenfield rebuild of the Fireside telematics stack · Solo engineer · Started 2026-05-22
Where we are
All Week-1 and Week-2 work is shipped. Week-3 is 80% done — only ntfy alerts, parity check, and the 7-day soak remain.
Two large scope additions (trip detection backend + UI, multi-vehicle overlay) have landed on top of the original plan,
and the platform was migrated to a fully Coolify-managed deployment two days ago.
Headline metrics
Vehicles live
~144
across 4 Tracksolid subaccounts
Polling cadence
30 s
main poll · 10 m stale sweep
SQL migrations
20
forward-only, all applied
Live URL
JWT login at /login.html
Timeline & pacing
| Phase | Window | Status | Notes |
| P1 — Foundation + live tracking | weeks 1-5 | In progress | Day 6 of 35. Well ahead of plan. |
| P2 — Trips + history + geocoding | weeks 6-8 | Partially front-loaded | Trip detection + reverse-geocoder already shipped in P1. |
| P3 — Operations tooling + cutover | weeks 7-9 | Not started | Push cut-over (this doc's last section) is the entry point. |
| P4 — Driver KPIs + cost allocation | weeks 10-12 | Not started | Depends on P3 driver roster. |
Capabilities live today
Live map
- ~144 vehicles refreshed every 15 s on a light Carto Positron basemap
- Markers tinted by cost-centre always; opacity carries state (moving 1.0 / parked 0.75 / offline grey 0.55)
- White directional arrow on every moving vehicle, zoom-scaled
- Plate-tail label per vehicle, hidden below zoom 11 to declutter
- Hover popup: plate, driver, speed, heading, reverse-geocoded address, age of last fix
- Fireside Group HQ POI (red dot at -1.2409, 36.7288) with permanent label from zoom 9
Filters (multi-select dropdowns)
- Cost centre + assigned city pickers with "All …" default + per-option checkboxes
- Cost-centre options carry a colour swatch matching the marker tint — filter doubles as a live colour legend
- Single selection → server-side filter (counts in
FLEET NOW reflect the filter)
- Multi-selection → client-side narrowing via
setFilter (markers hide, counts stay full for population context)
Trip dock (click any vehicle)
- Slides up from the bottom; day totals header (trips / km / drive·idle·stop minutes)
- Per-trip cards with start→end times, distance, duration, idling minutes, end-reason badge
- Each trip drawn on the map in its own colour from a 12-colour palette; matching swatch on the trip card's left edge
- Click a trip card → animated marker traces the polyline over ~10 s
- Date picker (defaults to today EAT, scrolls back through history)
- CSV download (per-trip rows: date · plate · reporting_time · trip_id · start · end · duration · distance · idling · end_reason)
- ⌘-click a second vehicle → multi-vehicle overlay; routes in distinct colours; dock switches to compact rows + aggregate KPIs
Auth + API
- JWT login at
/login.html (bcrypt + PyJWT, scopes read:fleet / write:ops / admin:fleet)
- Rate limited (60/min dashboards, 30/min CSV)
- All times displayed in EAT (Africa/Nairobi); all storage in UTC; conversion at the serve layer
Architecture
One FastAPI image, three container roles selected by APP_ROLE. Fate isolation is the point — a heavy report in the worker doesn't stall the gateway.
Tracksolid Pro API Browser
│ │
│ poll every 30s │ HTTPS · JWT
▼ ▼
┌─────────┐ ┌─────────┐
│ cron │ │ gateway │
│ (poll) │ │ (HTTP) │
└────┬────┘ └────┬────┘
│ │
└──────────┐ ┌──────────────┘
▼ ▼
┌──────────────┐ LISTEN events_raw_new
│ events.raw │ ──────────────────────┐
└──────┬───────┘ │
│ ▼
▼ ┌────────┐
┌──────────────┐ │ worker │
│ events.parsed│ ◀───────parse────┤(parser)│
└──────┬───────┘ └────────┘
│
▼ project (single writer)
┌────────────────────┐
│ state.live_positions│
│ state.position_history│
└────────────────────┘
│
▼ read
serve.fn_live_view
serve.fn_vehicle_trips
│
▼
Dashboard (browser)
| Role | Container | Workload |
gateway | fleet-platform-gateway | HTTP: push receivers, dashboard read API, JWT issuance, static UI |
worker | fleet-platform-worker | LISTEN events_raw_new → parser → projectors (single-writer) |
cron | fleet-platform-cron | APScheduler: polling (30s/10m), reverse geocoder (30s), SLO worker (60s), contract checker (daily 02:00 UTC) |
Deployment
- Coolify-managed Docker Compose app since 2026-05-27 (was manual
docker run before)
- Compose file:
docker-compose.coolify.yml at repo root
- Forgejo registry + git:
repo.rahamafresh.com/kianiadee/fleet-platform
- Networks: each container is attached to (a) its Coolify project network, (b)
bo3nov… (the DB project's network — where timescale_db alias resolves), (c) gateway also on coolify shared network so Traefik can reach it
- TimescaleDB: separate Coolify project. DB user
postgres, db fleet_platform. Read-only reporting role: reporting_reader
- Domain:
api.rahamafresh.com via Coolify-generated Traefik labels + Let's Encrypt
- Deploy flow:
git push origin main → Coolify UI → Redeploy (auto-deploy webhook not wired yet)
Migration gotcha
When piping SQL migrations to
psql, strip the
-- migrate:down section first (psql ignores the comment marker and runs everything). Use:
awk "/^-- migrate:down/{exit} {print}" db/migrations/NNNN.sql \
| docker exec -i "$PG" psql -U postgres -d fleet_platform -v ON_ERROR_STOP=1
Data model
Layered by purpose, not by feature. Read top-down: events are truth, state is derived.
| Schema | Tables | Purpose |
events | raw · parsed · parser_errors | Immutable log (hypertable). Every push and every poll lands here verbatim before any interpretation. |
state | live_positions · position_history · geocoded_positions | Derived projections. Single-writer (the projector). Rebuildable from events. |
domain | accounts · vehicles · devices | Business entities. Auto-provisioned by the projector on first-sight; CSV/admin edits later. |
serve | fn_live_view · fn_vehicle_trips · helper fns | Read-side SQL functions. Dashboard payloads are computed here, not in Python. |
slo | targets · measurements · v_current_status | SLO-as-data. Worker writes measurements every 60s. UI surface removed at user request; data still populates. |
ops | contract_check_log | Daily Tracksolid contract probe log; drives the contract_drift_days SLO. |
auth | accounts · tokens | JWT issuance + scope. |
API endpoints
| Method | Path | Auth | What it returns |
POST | /api/auth/token | Form login | JWT access + refresh |
GET | /api/views/live | read:fleet | FleetNow counters + GeoJSON of all active vehicles + SLO snapshot |
GET | /api/views/vehicle/{id}/trips?date=YYYY-MM-DD | read:fleet | Per-day trip breakdown (totals + trips[] with paths + stops) |
GET | /api/views/vehicle/{id}/trips.csv?date=YYYY-MM-DD | read:fleet | One row per trip, downloadable |
POST | /push/jimi/{pushgps,pushalarm,pushhb,pushobd,…} | Shared token (form body) | Verbatim INSERT into events.raw + NOTIFY. Receivers built; Tracksolid still pushes to the legacy URL. See push cut-over plan. |
GET | /health/{gateway,worker,cron} | Open | Container + DB liveness |
Migrations history
| # | File | What it adds |
| 01 | init_schemas | Schemas + Postgres extensions (Timescale, PostGIS) |
| 02 | events | events.raw / parsed / parser_errors hypertables + NOTIFY triggers |
| 03 | domain | accounts, vehicles, devices |
| 04 | state_live | live_positions + position_history hypertable |
| 05 | slo | slo.targets / measurements / v_current_status |
| 06 | auth | auth.accounts (bcrypt) + tokens |
| 07–09, 11 | serve_fn_live_view v1→v3 | Dashboard read function — evolved with each UI iteration |
| 08 | live_positions_richer | Added mc_type, mileage, gps_signal, satellites, device_name, pos_type |
| 10 | geocoded_positions | Nominatim cache table |
| 12 | label_short_from_plate | serve._label_short — plate-tail extraction |
| 13 | driver_from_device_name | serve._driver_name — heuristic driver-name parse |
| 14 | real_plates_consolidate | One-shot dedup of plate-equivalent vehicle rows |
| 15-16 | CSV import | Removed — rolled back by mig 17 |
| 17 | rollback_csv_import | Full CSV revert (re-split vehicles, drop CSV cols, restore fn_live_view v3) |
| 18 | ops_contract_check_log | Daily Tracksolid endpoint probe log |
| 19 | fn_vehicle_trips | PL/pgSQL state machine for trip detection |
| 20 | normalize_assigned_city | Data hygiene — collapsed Nairobi/nairobi |
Trip detection algorithm
One server-side function (serve.fn_vehicle_trips(vehicle_id, date_eat)), single forward pass over state.position_history for the day.
Rules
- Reporting time = first
acc_state=1 fix of the day in EAT
- Trip starts at every ACC_ON transition (or first fix if already on/moving)
- Trip ends when ONE of:
- ACC_OFF + stationary (< 5 km/h) for ≥ 5 min →
work_stop
- No new fix for ≥ 5 min (engine assumed off) →
nofix_stop
- Fix gap > 30 min →
long_gap
- End of day's data →
day_end
- Within a trip: ACC_ON + stationary ≥ 5 min logged as an idling segment (no split — engine still running)
- Distance only accumulates when current fix is > 5 km/h (excludes GPS jitter at standstill)
- Fallback: when
acc_state is null across the day (some Tracksolid devices don't expose it), algorithm degrades to speed-only segmentation; data_quality.has_acc_data: false flagged in the response
Calibration vs legacy DB
| Vehicle | Pattern | Legacy trips | New algo | Verdict |
| KDE 638J | full day, clean reporting | 15 | 15 | Perfect alignment |
| KDK 728K | half day, noisy stop-and-go | 33 | 9 | Cleaner — legacy over-segments traffic stops |
| KMGW 538W | half day | 20 | 8 | Legacy splits on sub-minute gaps |
| KDB 585E | busy day, many short trips | 21 | 18 | Close — most boundaries match |
| KDV 683Z | moderate | 13 | 7 | Same pattern as 538W |
5-min thresholds (work stop + no-fix stop) locked in. Sim tool at scripts/simulate_trips_from_legacy.py replays any legacy JSON dump through the algorithm offline.
Known issues & follow-ups
| Issue | Severity | Status | Notes |
| Polyline straight-line artifacts between fixes |
Visual |
Mitigated |
Dropped polling 60s→30s. Permanent fix is push cut-over (denser stream) or map-matching (OSRM/Valhalla) — both deferred |
APP_GIT_SHA shows unknown in containers |
Cosmetic |
Open |
Coolify isn't injecting SOURCE_COMMIT; need compose tweak |
Some vehicles report has_acc_data=false |
Data quality |
Accepted |
Algorithm falls back to speed-only detection; flagged in response |
state.position_history has no (imei, occurred_at) unique constraint |
Latent |
Address before push cut-over |
Bites only if push + polling overlap; needed for ingest idempotency. See push plan |
| Auto-deploy webhook not wired Forgejo → Coolify |
DX |
Open |
Manual "Redeploy" click required after each push |
Roadmap
P1 — remaining
| # | Item | Status | Effort |
| 05 | Coolify rollback smoke test | Pending | ~1 h |
| 14 | ntfy.sh container + SLO breach alerts | Pending | ~half day |
| 16 | parity_check.py vs legacy DB | Pending | ~half day |
| 17 | 7-day soak + dispatcher sign-off | Pending | 7 days calendar |
P2 — next
- History page (per-vehicle timeline)
- Routes page (per-trip detail + KPIs)
- Geofence ingest + entry/exit events
- Trips/idling/stops projector (materialized — currently on-demand)
P3 — operations + cut-over
- Push receiver cut-over (see next section)
- Driver roster (
domain.drivers + domain.driver_assignments with effective dates)
- Device lifecycle admin UI
- Alarm console
- Legacy decommission
P4 — driver KPIs + cost allocation
- Driver scorecards, shift attribution
- Fuel ingest from existing WhatsApp microservice
- Cost allocation by cost-centre
- Executive summary dashboards
Push-receiver cut-over plan (P3)
Currently Tracksolid posts to a legacy endpoint at https://tshook.rahamafresh.com/pushalarm (a separate project that no longer benefits us). We poll every 30 s as a workaround. The cut-over moves us off polling and onto real-time push.
Why it's worth doing
| Today (polling) | After (push) |
| ~30 s minimum lag from event to dashboard | ~1-5 s (push is event-driven) |
| ~1 fix/min/vehicle when stationary, ~2/min when moving | ~5-15 s between fixes on motion; immediate for ACC/alarm events |
| Polyline cuts straight across roads (low-density fixes) | Polyline traces actual movement (high-density fixes) |
| Alarms (ACC ON/OFF, SOS, geofence) buried in the polled snapshot | Alarms arrive as their own typed events instantly |
| ~4 Tracksolid API calls every 30 s = 11,520/day | Zero outbound API calls for the main fix stream |
What's already in place
- 7 push receivers wired at
/push/jimi/{pushgps,pushalarm,pushhb,pushobd,pushfaultinfo,pushtripreport,pushevent}
- Shared-token auth via
TRACKSOLID_PUSH_TOKEN in form body (matches Tracksolid's documented push pattern)
- Gateway contract honoured: form parse + token verify + INSERT
events.raw + NOTIFY + return {code:0, msg:"success"}. Nothing else.
- Parsers for 4 of 7 types:
pushgps, pushalarm, pushhb, pushevent (the other 3 ingest to events.raw but aren't parsed yet — fine, can parse later)
- Rate limit 1000/min per endpoint via
slowapi
What needs to happen (in order)
- Add dedup to
state.position_history — migration 21. Add a unique index on (imei, occurred_at, source) (or insert with ON CONFLICT DO NOTHING). Without this, push + polling overlap will duplicate fixes during the mirror window and inflate trip distances.
- Synthetic-payload smoke test — curl a realistic Tracksolid push body at each receiver, confirm
events.raw row appears, parser produces an events.parsed row, projector updates state.live_positions. Validates the path end-to-end before depending on real traffic.
- Tracksolid console: add the new URL alongside the legacy URL — this is a vendor-portal step, done by whoever manages the Fireside Tracksolid account. The exact URL list to paste:
https://api.rahamafresh.com/push/jimi/pushgps
https://api.rahamafresh.com/push/jimi/pushalarm
https://api.rahamafresh.com/push/jimi/pushhb
https://api.rahamafresh.com/push/jimi/pushevent
https://api.rahamafresh.com/push/jimi/pushobd
https://api.rahamafresh.com/push/jimi/pushfaultinfo
https://api.rahamafresh.com/push/jimi/pushtripreport
Token: the value of TRACKSOLID_PUSH_TOKEN (set in Coolify env).
- Mirror window (≥3 days) — both push and polling run. Compare daily counts per IMEI between push-derived and poll-derived fixes. Watch for: parser errors, auth failures, payload-shape surprises, dedup hit rate.
- Cut polling cadence — once mirror data shows push is delivering >95% of fixes, drop main polling from 30 s → 10 min as a sparse safety net (or disable entirely). Keep the stale-IMEI sweep for offline-recovery.
- Tracksolid console: remove legacy URL — once dispatchers confirm the new dashboard is showing identical or better real-time data, drop the legacy URL from Tracksolid. Hot-standby on our side for 48 h as fallback.
- Decommission legacy receiver project — final step; the old project at
tshook.rahamafresh.com can be shut down.
What needs decisions before starting
- Tracksolid admin access — who edits the push URL list, what's their lead time
- Mirror duration — 3 days (faster ship), 7 days per PRD (full week), or indefinite (no commit to a cut-over)
- Polling fate after push — disable entirely / keep at 5-min sparse / keep at 30 s belt-and-braces
- Auth scheme check — current impl uses form-body
token; PRD specifies HMAC X-Jimi-Signature. Either the PRD spec is aspirational and Tracksolid only offers shared-token (likely), OR Tracksolid does offer HMAC and we should switch. Verify before the mirror starts.
Expected outcomes
- Dashboard latency drops from ~30 s to ~5 s for live position updates
- Trip polylines visually trace real routes (no more straight-line shortcuts)
- Alarm and ACC events become first-class instead of derived from polling snapshots
- Trip detection becomes more accurate (denser fix stream, fewer false
nofix_stop boundaries)
- Tracksolid API call budget freed up for the contract checker + ad-hoc queries
Decisions log (significant ones)
| Date | Decision | Rationale |
| 2026-05-22 | Greenfield rebuild, no legacy reuse | Branch divergence + race conditions in legacy made incremental patching unviable |
| 2026-05-23 | Three container roles from one image | Fate isolation without microservices overhead |
| 2026-05-24 | CSV roster import | To enrich devices with real plates/drivers/cost-centres |
| 2026-05-25 | CSV import fully rolled back | Suffix-merge regression dropped vehicle count 144 → 124; underlying merge problem must be solved before any retry |
| 2026-05-26 | 5-min thresholds (work stop, no-fix stop) | Calibrated against 5 legacy report dumps; matches dispatcher mental model on clean data, cleaner on noisy data |
| 2026-05-27 | Migrate from manual docker run → Coolify Compose | Ad-hoc deploys were brittle; needed permanent infrastructure |
| 2026-05-27 | Polling 60 s → 30 s | Mitigation for sparse polyline artifacts pending push cut-over |
| 2026-05-27 | Remove SLO panel from top bar | User pref — backend still computes, UI just hides |
| 2026-05-27 | Light Carto Positron basemap + HQ POI | Higher contrast for cost-centre marker tints; reference landmark |
| 2026-05-27 | Per-trip colour coding in single-vehicle mode | Trip cards ↔ map polylines pair visually at a glance |
— end —