Fleet-platform status — week 1 of 5

Greenfield rebuild of the Fireside telematics stack · Solo engineer · Started 2026-05-22

Where we are
All Week-1 and Week-2 work is shipped. Week-3 is 80% done — only ntfy alerts, parity check, and the 7-day soak remain. Two large scope additions (trip detection backend + UI, multi-vehicle overlay) have landed on top of the original plan, and the platform was migrated to a fully Coolify-managed deployment two days ago.

Headline metrics

Vehicles live
~144
across 4 Tracksolid subaccounts
Polling cadence
30 s
main poll · 10 m stale sweep
SQL migrations
20
forward-only, all applied
Live URL
JWT login at /login.html

Timeline & pacing

PhaseWindowStatusNotes
P1 — Foundation + live trackingweeks 1-5In progressDay 6 of 35. Well ahead of plan.
P2 — Trips + history + geocodingweeks 6-8Partially front-loadedTrip detection + reverse-geocoder already shipped in P1.
P3 — Operations tooling + cutoverweeks 7-9Not startedPush cut-over (this doc's last section) is the entry point.
P4 — Driver KPIs + cost allocationweeks 10-12Not startedDepends on P3 driver roster.

Capabilities live today

Live map

Filters (multi-select dropdowns)

Trip dock (click any vehicle)

Auth + API

Architecture

One FastAPI image, three container roles selected by APP_ROLE. Fate isolation is the point — a heavy report in the worker doesn't stall the gateway.

Tracksolid Pro API Browser │ │ │ poll every 30s │ HTTPS · JWT ▼ ▼ ┌─────────┐ ┌─────────┐ │ cron │ │ gateway │ │ (poll) │ │ (HTTP) │ └────┬────┘ └────┬────┘ │ │ └──────────┐ ┌──────────────┘ ▼ ▼ ┌──────────────┐ LISTEN events_raw_new │ events.raw │ ──────────────────────┐ └──────┬───────┘ │ │ ▼ ▼ ┌────────┐ ┌──────────────┐ │ worker │ │ events.parsed│ ◀───────parse────┤(parser)│ └──────┬───────┘ └────────┘ │ ▼ project (single writer) ┌────────────────────┐ │ state.live_positions│ │ state.position_history│ └────────────────────┘ │ ▼ read serve.fn_live_view serve.fn_vehicle_trips │ ▼ Dashboard (browser)
RoleContainerWorkload
gatewayfleet-platform-gatewayHTTP: push receivers, dashboard read API, JWT issuance, static UI
workerfleet-platform-workerLISTEN events_raw_new → parser → projectors (single-writer)
cronfleet-platform-cronAPScheduler: polling (30s/10m), reverse geocoder (30s), SLO worker (60s), contract checker (daily 02:00 UTC)

Deployment

Migration gotcha
When piping SQL migrations to psql, strip the -- migrate:down section first (psql ignores the comment marker and runs everything). Use:
awk "/^-- migrate:down/{exit} {print}" db/migrations/NNNN.sql \
    | docker exec -i "$PG" psql -U postgres -d fleet_platform -v ON_ERROR_STOP=1

Data model

Layered by purpose, not by feature. Read top-down: events are truth, state is derived.

SchemaTablesPurpose
eventsraw · parsed · parser_errorsImmutable log (hypertable). Every push and every poll lands here verbatim before any interpretation.
statelive_positions · position_history · geocoded_positionsDerived projections. Single-writer (the projector). Rebuildable from events.
domainaccounts · vehicles · devicesBusiness entities. Auto-provisioned by the projector on first-sight; CSV/admin edits later.
servefn_live_view · fn_vehicle_trips · helper fnsRead-side SQL functions. Dashboard payloads are computed here, not in Python.
slotargets · measurements · v_current_statusSLO-as-data. Worker writes measurements every 60s. UI surface removed at user request; data still populates.
opscontract_check_logDaily Tracksolid contract probe log; drives the contract_drift_days SLO.
authaccounts · tokensJWT issuance + scope.

API endpoints

MethodPathAuthWhat it returns
POST/api/auth/tokenForm loginJWT access + refresh
GET/api/views/liveread:fleetFleetNow counters + GeoJSON of all active vehicles + SLO snapshot
GET/api/views/vehicle/{id}/trips?date=YYYY-MM-DDread:fleetPer-day trip breakdown (totals + trips[] with paths + stops)
GET/api/views/vehicle/{id}/trips.csv?date=YYYY-MM-DDread:fleetOne row per trip, downloadable
POST/push/jimi/{pushgps,pushalarm,pushhb,pushobd,…}Shared token (form body)Verbatim INSERT into events.raw + NOTIFY. Receivers built; Tracksolid still pushes to the legacy URL. See push cut-over plan.
GET/health/{gateway,worker,cron}OpenContainer + DB liveness

Migrations history

#FileWhat it adds
01init_schemasSchemas + Postgres extensions (Timescale, PostGIS)
02eventsevents.raw / parsed / parser_errors hypertables + NOTIFY triggers
03domainaccounts, vehicles, devices
04state_livelive_positions + position_history hypertable
05sloslo.targets / measurements / v_current_status
06authauth.accounts (bcrypt) + tokens
07–09, 11serve_fn_live_view v1→v3Dashboard read function — evolved with each UI iteration
08live_positions_richerAdded mc_type, mileage, gps_signal, satellites, device_name, pos_type
10geocoded_positionsNominatim cache table
12label_short_from_plateserve._label_short — plate-tail extraction
13driver_from_device_nameserve._driver_name — heuristic driver-name parse
14real_plates_consolidateOne-shot dedup of plate-equivalent vehicle rows
15-16CSV importRemoved — rolled back by mig 17
17rollback_csv_importFull CSV revert (re-split vehicles, drop CSV cols, restore fn_live_view v3)
18ops_contract_check_logDaily Tracksolid endpoint probe log
19fn_vehicle_tripsPL/pgSQL state machine for trip detection
20normalize_assigned_cityData hygiene — collapsed Nairobi/nairobi

Trip detection algorithm

One server-side function (serve.fn_vehicle_trips(vehicle_id, date_eat)), single forward pass over state.position_history for the day.

Rules

Calibration vs legacy DB

VehiclePatternLegacy tripsNew algoVerdict
KDE 638Jfull day, clean reporting1515Perfect alignment
KDK 728Khalf day, noisy stop-and-go339Cleaner — legacy over-segments traffic stops
KMGW 538Whalf day208Legacy splits on sub-minute gaps
KDB 585Ebusy day, many short trips2118Close — most boundaries match
KDV 683Zmoderate137Same pattern as 538W

5-min thresholds (work stop + no-fix stop) locked in. Sim tool at scripts/simulate_trips_from_legacy.py replays any legacy JSON dump through the algorithm offline.

Known issues & follow-ups

IssueSeverityStatusNotes
Polyline straight-line artifacts between fixes Visual Mitigated Dropped polling 60s→30s. Permanent fix is push cut-over (denser stream) or map-matching (OSRM/Valhalla) — both deferred
APP_GIT_SHA shows unknown in containers Cosmetic Open Coolify isn't injecting SOURCE_COMMIT; need compose tweak
Some vehicles report has_acc_data=false Data quality Accepted Algorithm falls back to speed-only detection; flagged in response
state.position_history has no (imei, occurred_at) unique constraint Latent Address before push cut-over Bites only if push + polling overlap; needed for ingest idempotency. See push plan
Auto-deploy webhook not wired Forgejo → Coolify DX Open Manual "Redeploy" click required after each push

Roadmap

P1 — remaining

#ItemStatusEffort
05Coolify rollback smoke testPending~1 h
14ntfy.sh container + SLO breach alertsPending~half day
16parity_check.py vs legacy DBPending~half day
177-day soak + dispatcher sign-offPending7 days calendar

P2 — next

P3 — operations + cut-over

P4 — driver KPIs + cost allocation

Push-receiver cut-over plan (P3)

Currently Tracksolid posts to a legacy endpoint at https://tshook.rahamafresh.com/pushalarm (a separate project that no longer benefits us). We poll every 30 s as a workaround. The cut-over moves us off polling and onto real-time push.

Why it's worth doing

Today (polling)After (push)
~30 s minimum lag from event to dashboard~1-5 s (push is event-driven)
~1 fix/min/vehicle when stationary, ~2/min when moving~5-15 s between fixes on motion; immediate for ACC/alarm events
Polyline cuts straight across roads (low-density fixes)Polyline traces actual movement (high-density fixes)
Alarms (ACC ON/OFF, SOS, geofence) buried in the polled snapshotAlarms arrive as their own typed events instantly
~4 Tracksolid API calls every 30 s = 11,520/dayZero outbound API calls for the main fix stream

What's already in place

What needs to happen (in order)

  1. Add dedup to state.position_history — migration 21. Add a unique index on (imei, occurred_at, source) (or insert with ON CONFLICT DO NOTHING). Without this, push + polling overlap will duplicate fixes during the mirror window and inflate trip distances.
  2. Synthetic-payload smoke test — curl a realistic Tracksolid push body at each receiver, confirm events.raw row appears, parser produces an events.parsed row, projector updates state.live_positions. Validates the path end-to-end before depending on real traffic.
  3. Tracksolid console: add the new URL alongside the legacy URL — this is a vendor-portal step, done by whoever manages the Fireside Tracksolid account. The exact URL list to paste:
    https://api.rahamafresh.com/push/jimi/pushgps
    https://api.rahamafresh.com/push/jimi/pushalarm
    https://api.rahamafresh.com/push/jimi/pushhb
    https://api.rahamafresh.com/push/jimi/pushevent
    https://api.rahamafresh.com/push/jimi/pushobd
    https://api.rahamafresh.com/push/jimi/pushfaultinfo
    https://api.rahamafresh.com/push/jimi/pushtripreport
    Token: the value of TRACKSOLID_PUSH_TOKEN (set in Coolify env).
  4. Mirror window (≥3 days) — both push and polling run. Compare daily counts per IMEI between push-derived and poll-derived fixes. Watch for: parser errors, auth failures, payload-shape surprises, dedup hit rate.
  5. Cut polling cadence — once mirror data shows push is delivering >95% of fixes, drop main polling from 30 s → 10 min as a sparse safety net (or disable entirely). Keep the stale-IMEI sweep for offline-recovery.
  6. Tracksolid console: remove legacy URL — once dispatchers confirm the new dashboard is showing identical or better real-time data, drop the legacy URL from Tracksolid. Hot-standby on our side for 48 h as fallback.
  7. Decommission legacy receiver project — final step; the old project at tshook.rahamafresh.com can be shut down.

What needs decisions before starting

Expected outcomes

Decisions log (significant ones)

DateDecisionRationale
2026-05-22Greenfield rebuild, no legacy reuseBranch divergence + race conditions in legacy made incremental patching unviable
2026-05-23Three container roles from one imageFate isolation without microservices overhead
2026-05-24CSV roster importTo enrich devices with real plates/drivers/cost-centres
2026-05-25CSV import fully rolled backSuffix-merge regression dropped vehicle count 144 → 124; underlying merge problem must be solved before any retry
2026-05-265-min thresholds (work stop, no-fix stop)Calibrated against 5 legacy report dumps; matches dispatcher mental model on clean data, cleaner on noisy data
2026-05-27Migrate from manual docker run → Coolify ComposeAd-hoc deploys were brittle; needed permanent infrastructure
2026-05-27Polling 60 s → 30 sMitigation for sparse polyline artifacts pending push cut-over
2026-05-27Remove SLO panel from top barUser pref — backend still computes, UI just hides
2026-05-27Light Carto Positron basemap + HQ POIHigher contrast for cost-centre marker tints; reference landmark
2026-05-27Per-trip colour coding in single-vehicle modeTrip cards ↔ map polylines pair visually at a glance

— end —