fleet-platform/260522_fleet_platform_prd_final.md

730 lines
84 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Product Requirements Document — Fleet Telematics Platform (Rebuild)
| **Document** | PRD v1.1 |
| **Date** | 2026-05-22 (rev. b — incorporates engineering review of 2026-05-22) |
| **Status** | Draft for approval |
| **Author posture** | Senior product manager ||
---
## 0. How to read this document
This PRD describes **what** the fleet platform must do for the business and **why**, in product terms. It pairs with the architecture document which describes **how** engineering will build it. PRD phases are organised by **user-visible value delivered**, not by engineering layers. Engineering's 12-week rollout (Phases AG in the architecture doc) is the build sequence that *produces* the product phases described here; the mapping is given in §13.
A reader should be able to:
- Approve or reject the scope of any phase without reading the architecture doc.
- Trace any requirement to a user persona and a measurable success criterion.
- Identify what is **deliberately out of scope** so we don't relitigate it later.
---
## 1. Executive summary
We operate a fleet of ~80 vehicles across Kenya and Uganda, instrumented with Jimicloud/Tracksolid trackers (GT06E, X3, AT4 families) and JC400P dashcams — roughly 180 devices in total. The current platform, built incrementally across three Git repositories over roughly twelve months, has reached the limits of incremental patching: data races, contract-drift surprises, silent data loss, branch divergence between production and `main`, and dashboards whose business logic is encoded in 1,400-line HTML files.
The product is sound. The architecture is the problem.
This PRD proposes a **greenfield rebuild** of the platform on a unified architecture (one repo, one service, one database, one branch, image-tag deploys, event-sourced ingest, explicit SLOs, lifecycle-aware device state). Built in four product phases over approximately nine to ten weeks, the new platform delivers feature parity with today's system by Phase 3 and unlocks driver KPIs, behaviour scoring, service-cycle tracking, and finance/executive intelligence in Phase 4.
The business case rests on three pillars:
1. **Operational reliability** — SLO-driven monitoring replaces the current pattern of "find out from a dispatcher that the dashboard is wrong". Mean time to detect a contract drift drops from ~90 days to <24 hours.
2. **Development velocity** adding a new dashboard, new ingest source, or new business rule becomes a one-day change in one place, not a three-process, three-deploy rebuild.
3. **Driver and service intelligence** first-class shift reporting (ACC sign-on / sign-off with geocoded location), driver-behaviour scoring, and km-based service-cycle tracking, all of which the current platform structurally cannot provide.
The rebuild does **not** require buying new hardware, changing telematics vendors, retraining operators, or pausing current operations. The current stack runs in parallel during build and for 48 hours after cutover as hot-standby.
**Scope boundary — routing and ticket-driven dispatch.** Suggest-route capabilities and ticket-allocation (vehicle-to-ServiceNow-ticket) are **explicitly out of scope of this PRD** and are tracked as a separate companion project. The reasoning: those capabilities are driven by external inputs (client tickets raised in ServiceNow) and constitute a distinct product domain ticket lifecycle, allocation rules, and route optimisation that deserves its own PRD, its own stakeholders, and its own technology choices. This platform provides the telematics foundation that any future routing project will consume; it does not deliver routing itself.
---
## 2. Background and problem statement
### 2.1 What we have today
- **Source-of-truth telematics:** Jimicloud Pro / Tracksolid Pro APIs, multi-account (TARGETS env var holds N sub-account credentials).
- **Ingest path:** Three Python containers (`webhook_receiver`, `ingest_movement`, `ingest_events`) plus n8n workflows, writing to a TimescaleDB instance with PostGIS.
- **Serving path:** Two static HTML dashboards (live + historical) talking to the database via n8n-proxied endpoints; a Grafana instance for ad-hoc operational queries.
- **Deployment:** Coolify on a single VPS, Traefik termination, rustfs as the static-asset and backup target.
- **Production branch:** `quality-program-2026-04-12`. `main` exists but has drifted. The most recent FIX-M21 had to be cherry-picked manually.
### 2.2 What hurts
The current platform has been kept healthy by a stream of point fixes labelled FIX-M01 through FIX-M21 (and FIX-E01 through FIX-E06 on the alarm side). These fixes share recurring patterns:
| Pattern | Examples |
|---|---|
| Upstream API field rename caught months late | FIX-E06 (`alarmType` `alertTypeId`), FIX-M11 (distance unit drift) |
| Race conditions across multiple writers to the same "current state" table | FIX-M21 (time-guarded upsert for cross-feed + sweep + rescue) |
| Bad data slipping past at ingest time | FIX-M03 (zero-island fixes), FIX-M12 (BCD timestamps) |
| Dedup logic rewritten in production six times | Rounds 16 of `reporting.v_live_positions` |
| Synchronous external calls in the write path | Nominatim reverse-geocode inside `poll_trips` |
| "Just one more flag" in queries | `WHERE enabled_flag=1` repeated ~40 times across SQL |
These are not bugs. They are **categories** of latent failure that the architecture invites. A fresh codebase on the same architecture will, on a long enough timeline, produce the same categories.
### 2.3 What we want instead
A platform where:
- Every signal is captured immutably before it is interpreted, so contract drift becomes a re-parse, not a data loss.
- There is one writer per state table, so write-path race conditions cannot occur.
- SLO thresholds are data, not constants scattered across SQL and JS.
- Lifecycle state (provisioned / active / suspended / decommissioned) is separate from operational state (moving / parked / offline).
- Dashboards are render-only; business logic lives in the API.
- "What's in production?" is answerable from `docker inspect`, not by guessing which branch was deployed.
---
## 3. Goals, non-goals, and guiding principles
### 3.1 Product goals
| # | Goal | Measure |
|---|---|---|
| G1 | Reduce mean time to detect upstream contract drift | From ~90 days today to <24 hours |
| G2 | Eliminate write-path race conditions on current-state tables | Zero retroactive guards added after Phase 2 |
| G3 | Achieve and monitor an explicit fix-freshness SLO | 95% of active devices have a fix within 90 s, measured continuously |
| G4 | Reduce time to add a new dashboard | From ~5 days today to <1 day |
| G5 | Reduce time to add a new ingest source | From ~3 days today to <0.5 day |
| G6 | Make "what's in production?" trivially answerable | One command returns the running image SHA |
| G7 | Separate lifecycle from operational state in all dashboards | Decommissioned devices do not appear in operational views |
| G8 | Keep the driver roster current as a first-class operational tool | 95% of active vehicles have a non-null currently-assigned driver; reassignment-for-leave flow used in production without spreadsheet fallback |
| G9 | Surface vehicles due for service before they become overdue | Service-due dashboard is the canonical "what's due this week" source; service spreadsheet retired |
| G10 | Make driver KPIs (shift start/end + behaviour) data-driven, not anecdote-driven | Per-driver shift records with geocoded start/end, behaviour score published weekly, used in at least one operational decision per ops manager per month |
### 3.2 Non-goals
The following are explicitly **out of scope** for this rebuild. Each can be a future initiative; none gates Phase 4.
- Replacing Jimicloud / Tracksolid as the telematics provider.
- Replacing the hardware (GT06E, X3, AT4, JC400P).
- Multi-region deployment or HA across regions.
- A native mobile app (the dashboards are responsive web; native is a separate product).
- A customer-facing tracking portal (current platform is internal-ops only; that stays).
- Migration of historical raw data beyond what is required for analytics continuity (we will migrate the last 90 days of parsed events; older data stays in the legacy DB as a read-only archive).
- Real-time video streaming from the JC400P cameras (out of scope; current platform does not do this either).
- Driver-side mobile app for trip-tagging, dispatch acknowledgement, or proof-of-delivery.
- Customer-facing APIs (the platform's API is internal; partner integrations are a future phase).
- **Routing, route suggestion, ETA prediction, multi-stop optimisation.** These are scoped as a separate companion project because they are downstream of ticket allocation, which is itself downstream of ServiceNow tickets raised by external clients. The companion project owns the ticket-lifecycle integration, the allocation policy, the routing engine choice (pgRouting, OSRM, Valhalla, or third-party API), and the dispatch UX. This platform's responsibility is to expose the telematics primitives that project will consume (live vehicle position, vehicle assignment to driver, segment-of-road observed speeds if needed) not to ship routing itself.
- **Ticket-driven dispatch / ServiceNow integration.** Same reasoning as above. The dispatch-decision audit trail in this platform (if retained at all see §14 Q16) is a thin local log, not the dispatch system. The companion project will define how tickets, allocation, dispatch state, and proof of completion flow.
### 3.3 Guiding principles
These principles inform every requirement that follows.
1. **Event log first, derived state second.** No write path mutates "current state" directly. Every signal lands in an immutable log; projectors derive state from it.
2. **Contracts are typed and verified.** Pydantic models for every endpoint, daily contract-check against sandbox upstreams.
3. **One codebase, one database, one repo — fate-isolated runtime roles.** Microservices are a scaling answer for organisations we don't have. But running gateway + workers + cron in a single Python process is a fate-sharing failure mode we don't need. The platform is one codebase that runs in three container roles (gateway / worker / cron) from the same image. Architecturally one service; operationally three containers that can fail independently.
4. **Lifecycle ≠ operational state.** Devices have a lifecycle (provisioned/active/suspended/decommissioned); vehicles have an operational state (moving/parked/offline/unknown). They are computed and stored separately.
5. **SLOs are first-class data.** Thresholds live in `slo.targets`, not in constants.
6. **Image-tag deploys, not branch deploys.** CI builds and tags; production runs a specific tag.
7. **Dashboards are thin renderers.** All business logic server-side; the JS draws what the API tells it to draw.
8. **External enrichment is async.** Nominatim, Mapbox, future enrichers never block ingest. Internal stage-to-stage continuation is event-driven (`LISTEN/NOTIFY`), not poll-driven, so we don't manufacture our own delay budget.
9. **Secure by default.** Every endpoint requires authentication from day one. Public-read access to operational data even "just the live map" is not a posture the rebuilt platform inherits from the legacy system.
---
## 4. Target users and personas
The platform serves five primary personas. Each persona's needs drive specific phases.
### 4.1 Dispatcher — "Where are my vehicles right now?"
- **Day-in-the-life:** Monitors a wall of live vehicle positions. Coordinates drivers around incidents via phone/WhatsApp. Confirms arrival at customer sites. Handles "the customer says the truck never showed up" calls.
- **Tools used today:** `live.rahamafresh.com` (live dashboard), Slack, phone, WhatsApp with drivers.
- **Pain points today:** OFFLINE markers that turn out to be "device powered off two weeks ago, decommissioned", not "in trouble right now". Limited ability to coordinate from a single screen relies on phone for the actual dispatch conversation.
- **Phase that primarily serves them:** Phase 1 (live tracking), Phase 2 (historical playback for retrospective queries). Suggest-route tooling is delivered by the separate companion project.
- **Key KPIs:** Fix freshness, time-to-detect device anomaly, accuracy of vehicle-at-site confirmation.
### 4.2 Operations manager — "Is the fleet healthy this week?"
- **Day-in-the-life:** Reviews weekly fleet performance. Identifies vehicles with degraded telematics. Triages device replacements. Reconciles trips against driver timesheets and customer deliveries.
- **Tools used today:** Historical dashboard (`fleetintelligence.rahamafresh.com`), Grafana, ad-hoc CSV exports.
- **Pain points today:** No SLO view ("am I above or below promise?"). The "OFFLINE 24h+" KPI mixes decommissioned vehicles with broken ones. Trip reports are slow to query.
- **Phase that primarily serves them:** Phase 2 (historical + trip analytics), Phase 3 (operations tooling).
- **Key KPIs:** Trip ingest lag, SLO attainment, decommissioned-device hygiene.
### 4.3 Fleet administrator — "Provision, retire, reassign, service."
The fleet administrator owns three running rosters: **devices**, **driver assignments**, and **service schedule**. These three rosters are the operational backbone of the fleet team's week.
- **Day-in-the-life:**
- *Devices.* Activates new devices, suspends devices for non-payment, decommissions retired vehicles, reconciles invoices against active devices.
- *Driver assignments.* Assigns a primary driver per vehicle (drawing from the HR-synced driver list names, phones, and employment status auto-arrive). Reassigns vehicles when a driver is on leave, off sick, on training, or otherwise off-roster. Maintains a historical log of who drove what when (for incident attribution and HR reconciliation).
- *Service schedule.* Tracks distance covered per vehicle against the service-interval policy (default 5,000 km between services). Flags vehicles due or overdue for service. Records completed services so the running total resets. The km figure auto-corrects from driver-submitted odometer readings at each fuel submission, so the running total is more accurate than the GPS-only figure alone.
- **Tools used today:** Spreadsheet for driver-to-vehicle mapping (frequently stale). Separate spreadsheet for service log (km readings entered manually from odometer photos). Occasionally direct DB access. Tracksolid web console for device-side admin.
- **Pain points today:**
- `devices.enabled_flag=1` is set everywhere and nowhere; the lifecycle is implicit.
- Driver-to-vehicle mapping lives in a sheet when a driver goes on leave, the sheet is rarely updated, and trips end up attributed to the wrong person or to nobody.
- Service tracking is reactive: vehicles get serviced when something fails or when someone notices the odometer photo looks high, not on a planned schedule.
- Driver phone numbers and status (terminated, on leave) drift between HR's record and operational reality the platform doesn't see HR changes until someone manually copies them across.
- **Phase that primarily serves them:** Phase 3 (device admin + driver-roster UI + service-due dashboard).
- **Key KPIs:** Time to provision a new device, time to complete a driver reassignment, percentage of trips attributed to a named driver, vehicles serviced on-schedule vs late.
### 4.4 Finance / cost-centre owner — "What did this cost?"
- **Day-in-the-life:** Allocates fleet costs by cost centre, by assigned city, by customer. Reviews fuel anomalies, idle time, after-hours usage.
- **Tools used today:** CSV exports, manual spreadsheet work.
- **Pain points today:** Cost-centre tagging is partial. After-hours-usage queries require a SQL specialist. Fuel/temperature data is captured but not surfaced.
- **Phase that primarily serves them:** Phase 4 (analytics, fuel/temperature surfaces, cost-centre allocation views).
- **Key KPIs:** Cost-allocation completeness, anomaly detection lead time.
### 4.5 Executive / sponsor — "Is the platform delivering?"
- **Day-in-the-life:** Monthly review of fleet performance. Capex decisions on devices and vehicles. Quarterly conversations with the telematics vendor.
- **Tools used today:** Manual decks built from spreadsheets.
- **Pain points today:** Numbers are hand-curated and slow. No durable executive view.
- **Phase that primarily serves them:** Phase 4 (executive summary view, SLO attainment dashboard).
- **Key KPIs:** Monthly fleet uptime, cost-per-km trend, incident count.
### 4.6 Secondary actors
- **Drivers** are *subjects* of the platform, not users. They interact via the vehicle and via voice/chat with dispatch; no driver-facing app is in scope.
- **The telematics vendor (Jimicloud)** is an upstream system, not a user. We track their API contracts via the contract checker.
- **Customers** are not platform users. Customer-facing tracking is out of scope.
---
## 5. Phases overview
The product is delivered in **four phases over approximately nine to ten weeks**, each phase shipping a self-contained increment of value. Each phase has a hard deliverable, a measurable success criterion, and a defined scope cut. No phase blocks production usage of earlier phases.
| Phase | Theme | Weeks | Primary persona | Headline deliverable |
|---|---|---|---|---|
| **P1** | Foundation + live tracking | 13 | Dispatcher | A working live-position dashboard, deployed, on the new architecture, against a parallel data source |
| **P2** | Historical + trip analytics | 46 | Operations manager | Historical playback + trip reports with feature parity for a chosen 30-day test window |
| **P3** | Operations tooling + cutover | 78 | Operations manager + Fleet admin | SLO dashboards, device-lifecycle admin UI, **driver-roster + reassignment UI**, **service-due dashboard**, alarm console, legacy decommission |
| **P4** | Intelligence + driver KPIs | 9+ | Operations manager + Finance + Executive + HR | **Driver shift reporting (ACC sign-on / sign-off + geocoded location)**, **driver-behaviour scoring**, cost-centre allocation, fuel/temperature surfaces, executive monthly view |
Phases 13 are committed scope for this rebuild. Phase 4's driver KPI work is committed; the remaining Phase 4 surfaces (cost, anomalies, executive view) are committed in principle with details confirmed at end of Phase 3.
Routing and ticket-driven dispatch are **not part of this PRD** (see §3.2 non-goals and §1 scope boundary) they are tracked separately.
---
## 6. Phase 1 — Foundation and live tracking (weeks 13)
### 6.1 Objective
Establish the platform's architectural foundation and ship a live-position dashboard that demonstrates feature parity with today's `live.rahamafresh.com`, running in parallel against the same telematics sources. By end of Phase 1, the new platform is reachable, deployable, monitored, and shows live vehicle positions correctly.
### 6.2 Why this is Phase 1
Live tracking is the highest-traffic, highest-stakes use case (dispatch makes decisions on it every minute). It exercises the full stack push receiver, parser, projector, serve function, dashboard renderer at the smallest scope. If the architecture works for live tracking, the rest follows. If it doesn't, we discover that in week 3, not week 12.
### 6.3 User stories
- **U1.1 Dispatcher sees current positions.** As a dispatcher, I can open the live dashboard and see every active vehicle plotted on a map with its current location, last-update time, vehicle plate, and operational state (moving / parked / offline / unknown), refreshed every 15 seconds.
- **U1.2 Dispatcher filters by cost centre and city.** As a dispatcher, I can filter the live view by cost centre and assigned city without page reload.
- **U1.3 Dispatcher distinguishes broken from dormant.** As a dispatcher, I can tell at a glance which "offline" vehicles are decommissioned (and therefore not my problem) versus which are unexpectedly silent.
- **U1.4 Ops sees SLO breach.** As an ops manager, I can see at any time what percentage of active devices have a fix within the freshness SLO (default 90 s) and which vehicles are below SLO right now.
- **U1.5 Engineer replays an event.** As an engineer, I can take a raw Jimi payload from `events.raw`, re-parse it with the current parser, and confirm what `state.live_positions` would have been written, without re-fetching from Jimi.
### 6.4 Functional requirements
- **F1.1 Multi-account ingest.** Receive push events from all Jimi sub-accounts (current TARGETS list) via HMAC-signed webhooks. Identify each event by `(account_id, imei)` from day one no retroactive multi-tenancy.
- **F1.2 Polled ingest.** Poll `jimi.user.device.location.list` per account on a 60-second cadence (catch-up on startup, ongoing). Poll `jimi.user.device.location.get` for stale IMEIs every 10 minutes.
- **F1.3 Immutable event log + minimal-gateway contract.** Every push and every poll response writes a row to `events.raw` with verbatim payload, source, signature, `received_at`, `parser_version`. The push-receiving gateway performs HMAC verify + INSERT + `NOTIFY events_raw_new` + 200 OK, and nothing else, per request. No parsing, no PostGIS, no projector work on the push path. Parser and projector work happens in the worker container role (architecture §2.3, §6).
- **F1.4 Versioned parser, event-driven.** A parser worker holds a `LISTEN events_raw_new` connection, drains new rows to `events.parsed` on arrival (typically within milliseconds), applying Pydantic-typed transformations. Parser version is recorded per row. Re-parsing is a SQL statement. A 5-second timer-based sweep catches the rare missed-NOTIFY case; under normal operation it is a no-op.
- **F1.5 Single-writer projector, event-driven.** One projector holds a `LISTEN events_parsed_new` connection and updates `state.live_positions` on arrival of each `position_fix` event. Ordering invariant: process events in `occurred_at` order; never overwrite a newer fix with an older one. Stage-to-stage lag (parser projector) is bounded by NOTIFY propagation, not by polling intervals.
- **F1.6 Dedup rule applied once.** The tracker-first dedup rule (tracker mc_type priority 24h freshness gate fall back to camera if all trackers stale intra-type tiebreak by most-recent fix activation_time tiebreak) is implemented in **one** SQL function (`serve.fn_live_view`) and **one** Pydantic projection. No client-side dedup.
- **F1.7 Live API endpoint.** `GET /api/views/live?filters=…` returns a render-ready payload: `{summary: {…KPIs…}, geojson: {…vehicles…}, slo_status: {…breaches…}}`.
- **F1.8 Live dashboard.** A static HTML page (`index-live.html`) imports `fleet-core.js`, authenticates against `/api/auth/token`, calls the live endpoint every 15 s with a JWT, renders KPIs and a MapLibre map. No business logic in JS. No anonymous access path; users sign in before the map renders.
- **F1.9 SLO measurement.** A `slo_measurement` worker computes fix-freshness every 60 s and writes to `slo.measurements`. Grafana dashboards render against `slo.*`.
- **F1.10 Contract checker.** A daily job calls each Jimi endpoint against a sandbox account, validates the response against the current Pydantic model, alerts on drift.
- **F1.11 Device lifecycle.** `domain.devices.lifecycle` is NOT NULL with values `provisioned | active | suspended | decommissioned`. The live view shows only `active` devices.
- **F1.12 Parallel deployment.** The new platform runs alongside the old; both receive Jimi pushes. The new dashboard is reachable at `live-v2.rahamafresh.com` and requires a JWT from day one (no public-read on the new platform see §3.3 principle 9, §15 Q1). The old dashboard remains canonical for dispatch until end of Phase 3.
### 6.5 Non-functional requirements
- **NFR1.1 Fix freshness.** 95% of active devices have a fix within 90 s during business hours (07:0019:00 EAT).
- **NFR1.2 Push receiver latency.** p95 < 100 ms (Jimi-side measurable timeout).
- **NFR1.3 Live endpoint latency.** p95 < 300 ms, p99 < 800 ms.
- **NFR1.4 Parser lag.** p95 of `received_at` to `events.parsed` insertion: < 30 s.
- **NFR1.5 Availability.** 99.5% monthly for push receiver and live endpoint. (Higher targets in P3.)
- **NFR1.6 Security.** All inbound webhooks HMAC-verified. All dashboard endpoints require a valid JWT read endpoints included, no anonymous access path. Public-read posture from the legacy platform is not preserved 15 Q1 closed, §3.3 principle 9).
- **NFR1.7 Observability.** Every request logged in structured JSON with `event_id`, `imei`, `endpoint`, `parser_version`, `latency_ms`.
### 6.6 Success criteria
Phase 1 is done when, in continuous production observation for 7 days:
1. The new platform receives 100% of Jimi pushes the old platform receives (per ingest-log comparison).
2. The new live dashboard renders the same vehicle positions as the old, within ±15 s.
3. `slo.v_current_status` shows fix-freshness SLO 95%.
4. Zero retroactive write-path guards have been added (no FIX-M21 equivalent).
5. The contract checker has run green for 7 consecutive days.
6. A demonstration of replay (truncate `state.live_positions`, re-project from `events.parsed`, confirm restored) succeeds in <30 minutes.
### 6.7 Out of scope for Phase 1
Historical playback, trips, parking events, alarms, fuel, temperature, OBD, dispatcher write-actions, routing, mobile responsive (works but not polished), authentication beyond JWT scaffolding.
---
## 7. Phase 2 — Historical and trip analytics (weeks 46)
### 7.1 Objective
Ship the historical-track and trip-report capabilities at feature parity with today's `fleetintelligence.rahamafresh.com`, on the new architecture. By end of Phase 2, ops managers can reproduce any analysis they do today against the new platform.
### 7.2 Why this is Phase 2
Live tracking gives dispatch what they need. Historical + trip analytics gives ops management what they need. These two cover ~90% of current platform usage. Building Phase 2 immediately after Phase 1 means the same architectural muscles are exercised on a wider surface (CAGGs, longer time ranges, larger result sets, more filters), without leaving the team for a month to come back.
### 7.3 User stories
- **U2.1 Ops manager plays back a track.** As an ops manager, I can select a vehicle and a 24-hour window and see the vehicle's track animated on a map, with speed and direction overlaid.
- **U2.2 Ops manager reviews trips.** As an ops manager, I can see all trips for a vehicle, plate, cost centre, or assigned city in a date range, with start/end time, start/end address, distance, duration, idle time, and max speed.
- **U2.3 Ops manager exports.** As an ops manager, I can export any historical view to CSV for downstream finance reconciliation.
- **U2.4 Ops manager reviews parking events.** As an ops manager, I can see parking events with start/end time and address, filter by duration (>1h, >8h, overnight).
- **U2.5 — Finance reconciles by cost centre.** As a finance owner, I can see total distance and trip count per cost centre for a billing period.
- **U2.6 — Engineer back-computes.** As an engineer, I can re-run the trip projector against `events.parsed` for any date range without re-fetching from Jimi.
### 7.4 Functional requirements
- **F2.1 — Trip projector.** A projector reads `events.parsed` of kinds `trip_open`, `trip_close`, and `position_fix` and writes `state.trips` with start/end time, start/end position, distance, duration, idle time, max speed.
- **F2.2 — Parking projector.** A projector derives parking events from position-fix streams (speed=0 for >5 min) and writes `state.parking_events`.
- **F2.3 — Geocoding worker.** A separate worker drains a `geocode_queue` table (positions needing addresses) and writes `state.geocoded_positions`. Nominatim primary, Mapbox fallback. Never blocks ingest.
- **F2.4 — Historical API endpoints.** `GET /api/views/history?filters=…` returns `{summary, geojson, slo_status}` for a date range. `GET /api/views/history/animation?…` returns time-stamped position frames for playback.
- **F2.5 — Trip API endpoint.** `GET /api/views/trips?filters=…` returns trip records. Filters: vehicle, plate, cost centre, assigned city, date range, min duration, min distance.
- **F2.6 — Parking API endpoint.** `GET /api/views/parking?filters=…` similar shape.
- **F2.7 — Continuous aggregates.** TimescaleDB CAGGs for daily and weekly trip rollups per cost centre, refreshed every 15 min.
- **F2.8 — Historical dashboard.** `index-history.html` page with form-driven filter UI, playback control, KPI tiles. Shares `fleet-core.js` with the live dashboard.
- **F2.9 — CSV export.** Every dashboard view has a "Download CSV" action that exports the current filtered result.
- **F2.10 — Migration of recent history.** The last 90 days of trips and positions from the legacy DB are imported into `events.raw` (synthesised events with `source=legacy_import`), then re-parsed and projected on the new platform. Older data remains in the legacy DB as a read-only archive available to ops via Grafana.
### 7.5 Non-functional requirements
- **NFR2.1 — Historical API latency.** p95 < 1.5 s for a 24-hour vehicle track; < 3 s for a 7-day cost-centre rollup.
- **NFR2.2 Trip ingest lag.** Trips closed in Jimi appear in `state.trips` within 600 s (SLO `trip_lag`).
- **NFR2.3 Geocoding hit rate.** 80% of trip endpoints have a non-null address within 24 h of the trip closing.
- **NFR2.4 CSV export.** Up to 100,000 rows in <30 s.
### 7.6 Success criteria
Phase 2 is done when, for a chosen 30-day test window:
1. The new historical dashboard renders the same trip records as the legacy one (sample 100 trips, 100% match on start time, end time, distance ±1%, addresses 90% match or none-vs-something).
2. Trip lag SLO is met for 7 consecutive days.
3. A 90-day backfill of trips from the legacy DB has been imported, re-parsed, and projected successfully.
4. Ops manager has signed off on parity for their workflow.
### 7.7 Out of scope for Phase 2
Active alarms console (Phase 3), driver-behaviour scoring (Phase 4). Routing and ticket-driven dispatch are out of scope of this PRD entirely (companion project).
---
## 8. Phase 3 — Operations tooling and cutover (weeks 78)
### 8.1 Objective
Equip ops managers and fleet administrators with first-class tools for the work they currently do in spreadsheets, Slack, and direct database access. By end of Phase 3, the platform is the canonical home for fleet operations, and the legacy platform is decommissioned.
### 8.2 Why this is Phase 3
Phases 1 and 2 ship feature parity for live and historical views. Phase 3 ships the **operational hygiene** that the legacy platform was missing SLO visibility, alarm triage, device lifecycle UI, dispatcher workflow. This is where the rebuild starts to be visibly better than what it replaces, and it ends with the legacy stack going dark.
### 8.3 User stories
- **U3.1 Ops sees SLO health.** As an ops manager, I can see a dashboard of all platform SLOs (fix freshness, trip lag, parser lag, contract drift) with current value, threshold, and trend over the last 7 days.
- **U3.2 Ops triages alarms.** As an ops manager, I can see an alarm console listing recent alarms (panic button, speeding, geofence breach, etc.) with vehicle, plate, time, location, and ack-status. I can acknowledge and add a note.
- **U3.3 Admin provisions a device.** As a fleet administrator, I can add a new device to the platform: enter IMEI, vehicle plate, cost centre, assigned city, set lifecycle to `provisioned`. On first valid push from that device, transition to `active`.
- **U3.4 Admin suspends a device.** As a fleet administrator, I can suspend a device (e.g. subscription expired) and the device disappears from operational views but remains visible to admin views.
- **U3.5 Admin decommissions a device.** As a fleet administrator, I can decommission a device permanently. It vanishes from all operational views.
- **U3.6 Admin audits.** As a fleet administrator, I can see an audit log of all lifecycle transitions for any device.
- **U3.8 Admin assigns a primary driver.** As a fleet administrator, I can assign a primary driver to a vehicle with an effective-from date and an optional end date. The assignment is the default attribution for all trips and behaviour scoring on that vehicle.
- **U3.9 Admin reassigns a vehicle for leave or absence.** As a fleet administrator, when a driver goes on leave (annual leave, sick leave, training, suspension, etc.), I can mark them off-roster for a date range and assign a temporary substitute driver to their vehicle for that range. When the date range ends, the primary driver automatically resumes attribution no second action required.
- **U3.10 Admin sees vehicles without an assigned driver.** As a fleet administrator, I can see a list of vehicles that are currently active but have no driver assigned (or whose assigned driver is currently off-roster with no substitute). This list is the daily "needs reassignment" worklist.
- **U3.11 Admin sees driver history.** As a fleet administrator, I can see the historical sequence of drivers for any vehicle, and the historical sequence of vehicles for any driver. Useful for incident investigation, HR reconciliation, and customer queries.
- **U3.12 Admin sees service-due dashboard.** As a fleet administrator, I can see every active vehicle with its odometer-since-last-service running total, the service-interval policy applied to it (default 5,000 km), and how many km remain until next service. The view sorts by urgency: overdue first, then due-soon (within 500 km), then comfortable.
- **U3.13 Admin records a completed service.** As a fleet administrator, I can record a completed service: vehicle, date, service type, odometer reading at service, optional notes, optional cost. The running total resets at that reading.
- **U3.14 Admin sets a per-vehicle service interval.** As a fleet administrator, I can override the default 5,000 km service interval for a specific vehicle class (e.g., heavy trucks at 10,000 km) so the service-due math reflects the vehicle's actual maintenance plan.
### 8.4 Functional requirements
- **F3.1 SLO dashboard.** A page (`/ops/slos`) rendering all rows from `slo.targets` with current state from `slo.measurements`. Green/amber/red badges with thresholds visible.
- **F3.2 Alarm projector.** Projector reads `events.parsed` of kind `alarm` and writes `state.alarms` with `ack_status`, `ack_by`, `ack_note`.
- **F3.3 Alarm console.** A page (`/ops/alarms`) listing recent alarms with filters (severity, vehicle, date range), ack action, and note field.
- **F3.4 Device admin UI.** A page (`/admin/devices`) with table view, lifecycle transitions, audit log.
- **F3.5 Lifecycle audit log.** `domain.devices_audit` table records every lifecycle transition with `actor`, `at`, `from_lifecycle`, `to_lifecycle`, `reason`.
- **F3.8 Authentication.** JWT-based login for ops and admin pages. Two scopes: `read:fleet`, `admin`. Dispatcher and ops manager = `read:fleet`. Fleet admin = `admin`. (A `write:dispatch` scope is intentionally not introduced here; ticket-driven dispatch is the companion project's domain.)
- **F3.9 Cutover plan.** Push events are mirrored to both old and new platforms for 7 days. DNS cuts. The legacy push receiver continues to accept events as standby for 48 h, then is decommissioned.
- **F3.10 Driver model.** `domain.drivers` holds driver identity (`driver_id` UUID, `full_name`, `employee_ref`, `phone`, `status`). `status ∈ {active, on_leave, suspended, terminated}`. `domain.driver_status_log` records every status transition with `actor`, `at`, `from`, `to`, `reason`, `effective_from`, `effective_to`.
- **F3.11 Driver assignment model.** `domain.driver_assignments` is the time-ranged source of truth for which driver was on which vehicle: `(assignment_id, vehicle_id, driver_id, role, effective_from, effective_to, reason, created_by, created_at)`. `role ∈ {primary, substitute}`. Effective ranges may not overlap for the same `(vehicle_id, role)`. The "currently assigned driver" for a vehicle at time `t` is: substitute whose range covers `t` if one exists, otherwise primary whose range covers `t`, otherwise NULL.
- **F3.12 Assignment lookup function.** `serve.fn_driver_at(vehicle_id, at_time) → driver_id` returns the assigned driver at a point in time. Used by every projector that attributes activity to a driver (trips, shifts, behaviour scoring).
- **F3.13 Driver-roster admin UI.** Page `/admin/drivers` lists drivers with current status and current vehicle. Page `/admin/assignments` provides the calendar-style reassignment UI (driver-on-leave date range choose substitute save).
- **F3.14 Service policy model.** `domain.service_policies` defines the service interval per vehicle class (`vehicle_class`, `interval_km`, `description`). Default policy: `interval_km = 5000`. A vehicle inherits its class's policy unless overridden in `domain.vehicles.service_interval_km_override`.
- **F3.15 Service event model.** `ops.service_log` records completed services: `(service_id, vehicle_id, serviced_at, service_type, odometer_km_at_service, cost, notes, recorded_by, recorded_at)`. Once recorded, the running-total math resets at this row.
- **F3.16 Service-due projector.** A projector computes per-vehicle `km_since_last_service` from `state.position_history` (sum of segment distances since `serviced_at`) and writes `state.service_status (vehicle_id, last_serviced_at, km_at_last_service, current_odometer_km, km_since_last_service, interval_km, km_to_next_service, status)`. `status ∈ {overdue, due_soon, ok}` derived from `km_to_next_service` versus a configurable buffer (default 500 km for due_soon).
- **F3.17 Service-due API endpoint.** `GET /api/views/service?filters=…` returns the service worklist payload `{summary: {overdue_count, due_soon_count, ok_count}, vehicles: [{vehicle_id, plate, ...service_status fields}]}` sorted by urgency.
- **F3.18 Service-due dashboard.** A page (`/ops/service`) renders the service worklist with overdue (red) at top, due-soon (amber) next, ok (green) collapsed. Each row links to the vehicle's service-log history. A "Record service" action opens a form that creates an `ops.service_log` row.
**Fuel / refuelling ingest from the existing WhatsApp microservice**
The company already operates a WhatsApp-based fuelling microservice: drivers send a message containing a photo of the odometer, litres added, fuel station, and time. That microservice continues to own the driver-facing WhatsApp interaction. This platform consumes its output as a new ingest source same event-sourcing pattern as Jimi pushes.
- **F3.19 Fuel submission ingest endpoint.** `POST /push/fuel` accepts a payload from the existing microservice with `(submission_id, imei_or_plate, driver_phone, odometer_km, litres, fuel_station_name, fuel_station_geom?, photo_ref, submitted_at, signature)`. HMAC-verified against a shared secret. Payload written verbatim to `events.raw` with `source = 'whatsapp_fuel'`. Parser produces a typed `fuel_submission` event in `events.parsed` keyed by the platform's `vehicle_id` (resolved from IMEI or normalised plate) and `driver_id` (resolved from `driver_phone`).
- **F3.20 Fuel / odometer projector.** Reads `events.parsed` of kind `fuel_submission` and writes:
- `state.fuel_log`: `(fuel_id, vehicle_id, driver_id, fuel_station_name, fuel_station_geom, odometer_km_submitted, litres, photo_ref, submitted_at)`.
- `state.odometer_readings`: appends `(reading_id, vehicle_id, source, odometer_km, observed_at, confidence)` where `source = 'driver_submission'` and `confidence` is computed from the agreement between submitted km and platform-derived (GPS-summed) km at the same timestamp.
- **F3.21 Odometer-truth view and variance-gated reset.** `serve.fn_odometer_status(vehicle_id) → (gps_derived_km, last_submitted_km, last_submitted_at, divergence_km, confidence)` exposes both sources side-by-side. Confidence is computed from the variance between the submitted km delta and the GPS-derived km delta over the same interval, normalised: `variance = |Δodometer_submitted - Δdistance_gps| / Δdistance_gps`. If `variance ≤ 5%`, the reading is `confidence = high`. If `variance > 5%`, the reading is `confidence = low`.
**High-confidence readings** reset the service-due clock: the service-due projector (F3.16) consumes the submitted km as the new baseline for `km_since_last_service`. The fuel submission is accepted into `state.fuel_log` normally.
**Low-confidence readings are quarantined**, not used. The submission is still written to `state.fuel_log` (we don't lose the record) and the reading is still appended to `state.odometer_readings` with `source = 'driver_submission'`, `confidence = 'low'` (we don't lose audit trail), but `state.service_status` is **not** updated from a low-confidence reading. A row is written to `ops.admin_alerts` of kind `odometer_variance_exceeded` with `vehicle_id`, `variance_pct`, `submitted_km`, `gps_derived_km`, and the photo reference; the fleet administrator reviews against the photo and either confirms (promoting the reading to high-confidence via an explicit admin action that resets the clock) or rejects (the reading stays in audit but never updates service status).
Rationale: an unverified manual entry typo, fraud, or honest mistake should not silently corrupt service-due math. A reading that looks wrong against the GPS-derived figure stays in the audit trail but doesn't get to move the service clock until a human has looked at the photo.
**HR roster sync — `domain.drivers` stays current automatically**
Driver identity, phone numbers, and employment status are mastered in the HR system, not in this platform. We sync via the platform's existing event-sourcing pattern rather than via FDW + materialised view, so HR data follows the same `events.raw → parsed → projected` lifecycle as Jimi and fuel submissions. Cadence is 3 hours HR data is not minute-by-minute volatile, and a force-refresh action handles the rare urgent case.
- **F3.22 HR driver sync worker.** Scheduled every 3 hours. Pulls the HR extract (table, view, or API Q19) and writes one row per driver verbatim to `events.raw` with `source = 'hr_extract'`. Parser produces `hr_driver_snapshot` events in `events.parsed` with normalised fields (E.164 phone, trimmed name, validated `employee_ref`, status in `{active, on_leave, suspended, terminated}`). Projector applies them to `domain.drivers` (upsert by `employee_ref`).
- **F3.23 Force-refresh HR.** A fleet-admin action triggers an immediate HR pull outside the schedule. Used when an urgent termination or suspension needs to take effect inside the 3-hour window. Logged in `domain.devices_audit`-style audit so we know why a refresh fired and who fired it.
- **F3.24 HR sync staleness metric.** `slo.targets` row `hr_sync_lag` measures "minutes since last successful HR sync". SLO threshold: 240 min (3 h cadence + 1 h buffer) during business hours. Breach surfaces in the SLO dashboard like any other.
- **F3.25 Quarantined odometer review action.** A fleet-admin page (`/admin/odometer-review`) lists open `odometer_variance_exceeded` alerts with submitted km, GPS-derived km, variance percentage, photo, driver, and submission time. The admin either **confirms** (the reading is promoted to `confidence = high` retroactively, `state.service_status` is updated with this reading as the baseline, and the alert is closed with `resolution = confirmed`) or **rejects** (the reading stays as `confidence = low` in `state.odometer_readings`, the alert is closed with `resolution = rejected`, optional reason captured). Resolution is logged in `domain.devices_audit`-style audit with actor, timestamp, resolution, and reason. No bulk-action option in v1; per-row review is the point.
### 8.5 Non-functional requirements
- **NFR3.1 Availability.** 99.9% monthly for the platform after cutover.
- **NFR3.2 Auth latency.** Token-issue endpoint p95 < 200 ms.
- **NFR3.3 Audit completeness.** Every lifecycle transition has an audit row. Audit rows are never deleted.
### 8.6 Success criteria
Phase 3 is done when:
1. Legacy `live.rahamafresh.com` and `fleetintelligence.rahamafresh.com` are no longer the canonical dashboards. DNS points to the new platform.
2. Ops manager and fleet administrator have signed off on workflow parity + improvements.
3. All SLOs in `slo.v_current_status` have been green for 7 consecutive days.
4. The contract checker has detected at least one synthetic upstream change in staging (proof it works).
5. Zero data loss during cutover (verified by comparing `events.raw` counts in the mirror window).
6. 95% of active vehicles have a non-null `serve.fn_driver_at(vehicle_id, now())` i.e. the driver roster is functionally complete, not aspirational.
7. The service-due dashboard is the canonical source for "what's due this week" confirmed by fleet admin having retired the service spreadsheet for two consecutive weeks.
8. The reassignment-for-leave flow has been used in production at least 3 times without ops needing to fall back to the spreadsheet.
9. 80% of fuel submissions from the WhatsApp microservice land in `events.raw` within 60 s of submission (per microservice telemetry vs platform `received_at`).
10. 90% of active vehicles have at least one driver-submitted odometer reading within the last 30 days. Of those readings, 75% pass the high-confidence variance gate (F3.21) on first submission. Quarantined readings have a median admin-review-and-resolve time 3 working days.
11. HR sync `hr_sync_lag` SLO is green (≤ 240 min) for 95% of business hours across 7 consecutive days; force-refresh has been used at least once and verified to take effect within 60 s.
### 8.7 Out of scope for Phase 3
Driver-behaviour scoring (Phase 4), customer-facing portal (out of scope entirely). Routing and ticket-driven dispatch are out of scope of this PRD entirely (companion project).
---
## 9. Phase 4 — Intelligence and driver KPIs (weeks 9+)
### 9.1 Objective
Convert the data the platform now collects cleanly into business intelligence:
- **Driver KPIs** shift sign-on and sign-off detection (first ACC_ON of the day, last ACC_OFF) with geocoded location at each, total drive time per shift, idle time, and a composite **driver-behaviour score** built from speeding, harsh-acceleration, and harsh-braking events.
- **Cost allocation** distance, fuel, idle time, and after-hours usage per cost centre and per assigned city.
- **Anomaly surfaces** fuel and temperature anomalies surfaced for ops triage.
- **Executive view** a one-page monthly summary backed by data, not hand curation.
By end of Phase 4, the **ops manager has a daily driver-performance worklist**, the **HR-adjacent attribution** (who drove what when, where they signed on, where they signed off) is in the platform and not in a spreadsheet, **finance has self-service cost allocation**, and **executives consume a monthly view that builds itself**.
Driver KPIs are the headline of Phase 4. They depend on the P3 driver-roster work being complete and accurate the platform cannot attribute a shift to "Driver X" if the assignment data isn't there.
### 9.2 Why this is Phase 4
Driver KPIs depend on three earlier-phase prerequisites being in place: (a) the event log carries ACC_ON/ACC_OFF cleanly (P1), (b) trips and idle minutes are projected (P2), and (c) the driver roster is current and trustworthy (P3). Putting driver KPIs earlier would mean attributing shifts to the wrong people; putting them later would mean we've left obvious value on the table after the platform is otherwise complete.
Driver KPIs and service tracking together close the loop on the fleet team's three rosters: device roster (P3), driver roster (P3), service roster (P3) driver performance (P4) cost allocation (P4) executive summary (P4). The data flows from operational hygiene into business intelligence in one direction.
**Phase 4 scope.** Driver shift reporting and behaviour scoring are **committed**. Cost allocation, fuel/temperature anomaly surfaces, and the executive view are **committed in principle, detailed at end of Phase 3** based on what the data and the team have taught us by then. Stretch items (deviation alerts, timesheet export) are post-rebuild backlog.
### 9.3 User stories
Driver-related stories are committed scope. The remaining stories are refined at end of Phase 4.
**Driver KPIs — shift reporting (committed)**
- **U4.1 Ops sees when each driver started work.** As an ops manager, I can see, for any date, every driver's shift start time and shift start location (geocoded address). Shift start is defined as the first ACC_ON event of the day for the driver's assigned vehicle. If a driver did not sign on, the row is present with status `no_shift`.
- **U4.2 Ops sees when each driver finished work.** As an ops manager, for the same date, I can see every driver's shift end time and shift end location (geocoded address). Shift end is the last ACC_OFF event of the day for that driver's assigned vehicle, after their shift-start ACC_ON. If a shift is still open at the time of viewing (driver currently working), status is `open`.
- **U4.3 Ops sees total drive time and idle time per shift.** As an ops manager, for any shift I can see total driving minutes (ACC_ON with speed > 0), total idle minutes (ACC_ON with speed = 0), and total stopped-engine-on minutes within the shift window.
- **U4.4 — Ops sees a driver's week.** As an ops manager, I can pick a driver and a week and see the seven-day timeline of shifts with start/end times, distance, and behaviour score per shift.
**Driver KPIs — behaviour scoring (committed)**
- **U4.5 — Ops scores drivers on quality of driving.** As an ops manager, I can see a per-driver behaviour score for any window. The score is a composite of: speeding events (sustained position-fix speed above a configurable fleet threshold, or Jimi's own `speeding` alarm), harsh-acceleration events (from Jimi alarms), harsh-braking events (from Jimi alarms), harsh-cornering events (from Jimi alarms, if device supports them), idle-percentage. Each component is normalised per 100 km driven so longer-distance drivers aren't penalised.
- **U4.6 — Ops drills into harsh events.** As an ops manager, for any harsh event I can see the timestamp, location (geocoded), speed at event, and the surrounding 60 seconds of position trail. Useful for "is this a real concern or a road feature?" judgement.
- **U4.7 — Ops compares drivers.** As an ops manager, I can compare any subset of drivers (e.g., a depot's roster) side-by-side on the composite score and on each component, for any window.
**Cost, anomalies, executive (committed)**
- **U4.8 — Finance allocates by cost centre.** As a finance owner, I can see total distance, fuel consumption, idle time, after-hours usage per cost centre for any billing period.
- **U4.9 — Ops detects fuel anomalies.** As an ops manager, I can see vehicles with unusual fuel consumption (sudden drop = theft suspicion, sudden rise = sensor failure) for a window.
- **U4.10 — Ops monitors cold-chain.** As an ops manager (for refrigerated vehicles), I can see temperature compliance per trip, with breaches flagged.
- **U4.11 — Executive sees the month.** As an executive, I can see a one-page monthly summary: fleet uptime SLO attainment, cost-per-km trend, incident count, top-5 vehicles by distance, top-5 cost centres by spend, **top-5 and bottom-5 drivers by behaviour score**, vehicles serviced on-schedule vs late.
**Stretch (post-rebuild backlog)**
- **U4.12 — HR exports timesheets.** *(Stretch)* HR can export a per-driver timesheet (sign-on, sign-off, total hours) for any pay period directly from the platform.
Deviation alerts and multi-stop optimisation — previously listed here as stretch — are now in the routing companion project's scope, not this platform's.
### 9.4 Functional requirements
**Driver shift derivation (committed)**
- **F4.1 — Shift projector.** A projector reads `events.parsed` of kinds `acc_on`, `acc_off`, and `position_fix` and writes `state.driver_shifts`: `(shift_id, driver_id, vehicle_id, shift_date, started_at, started_geom, started_address, ended_at, ended_geom, ended_address, drive_minutes, idle_minutes, distance_km, status)`. `status ∈ {open, closed, no_shift}`. Driver attribution uses `serve.fn_driver_at(vehicle_id, started_at)` from Phase 3.
- **F4.2 — Shift definition.** Shift start = first ACC_ON of the day where the prior ACC_OFF was ≥ `min_break_hours` ago (default 6 hours, configurable per cost centre). Shift end = last ACC_OFF before the next shift-start trigger. Lunch breaks (short ACC_OFFs within a shift) do not end the shift; they are subtracted from drive time and reported as idle.
- **F4.3 — Day boundary handling.** A shift starting before midnight and ending after midnight is one shift. The reporting "day" is the calendar date of the `started_at`. A shift starting after midnight is its own day even if the prior shift ended after midnight of the same calendar day.
- **F4.4 — Geocoding of shift endpoints.** Shift start and shift end positions are enqueued to the geocoding worker (Phase 2) at projector time so addresses are present within the geocoder's SLO window.
- **F4.5 — Shift API endpoint.** `GET /api/views/shifts?filters=…` returns shift records. Filters: driver, vehicle, cost centre, date range, status. Includes a `current_shifts` flag for "show me everyone currently signed on right now".
**Driver behaviour scoring (committed)**
- **F4.6 — Behaviour event extraction.** A projector identifies behaviour events from `events.parsed`:
- `speeding`: sustained position-fix speed above a fleet-configurable threshold (default 80 km/h urban / 100 km/h highway distinguished by `road_class` tag in `domain.vehicles` if known, otherwise single fleet-wide threshold of 90 km/h) for > 30 s, **or** an explicit `speeding` alarm from Jimi.
- `harsh_accel`: alarm of type `harshAcceleration` from Jimi.
- `harsh_brake`: alarm of type `harshBraking` from Jimi.
- `harsh_corner`: alarm of type `harshCornering` from Jimi (if device supports it).
Each event is written to `state.behaviour_events` with `(driver_id, vehicle_id, occurred_at, kind, severity, geom, speed_kmh)`.
Note: a more sophisticated speeding rule that uses statutory `maxspeed` per road segment (requires OSM road topology) is intentionally deferred — it belongs in the routing companion project where OSM ingestion already lives. The fleet-wide threshold gets us 80% of the value without the cost.
- **F4.7 — Behaviour score function.** `serve.fn_driver_behaviour_score(driver_id, period_start, period_end) → numeric` returns a 0100 score. Component weights are configurable in `domain.behaviour_weights` (default: speeding 35, harsh_brake 25, harsh_accel 25, harsh_corner 15). Each component is normalised per 100 km driven in the window. 100 = no events; 0 = every km has a harsh event.
- **F4.8 — Behaviour API endpoint.** `GET /api/views/drivers?filters=…` returns driver records with current score, score trend, top-3 event types, week-over-week change. `GET /api/views/drivers/{id}/events?…` returns the event list with location and speed for drill-down.
**Cost, anomalies, executive (committed)**
- **F4.9 — Cost-allocation view.** `serve.fn_cost_allocation(period, cost_centre?)` returns the finance-ready breakdown: distance, fuel litres, fuel cost (from `state.fuel_log` driver-submitted entries — real cost per litre × litres, with station attribution), idle hours, after-hours hours, per cost centre, per assigned city. Where `state.fuel_log` data is missing, falls back to estimated consumption from `state.fuel_readings` (Jimi OBD tap) and surfaces the basis (`cost_basis ∈ {actual, estimated}`) in the response.
- **F4.10 — Fuel anomaly detector.** Worker consumes both `state.fuel_readings` (Jimi OBD tap) and `state.fuel_log` (driver-submitted fillups). Applies a z-score rule (≥ 3σ against the vehicle's trailing 30-day mean) to each independently. Cross-checks the two sources: when a fillup is recorded in `state.fuel_log` but OBD doesn't show the expected tank-level rise (or vice versa), the anomaly is tagged `source_disagreement` and surfaces with higher severity — catches fuel-card-vs-tank discrepancies (a theft signature) and sensor failures (an OBD signature). Anomalies written to `ops.anomalies` for review.
- **F4.11 — Temperature compliance projector.** For cold-chain vehicles, reads `state.temperature_readings`, computes time-out-of-band per trip, surfaces breaches.
- **F4.12 — Executive monthly view.** `serve.fn_executive_summary(month)` returns the one-page payload: SLO attainment, cost-per-km trend, incident count, distance leaders, cost-centre spend, driver-score leaders + laggards, services on-schedule percentage.
**Stretch**
- **F4.13 — Timesheet export.** *(Stretch)* `GET /api/views/timesheets?driver=…&pay_period=…` returns CSV ready for HR import.
### 9.5 Success criteria
Driver-related criteria are committed. Cost/anomaly/executive criteria are refined at end of Phase 3.
**Driver shift reporting (committed)**
1. For 7 consecutive working days, every active vehicle whose driver is rostered has a `state.driver_shifts` row with non-null `started_at`, `started_address`, `ended_at`, `ended_address`. Vehicles without an assigned driver are flagged as `unattributed_shift` (not as missing data).
2. Sign-on and sign-off addresses are populated (non-null) for ≥95% of closed shifts within 24 h of shift close.
3. Ops manager has used the shift view to make at least 3 operational decisions (timesheet reconciliation, late-start investigation, route-coverage review) in the first 30 days.
**Driver behaviour scoring (committed)**
4. Behaviour events are extracted for every active driver continuously for 14 consecutive days.
5. The composite score is published per driver per week and visible in the ops dashboard.
6. A blinded review with the ops manager confirms that the top-5 and bottom-5 drivers by score match the ops manager's independent ranking of the same group with ≥70% agreement (sanity-check the scoring weights).
**Cost / anomalies / executive (refined at end of P4)**
7. Finance has signed off on cost-allocation reports.
8. Executive consumes the monthly view without manual deck construction for 3 consecutive months.
### 9.6 Out of scope for Phase 4 (still and forever for this PRD cycle)
Customer-facing portal, driver mobile app, native iOS/Android, multi-region active-active deployment, video streaming from cameras, automated disciplinary action triggered by behaviour score (the score informs human decisions; it does not take them), real-time driver feedback in the cab (out of scope without a driver-facing app), payroll integration (timesheets export to CSV as the integration surface in stretch). Routing, ETA prediction, deviation alerts, and ticket-driven dispatch remain in the companion project's scope.
---
## 10. Cross-cutting requirements
These apply to every phase, not to any single one.
### 10.1 Data and privacy
- **D1 — Data retention.** `events.raw` retained for 365 days (then archived to rustfs as monthly parquet dumps). `state.*` retained indefinitely. PII (driver names, vehicle plates, **driver shift locations and home-area sign-on points**) treated per Kenya Data Protection Act 2019 and Uganda DPPA 2019 — access logged, exports auditable.
- **D2 — Data residency.** All production data resides on rahamafresh.com infrastructure (VPS region: as currently configured). No SaaS data exit.
- **D3 — Backup.** Daily logical dump to rustfs; weekly events.raw and monthly geo slices. Restore drill quarterly.
- **D4 — Right-to-delete.** Driver records can be anonymised on request without breaking referential integrity (FK on `driver_id` set to a "redacted" sentinel). Anonymisation preserves `state.driver_shifts` and `state.behaviour_events` rows but replaces `driver_id` with the sentinel; shift locations and behaviour-event geometries are retained for fleet-level analysis but no longer attributable to a person.
- **D5 — Driver-shift sensitivity.** Shift start/end locations reveal a driver's home or near-home depot pattern. Access to per-driver shift views requires `read:fleet` scope and is logged with `actor`, `driver_id_viewed`, `at`. Aggregated views (depot-level start-time distribution) require no special scope.
### 10.2 Security
- **S1 — Inbound webhook auth.** HMAC-signed (shared secret with Jimi). Signature header `X-Jimi-Signature`. Signature mismatch → 401, log, alert.
- **S2 — Internal API auth.** All API endpoints require a valid JWT — read and write, dashboards and admin pages. No anonymous access path on the new platform. Three scopes: `read:fleet` (live, history, trips, parking, alarms, shifts, behaviour read), `write:ops` (driver assignment changes, alarm acks, service-log entries, odometer review actions), `admin:fleet` (device lifecycle transitions, audit access, system configuration). 15-min access tokens, 30-day refresh tokens. Refresh-token revocation list.
- **S3 — Database access.** Application connects via `pgbouncer` with a per-role user (`app_writer`, `app_reader`, `migrations`). Grafana uses `reporting_reader`. No direct DB access from outside the VPS.
- **S4 — Secrets.** `.env` in dev, Coolify env vars in prod. Never in git. `.env.example` documents every key.
- **S5 — TLS.** All public endpoints behind Traefik with Let's Encrypt. HSTS enabled. No plain HTTP.
- **S6 — Rate limiting.** slowapi: dashboards 60 req/min/IP, push 1000 req/min total.
- **S7 — Audit.** Lifecycle transitions, dispatch decisions, alarm acks, and admin actions all written to audit tables with actor, time, payload.
### 10.3 Performance and scale
- **P1 — Target scale.** Current: ~180 devices, ~5,000 alarms/day, ~5,000 position fixes/day from push, ~120k position fixes/day from polling. Design target: 5x current (~900 devices, ~600k fixes/day) without architectural change. Beyond 5x: a discussion about read replicas and partition strategy.
- **P2 — Database sizing.** TimescaleDB-HA on the existing VPS class (8 vCPU / 32 GB / NVMe). Hypertable chunk interval: 1 day for events, 7 days for state hypertables.
- **P3 — Cache.** No external cache (Redis) in v1. Postgres query plan + CAGGs cover the read patterns. Re-evaluate at 3x scale.
### 10.4 Observability
- **O1 — Structured logs.** JSON to stdout, aggregated by Coolify, retained 30 days.
- **O2 — Metrics.** Postgres views in `slo.*`. Grafana renders. No Prometheus in v1.
- **O3 — Alerting.** Grafana alerts to Slack (channel TBD) on SLO breach. PagerDuty-style on-call: deferred (no on-call rotation today).
- **O4 — Tracing.** OpenTelemetry-ready (FastAPI middleware) but no collector in v1. Trace IDs propagated in logs.
### 10.5 Internationalisation and locale
- **I1 — Time zone.** All UI in EAT (UTC+3). All storage in UTC. Conversion at the serve layer.
- **I2 — Language.** English only.
- **I3 — Units.** Distance in km, speed in km/h, temperature in °C, fuel in litres.
### 10.6 Accessibility
- **A1 — Contrast.** All map overlays, KPI tiles, and status badges meet WCAG AA contrast against their background.
- **A2 — Keyboard navigation.** All filter forms and table actions keyboard-accessible.
- **A3 — Screen reader.** KPI tiles have ARIA labels. Map markers have alt text via popup.
### 10.7 Compatibility
- **C1 — Browser support.** Latest Chrome, Edge, Safari, Firefox (n and n-1). No IE.
- **C2 — Mobile.** Responsive layout works at ≥375px width. Native mobile is out of scope.
- **C3 — Print.** Trip reports and parking reports print sensibly (CSS print stylesheet).
---
## 11. Risks and mitigations
| # | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| R1 | Jimicloud changes API contract during rebuild | Medium | High | Contract checker built in Phase 1; daily runs catch drift within 24h. |
| R2 | Push mirror in legacy stack drops events during parallel run | Low | High | Compare `events.raw` counts every 6h during mirror window; reconcile via polled catch-up. |
| R3 | Team capacity stretched across product phases and feature requests | High | Medium | Phase scope is firm; new feature requests go to Phase 4 stretch backlog or to the routing companion project. Stakeholder agreement up front. |
| R4 | Cutover discovers a feature parity gap | Medium | Medium | Side-by-side comparison in week 78 with ops manager sign-off before DNS cut. |
| R5 | Coolify deploy mechanism doesn't support clean image-tag rollback | Low | Medium | Validated in Phase 1 week 1 (CI/CD smoke test before any business logic). |
| R6 | Gitea or registry outage blocks production deploys | Low | Low | Last-known-good image cached on the VPS; `docker run` can be invoked directly in emergency. |
| R7 | Data migration of 90-day backfill takes longer than a weekend | Medium | Low | Backfill is non-blocking; old platform stays canonical until backfill complete. |
| R8 | A dependency vendor sunset (TimescaleDB, MapLibre, etc.) during build | Very low | Medium | All chosen tech is OSS with multi-year support cycles; no SaaS lock-in. |
| R9 | Driver roster drifts out of date — vehicles move with no assigned driver, or with a driver on leave who didn't get reassigned | Medium | Medium | P3 dashboard surfaces "unassigned-but-moving" vehicles as a daily worklist; behaviour events for unattributed shifts roll up to a `_unassigned` bucket so the data isn't lost, just not personally attributed. |
| R10 | Behaviour score weights are wrong — top scorers and bottom scorers don't match ops manager's intuition | Medium | Low | Weights are configurable rows in `domain.behaviour_weights`, not constants; blinded review in P4 success criteria catches the mismatch and tunes weights without code change. |
| R11 | Odometer readings drift from actual vehicle odometer (Jimi distance is GPS-derived, not OBD-read) so service-due math is wrong | Low | Medium | **Variance-gated reset at every fuel submission** (F3.21): driver-submitted odometer at fillup resets the service-due clock when within 5% of the GPS-derived figure (`confidence = high`); readings outside that envelope are quarantined to `ops.admin_alerts` and do **not** mutate `state.service_status` until a fleet admin reviews the photo and either confirms or rejects (F3.25). The reading is preserved in the audit trail either way. Manual admin correction remains available as a backstop. |
| R12 | Privacy concerns around driver shift locations — staff side or union side | Low | Medium | D5 access logging + admin-only personal view + aggregate views without scope gate. Disclose to drivers what is captured (transparency builds trust). |
| R13 | Companion routing project starts late and dispatchers continue without route-suggestion tooling | Medium | Low | This PRD does not own the timeline of the routing project. Dispatchers retain phone/WhatsApp coordination (status quo) until that project ships. Live position + historical playback in this platform remain available throughout. |
| R14 | HR extract schema changes (field rename, type change, new status value) and our sync starts producing garbage | Medium | Medium | Contract checker pattern applies: a daily validation job runs the HR extract through the Pydantic model in dry-run mode and alerts on drift. Parser is version-pinned; new HR schema = bumped parser version, no silent data loss. |
| R15 | Driver submits fraudulent fuel data — wrong odometer, inflated litres, fake station | Medium | Medium | Photo is retained in `state.fuel_log.photo_ref` for human spot-audit. Submitted odometer is cross-checked against GPS-derived figure (F3.21 confidence flag); divergence > 5% flags review. Submitter phone is cross-checked against assigned-driver phone at submission time; mismatch flags review. Monthly anomaly batch (F4.10) surfaces patterns. Detection is not bulletproof but raises the cost of fraud meaningfully. |
| R16 | Companion WhatsApp comms project starts late and dispatcher-to-driver coordination remains on personal phones | Medium | Low | Same posture as R13 (routing). This PRD does not own that project's timeline. Existing voice/WhatsApp coordination continues unaffected. |
| R17 | A heavy historical query, OOM in a single worker, or runaway scheduled job takes down live tracking — single-process fate sharing | Medium | High | Architecture deploys three container roles from one image: `platform-gateway` (push receivers + dashboard reads), `platform-worker` (parsers + projectors + geocoder), `platform-cron` (scheduled polls + contract checks). Failures are isolated to the role that hits them; the gateway keeps returning 200 OK to Jimi while a worker is restarting. Verified by F1.x cutover criteria (kill a worker container during normal traffic, confirm push ingest continues). |
| R18 | Stage-to-stage lag (parser polling, projector polling) consumes the fix-freshness SLO budget before the data even gets to the dashboard | Low | Medium | Parser and projector wake on `LISTEN/NOTIFY` from the upstream stage's write, not on fixed polling intervals. Internal stage-to-stage lag is bounded by NOTIFY propagation (typically <100 ms), so the 90 s freshness SLO budget is spent on Jimi transport and dashboard polling, not on our scheduler. Timer-based sweeps are retained as a fallback for missed NOTIFY (e.g. connection blip) but are no-ops under normal operation. |
| R19 | Public-read dashboards leak driver shift home-area locations, plate-to-customer mappings, and HR-derived driver identity to anyone with the URL | Medium | High | Public-read posture not preserved on the rebuilt platform. JWT required on every endpoint from Phase 1, including live map reads. Three scopes (`read:fleet`, `write:ops`, `admin:fleet`); driver shift views require explicit access logging (D5). Q1 closed in favor of authenticated access. |
---
## 12. Dependencies and engineering mapping
### 12.1 External dependencies
- **Jimicloud / Tracksolid Pro API access** existing, multi-account credentials in `.env`.
- **Nominatim** (self-hosted or public) for reverse geocoding.
- **Mapbox** (fallback geocoder, basemap option) token needed.
- **rustfs** for static assets and backups.
- **Let's Encrypt** for TLS.
OSM road topology (Geofabrik extracts) is **not a dependency of this platform**. It belongs to the routing companion project. If the companion project chooses pgRouting and wants to share the same database, that conversation happens at companion-project planning time and may involve adding a `geo` schema then.
### 12.2 Internal dependencies
- VPS capacity headroom for parallel run (legacy + new for ~6 weeks). Current VPS class should suffice; confirm at Phase 1 kickoff.
- Stakeholder availability for Phase 2 sign-off (ops manager) and Phase 3 sign-off (ops manager + fleet admin).
- Open question resolution (see §15) before relevant phases.
### 12.3 Mapping to engineering phases (architecture doc)
| PRD phase | Engineering phases (from architecture doc) | Notes |
|---|---|---|
| P1 Foundation + live | Eng A (Foundation), Eng B (Event log + parser), Eng C (Projectors, subset: live_positions) | Live tracking is the smallest end-to-end slice. Eng A ships the three container roles (gateway / worker / cron) from one image; subsequent phases extend the worker and cron roles without changing the gateway contract. |
| P2 Historical + trips | Eng C (Projectors, full), Eng D (Serve layer) | Historical adds more projectors + more SQL functions |
| P3 Ops tooling + cutover | Eng E (Dashboards, full) + Eng G (Cutover) + driver-roster + service-due additions | Ops UI completes the dashboard set; driver assignment + service tracking added to the admin surface; cutover happens at end of P3 |
| P4 Intelligence + driver KPIs | Driver-shift projector + behaviour-event projector + behaviour-score function + cost/anomaly/executive views | Depends on P3 driver roster. Architecture doc's Eng F (Routing) is **dropped** from this rebuild's scope. |
The architecture document's Phase F (Routing OSM loader, map-match projector, segment-speed CAGG, route endpoint, pgRouting) is removed from this PRD's commit. If the routing companion project chooses to build on the same database, the architecture doc remains a useful reference for that project but it is not work this PRD funds or schedules.
---
## 13. Stakeholders and sign-off
| Role | Name | Sign-off required for |
|---|---|---|
| Product sponsor | TBD | Overall PRD, phase scopes, budget |
| Engineering lead | TBD | Architecture, capacity, schedule |
| Ops manager | TBD | P2 + P3 feature parity, cutover go/no-go |
| Fleet administrator | TBD | P3 admin UI, lifecycle model |
| Dispatcher representative | TBD | P1 live UX |
| HR / People lead | TBD | P4 driver KPI scope (shift reporting + behaviour scoring), Q14 driver transparency communication |
| Workshop / Maintenance lead | TBD | P3 service-due dashboard + service interval policy (Q10) |
| Finance | TBD | P4 cost-allocation model |
| Fuel microservice owner | TBD | P3 fuel-submission ingest (F3.19F3.21), payload contract (Q17) |
| HR systems owner | TBD | P3 HR sync (F3.22F3.24), extract source decisions (Q19) |
A sign-off is "I have read the requirements for the phases that affect my work and I agree they describe what I need." It is not "I will not change my mind later." Change is expected and tracked.
---
## 14. Open questions
These are decisions needed before the affected phase starts.
| # | Question | Needed before | Owner | Default if undecided |
|---|---|---|---|---|
| Q1 | ~~Authentication posture for dashboards: public-read (today) or JWT-gated?~~ **Decided.** All endpoints require JWT from day one, including live dashboard reads. Public-read not preserved. | **Decided** | Sponsor + Engineering | **JWT-gated on all endpoints.** Driver-shift home-area data, HR-sourced driver identity, plate-to-customer mappings none of these are public-read-safe. Reflected in §3.3 principle 9, F1.8, F1.12, NFR1.6, S2, R19. |
| Q2 | Confirmed SLO numerical targets (fix freshness, trip lag) | P1 (freshness), P2 (trip lag) | Ops manager | 90 s / 600 s |
| Q3 | Image registry: ghcr.io vs self-hosted registry.rahamafresh.com | P1 | Engineering | ghcr.io (lower ops burden) |
| Q4 | Driver assignment model: primary driver per vehicle with substitute override for leave (recommended), or per-trip assignment? | P3 | Ops manager + Fleet admin | Primary + substitute with time-bounded ranges (per §F3.11). |
| Q5 | Slack channel for SLO alerts | P3 | Sponsor | `#fleet-ops` (TBD existence) |
| Q6 | What stays of n8n? Any genuine cross-system workflows worth keeping? | P3 | Engineering + Ops | Decommission entirely unless concrete workflow surfaces |
| Q7 | Customer-facing tracking confirmed out of scope or future phase? | P4 sign-off | Sponsor | Out of scope (PRD non-goal) |
| Q8 | Data retention beyond 365 days for `events.raw` archive or delete? | P3 (when first chunk ages out) | Sponsor + legal | Archive to rustfs as monthly parquet |
| Q9 | Phase 4 scope details for cost-allocation and fuel/temperature priorities (driver KPIs are committed) | End of P3 | Sponsor + ops + finance | Confirmed at P3 wrap |
| Q10 | Service interval policy: one default (5,000 km) or per vehicle class? Which vehicle classes exist and what is each class's interval? | P3 | Fleet admin + workshop | Default 5,000 km, with `service_interval_km_override` per vehicle to start, vehicle classes added when the classes are known. |
| Q11 | Minimum break between shifts (the gap that distinguishes "shift end" from "lunch break"). Default 6 h confirm against actual roster patterns. | P4 | Ops manager + HR | 6 h, configurable per cost centre if depots differ. |
| Q12 | Treatment of unassigned-vehicle movement: bucket under `_unassigned` driver, surface as alert, both? Privacy implication: tracking unattributed driving still records location. | P4 | Ops manager + HR + sponsor | Bucket under `_unassigned`, surface daily worklist via P3 dashboard. |
| Q13 | Behaviour score weights do we accept the proposed defaults (speeding 35, harsh_brake 25, harsh_accel 25, harsh_corner 15) or does ops have a different priority? | P4 | Ops manager | Proposed defaults; tunable post-launch via blinded review. |
| Q14 | Driver transparency communication do we inform drivers what is captured (shift locations, behaviour events) before P4 launches? | P4 launch | Sponsor + HR | Yes; transparency reduces R12 and is the right thing to do regardless. |
| Q15 | Companion routing project kick off in parallel with this rebuild, or sequence after cutover? Affects whether dispatchers see suggest-route tooling in 2026 or 2027. | P3 sign-off | Sponsor + Engineering | Sequence after cutover; pre-empts capacity contention during P1P3. |
| Q16 | Should this platform retain a thin dispatch-audit log (`ops.tickets` or similar) for "who decided to send vehicle X where", or is even that the routing project's responsibility? | P3 | Sponsor + Ops + Companion-project lead | Drop entirely from this PRD; if companion project doesn't ship it, we revisit. |
| Q17 | Fuel microservice payload contract exact field names, types, photo storage convention. The microservice is in production and continues to evolve; we adapt to it and version-pin the parser. | P3 | Engineering + Fuel microservice owner | We adopt the microservice's current schema as `parser_version = 1`. Future microservice changes bump parser version; both versions remain replayable from `events.raw`. |
| Q18 | HR ingest pattern sync-via-event-log (recommended) or FDW + materialised view? | **Decided** | Sponsor + Engineering | **Sync-via-event-log with 3-hour cadence + force-refresh action.** Same `events.raw → parsed → projected → state` pipeline as Jimi pushes and fuel submissions. Documented in F3.22F3.24. |
| Q19 | HR extract source details which database / API endpoint, which table or view, credentials, network reachability from the platform's VPS. | P3 | HR systems owner + Engineering | TBD needs HR systems owner conversation. Default if HR cannot expose a stable extract: nightly CSV drop to rustfs, ingested by a file-watcher worker following the same `events.raw` pattern. |
| Q20 | Companion WhatsApp comms project scope, timeline, tech choice (Meta API vs Evolution + Chatwoot). | After P3 sign-off | Sponsor + comms-project lead | Sequence after cutover (same logic as Q15 for routing). Tech choice belongs to that project, not this one. |
---
## 15. Glossary
- **ACC_ON / ACC_OFF** Ignition-on and ignition-off signals from the tracker, sent when the vehicle's accessory power is turned on or off. The day's first ACC_ON marks the driver's shift start; the last ACC_OFF marks shift end.
- **Active device** A device whose lifecycle state is `active` (provisioned + has reported within retention threshold + not suspended or decommissioned).
- **Behaviour score** A 0100 per-driver composite of speeding, overspeed-vs-segment, harsh-acceleration, harsh-braking, and harsh-cornering events, normalised per 100 km driven. 100 = clean driving in the window; 0 = a harsh event on every km.
- **CAGG** TimescaleDB Continuous Aggregate; a materialised view kept fresh by the database.
- **Cross-feed** The mechanism by which an alarm push's latent position is also written to `live_positions` (introduced as FIX-M21 in the current system; in the rebuild it is automatic via event sourcing).
- **Dedup rule** The logic for selecting which device's fix represents a vehicle when multiple devices (tracker + camera) are on one plate.
- **Driver assignment** A time-ranged record stating which driver was on which vehicle in which role (primary or substitute) for which date range. Substitute assignments override the primary for their range.
- **Driver shift** A derived period from first ACC_ON of a day to last ACC_OFF, attributed to whichever driver was assigned (primary or substitute) to the vehicle at shift start. Carries geocoded start and end locations.
- **Fuel submission** A driver-originated record from the existing WhatsApp fuelling microservice containing odometer reading, litres added, fuel station, time, and photo. Ingested via `POST /push/fuel`, written to `events.raw` (`source = 'whatsapp_fuel'`), projected to `state.fuel_log` and `state.odometer_readings`.
- **HR extract sync** A scheduled 3-hour pull (with admin force-refresh) of the HR system's driver roster into `events.raw` (`source = 'hr_extract'`), projected to `domain.drivers`. The platform's source of truth for driver identity, phone, and employment status. Same event-sourcing pipeline as Jimi pushes and fuel submissions.
- **Lifecycle state** A device's administrative state machine (`provisioned`, `active`, `suspended`, `decommissioned`).
- **Operational state** A vehicle's derived current state (`moving`, `parked`, `offline`, `unknown`).
- **Projector** A worker that reads `events.parsed` and writes to a `state.*` table.
- **Odometer source** Where a km reading came from. `gps_derived` (summed from position fixes always available but approximate), `driver_submission` (from a fuel submission accurate but periodic), or `admin_correction` (manually entered). `serve.fn_odometer_status` exposes both `gps_derived` and `driver_submission` side-by-side with a divergence + confidence indicator.
- **Service interval** The km-between-services policy for a vehicle. Default 5,000 km, per-vehicle-class override allowed (e.g. heavy trucks at 10,000 km), per-vehicle override allowed.
- **Service status** A vehicle's position in its service cycle: `ok` (comfortably within interval), `due_soon` (within 500 km of next service), `overdue` (km_since_last_service interval_km).
- **SLO** Service Level Objective; an explicit numerical commitment (e.g. "95% of active devices have a fix within 90 s").
- **Stale IMEI** A device whose latest fix is older than the freshness threshold (default 30 min for rescue, 90 s for SLO).
- **Substitute driver** A driver temporarily assigned to a vehicle while the primary driver is on leave / off-roster. Substitute assignments are time-bounded; when the range ends, attribution returns to the primary automatically.
- **Unattributed shift** A shift on a vehicle that had no driver assigned (primary or substitute) at shift start. Recorded under the `_unassigned` driver sentinel for fleet-level analysis; surfaced as a daily worklist for fleet admin.
- **Zero island** A latitude/longitude of exactly (0,0), which is in the Gulf of Guinea and indicates a sensor error, never a real fix.
---
## 16. Companion documents
**Companion projects (not yet authored):**
- *Routing + ServiceNow ticket dispatch PRD* to be written. Scope: ServiceNow inbound ticket integration, vehicle-to-ticket allocation policy, suggest-route engine (pgRouting / OSRM / Valhalla / third-party choice deferred to that project), dispatcher route-suggestion UX, deviation alerts, ETA prediction, multi-stop optimisation. Depends on this platform for live vehicle position, driver assignment, and (optionally) segment-speed observations.
- *WhatsApp comms PRD* to be written. Scope: dispatcher-to-driver broadcasts, inbound conversation handling, service-reminder templates, alarm notifications, opt-in/opt-out workflow, choice between Meta WhatsApp Business API and Evolution + Chatwoot, conversation-state UX for dispatchers. Depends on this platform for `domain.drivers` (identity, phone) and emits its own conversation log. **Does not** include the fuel/refuelling microservice, which is in-scope of this PRD as an ingest source.
**External services this PRD integrates with (existing, not built by this project):**
- *WhatsApp fuelling microservice* already in production. Driver sends a WhatsApp message with odometer photo, litres, station, time; the microservice parses and publishes a structured payload. This PRD's `POST /push/fuel` endpoint consumes that payload (F3.19). The microservice's WhatsApp interaction model is owned by the microservice; this PRD owns only the downstream ingest and projection. Payload contract evolves per Q17.
- *HR system* existing system of record for driver identity and employment status. This PRD reads from it via the HR sync worker (F3.22) on a 3-hour cadence. This PRD does not write to it.
---