Full slide-by-slide copy for elicitation pitch: 6 pain questions, feature
reveal, business case, optional add-ons (RustFS + DuckDB), and one-pager.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RustFS (S3-compatible blob) and DuckDB (historical analytics) added as
optional add-on tiers with elicitation pain questions and tier model.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ingest scripts were connecting to the old tracksolid_2 database instead of
the timescale_db container in this stack. Grafana was already correct
(uses service name timescale_db:5432). Also strip leading space and quotes
from DATABASE_URL and API_BASE_URL so os.getenv() returns clean values.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Maps host port 5888 → container port 5432 so the DB can be reached
directly from the MacBook (requires UFW allow 5888/tcp on the server).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- §3: note tracksolid_2 as live schema, tracksolid as empty target;
add DB direct access tip (31.97.44.246:5888, leading space in .env)
- §4: add import_drivers_csv.py and migration 06 to codebase map
- §5: document tracksolid_2 live tables with column differences
(assigned_team vs cost_centre, city vs assigned_city); add ops.*
- §8: add rule 9 (Forgejo API auth via keychain) and rule 10
(always check active schema before querying)
- §9: update fleet state — pipeline stopped Apr 6, CSV fleet pending,
0 driver names, 19 stale positions
- §10: replace driver-name manual item with deploy + CSV import tasks
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit fixes across the ingestion stack:
Observability
- Move log_ingestion out of batch loops in poll_alarms and poll_parking
(was emitting N cumulative log rows per run instead of one).
- Add missing log_ingestion + t0 to poll_trips.
- Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT
DO NOTHING no longer inflates the metric.
Resilience
- SAVEPOINT-per-item added to poll_alarms, poll_live_positions,
poll_trips, poll_parking so one bad row no longer aborts the batch
(webhook handlers already had this; pollers were inconsistent).
Performance
- /pushgps and poll_track_list now use psycopg2.extras.execute_values
with ON CONFLICT DO NOTHING — 10-50x write throughput on larger
batches.
- sync_devices and sync_driver_audit fetch jimi.track.device.detail
concurrently via ThreadPoolExecutor(max_workers=8), cutting the
daily registry sync from ~24s to ~3s for an 80-device fleet.
- poll_track_list split into two phases: parallel API fetch (4 workers,
no DB connection held) then one batched write. Previously the DB
connection was held across every per-IMEI HTTP call, risking pool
starvation.
Security
- _validate_token uses hmac.compare_digest for constant-time token
comparison (closes timing side-channel).
- _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default
5000) so a pathological push cannot blow memory.
Tests
- Fix test_null_alarm_type_skipped: its INSERT-count assertion was
catching the ingestion_log insert written by log_ingestion. Filter
that out so the test checks only data-table inserts.
- Full suite: 66 passed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
57 unit tests covering clean helpers, API signing, and field mapping fixes
(FIX-E06, FIX-M16, BUG-01, BUG-03); integration tests for webhook endpoints
with mocked DB; Forgejo CI workflow with TimescaleDB service container.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLAUDE.md: cached context file covering project identity, tech stack,
codebase map, schema quick-ref, API gotchas, fix history, working rules,
fleet state, and open items. Structured for maximum cache efficiency —
stable content first, dynamic state at the end.
docs/CONNECTIONS.md: connection parameter shapes (no secrets) for SSH,
DB, API, container resolution, Forgejo, Grafana, n8n.
docs/PROJECT_CONTEXT.md: client business context (telco field service,
3 cities, service types), data quality gaps, KPI framework by domain,
integration roadmap.
docs/KPI_FRAMEWORK.md: living KPI register with status tracking,
thresholds, client feedback log, and review checklist. To be co-developed
with client iteratively.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Post-deployment snapshot at ~00:15 EAT 2026-04-12. Key changes vs 260410:
- 3 trips recorded (FRED KMGW 538W HULETI, 6.94 km total) — pipeline validated
- FIX-M16 distance unit fix confirmed: implied speed matches API avgSpeed exactly
- 70 track_list fixes in 24h (was 13) — dense trail from active driving
- KDK 829A GP returned to primary depot from secondary Nairobi East cluster
- Uganda anomaly (X3-63282) persists — flagged for management
- Driver name root cause confirmed: not assigned in Tracksolid Pro UI
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[FIX-M16] jimi.device.track.mileage returns distance in metres despite
docs claiming km. Confirmed: avgSpeed × runTimeSecond / 3600 = distance/1000.
poll_trips() now divides raw value by 1000 before storing as distance_km.
3 existing bad rows corrected in prod DB (distance_km / 1000).
[FIX-M17] sync_devices() ON CONFLICT clause was only updating 5 of 26
fields, silently dropping driver_phone, sim, iccid, vehicle_name, status
etc. on subsequent syncs. Expanded to update all device fields so driver
assignments made in Tracksolid Pro UI propagate to DB on next daily sync.
Add sync_driver_audit.py: one-shot script to compare API vs DB device
registry, report driver/IMEI gaps, and force a full field upsert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coolify only copies docker-compose.yaml and .env to its working directory —
the ./grafana/provisioning bind mount source was always empty on the server,
so Grafana started with no datasource or dashboard configured (causing the
'Failed to load home dashboard' error).
Fix: build a custom Grafana image (grafana/Dockerfile) that COPYs the
provisioning directory at image build time. Grafana substitutes
${GRAFANA_DB_RO_PASSWORD} at startup from the env var now in Coolify's store.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
grafana_ro DB role was created with placeholder password 'SET_PASSWORD_IN_ENV'
and GRAFANA_DB_RO_PASSWORD was never set in .env, so Grafana's TracksolidDB
datasource could not authenticate — causing 'Failed to load home dashboard'.
Fix:
- Add GRAFANA_DB_RO_PASSWORD to .env with a secure generated password
- Add sync_role_passwords() to run_migrations.py — runs ALTER ROLE on every
startup so DB password stays in sync with the env var (idempotent)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LOGIN/LOGOUT events from Jimi now persist to tracksolid.device_events.
Table already existed with correct schema (imei, event_type, event_time,
timezone, unique constraint). Follows same SAVEPOINT + log_ingestion
pattern as all other DB-writing endpoints.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BUG-01: OBD event_time — try unix_to_ts before clean_ts (Jimi sends epoch ints)
BUG-02: push_alarm — guard alarm_type not null (NULL breaks ON CONFLICT dedup)
BUG-03: push_trip_report — _parse_trip_ts handles Jimi BCD format YYMMDDHHmmss
BUG-04: SAVEPOINT per item in all 5 DB endpoints (FK violation on one item no
longer aborts the whole batch; SAVEPOINT now inside try for safety)
BUG-05: Add /pushevent endpoint (log-only; was returning 404 to Jimi)
FIX: push_fault_info — skip null fault_code (NULL != NULL in PG unique index)
FIX: log_ingestion — pass SQL NULL not string "None" when no error occurred
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reflects accurate field names, behaviours, and status from production:
Polling endpoints:
- 5.1 location.list: add full response schema (direction, gpsSignal, gpsNum,
powerValue, elecQuantity, posType, locDesc); add implementation note
(311 calls, ~19 devices/sweep, ~200ms, missing devices silently omitted)
- 5.4 track.mileage: add maxSpeed field (BUG-03); add distance unit note
(BUG-02 — values are km from API, corrected via migration 04)
- 5.5 track.list: add altitude/satellite fields; add POLL-01 implementation
note (30-min schedule, 35-min lookback, source='track_list', ~137s/call)
- 5.7 parking: clarify acc_type=0 required; note durSecond vs stopSecond;
add POLL-02 production status (60 calls, 0 rows, overnight expected)
- Rate limits: document track.list latency (~137s per call)
Alarms:
- 6.1: replace vague note with explicit poll-vs-push field name table
(alertTypeId/alarmTypeName vs alarmType/alarmName); confirm BUG-01 fix
verified in production (type 3 / "Vibration alert" now stored correctly)
Webhooks:
- 10.1 /pushevent: mark implemented (PUSH-01), db table
- 10.2 /pushhb: mark as not yet wired, table ready
- 10.4 /pushalarm: mark implemented, cross-ref field name table
- 10.7 /pushoil: mark implemented (PUSH-02), unit int→text note
- 10.9 /pushtem: mark implemented (PUSH-03)
- 10.10 /pushlbs: mark implemented (PUSH-04)
- 10.20 /pushobd: mark implemented, document OBD scalar extraction
- 10.21 /pushfaultinfo: mark not yet wired, table ready
- 10.22 /pushtripreport: mark implemented
Appendix B: full rewrite — split into polling and push tables with
accurate status (✅/⚠️/not used), call counts, and DB table references
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full live-query refresh against tracksolid_db at 07:38 EAT 2026-04-11.
All data sourced directly from the server via 10 targeted psql queries.
Report covers: all 17 table row counts, full 63-device registry with
odometer/SIM/expiry, live position detail for all 19 reporting devices
with GPS signal quality, geographic cluster map, position_history by
source (poll=124 / track_list=13 = 137 total), alarm detail confirming
BUG-01 fix, ingestion log health (399 calls, 0 failures), subscription
status breakdown, silent device full list (44 devices), schema additions
verification, Grafana readiness matrix, and P0/P1/P2 action plan.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updated report reflects state after migrations 04 and 05 are fully applied.
Includes: all 13 table row counts, fleet composition (63 devices / 4 models),
live position coverage (19/63), position history breakdown by source (poll vs
track_list), alarm detail (2 vibration alerts, BUG-01 fix confirmed), schema
health checklist, ingestion log polling summary, odometer service flags,
Uganda anomaly flag for X3-63282, data quality gap priority table, and
Grafana readiness assessment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Containers share one DB — when ingest_movement applies 04, ingest_events
and webhook_receiver start later and find distance_m already renamed,
causing a spurious FATAL before the next restart catches the recorded row.
Added sentinels for all four migrations so any container self-heals
on first startup regardless of which container ran first:
04 — trips.distance_km column exists
05 — tracksolid.device_events table exists
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Migration 02 and 03 were applied before the schema_migrations tracking
table existed, so they had no record and the runner tried to re-run them,
hitting non-idempotent TimescaleDB policy/trigger/cagg statements.
seed_pre_tracking_migrations() checks for sentinel schema objects and
inserts records for any migration that was clearly already applied:
- 02: tracksolid.devices table exists
- 03: position_history.altitude column exists
Called immediately after ensure_tracking_table() on every startup.
Safe on fresh databases (objects absent → nothing seeded → runs normally).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On a fresh database the tracksolid schema doesn't exist yet —
migration 02 creates it, but ensure_tracking_table() ran first.
Added CREATE SCHEMA IF NOT EXISTS tracksolid before the table DDL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 04_bug_fix_migration.sql and 05_enhancement_migration.sql to list
- Use schema_migrations table to skip already-applied migrations (prevents
migration 04's RENAME from failing on re-run after first deployment)
- Expand CRITICAL_TABLES to include all 5 new tables from migration 05
- record_applied() writes to schema_migrations after each success
- Cleaner output: APPLY / SKIP / OK per file with summary count
On next Coolify redeploy each container will skip 02-05 (already applied)
and apply any new migrations added in future commits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
run_migrations.sh auto-discovers numbered SQL files (NN_*.sql),
tracks applied migrations in tracksolid.schema_migrations table,
and skips already-applied files — safe to run on every deployment.
Usage: bash /app/run_migrations.sh
Coolify: add to Post-deployment Command in service settings.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Captures first-night state of the tracksolid_db pipeline:
- 63 devices registered, 19 with live positions, 4 active today
- 3 vehicles with fresh GPS (<10 min): Westlands x2, Athi River x1
- X3-63282 located in Uganda — flagged for investigation
- KDK 829A GP (239k km) and Belta KCU-647D (234k km) flagged for service review
- Migration 04 and 05 not yet applied (distance_m column still present)
- Parking fix and trip polling not yet active (containers not redeployed)
- Prioritised action list for full operational readiness
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coolify regenerates the container suffix on every redeploy, making
hardcoded names stale. All three docs now use:
TS_DB=$(docker ps --filter "name=timescale_db" --format "{{.Names}}" | head -1)
OPERATIONS_MANUAL.md: replaced bare connection string with full
tsdb() shell function, one-liner pattern, and multi-container
label-filter guidance.
tracksolid_DB_manual.md: updated header and connection example.
01_BusinessAnalytics.md: updated Step 5 migration commands.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers fleet utilisation, driver behaviour (speeding, harsh driving,
tardiness, after-hours movement), real-time dispatch queries, km per
driver per day, full business question inventory, Grafana dashboard
blueprint, and the 5-step roadmap to unlock remaining capabilities.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POLL-01 (FIX-M14): Add poll_track_list() calling jimi.device.track.list
- Runs every 30 min with 35-min lookback window (5-min overlap prevents gaps)
- Inserts all device waypoints into position_history with source='track_list'
- Increases position density from ~1/min to 2-6 fixes/min per active vehicle
- Single shared DB connection for all devices per cycle (efficient)
POLL-03 (FIX-M15): Add get_device_locations() utility function
- Calls jimi.device.location.get for up to 50 specific IMEIs on demand
- Used for alarm enrichment, stale device recovery, dashboard precision refresh
Manual updates:
- position_history section rewritten to document dual ingestion sources
- Three new queries: data density check, harsh driving detection, route trace
- Known Data Issues: issues 10 and 11 added and marked Fixed
- API coverage table updated to reflect all three endpoints now in use
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add sections 16–21: Daily, Weekly, Monthly, Quarterly analytics,
new table docs (device_events, fuel_readings, temperature_readings,
lbs_readings, geofences), and updated Known Data Issues
- Fix all distance queries: remove erroneous /1000000.0 division
(column is now distance_km in kilometres after migration 04)
- Update alarms section to reflect BUG-01 field mapping fix
- Update parking section to reflect POLL-02 acc_type/durSecond fix
- Rewrite "verify distance" section as accuracy cross-check query
- Expand row count query to include 5 new tables from migration 05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BUG-01 [FIX-E06]: jimi.device.alarm.list poll response uses alertTypeId/
alarmTypeName/alertTime, not the webhook field names. All 1,054 stored alarm
records had null alarm_type/alarm_name as a result. Corrected field mapping
in ingest_events_rev.py; also added alarm_name and source columns to INSERT.
BUG-02 [FIX-M11/M12]: trips.distance_m was storing millimetres due to an
erroneous * 1000 on an already-km API value. Removed the multiplication in
poll_trips() and push_trip_report(). Column renamed to distance_km in
migration 04 (historical rows divided by 1,000,000 to correct to km).
All SQL in both ingestion files updated to reference distance_km.
POLL-02 [FIX-M13]: parking poll returned 0 rows because the required
account and acc_type=0 parameters were missing. Also fixed response field
mapping: durSecond was incorrectly read as 'seconds'.
Migration 04: corrects and renames distance_m → distance_km.
Migration 05: adds normalized OBD columns, alarm/device enrichment columns,
new tables (device_events, fuel_readings, temperature_readings, lbs_readings,
geofences), expands dwh_gold fact table, and adds refresh_daily_metrics() ETL.
tracksolid_DB_manual.md updated to reflect column rename and mark fixed issues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers pre-deployment checklist, post-deploy verification steps for each
panel, database verification queries, troubleshooting guide, and
day-to-day NOC operations reference.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a fully-provisioned Grafana dashboard for NOC operators to monitor
80 vehicles in real-time: live geomap with direction arrows, speed, driver
info, and color-coded plates. Includes datasource and dashboard provider
YAMLs, dashboard JSON (schemaVersion 39 / Grafana 11.0.0), and
docker-compose updates to mount provisioning at container start.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port 8000 was already in use on the host. Updated uvicorn to listen
on 8888. Added 6 importable n8n workflow JSON files for Jimi push
data forwarding (OBD, faults, alarms, GPS, heartbeats, trips).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive guide covering:
- Service architecture and scheduled tasks
- Per-service verification SQL queries grouped by service
- Health dashboard queries for monitoring
- Polling vs push coexistence and dedup strategy
- Environment variables, data retention, troubleshooting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dict.get("result", []) returns None when key exists with null value.
Changed to resp.get("result") or [] which handles both cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change image from timescaledb-ha:pg16-ts2.15-oss to pg16-ts2.15
(OSS edition lacks compression, retention, continuous aggregates)
- Add postgresql-client to Dockerfile for psql binary
- Rewrite run_migrations.py to use psql instead of psycopg2
(psql runs each statement independently; psycopg2 wraps the
entire file in one transaction so one error rolls back everything)
- Add schema verification: exits 1 if critical tables missing,
preventing services from starting with broken schema
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Coolify doesn't support service_completed_successfully dependency.
Each Python service now runs 'python run_migrations.py' before its
main process. SQL is idempotent so concurrent runs are safe.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New run_migrations.py: executes 02_*.sql and 03_*.sql in order
- New db_migrate service: runs once before all other services start
- All services now depend on db_migrate (service_completed_successfully)
- Tolerates re-deploy: catches errors from already-existing objects
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port 8000 was already allocated on the host. On Coolify, Traefik routes
external traffic to the container internally — no host port needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>