Commit graph

11 commits

Author SHA1 Message Date
david kiania
b11294009b fix(security,ingest): 260702 audit — secure the stack, correct poller counters
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Waiting to run
Tests / test (pull_request) Waiting to run
Security:
- .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image
  layers; install pinned from uv.lock (reproducible builds) (SEC-04/05).
- docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by
  default; remote tooling moves to an SSH tunnel (SEC-01).
- webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed
  when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01).

Correctness:
- FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/
  poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new
  events inserted" logs and negative ingestion_log.rows_inserted.
- FIX-W02: parse application/json push bodies (were silently dropped).
- FIX-W03: move webhook DB work off the event loop via asyncio.to_thread.
- FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid +
  Nominatim (1 req/s) network calls.
- FIX-M24: sync_devices disables devices absent from every target (guarded).
- FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed).
- get_token(): don't relabel already-aware timestamptz expiries (BUG-P9).

Observability/lifecycle:
- migration 21: v_ingest_health restricted to active pipeline endpoints so
  one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified).
- FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d).
- remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config.

+5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*.
Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file
also carries unrelated in-progress edits).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 09:51:02 +03:00
david kiania
2309464ab8 FIX-M21: alarm cross-feed + stale-IMEI recovery for live_positions
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Cherry-pick of c8f5907 (originally FIX-M20 on main) onto
quality-program-2026-04-12 — renamed to FIX-M21 here to avoid clashing
with this branch's existing [FIX-M20] (trip enrichment, commit 144dede).
Behaviour and code are unchanged from the main-branch original; the
annotation tag is the only difference.

Background
----------
A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two
freshness gaps that share a single root cause: tracksolid.live_positions
was being written by only one path (the 60s polled sweep), and that path
silently omits devices that don't have a "current" fix in Jimi's
location.list response. Effect on the dashboard:

  * 18 vehicles show OFFLINE for days-to-months — last fix is whatever
    the sweep wrote before Jimi dropped them.
  * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback
    because their dedicated tracker has been silent; the camera's lat/lng
    arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but
    we discard it after writing to tracksolid.alarms.

Verified upstream subscription state: only /pushalarm is registered with
Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are
inactive. This change uses only data that already arrives.

What's in this commit
---------------------
ts_shared_rev.py
  * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None)
    — single time-guarded upsert all three writers will share. Guards on
    is_valid_fix() (filters Zero-Island and out-of-range) and
    EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or
    webhook retries can't rewind a fresher marker. COALESCE on optional
    columns so sparse callers don't blank dense ones' values.
  * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices
    whose live_positions.gps_time is NULL or older than the threshold,
    ordered NULLS FIRST so worst-offenders are in batch #1.
  * ensure_device(cur, imei, device_name=None) — relocated from
    webhook_receiver_rev so every live_positions writer can satisfy the
    FK without re-defining the helper. The original underscore-prefixed
    name in webhook_receiver_rev becomes a backwards-compat alias.

webhook_receiver_rev.py
  * /pushalarm — after the alarm row insert, call upsert_live_position
    with the alarm's lat/lng and alarmTime. Sits inside the existing
    per-item SAVEPOINT, so a cross-feed failure rolls back only that
    one alarm's cross-feed, not the alarm row.

ingest_movement_rev.py
  * poll_live_positions — inline INSERT replaced with upsert_live_position
    (extras dict carries the sweep-only columns). Same data, time-guarded.
  * get_device_locations — inline INSERT replaced; also gains an
    ensure_device call so it can be safely fed arbitrary IMEIs.
  * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and
    hands it to get_device_locations. Scheduled every 10 minutes plus a
    startup catch-up call. Uses jimi.device.location.get which returns
    *last-known* fix, so devices the 60s sweep drops can be re-warmed.

Expected post-deploy effect (estimates, see
06_live_location/260521_timescale_location_upgrade_major.md §4)
  * ~1,100-1,600 additional live_positions upserts/day from the alarm
    cross-feed, after the time-guard rejects ~70-80% of races vs the
    fresher 60s sweep.
  * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence
    (JC400P emits ~107 alarms/day per device).
  * 8-14 of the 24 OFFLINE plates expected to recover via location.get's
    last-known-fix path within the first 30 minutes.
  * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour.
  * No 06_live_location code changes required — reads through
    reporting.v_live_positions transparently.

Tests
-----
12 webhook integration tests pass (3 new: cross-feed fires on valid fix;
skips without lat/lng; skips Zero-Island). 8 new unit tests in
test_stale_imeis.py cover the stale selector, the poll wrapper, and the
time-guard contract on upsert_live_position. Full suite: 77 passed.

Deployment
----------
No schema migration. Both webhook_receiver and ingest_movement
containers must be rebuilt — source is image-baked, not bind-mounted.
Rollback is git revert + rebuild.

Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md
Verification playbook:  06_live_location/260521_timescale_location_upgrade_verification.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 22:33:21 +03:00
David Kiania
257643cae2 fix: auto-register devices on push + allow CSV import to insert new rows
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Three changes that together close the FK-violation loop on /pushalarm:

1. import_drivers_csv.py: when an IMEI is in the CSV but not in
   tracksolid.devices, INSERT a new row instead of skipping. Unblocks
   the 140 X3/JC400P devices listed as a HIGH open item in CLAUDE.md §10.

2. webhook_receiver_rev.py: new _ensure_device() helper upserts a stub
   devices row (status='unknown') before inserting an alarm. Handles the
   third class of devices — not in API sync, not in CSV (e.g. the
   X3-63282 Kampala device flagged in CLAUDE.md §10).

3. CSV refreshed from Downloads (Apr 21 version, 140 active rows).

Also fixes alarm error log previously showing "None" (read deviceImei
instead of the integration push's imei field).

CSV import already applied live on the instance (63 → 201 devices).
Webhook patch requires a Coolify redeploy to pick up _ensure_device().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:29:32 +03:00
David Kiania
636dd2b8b0 fix: parse actual Jimi push format (msgType+data, field name remap)
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Diagnostic logging revealed the real Jimi integration push format:
  Content-Type: application/x-www-form-urlencoded
  Body: msgType=jimi.push.device.alarm&data=<URL-encoded JSON>

Differences from docs:
  - data is one JSON object per POST (not a data_list array)
  - alarm uses imei+alarmTime, NOT deviceImei+gateTime

_parse_request now reads form field `data` (falls back to `data_list`) and
JSON-decodes a single object or array. push_alarm handler accepts either
field naming for forward-compat.

Removes diagnostic INFO log now that format is confirmed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:10:08 +03:00
David Kiania
c54794eb4c diag: log raw push body + content-type at INFO level
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Temporary diagnostic to see what format Jimi actually sends on /pushalarm.
New container is parsing to empty items (pushes arrive but no DB insert),
so we need to see the real body shape. Remove once format is confirmed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:04:55 +03:00
David Kiania
ef36ebebea fix: handle JSON body push format from Jimi integration API
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Jimi's integration push API (tracksolidprodocs.jimicloud.com) sends
Content-Type: application/json with body {"token":"...","data_list":[...]},
not form-encoded. FastAPI Form() silently defaulted to "" so all pushes
were discarded with "Failed to parse data_list:" warnings.

Replaces per-endpoint Form() params with a shared _parse_request() helper
that tries JSON body first, falls back to form-encoded. All seven push
endpoints (pushobd, pushfaultinfo, pushalarm, pushgps, pushhb,
pushtripreport, pushevent) updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 11:44:08 +03:00
David Kiania
8867be9d3d perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Audit fixes across the ingestion stack:

Observability
- Move log_ingestion out of batch loops in poll_alarms and poll_parking
  (was emitting N cumulative log rows per run instead of one).
- Add missing log_ingestion + t0 to poll_trips.
- Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT
  DO NOTHING no longer inflates the metric.

Resilience
- SAVEPOINT-per-item added to poll_alarms, poll_live_positions,
  poll_trips, poll_parking so one bad row no longer aborts the batch
  (webhook handlers already had this; pollers were inconsistent).

Performance
- /pushgps and poll_track_list now use psycopg2.extras.execute_values
  with ON CONFLICT DO NOTHING — 10-50x write throughput on larger
  batches.
- sync_devices and sync_driver_audit fetch jimi.track.device.detail
  concurrently via ThreadPoolExecutor(max_workers=8), cutting the
  daily registry sync from ~24s to ~3s for an 80-device fleet.
- poll_track_list split into two phases: parallel API fetch (4 workers,
  no DB connection held) then one batched write. Previously the DB
  connection was held across every per-IMEI HTTP call, risking pool
  starvation.

Security
- _validate_token uses hmac.compare_digest for constant-time token
  comparison (closes timing side-channel).
- _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default
  5000) so a pathological push cannot blow memory.

Tests
- Fix test_null_alarm_type_skipped: its INSERT-count assertion was
  catching the ingestion_log insert written by log_ingestion. Filter
  that out so the test checks only data-table inserts.
- Full suite: 66 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 00:33:55 +03:00
David Kiania
87ecab4a72 Wire /pushevent to device_events table (was log-only)
LOGIN/LOGOUT events from Jimi now persist to tracksolid.device_events.
Table already existed with correct schema (imei, event_type, event_time,
timezone, unique constraint). Follows same SAVEPOINT + log_ingestion
pattern as all other DB-writing endpoints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 18:42:22 +03:00
David Kiania
b1e4d6e85f Fix 5 webhook bugs: SAVEPOINTs, NULL guards, BCD timestamps, /pushevent, log NULL fix
BUG-01: OBD event_time — try unix_to_ts before clean_ts (Jimi sends epoch ints)
BUG-02: push_alarm — guard alarm_type not null (NULL breaks ON CONFLICT dedup)
BUG-03: push_trip_report — _parse_trip_ts handles Jimi BCD format YYMMDDHHmmss
BUG-04: SAVEPOINT per item in all 5 DB endpoints (FK violation on one item no
        longer aborts the whole batch; SAVEPOINT now inside try for safety)
BUG-05: Add /pushevent endpoint (log-only; was returning 404 to Jimi)
FIX:    push_fault_info — skip null fault_code (NULL != NULL in PG unique index)
FIX:    log_ingestion — pass SQL NULL not string "None" when no error occurred

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 18:19:13 +03:00
David Kiania
c05b47abe2 Fix alarm field mapping, distance unit bug, parking params; add schema migrations
BUG-01 [FIX-E06]: jimi.device.alarm.list poll response uses alertTypeId/
alarmTypeName/alertTime, not the webhook field names. All 1,054 stored alarm
records had null alarm_type/alarm_name as a result. Corrected field mapping
in ingest_events_rev.py; also added alarm_name and source columns to INSERT.

BUG-02 [FIX-M11/M12]: trips.distance_m was storing millimetres due to an
erroneous * 1000 on an already-km API value. Removed the multiplication in
poll_trips() and push_trip_report(). Column renamed to distance_km in
migration 04 (historical rows divided by 1,000,000 to correct to km).
All SQL in both ingestion files updated to reference distance_km.

POLL-02 [FIX-M13]: parking poll returned 0 rows because the required
account and acc_type=0 parameters were missing. Also fixed response field
mapping: durSecond was incorrectly read as 'seconds'.

Migration 04: corrects and renames distance_m → distance_km.
Migration 05: adds normalized OBD columns, alarm/device enrichment columns,
new tables (device_events, fuel_readings, temperature_readings, lbs_readings,
geofences), expands dwh_gold fact table, and adds refresh_daily_metrics() ETL.

tracksolid_DB_manual.md updated to reflect column rename and mark fixed issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 22:18:30 +03:00
David Kiania
de70972d6a Add webhook receiver, consolidate shared utilities, expand telemetry coverage
- Add FastAPI webhook receiver (webhook_receiver_rev.py) for Jimi push data:
  OBD diagnostics, DTC fault codes, alarms, GPS, heartbeats, trip reports
- Add schema migration (03_webhook_schema_migration.sql) for webhook tables:
  fault_codes, heartbeats, expanded obd_readings/trips/position_history/alarms
- Consolidate duplicated _safe/_shutdown into shared safe_task/setup_shutdown
  in ts_shared_rev.py (DRY refactor)
- Add auto-commit to get_conn() context manager (prevents forgotten commits)
- Fix poll_trips to capture runTimeSecond and maxSpeed from API
- Add poll_parking via jimi.open.platform.report.parking
- Remove broken poll_obd (OBD is push-only, no polling endpoint exists)
- Fix alarms schema: add lat/lng/acc_status columns + dedup constraint
- Fix obd_readings schema: add dedup constraint
- Fix trigger DO block: replace nonexistent has_column with information_schema
- Narrow api_post exception handling to RequestException/ValueError
- Add webhook_receiver service to docker-compose.yaml
- Add fastapi/uvicorn/python-multipart to pyproject.toml
- Add clean_ts timestamp validator to ts_shared_rev.py
- Add Tracksolid Pro API documentation (tracksolidApiDocumentation.md)
- Populate .gitignore with Python/OS/secrets patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 16:31:17 +03:00