Commit graph

10 commits

Author SHA1 Message Date
david kiania
c8f5907d4f FIX-M20: alarm cross-feed + stale-IMEI recovery for live_positions
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Background
----------
A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two
freshness gaps that share a single root cause: tracksolid.live_positions
was being written by only one path (the 60s polled sweep), and that path
silently omits devices that don't have a "current" fix in Jimi's
location.list response. Effect on the dashboard:

  * 18 vehicles show OFFLINE for days-to-months — last fix is whatever
    the sweep wrote before Jimi dropped them.
  * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback
    because their dedicated tracker has been silent; the camera's lat/lng
    arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but
    we discard it after writing to tracksolid.alarms.

Verified upstream subscription state: only /pushalarm is registered with
Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are
inactive. This change uses only data that already arrives.

What's in this PR
-----------------
ts_shared_rev.py
  * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None)
    — single time-guarded upsert all three writers will share. Guards on
    is_valid_fix() (filters Zero-Island and out-of-range) and
    EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or
    webhook retries can't rewind a fresher marker. COALESCE on optional
    columns so sparse callers don't blank dense ones' values.
  * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices
    whose live_positions.gps_time is NULL or older than the threshold,
    ordered NULLS FIRST so worst-offenders are in batch #1.
  * ensure_device(cur, imei, device_name=None) — relocated from
    webhook_receiver_rev so every live_positions writer can satisfy the
    FK without re-defining the helper. The original underscore-prefixed
    name in webhook_receiver_rev becomes a backwards-compat alias.

webhook_receiver_rev.py
  * /pushalarm — after the alarm row insert, call upsert_live_position
    with the alarm's lat/lng and alarmTime. Sits inside the existing
    per-item SAVEPOINT, so a cross-feed failure rolls back only that
    one alarm's cross-feed, not the alarm row.

ingest_movement_rev.py
  * poll_live_positions — inline INSERT replaced with upsert_live_position
    (extras dict carries the sweep-only columns). Same data, time-guarded.
  * get_device_locations — inline INSERT replaced; also gains an
    ensure_device call so it can be safely fed arbitrary IMEIs.
  * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and
    hands it to get_device_locations. Scheduled every 10 minutes plus a
    startup catch-up call. Uses jimi.device.location.get which returns
    *last-known* fix, so devices the 60s sweep drops can be re-warmed.

Expected post-deploy effect (estimates, see
260521_timescale_location_upgrade_major.md §4)
  * ~1,100-1,600 additional live_positions upserts/day from the alarm
    cross-feed, after the time-guard rejects ~70-80% of races vs the
    fresher 60s sweep.
  * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence
    (JC400P emits ~107 alarms/day per device).
  * 8-14 of the 24 OFFLINE plates expected to recover via location.get's
    last-known-fix path within the first 30 minutes.
  * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour.
  * No 06_live_location code changes required — reads through
    reporting.v_live_positions transparently.

Tests
-----
12 webhook integration tests pass (3 new: cross-feed fires on valid fix;
skips without lat/lng; skips Zero-Island). 8 new unit tests in
test_stale_imeis.py cover the stale selector, the poll wrapper, and the
time-guard contract on upsert_live_position. Full suite: 77 passed.

Deployment
----------
No schema migration. Both webhook_receiver and ingest_movement
containers must be rebuilt — source is image-baked, not bind-mounted.
Rollback is git revert + rebuild.

Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 21:05:26 +03:00
David Kiania
fa110f4313 feat: [FIX-M19] multi-account ingest across fireside sub-accounts
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Fleet lives across three Tracksolid sub-accounts:
  fireside         —  63 devices
  Fireside@HQ      —  52 devices
  Fireside_MSA     —  41 devices

Previously sync_devices / poll_live_positions / poll_parking only
queried a single TARGET_ACCOUNT, so ~64% of the fleet was invisible to
the pipeline.

Changes:
  - ts_shared_rev.py: new TARGETS list (env TRACKSOLID_TARGETS,
    comma-separated; falls back to the single TARGET_ACCOUNT).
  - ts_shared_rev.py: new get_active_imeis_by_target() helper that
    groups active IMEIs by their stored account so parking calls can
    pass the right account param per batch.
  - ingest_movement_rev.py: sync_devices and poll_live_positions loop
    over every target and dedupe by IMEI before upserting. poll_parking
    loops over imeis_by_target so each batch carries the matching
    account.
  - CLAUDE.md: FIX-M19 entry.

Requires new env var TRACKSOLID_TARGETS="fireside,Fireside@HQ,Fireside_MSA"
on the ingest services in Coolify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 10:43:07 +03:00
David Kiania
417627675e fix: [FIX-M18] pull driverName/vehicleNumber/sim from detail endpoint
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
jimi.user.device.list returns null for vehicleName, vehicleNumber,
driverName, driverPhone, and sim even after those fields are set via
jimi.open.device.update — the values only surface through
jimi.track.device.detail. sync_devices() now reads from dtl first with
d as fallback, which unblocks backfill of the 144 CSV-driven updates
pushed on 2026-04-22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 18:21:25 +03:00
David Kiania
8867be9d3d perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch
Some checks are pending
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Audit fixes across the ingestion stack:

Observability
- Move log_ingestion out of batch loops in poll_alarms and poll_parking
  (was emitting N cumulative log rows per run instead of one).
- Add missing log_ingestion + t0 to poll_trips.
- Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT
  DO NOTHING no longer inflates the metric.

Resilience
- SAVEPOINT-per-item added to poll_alarms, poll_live_positions,
  poll_trips, poll_parking so one bad row no longer aborts the batch
  (webhook handlers already had this; pollers were inconsistent).

Performance
- /pushgps and poll_track_list now use psycopg2.extras.execute_values
  with ON CONFLICT DO NOTHING — 10-50x write throughput on larger
  batches.
- sync_devices and sync_driver_audit fetch jimi.track.device.detail
  concurrently via ThreadPoolExecutor(max_workers=8), cutting the
  daily registry sync from ~24s to ~3s for an 80-device fleet.
- poll_track_list split into two phases: parallel API fetch (4 workers,
  no DB connection held) then one batched write. Previously the DB
  connection was held across every per-IMEI HTTP call, risking pool
  starvation.

Security
- _validate_token uses hmac.compare_digest for constant-time token
  comparison (closes timing side-channel).
- _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default
  5000) so a pathological push cannot blow memory.

Tests
- Fix test_null_alarm_type_skipped: its INSERT-count assertion was
  catching the ingestion_log insert written by log_ingestion. Filter
  that out so the test checks only data-table inserts.
- Full suite: 66 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 00:33:55 +03:00
David Kiania
6a0ceb78dd Fix trip distance unit (metres→km) and full device sync on upsert
[FIX-M16] jimi.device.track.mileage returns distance in metres despite
docs claiming km. Confirmed: avgSpeed × runTimeSecond / 3600 = distance/1000.
poll_trips() now divides raw value by 1000 before storing as distance_km.
3 existing bad rows corrected in prod DB (distance_km / 1000).

[FIX-M17] sync_devices() ON CONFLICT clause was only updating 5 of 26
fields, silently dropping driver_phone, sim, iccid, vehicle_name, status
etc. on subsequent syncs. Expanded to update all device fields so driver
assignments made in Tracksolid Pro UI propagate to DB on next daily sync.

Add sync_driver_audit.py: one-shot script to compare API vs DB device
registry, report driver/IMEI gaps, and force a full field upsert.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 00:06:57 +03:00
David Kiania
3797a4e2ca Implement POLL-01 high-res GPS trails and POLL-03 on-demand location refresh
POLL-01 (FIX-M14): Add poll_track_list() calling jimi.device.track.list
- Runs every 30 min with 35-min lookback window (5-min overlap prevents gaps)
- Inserts all device waypoints into position_history with source='track_list'
- Increases position density from ~1/min to 2-6 fixes/min per active vehicle
- Single shared DB connection for all devices per cycle (efficient)

POLL-03 (FIX-M15): Add get_device_locations() utility function
- Calls jimi.device.location.get for up to 50 specific IMEIs on demand
- Used for alarm enrichment, stale device recovery, dashboard precision refresh

Manual updates:
- position_history section rewritten to document dual ingestion sources
- Three new queries: data density check, harsh driving detection, route trace
- Known Data Issues: issues 10 and 11 added and marked Fixed
- API coverage table updated to reflect all three endpoints now in use

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 22:46:00 +03:00
David Kiania
c05b47abe2 Fix alarm field mapping, distance unit bug, parking params; add schema migrations
BUG-01 [FIX-E06]: jimi.device.alarm.list poll response uses alertTypeId/
alarmTypeName/alertTime, not the webhook field names. All 1,054 stored alarm
records had null alarm_type/alarm_name as a result. Corrected field mapping
in ingest_events_rev.py; also added alarm_name and source columns to INSERT.

BUG-02 [FIX-M11/M12]: trips.distance_m was storing millimetres due to an
erroneous * 1000 on an already-km API value. Removed the multiplication in
poll_trips() and push_trip_report(). Column renamed to distance_km in
migration 04 (historical rows divided by 1,000,000 to correct to km).
All SQL in both ingestion files updated to reference distance_km.

POLL-02 [FIX-M13]: parking poll returned 0 rows because the required
account and acc_type=0 parameters were missing. Also fixed response field
mapping: durSecond was incorrectly read as 'seconds'.

Migration 04: corrects and renames distance_m → distance_km.
Migration 05: adds normalized OBD columns, alarm/device enrichment columns,
new tables (device_events, fuel_readings, temperature_readings, lbs_readings,
geofences), expands dwh_gold fact table, and adds refresh_daily_metrics() ETL.

tracksolid_DB_manual.md updated to reflect column rename and mark fixed issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 22:18:30 +03:00
David Kiania
e1402f6af1 Fix NoneType crash: API returns null result instead of missing key
dict.get("result", []) returns None when key exists with null value.
Changed to resp.get("result") or [] which handles both cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 17:27:48 +03:00
David Kiania
de70972d6a Add webhook receiver, consolidate shared utilities, expand telemetry coverage
- Add FastAPI webhook receiver (webhook_receiver_rev.py) for Jimi push data:
  OBD diagnostics, DTC fault codes, alarms, GPS, heartbeats, trip reports
- Add schema migration (03_webhook_schema_migration.sql) for webhook tables:
  fault_codes, heartbeats, expanded obd_readings/trips/position_history/alarms
- Consolidate duplicated _safe/_shutdown into shared safe_task/setup_shutdown
  in ts_shared_rev.py (DRY refactor)
- Add auto-commit to get_conn() context manager (prevents forgotten commits)
- Fix poll_trips to capture runTimeSecond and maxSpeed from API
- Add poll_parking via jimi.open.platform.report.parking
- Remove broken poll_obd (OBD is push-only, no polling endpoint exists)
- Fix alarms schema: add lat/lng/acc_status columns + dedup constraint
- Fix obd_readings schema: add dedup constraint
- Fix trigger DO block: replace nonexistent has_column with information_schema
- Narrow api_post exception handling to RequestException/ValueError
- Add webhook_receiver service to docker-compose.yaml
- Add fastapi/uvicorn/python-multipart to pyproject.toml
- Add clean_ts timestamp validator to ts_shared_rev.py
- Add Tracksolid Pro API documentation (tracksolidApiDocumentation.md)
- Populate .gitignore with Python/OS/secrets patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 16:31:17 +03:00
David Kiania
6205c483ee Deploy v2.0 Production Telemetry Stack 2026-04-07 21:34:40 +03:00