Commit graph

5 commits

Author SHA1 Message Date
david kiania
c8f5907d4f FIX-M20: alarm cross-feed + stale-IMEI recovery for live_positions
Some checks failed
Static Analysis / static (push) Has been cancelled
Tests / test (push) Has been cancelled
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Background
----------
A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two
freshness gaps that share a single root cause: tracksolid.live_positions
was being written by only one path (the 60s polled sweep), and that path
silently omits devices that don't have a "current" fix in Jimi's
location.list response. Effect on the dashboard:

  * 18 vehicles show OFFLINE for days-to-months — last fix is whatever
    the sweep wrote before Jimi dropped them.
  * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback
    because their dedicated tracker has been silent; the camera's lat/lng
    arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but
    we discard it after writing to tracksolid.alarms.

Verified upstream subscription state: only /pushalarm is registered with
Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are
inactive. This change uses only data that already arrives.

What's in this PR
-----------------
ts_shared_rev.py
  * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None)
    — single time-guarded upsert all three writers will share. Guards on
    is_valid_fix() (filters Zero-Island and out-of-range) and
    EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or
    webhook retries can't rewind a fresher marker. COALESCE on optional
    columns so sparse callers don't blank dense ones' values.
  * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices
    whose live_positions.gps_time is NULL or older than the threshold,
    ordered NULLS FIRST so worst-offenders are in batch #1.
  * ensure_device(cur, imei, device_name=None) — relocated from
    webhook_receiver_rev so every live_positions writer can satisfy the
    FK without re-defining the helper. The original underscore-prefixed
    name in webhook_receiver_rev becomes a backwards-compat alias.

webhook_receiver_rev.py
  * /pushalarm — after the alarm row insert, call upsert_live_position
    with the alarm's lat/lng and alarmTime. Sits inside the existing
    per-item SAVEPOINT, so a cross-feed failure rolls back only that
    one alarm's cross-feed, not the alarm row.

ingest_movement_rev.py
  * poll_live_positions — inline INSERT replaced with upsert_live_position
    (extras dict carries the sweep-only columns). Same data, time-guarded.
  * get_device_locations — inline INSERT replaced; also gains an
    ensure_device call so it can be safely fed arbitrary IMEIs.
  * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and
    hands it to get_device_locations. Scheduled every 10 minutes plus a
    startup catch-up call. Uses jimi.device.location.get which returns
    *last-known* fix, so devices the 60s sweep drops can be re-warmed.

Expected post-deploy effect (estimates, see
260521_timescale_location_upgrade_major.md §4)
  * ~1,100-1,600 additional live_positions upserts/day from the alarm
    cross-feed, after the time-guard rejects ~70-80% of races vs the
    fresher 60s sweep.
  * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence
    (JC400P emits ~107 alarms/day per device).
  * 8-14 of the 24 OFFLINE plates expected to recover via location.get's
    last-known-fix path within the first 30 minutes.
  * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour.
  * No 06_live_location code changes required — reads through
    reporting.v_live_positions transparently.

Tests
-----
12 webhook integration tests pass (3 new: cross-feed fires on valid fix;
skips without lat/lng; skips Zero-Island). 8 new unit tests in
test_stale_imeis.py cover the stale selector, the poll wrapper, and the
time-guard contract on upsert_live_position. Full suite: 77 passed.

Deployment
----------
No schema migration. Both webhook_receiver and ingest_movement
containers must be rebuilt — source is image-baked, not bind-mounted.
Rollback is git revert + rebuild.

Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 21:05:26 +03:00
David Kiania
fa110f4313 feat: [FIX-M19] multi-account ingest across fireside sub-accounts
Some checks failed
Static Analysis / static (push) Waiting to run
Tests / test (push) Waiting to run
Static Analysis / static (pull_request) Has been cancelled
Tests / test (pull_request) Has been cancelled
Fleet lives across three Tracksolid sub-accounts:
  fireside         —  63 devices
  Fireside@HQ      —  52 devices
  Fireside_MSA     —  41 devices

Previously sync_devices / poll_live_positions / poll_parking only
queried a single TARGET_ACCOUNT, so ~64% of the fleet was invisible to
the pipeline.

Changes:
  - ts_shared_rev.py: new TARGETS list (env TRACKSOLID_TARGETS,
    comma-separated; falls back to the single TARGET_ACCOUNT).
  - ts_shared_rev.py: new get_active_imeis_by_target() helper that
    groups active IMEIs by their stored account so parking calls can
    pass the right account param per batch.
  - ingest_movement_rev.py: sync_devices and poll_live_positions loop
    over every target and dedupe by IMEI before upserting. poll_parking
    loops over imeis_by_target so each batch carries the matching
    account.
  - CLAUDE.md: FIX-M19 entry.

Requires new env var TRACKSOLID_TARGETS="fireside,Fireside@HQ,Fireside_MSA"
on the ingest services in Coolify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 10:43:07 +03:00
David Kiania
b1e4d6e85f Fix 5 webhook bugs: SAVEPOINTs, NULL guards, BCD timestamps, /pushevent, log NULL fix
BUG-01: OBD event_time — try unix_to_ts before clean_ts (Jimi sends epoch ints)
BUG-02: push_alarm — guard alarm_type not null (NULL breaks ON CONFLICT dedup)
BUG-03: push_trip_report — _parse_trip_ts handles Jimi BCD format YYMMDDHHmmss
BUG-04: SAVEPOINT per item in all 5 DB endpoints (FK violation on one item no
        longer aborts the whole batch; SAVEPOINT now inside try for safety)
BUG-05: Add /pushevent endpoint (log-only; was returning 404 to Jimi)
FIX:    push_fault_info — skip null fault_code (NULL != NULL in PG unique index)
FIX:    log_ingestion — pass SQL NULL not string "None" when no error occurred

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 18:19:13 +03:00
David Kiania
de70972d6a Add webhook receiver, consolidate shared utilities, expand telemetry coverage
- Add FastAPI webhook receiver (webhook_receiver_rev.py) for Jimi push data:
  OBD diagnostics, DTC fault codes, alarms, GPS, heartbeats, trip reports
- Add schema migration (03_webhook_schema_migration.sql) for webhook tables:
  fault_codes, heartbeats, expanded obd_readings/trips/position_history/alarms
- Consolidate duplicated _safe/_shutdown into shared safe_task/setup_shutdown
  in ts_shared_rev.py (DRY refactor)
- Add auto-commit to get_conn() context manager (prevents forgotten commits)
- Fix poll_trips to capture runTimeSecond and maxSpeed from API
- Add poll_parking via jimi.open.platform.report.parking
- Remove broken poll_obd (OBD is push-only, no polling endpoint exists)
- Fix alarms schema: add lat/lng/acc_status columns + dedup constraint
- Fix obd_readings schema: add dedup constraint
- Fix trigger DO block: replace nonexistent has_column with information_schema
- Narrow api_post exception handling to RequestException/ValueError
- Add webhook_receiver service to docker-compose.yaml
- Add fastapi/uvicorn/python-multipart to pyproject.toml
- Add clean_ts timestamp validator to ts_shared_rev.py
- Add Tracksolid Pro API documentation (tracksolidApiDocumentation.md)
- Populate .gitignore with Python/OS/secrets patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 16:31:17 +03:00
David Kiania
6205c483ee Deploy v2.0 Production Telemetry Stack 2026-04-07 21:34:40 +03:00