tracksolid_timescale_grafan.../webhook_receiver_rev.py

742 lines
32 KiB
Python
Raw Normal View History

"""
webhook_receiver_rev.py Fireside Communications · Tracksolid Webhook Receiver
RESPONSIBILITY: Receives real-time push data from Jimi Tracksolid Pro servers.
Jimi's Data Push API POSTs telemetry to these endpoints as it arrives from
devices, providing real-time ingestion without polling. This is the ONLY way
to receive OBD diagnostics and DTC fault codes those data types have no
polling endpoint.
ENDPOINTS:
/pushobd OBD CAN bus diagnostics (Priority 1)
/pushfaultinfo DTC fault codes (Priority 1)
/pushalarm Alarm events (Priority 2)
/pushgps GPS positions (Priority 2)
/pushhb Device heartbeats (Priority 2)
/pushtripreport Trip reports (Priority 2)
/pushevent Device LOGIN/LOGOUT events (Priority 3)
/health Healthcheck for Docker/monitoring
REVISIONS (QA-Verified):
[BUG-01] OBD event_time: try unix_to_ts before clean_ts (handles epoch timestamps).
[BUG-02] push_alarm: guard also checks alarm_type is not null (prevents FK violation).
[BUG-03] push_trip_report: _parse_trip_ts handles Jimi BCD format YYMMDDHHmmss.
[BUG-04] SAVEPOINT per item in all DB-writing endpoints (one bad item won't abort batch).
[BUG-05] Added /pushevent endpoint writes to tracksolid.device_events.
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
[FIX-W01] 260702 SEC-02: startup CRITICAL warning when JIMI_WEBHOOK_TOKEN is
empty; WEBHOOK_REQUIRE_TOKEN=1 refuses to start unauthenticated.
[FIX-W02] 260702 BUG-P3: _parse_request handles application/json bodies
({"token", "data_list"}) in addition to the observed form-encoded
msgType/data format a vendor-side format switch no longer
silently discards pushes.
[FIX-W03] 260702 BUG-P2: all DB work moved off the asyncio event loop into
sync _process_* functions run via asyncio.to_thread a large push
no longer stalls /health and concurrent requests on the worker.
[FIX-W04] 260702 BUG-P8: push_alarm rejects alarm_time outside a sane window
(WEBHOOK_EVENT_MAX_AGE_DAYS / WEBHOOK_EVENT_MAX_FUTURE_DAYS) so
devices with reset clocks (2019 timestamps observed live) stop
polluting tracksolid.alarms.
"""
from __future__ import annotations
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
import asyncio
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
import hmac
import json
import os
import time
from contextlib import asynccontextmanager
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
from datetime import datetime, timedelta, timezone
from typing import Optional
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
# Cap on items per webhook POST. Prevents a malformed/malicious push from
# monopolising a worker or blowing the DB pool. Jimi normally sends ≤ 200.
MAX_ITEMS_PER_POST = int(os.getenv("WEBHOOK_MAX_ITEMS", "5000"))
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import JSONResponse
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
from psycopg2.extras import execute_values
from ts_shared_rev import (
close_pool,
get_conn,
log_ingestion,
clean,
clean_num,
clean_int,
clean_ts,
is_valid_fix,
get_logger,
FIX-M21: alarm cross-feed + stale-IMEI recovery for live_positions Cherry-pick of c8f5907 (originally FIX-M20 on main) onto quality-program-2026-04-12 — renamed to FIX-M21 here to avoid clashing with this branch's existing [FIX-M20] (trip enrichment, commit 144dede). Behaviour and code are unchanged from the main-branch original; the annotation tag is the only difference. Background ---------- A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two freshness gaps that share a single root cause: tracksolid.live_positions was being written by only one path (the 60s polled sweep), and that path silently omits devices that don't have a "current" fix in Jimi's location.list response. Effect on the dashboard: * 18 vehicles show OFFLINE for days-to-months — last fix is whatever the sweep wrote before Jimi dropped them. * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback because their dedicated tracker has been silent; the camera's lat/lng arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but we discard it after writing to tracksolid.alarms. Verified upstream subscription state: only /pushalarm is registered with Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are inactive. This change uses only data that already arrives. What's in this commit --------------------- ts_shared_rev.py * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None) — single time-guarded upsert all three writers will share. Guards on is_valid_fix() (filters Zero-Island and out-of-range) and EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or webhook retries can't rewind a fresher marker. COALESCE on optional columns so sparse callers don't blank dense ones' values. * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices whose live_positions.gps_time is NULL or older than the threshold, ordered NULLS FIRST so worst-offenders are in batch #1. * ensure_device(cur, imei, device_name=None) — relocated from webhook_receiver_rev so every live_positions writer can satisfy the FK without re-defining the helper. The original underscore-prefixed name in webhook_receiver_rev becomes a backwards-compat alias. webhook_receiver_rev.py * /pushalarm — after the alarm row insert, call upsert_live_position with the alarm's lat/lng and alarmTime. Sits inside the existing per-item SAVEPOINT, so a cross-feed failure rolls back only that one alarm's cross-feed, not the alarm row. ingest_movement_rev.py * poll_live_positions — inline INSERT replaced with upsert_live_position (extras dict carries the sweep-only columns). Same data, time-guarded. * get_device_locations — inline INSERT replaced; also gains an ensure_device call so it can be safely fed arbitrary IMEIs. * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and hands it to get_device_locations. Scheduled every 10 minutes plus a startup catch-up call. Uses jimi.device.location.get which returns *last-known* fix, so devices the 60s sweep drops can be re-warmed. Expected post-deploy effect (estimates, see 06_live_location/260521_timescale_location_upgrade_major.md §4) * ~1,100-1,600 additional live_positions upserts/day from the alarm cross-feed, after the time-guard rejects ~70-80% of races vs the fresher 60s sweep. * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence (JC400P emits ~107 alarms/day per device). * 8-14 of the 24 OFFLINE plates expected to recover via location.get's last-known-fix path within the first 30 minutes. * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour. * No 06_live_location code changes required — reads through reporting.v_live_positions transparently. Tests ----- 12 webhook integration tests pass (3 new: cross-feed fires on valid fix; skips without lat/lng; skips Zero-Island). 8 new unit tests in test_stale_imeis.py cover the stale selector, the poll wrapper, and the time-guard contract on upsert_live_position. Full suite: 77 passed. Deployment ---------- No schema migration. Both webhook_receiver and ingest_movement containers must be rebuilt — source is image-baked, not bind-mounted. Rollback is git revert + rebuild. Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md Verification playbook: 06_live_location/260521_timescale_location_upgrade_verification.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 18:05:26 +00:00
ensure_device,
upsert_live_position,
)
log = get_logger("webhook")
# ── Configuration ─────────────────────────────────────────────────────────────
WEBHOOK_TOKEN = os.getenv("JIMI_WEBHOOK_TOKEN", "")
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
# [FIX-W04] Sanity window for pushed event timestamps. Devices with reset
# clocks push alarm_time values years in the past (2019 observed live).
_EVENT_MAX_AGE = timedelta(days=int(os.getenv("WEBHOOK_EVENT_MAX_AGE_DAYS", "30")))
_EVENT_MAX_FUTURE = timedelta(days=int(os.getenv("WEBHOOK_EVENT_MAX_FUTURE_DAYS", "2")))
# ── Lifespan ──────────────────────────────────────────────────────────────────
@asynccontextmanager
async def lifespan(app: FastAPI):
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
# [FIX-W01] Fail loudly (or closed) when running without push authentication.
if not WEBHOOK_TOKEN:
if os.getenv("WEBHOOK_REQUIRE_TOKEN", "0") == "1":
raise RuntimeError(
"WEBHOOK_REQUIRE_TOKEN=1 but JIMI_WEBHOOK_TOKEN is empty — "
"refusing to start an unauthenticated public webhook."
)
log.critical(
"JIMI_WEBHOOK_TOKEN is EMPTY — every /push* endpoint accepts "
"UNAUTHENTICATED writes. Configure a push token in the Tracksolid "
"console and set JIMI_WEBHOOK_TOKEN (+ WEBHOOK_REQUIRE_TOKEN=1)."
)
log.info("Webhook receiver starting (v1.2)...")
yield
log.info("Webhook receiver shutting down...")
close_pool()
app = FastAPI(title="Tracksolid Webhook Receiver", lifespan=lifespan)
# ── Helpers ───────────────────────────────────────────────────────────────────
SUCCESS = {"code": 0, "msg": "success"}
def _validate_token(token: str) -> None:
"""Raise 403 if token is invalid. Skips validation if JIMI_WEBHOOK_TOKEN is empty."""
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
if WEBHOOK_TOKEN and not hmac.compare_digest(token, WEBHOOK_TOKEN):
raise HTTPException(status_code=403, detail="Invalid token")
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
def _cap_items(parsed) -> list[dict]:
items = parsed if isinstance(parsed, list) else [parsed]
if len(items) > MAX_ITEMS_PER_POST:
log.warning("push: truncated %d%d items", len(items), MAX_ITEMS_PER_POST)
items = items[:MAX_ITEMS_PER_POST]
return items
async def _parse_request(request: Request) -> tuple[str, list[dict]]:
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
"""Extract token + items from either a JSON body or a form-encoded body.
Two formats exist in the wild:
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
1. Observed live (integration push):
Content-Type: application/x-www-form-urlencoded
Body: msgType=<topic>&data=<URL-encoded JSON object or array>
(`data` holds one JSON object per event, not an array.)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
2. Documented Data Push API [FIX-W02]:
Content-Type: application/json
Body: {"token": "...", "data_list": [{...}, ...]}
Both are handled here so no endpoint needs to know which format arrived
and a vendor-side format switch can't silently discard data.
"""
body = await request.body()
if not body:
return "", []
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
ctype = request.headers.get("content-type", "").lower()
# [FIX-W02] JSON push format.
if "application/json" in ctype:
try:
payload = json.loads(body)
except (json.JSONDecodeError, UnicodeDecodeError):
log.warning("push: JSON body parse failed — %.200s", body)
return "", []
if not isinstance(payload, dict):
return "", _cap_items(payload) if isinstance(payload, list) else []
token = str(payload.get("token", "") or "")
raw = payload.get("data_list") or payload.get("data") or []
if isinstance(raw, str):
try:
raw = json.loads(raw)
except (json.JSONDecodeError, TypeError):
log.warning("push: data JSON parse failed — %.200s", raw)
return token, []
return token, _cap_items(raw)
# Form-encoded push format (observed live).
try:
form = await request.form()
except Exception:
log.warning("push: form parse failed", exc_info=True)
return "", []
token = str(form.get("token", ""))
raw_data = form.get("data") or form.get("data_list") or ""
if not raw_data:
return token, []
try:
parsed = json.loads(raw_data)
except (json.JSONDecodeError, TypeError):
log.warning("push: data JSON parse failed — %.200s", raw_data)
return token, []
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return token, _cap_items(parsed)
def unix_to_ts(v) -> Optional[str]:
"""Convert Unix timestamp (seconds or milliseconds) to ISO string."""
if v is None:
return None
try:
ts = int(v)
if ts > 1e12:
ts = ts // 1000
return datetime.fromtimestamp(ts, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
except (ValueError, TypeError, OSError):
return None
def _parse_trip_ts(v) -> Optional[str]:
"""[BUG-03] Parse trip timestamps. Handles ISO strings and Jimi BCD formats."""
iso = clean_ts(v)
if iso:
return iso
s = clean(v)
if s is None:
return None
try:
if len(s) == 12: # YYMMDDHHmmss
return datetime.strptime(s, "%y%m%d%H%M%S").strftime("%Y-%m-%d %H:%M:%S")
if len(s) == 14: # YYYYMMDDHHmmss
return datetime.strptime(s, "%Y%m%d%H%M%S").strftime("%Y-%m-%d %H:%M:%S")
except (ValueError, TypeError):
pass
log.warning("Cannot parse trip timestamp: %r", v)
return None
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
def _is_sane_event_ts(ts: Optional[str]) -> bool:
"""[FIX-W04] True when a pushed event timestamp falls inside the sane
window (not older than WEBHOOK_EVENT_MAX_AGE_DAYS, not further ahead than
WEBHOOK_EVENT_MAX_FUTURE_DAYS). Unparseable values count as insane."""
if not ts:
return False
try:
dt = datetime.fromisoformat(str(ts).replace("Z", "+00:00"))
except (ValueError, TypeError):
return False
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
now = datetime.now(timezone.utc)
return (now - _EVENT_MAX_AGE) <= dt <= (now + _EVENT_MAX_FUTURE)
def _make_geom_params(lat, lng):
"""Return (lng, lat, lng, lat) tuple for the CASE WHEN ST_MakePoint pattern."""
return (lng, lat, lng, lat)
FIX-M21: alarm cross-feed + stale-IMEI recovery for live_positions Cherry-pick of c8f5907 (originally FIX-M20 on main) onto quality-program-2026-04-12 — renamed to FIX-M21 here to avoid clashing with this branch's existing [FIX-M20] (trip enrichment, commit 144dede). Behaviour and code are unchanged from the main-branch original; the annotation tag is the only difference. Background ---------- A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two freshness gaps that share a single root cause: tracksolid.live_positions was being written by only one path (the 60s polled sweep), and that path silently omits devices that don't have a "current" fix in Jimi's location.list response. Effect on the dashboard: * 18 vehicles show OFFLINE for days-to-months — last fix is whatever the sweep wrote before Jimi dropped them. * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback because their dedicated tracker has been silent; the camera's lat/lng arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but we discard it after writing to tracksolid.alarms. Verified upstream subscription state: only /pushalarm is registered with Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are inactive. This change uses only data that already arrives. What's in this commit --------------------- ts_shared_rev.py * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None) — single time-guarded upsert all three writers will share. Guards on is_valid_fix() (filters Zero-Island and out-of-range) and EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or webhook retries can't rewind a fresher marker. COALESCE on optional columns so sparse callers don't blank dense ones' values. * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices whose live_positions.gps_time is NULL or older than the threshold, ordered NULLS FIRST so worst-offenders are in batch #1. * ensure_device(cur, imei, device_name=None) — relocated from webhook_receiver_rev so every live_positions writer can satisfy the FK without re-defining the helper. The original underscore-prefixed name in webhook_receiver_rev becomes a backwards-compat alias. webhook_receiver_rev.py * /pushalarm — after the alarm row insert, call upsert_live_position with the alarm's lat/lng and alarmTime. Sits inside the existing per-item SAVEPOINT, so a cross-feed failure rolls back only that one alarm's cross-feed, not the alarm row. ingest_movement_rev.py * poll_live_positions — inline INSERT replaced with upsert_live_position (extras dict carries the sweep-only columns). Same data, time-guarded. * get_device_locations — inline INSERT replaced; also gains an ensure_device call so it can be safely fed arbitrary IMEIs. * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and hands it to get_device_locations. Scheduled every 10 minutes plus a startup catch-up call. Uses jimi.device.location.get which returns *last-known* fix, so devices the 60s sweep drops can be re-warmed. Expected post-deploy effect (estimates, see 06_live_location/260521_timescale_location_upgrade_major.md §4) * ~1,100-1,600 additional live_positions upserts/day from the alarm cross-feed, after the time-guard rejects ~70-80% of races vs the fresher 60s sweep. * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence (JC400P emits ~107 alarms/day per device). * 8-14 of the 24 OFFLINE plates expected to recover via location.get's last-known-fix path within the first 30 minutes. * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour. * No 06_live_location code changes required — reads through reporting.v_live_positions transparently. Tests ----- 12 webhook integration tests pass (3 new: cross-feed fires on valid fix; skips without lat/lng; skips Zero-Island). 8 new unit tests in test_stale_imeis.py cover the stale selector, the poll wrapper, and the time-guard contract on upsert_live_position. Full suite: 77 passed. Deployment ---------- No schema migration. Both webhook_receiver and ingest_movement containers must be rebuilt — source is image-baked, not bind-mounted. Rollback is git revert + rebuild. Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md Verification playbook: 06_live_location/260521_timescale_location_upgrade_verification.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 18:05:26 +00:00
# Backwards-compat shim. The implementation was relocated to ts_shared_rev
# (as `ensure_device`) so ingest_movement_rev and any future writer can share
# the FK-guard without re-defining it. Existing call sites in this file
# continue to use the underscore-prefixed name.
_ensure_device = ensure_device
# ── Health Check ──────────────────────────────────────────────────────────────
@app.get("/health")
def health():
return {"status": "ok"}
# ── 1. OBD Diagnostics (Priority 1) ──────────────────────────────────────────
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
def _process_obd(items: list[dict]) -> int:
"""Blocking DB work for /pushobd — runs in a worker thread [FIX-W03]."""
t0 = time.time()
inserted = 0
with get_conn() as conn:
with conn.cursor() as cur:
for item in items:
try:
cur.execute("SAVEPOINT sp")
imei = clean(item.get("deviceImei"))
obd = item.get("obdJson", {})
if isinstance(obd, str):
try:
obd = json.loads(obd)
except json.JSONDecodeError:
obd = {}
# [BUG-01] Try unix epoch first, fall back to ISO string.
event_time = (
unix_to_ts(obd.get("event_time"))
or clean_ts(obd.get("event_time"))
)
if not imei or not event_time:
cur.execute("RELEASE SAVEPOINT sp")
continue
lat = clean_num(obd.get("lat"))
lng = clean_num(obd.get("lng"))
cur.execute("""
INSERT INTO tracksolid.obd_readings (
imei, reading_time, car_type, acc_state, status_flags,
lat, lng, geom, obd_data, updated_at
) VALUES (
%s, %s, %s, %s, %s, %s, %s,
CASE WHEN %s IS NOT NULL AND %s IS NOT NULL
THEN ST_SetSRID(ST_MakePoint(%s, %s), 4326)
ELSE NULL END,
%s, NOW()
) ON CONFLICT (imei, reading_time) DO UPDATE SET
obd_data = EXCLUDED.obd_data,
updated_at = NOW()
""", (
imei, event_time,
clean_int(obd.get("car_type")),
clean_int(obd.get("AccState")),
clean_int(obd.get("statusFlags")),
lat, lng,
*_make_geom_params(lat, lng),
json.dumps(obd),
))
cur.execute("RELEASE SAVEPOINT sp")
inserted += 1
except Exception:
cur.execute("ROLLBACK TO SAVEPOINT sp")
log.warning("Failed to process OBD item for %s", item.get("deviceImei"), exc_info=True)
log_ingestion(cur, "webhook/pushobd", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
@app.post("/pushobd")
async def push_obd(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_obd, items)
log.info("pushobd: %d/%d items processed.", inserted, len(items))
return JSONResponse(content=SUCCESS)
# ── 2. DTC Fault Codes (Priority 1) ──────────────────────────────────────────
def _process_fault_info(items: list[dict]) -> int:
"""Blocking DB work for /pushfaultinfo — runs in a worker thread [FIX-W03]."""
t0 = time.time()
inserted = 0
with get_conn() as conn:
with conn.cursor() as cur:
for item in items:
try:
cur.execute("SAVEPOINT sp")
imei = clean(item.get("deviceImei"))
gate_time = clean_ts(item.get("gateTime"))
if not imei or not gate_time:
cur.execute("RELEASE SAVEPOINT sp")
continue
fault_codes = item.get("faultCodeList", [])
if isinstance(fault_codes, str):
try:
fault_codes = json.loads(fault_codes)
except json.JSONDecodeError:
fault_codes = []
lat = clean_num(item.get("lat"))
lng = clean_num(item.get("lng"))
evt_time = unix_to_ts(item.get("eventTime")) or clean_ts(item.get("eventTime"))
for code in fault_codes:
fault_code = clean(code)
# Guard NULL: ON CONFLICT won't deduplicate NULL fault_codes
# because NULL != NULL in Postgres unique constraints.
if not fault_code:
continue
cur.execute("""
INSERT INTO tracksolid.fault_codes (
imei, reported_at, fault_code, status_flags,
lat, lng, geom, event_time
) VALUES (
%s, %s, %s, %s, %s, %s,
CASE WHEN %s IS NOT NULL AND %s IS NOT NULL
THEN ST_SetSRID(ST_MakePoint(%s, %s), 4326)
ELSE NULL END,
%s
) ON CONFLICT (imei, reported_at, fault_code) DO NOTHING
""", (
imei, gate_time, fault_code,
clean_int(item.get("statusFlags")),
lat, lng,
*_make_geom_params(lat, lng),
evt_time,
))
inserted += 1
cur.execute("RELEASE SAVEPOINT sp")
except Exception:
cur.execute("ROLLBACK TO SAVEPOINT sp")
log.warning("Failed to process fault item for %s", item.get("deviceImei"), exc_info=True)
log_ingestion(cur, "webhook/pushfaultinfo", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
@app.post("/pushfaultinfo")
async def push_fault_info(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_fault_info, items)
log.info("pushfaultinfo: %d fault codes from %d items.", inserted, len(items))
return JSONResponse(content=SUCCESS)
# ── 3. Alarm Events (Priority 2) ─────────────────────────────────────────────
def _process_alarms(items: list[dict]) -> int:
"""Blocking DB work for /pushalarm — runs in a worker thread [FIX-W03]."""
t0 = time.time()
inserted = 0
with get_conn() as conn:
with conn.cursor() as cur:
for item in items:
try:
cur.execute("SAVEPOINT sp")
# Jimi integration push uses `imei` + `alarmTime`, NOT the
# `deviceImei` + `gateTime` fields shown in the API docs.
imei = clean(item.get("imei") or item.get("deviceImei"))
alarm_type = clean(item.get("alarmType"))
alarm_time = clean_ts(item.get("alarmTime") or item.get("gateTime"))
# [BUG-02] Also guard alarm_type — NULL alarm_type violates NOT NULL constraint.
if not imei or not alarm_time or not alarm_type:
cur.execute("RELEASE SAVEPOINT sp")
continue
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
# [FIX-W04] Reject device-clock garbage (2019 timestamps observed).
if not _is_sane_event_ts(alarm_time):
log.warning("pushalarm: rejected insane alarm_time %r for %s",
alarm_time, imei)
cur.execute("RELEASE SAVEPOINT sp")
continue
# Ensure parent devices row exists to satisfy FK constraint.
_ensure_device(cur, imei, clean(item.get("deviceName")))
lat = clean_num(item.get("lat"))
lng = clean_num(item.get("lng"))
cur.execute("""
INSERT INTO tracksolid.alarms (
imei, alarm_type, alarm_name, alarm_time, geom,
lat, lng, speed, source, updated_at
) VALUES (
%s, %s, %s, %s,
CASE WHEN %s IS NOT NULL AND %s IS NOT NULL
THEN ST_SetSRID(ST_MakePoint(%s, %s), 4326)
ELSE NULL END,
%s, %s, %s, 'push', NOW()
) ON CONFLICT (imei, alarm_type, alarm_time) DO NOTHING
""", (
imei, alarm_type, clean(item.get("alarmName")), alarm_time,
*_make_geom_params(lat, lng),
lat, lng,
clean_num(item.get("speed")),
))
FIX-M21: alarm cross-feed + stale-IMEI recovery for live_positions Cherry-pick of c8f5907 (originally FIX-M20 on main) onto quality-program-2026-04-12 — renamed to FIX-M21 here to avoid clashing with this branch's existing [FIX-M20] (trip enrichment, commit 144dede). Behaviour and code are unchanged from the main-branch original; the annotation tag is the only difference. Background ---------- A field audit of liveposition.rahamafresh.com on 2026-05-21 surfaced two freshness gaps that share a single root cause: tracksolid.live_positions was being written by only one path (the 60s polled sweep), and that path silently omits devices that don't have a "current" fix in Jimi's location.list response. Effect on the dashboard: * 18 vehicles show OFFLINE for days-to-months — last fix is whatever the sweep wrote before Jimi dropped them. * 3 vehicles (KDK 780K, KCQ 618K, KCZ 476E) depend on dashcam fallback because their dedicated tracker has been silent; the camera's lat/lng arrives via /pushalarm webhooks (5,287/day, 100% lat/lng fill) but we discard it after writing to tracksolid.alarms. Verified upstream subscription state: only /pushalarm is registered with Jimi; the n8n forwarders for /pushgps, /pushtripreport, /pushobd are inactive. This change uses only data that already arrives. What's in this commit --------------------- ts_shared_rev.py * upsert_live_position(cur, imei, lat, lng, gps_time, ..., extras=None) — single time-guarded upsert all three writers will share. Guards on is_valid_fix() (filters Zero-Island and out-of-range) and EXCLUDED.gps_time > stored.gps_time so late-arriving alarms or webhook retries can't rewind a fresher marker. COALESCE on optional columns so sparse callers don't blank dense ones' values. * get_stale_imeis(stale_minutes=30) — SELECT enabled_flag=1 devices whose live_positions.gps_time is NULL or older than the threshold, ordered NULLS FIRST so worst-offenders are in batch #1. * ensure_device(cur, imei, device_name=None) — relocated from webhook_receiver_rev so every live_positions writer can satisfy the FK without re-defining the helper. The original underscore-prefixed name in webhook_receiver_rev becomes a backwards-compat alias. webhook_receiver_rev.py * /pushalarm — after the alarm row insert, call upsert_live_position with the alarm's lat/lng and alarmTime. Sits inside the existing per-item SAVEPOINT, so a cross-feed failure rolls back only that one alarm's cross-feed, not the alarm row. ingest_movement_rev.py * poll_live_positions — inline INSERT replaced with upsert_live_position (extras dict carries the sweep-only columns). Same data, time-guarded. * get_device_locations — inline INSERT replaced; also gains an ensure_device call so it can be safely fed arbitrary IMEIs. * poll_stale_locations() — new wrapper. Pulls get_stale_imeis() and hands it to get_device_locations. Scheduled every 10 minutes plus a startup catch-up call. Uses jimi.device.location.get which returns *last-known* fix, so devices the 60s sweep drops can be re-warmed. Expected post-deploy effect (estimates, see 06_live_location/260521_timescale_location_upgrade_major.md §4) * ~1,100-1,600 additional live_positions upserts/day from the alarm cross-feed, after the time-guard rejects ~70-80% of races vs the fresher 60s sweep. * The 3 camera-fallback plates flip to "seconds-after-alarm" cadence (JC400P emits ~107 alarms/day per device). * 8-14 of the 24 OFFLINE plates expected to recover via location.get's last-known-fix path within the first 30 minutes. * Dashboard's "Offline 24h+" KPI: 24 → 10-14 within the first hour. * No 06_live_location code changes required — reads through reporting.v_live_positions transparently. Tests ----- 12 webhook integration tests pass (3 new: cross-feed fires on valid fix; skips without lat/lng; skips Zero-Island). 8 new unit tests in test_stale_imeis.py cover the stale selector, the poll wrapper, and the time-guard contract on upsert_live_position. Full suite: 77 passed. Deployment ---------- No schema migration. Both webhook_receiver and ingest_movement containers must be rebuilt — source is image-baked, not bind-mounted. Rollback is git revert + rebuild. Plan & monitoring SQL: 06_live_location/260521_timescale_location_upgrade_major.md Verification playbook: 06_live_location/260521_timescale_location_upgrade_verification.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 18:05:26 +00:00
# [FIX-M21] Cross-feed: every Jimi alarm carries lat/lng.
# Refresh live_positions so dashboard markers don't have to
# wait up to 60s for the next polled sweep. Time-guarded
# inside the helper — alarms older than the current fix lose.
upsert_live_position(
cur, imei, lat, lng, alarm_time,
speed=clean_num(item.get("speed")),
)
cur.execute("RELEASE SAVEPOINT sp")
inserted += 1
except Exception:
cur.execute("ROLLBACK TO SAVEPOINT sp")
log.warning("Failed to process alarm for %s",
item.get("imei") or item.get("deviceImei"), exc_info=True)
log_ingestion(cur, "webhook/pushalarm", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
@app.post("/pushalarm")
async def push_alarm(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_alarms, items)
log.info("pushalarm: %d/%d items processed.", inserted, len(items))
return JSONResponse(content=SUCCESS)
# ── 4. GPS Positions (Priority 2) ────────────────────────────────────────────
def _process_gps(items: list[dict]) -> int:
"""Blocking validate+write for /pushgps — runs in a worker thread [FIX-W03]."""
t0 = time.time()
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
# Validation phase — pre-clean and filter without touching the DB.
# Per-row INSERT with SAVEPOINT was ~1 ms/row overhead at this volume;
# one batched execute_values is 10-50× faster for the same rows.
rows = []
for item in items:
imei = clean(item.get("deviceImei"))
gps_time = clean_ts(item.get("gpsTime"))
lat = clean_num(item.get("lat"))
lng = clean_num(item.get("lng"))
if not imei or not gps_time or not is_valid_fix(lat, lng):
continue
rows.append((
imei, gps_time, lng, lat, lat, lng,
clean_num(item.get("gpsSpeed")),
clean_num(item.get("direction")),
str(item.get("acc")) if item.get("acc") is not None else None,
clean_int(item.get("satelliteNum")),
clean_num(item.get("distance")),
clean_num(item.get("altitude")),
clean_int(item.get("postType")),
))
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
inserted = 0
if rows:
with get_conn() as conn:
with conn.cursor() as cur:
execute_values(
cur,
"""
INSERT INTO tracksolid.position_history (
imei, gps_time, geom, lat, lng, speed, direction,
acc_status, satellite, current_mileage,
altitude, post_type, source
) VALUES %s
ON CONFLICT (imei, gps_time) DO NOTHING
""",
rows,
template="(%s, %s, ST_SetSRID(ST_MakePoint(%s, %s), 4326),"
" %s, %s, %s, %s, %s, %s, %s, %s, %s, 'push')",
page_size=len(rows),
)
inserted = cur.rowcount
log_ingestion(cur, "webhook/pushgps", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
else:
# No valid rows, still record the call for observability.
with get_conn() as conn:
with conn.cursor() as cur:
log_ingestion(cur, "webhook/pushgps", len(items), 0, 0,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
perf+fix: SAVEPOINT-per-item pollers, batched GPS inserts, parallel detail fetch Audit fixes across the ingestion stack: Observability - Move log_ingestion out of batch loops in poll_alarms and poll_parking (was emitting N cumulative log rows per run instead of one). - Add missing log_ingestion + t0 to poll_trips. - Count inserted via cur.rowcount instead of naive +=1 so ON CONFLICT DO NOTHING no longer inflates the metric. Resilience - SAVEPOINT-per-item added to poll_alarms, poll_live_positions, poll_trips, poll_parking so one bad row no longer aborts the batch (webhook handlers already had this; pollers were inconsistent). Performance - /pushgps and poll_track_list now use psycopg2.extras.execute_values with ON CONFLICT DO NOTHING — 10-50x write throughput on larger batches. - sync_devices and sync_driver_audit fetch jimi.track.device.detail concurrently via ThreadPoolExecutor(max_workers=8), cutting the daily registry sync from ~24s to ~3s for an 80-device fleet. - poll_track_list split into two phases: parallel API fetch (4 workers, no DB connection held) then one batched write. Previously the DB connection was held across every per-IMEI HTTP call, risking pool starvation. Security - _validate_token uses hmac.compare_digest for constant-time token comparison (closes timing side-channel). - _parse_data_list caps incoming items at WEBHOOK_MAX_ITEMS (default 5000) so a pathological push cannot blow memory. Tests - Fix test_null_alarm_type_skipped: its INSERT-count assertion was catching the ingestion_log insert written by log_ingestion. Filter that out so the test checks only data-table inserts. - Full suite: 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 21:33:55 +00:00
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
@app.post("/pushgps")
async def push_gps(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_gps, items)
log.info("pushgps: %d/%d items inserted.", inserted, len(items))
return JSONResponse(content=SUCCESS)
# ── 5. Device Heartbeats (Priority 2) ────────────────────────────────────────
def _process_heartbeats(items: list[dict]) -> int:
"""Blocking DB work for /pushhb — runs in a worker thread [FIX-W03]."""
t0 = time.time()
inserted = 0
with get_conn() as conn:
with conn.cursor() as cur:
for item in items:
try:
cur.execute("SAVEPOINT sp")
imei = clean(item.get("deviceImei"))
gate_time = clean_ts(item.get("gateTime"))
if not imei or not gate_time:
cur.execute("RELEASE SAVEPOINT sp")
continue
cur.execute("""
INSERT INTO tracksolid.heartbeats (
imei, gate_time, power_level, gsm_signal,
acc_status, power_status, fortify
) VALUES (%s, %s, %s, %s, %s, %s, %s)
ON CONFLICT (imei, gate_time) DO NOTHING
""", (
imei, gate_time,
clean_int(item.get("powerLevel")),
clean_int(item.get("gsmSign")),
clean_int(item.get("acc")),
clean_int(item.get("powerStatus")),
clean_int(item.get("fortify")),
))
cur.execute("RELEASE SAVEPOINT sp")
inserted += 1
except Exception:
cur.execute("ROLLBACK TO SAVEPOINT sp")
log.warning("Failed to process heartbeat for %s", item.get("deviceImei"), exc_info=True)
log_ingestion(cur, "webhook/pushhb", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
@app.post("/pushhb")
async def push_heartbeat(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_heartbeats, items)
log.info("pushhb: %d/%d items processed.", inserted, len(items))
return JSONResponse(content=SUCCESS)
# ── 6. Trip Reports (Priority 2) ─────────────────────────────────────────────
def _process_trip_reports(items: list[dict]) -> int:
"""Blocking DB work for /pushtripreport — runs in a worker thread [FIX-W03]."""
t0 = time.time()
inserted = 0
with get_conn() as conn:
with conn.cursor() as cur:
for item in items:
try:
cur.execute("SAVEPOINT sp")
imei = clean(item.get("deviceImei"))
# [BUG-03] Use _parse_trip_ts to handle Jimi BCD format YYMMDDHHmmss.
begin_time = _parse_trip_ts(item.get("beginTime"))
end_time = _parse_trip_ts(item.get("endTime"))
if not imei or not begin_time:
cur.execute("RELEASE SAVEPOINT sp")
continue
# [FIX-M11] API sends km. Store directly as distance_km.
distance_km = clean_num(item.get("miles"))
begin_lat = clean_num(item.get("beginLat"))
begin_lng = clean_num(item.get("beginLng"))
end_lat = clean_num(item.get("endLat"))
end_lng = clean_num(item.get("endLng"))
cur.execute("""
INSERT INTO tracksolid.trips (
imei, start_time, end_time, distance_km,
start_geom, end_geom,
fuel_consumed_l, idle_time_s, trip_seq, source,
updated_at
) VALUES (
%s, %s, %s, %s,
CASE WHEN %s IS NOT NULL AND %s IS NOT NULL
THEN ST_SetSRID(ST_MakePoint(%s, %s), 4326)
ELSE NULL END,
CASE WHEN %s IS NOT NULL AND %s IS NOT NULL
THEN ST_SetSRID(ST_MakePoint(%s, %s), 4326)
ELSE NULL END,
%s, %s, %s, 'push', NOW()
) ON CONFLICT (imei, start_time) DO UPDATE SET
end_time = EXCLUDED.end_time,
distance_km = EXCLUDED.distance_km,
end_geom = EXCLUDED.end_geom,
fuel_consumed_l = EXCLUDED.fuel_consumed_l,
idle_time_s = EXCLUDED.idle_time_s,
updated_at = NOW()
""", (
imei, begin_time, end_time, distance_km,
begin_lng, begin_lat, begin_lng, begin_lat,
end_lng, end_lat, end_lng, end_lat,
clean_num(item.get("oils")),
clean_int(item.get("idleTimes")),
clean_int(item.get("tripSeq")),
))
cur.execute("RELEASE SAVEPOINT sp")
inserted += 1
except Exception:
cur.execute("ROLLBACK TO SAVEPOINT sp")
log.warning("Failed to process trip for %s", item.get("deviceImei"), exc_info=True)
log_ingestion(cur, "webhook/pushtripreport", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
@app.post("/pushtripreport")
async def push_trip_report(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_trip_reports, items)
log.info("pushtripreport: %d/%d items processed.", inserted, len(items))
return JSONResponse(content=SUCCESS)
# ── 7. Device Events (LOGIN / LOGOUT) ────────────────────────────────────────
def _process_events(items: list[dict]) -> int:
"""Blocking DB work for /pushevent — runs in a worker thread [FIX-W03]."""
t0 = time.time()
inserted = 0
with get_conn() as conn:
with conn.cursor() as cur:
for item in items:
try:
cur.execute("SAVEPOINT sp")
imei = clean(item.get("deviceImei"))
event_type = clean(item.get("type"))
event_time = clean_ts(item.get("gateTime"))
if not imei or not event_type or not event_time:
cur.execute("RELEASE SAVEPOINT sp")
continue
cur.execute("""
INSERT INTO tracksolid.device_events
(imei, event_type, event_time, timezone)
VALUES (%s, %s, %s, %s)
ON CONFLICT (imei, event_type, event_time) DO NOTHING
""", (imei, event_type, event_time, clean(item.get("timezone"))))
cur.execute("RELEASE SAVEPOINT sp")
inserted += 1
except Exception:
cur.execute("ROLLBACK TO SAVEPOINT sp")
log.warning("Failed to process event for %s", item.get("deviceImei"), exc_info=True)
log_ingestion(cur, "webhook/pushevent", len(items), 0, inserted,
int((time.time() - t0) * 1000), True)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
return inserted
@app.post("/pushevent")
async def push_event(request: Request):
token, items = await _parse_request(request)
_validate_token(token)
if not items:
return JSONResponse(content=SUCCESS)
fix(security,ingest): 260702 audit — secure the stack, correct poller counters Security: - .dockerignore + Dockerfile: stop baking .env / the 346MB OSM pbf into image layers; install pinned from uv.lock (reproducible builds) (SEC-04/05). - docker-compose: DB port binds ${DB_BIND_ADDR:-127.0.0.1} — loopback-only by default; remote tooling moves to an SSH tunnel (SEC-01). - webhook_receiver: CRITICAL startup warning + WEBHOOK_REQUIRE_TOKEN=1 fail-closed when JIMI_WEBHOOK_TOKEN is empty (SEC-02 / FIX-W01). Correctness: - FIX-M22/E07: capture cur.rowcount BEFORE RELEASE SAVEPOINT in poll_alarms/ poll_trips/poll_parking — the RELEASE reported -1, producing "Alarms: -4 new events inserted" logs and negative ingestion_log.rows_inserted. - FIX-W02: parse application/json push bodies (were silently dropped). - FIX-W03: move webhook DB work off the event loop via asyncio.to_thread. - FIX-M23: poll_trips phased so no txn/connection is held across Tracksolid + Nominatim (1 req/s) network calls. - FIX-M24: sync_devices disables devices absent from every target (guarded). - FIX-W04: reject device-clock-garbage alarm_time (2019 timestamps observed). - get_token(): don't relabel already-aware timestamptz expiries (BUG-P9). Observability/lifecycle: - migration 21: v_ingest_health restricted to active pipeline endpoints so one-shot tools stop wedging /health/ingest at 'stale' (dry-run verified). - FIX-M25: daily purge_audit_logs() trims ingestion_log (90d) + refresh_log (180d). - remove orphaned duplicate migrations/10_driver_clock_views.sql; ruff lint config. +5 webhook tests (82 pass). Report/plan/work-log in docs/reports/260702_*. Local only; not deployed. CLAUDE.md fix-history edits left uncommitted (that file also carries unrelated in-progress edits). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 06:51:02 +00:00
inserted = await asyncio.to_thread(_process_events, items)
log.info("pushevent: %d/%d items processed.", inserted, len(items))
return JSONResponse(content=SUCCESS)