tracksolid_timescale_grafan.../OPERATIONS_MANUAL.md
David Kiania 004fed7ab9 Add operations manual with verification queries per service
Comprehensive guide covering:
- Service architecture and scheduled tasks
- Per-service verification SQL queries grouped by service
- Health dashboard queries for monitoring
- Polling vs push coexistence and dedup strategy
- Environment variables, data retention, troubleshooting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 17:59:05 +03:00

18 KiB

Fireside Communications — Tracksolid Pro Telemetry Stack

Operations Manual & Verification Guide


1. Service Architecture

JIMI TRACKSOLID PRO API
        |
        +-- POLLING (Pull)              +-- PUSH (Webhook)
        |   (Fallback / Catch-up)       |   (Real-time)
        |                               |
   ingest_movement    ingest_events    webhook_receiver
   (60s/15m/daily)    (5m polling)     (FastAPI :8000)
        |                  |                |
        +------------------+----------------+
                           |
                    timescale_db
                    (PG16 + TimescaleDB + PostGIS)
                           |
                        grafana
                    (Dashboards :3000)
Service Purpose Restart Policy
timescale_db PostgreSQL 16 + TimescaleDB 2.15 + PostGIS 3 always
ingest_movement GPS positions, trips, parking, device sync (polling) always
ingest_events Alarm event polling (catch-up/fallback) always
webhook_receiver Real-time push data from Jimi (OBD, faults, GPS, alarms, heartbeats, trips) always
grafana Visualization dashboards (read-only DB access) always

2. Verification Tests — Grouped by Service

2.1 timescale_db

Check database is healthy:

SELECT PostGIS_Full_Version();
SELECT * FROM timescaledb_information.hypertables;

Verify all critical tables exist:

SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_schema IN ('tracksolid', 'dwh_gold')
ORDER BY table_schema, table_name;

Expected tables:

  • tracksolid.devices
  • tracksolid.api_token_cache
  • tracksolid.ingestion_log
  • tracksolid.live_positions
  • tracksolid.position_history (hypertable)
  • tracksolid.trips
  • tracksolid.parking_events
  • tracksolid.alarms
  • tracksolid.obd_readings
  • tracksolid.fault_codes
  • tracksolid.heartbeats (hypertable)
  • dwh_gold.dim_vehicles
  • dwh_gold.fact_daily_fleet_metrics

Verify hypertables are configured:

SELECT hypertable_schema, hypertable_name, compression_enabled
FROM timescaledb_information.hypertables;

Expected:

hypertable_schema hypertable_name compression_enabled
tracksolid position_history true
tracksolid heartbeats false

Verify retention policies:

SELECT application_name, schedule_interval, config
FROM timescaledb_information.jobs
WHERE application_name LIKE '%retention%' OR application_name LIKE '%compression%';

2.2 ingest_movement

This service runs four scheduled tasks. Verify each one is producing data.

2.2.1 Device Registry Sync (Daily @ 02:00 UTC)

API endpoint: jimi.user.device.list + jimi.track.device.detail Table: tracksolid.devices Schedule: Daily at 02:00 UTC + on startup

-- Check devices are registered
SELECT COUNT(*) AS total_devices,
       COUNT(*) FILTER (WHERE enabled_flag = 1) AS enabled,
       MIN(last_synced_at) AS oldest_sync,
       MAX(last_synced_at) AS latest_sync
FROM tracksolid.devices;
-- Sample device records
SELECT imei, device_name, vehicle_number, driver_name, enabled_flag, last_synced_at
FROM tracksolid.devices
ORDER BY last_synced_at DESC
LIMIT 10;

Healthy indicator: latest_sync within the last 24 hours, total_devices > 0.

2.2.2 Live Positions (Every 60 seconds)

API endpoint: jimi.user.device.location.list Tables: tracksolid.live_positions, tracksolid.position_history Schedule: Every 60 seconds

-- Check live positions are being updated
SELECT COUNT(*) AS tracked_devices,
       MIN(updated_at) AS oldest_update,
       MAX(updated_at) AS latest_update,
       ROUND(AVG(EXTRACT(EPOCH FROM (NOW() - updated_at)))) AS avg_age_seconds
FROM tracksolid.live_positions;
-- Fleet status overview
SELECT connectivity_status, COUNT(*) AS device_count
FROM tracksolid.v_fleet_status
GROUP BY connectivity_status;
-- Position history volume (last 24h)
SELECT COUNT(*) AS records_24h,
       COUNT(DISTINCT imei) AS active_devices,
       MIN(gps_time) AS earliest,
       MAX(gps_time) AS latest
FROM tracksolid.position_history
WHERE gps_time > NOW() - INTERVAL '24 hours';

Healthy indicator: latest_update within last 2 minutes, avg_age_seconds < 120.

2.2.3 Trip Reports (Every 15 minutes)

API endpoint: jimi.device.track.mileage Table: tracksolid.trips Schedule: Every 15 minutes (1-hour lookback)

-- Trip data freshness
SELECT COUNT(*) AS trips_24h,
       COUNT(DISTINCT imei) AS vehicles_with_trips,
       ROUND(SUM(distance_m) / 1000, 1) AS total_km,
       ROUND(AVG(avg_speed_kmh), 1) AS fleet_avg_speed,
       MAX(updated_at) AS latest_trip_update
FROM tracksolid.trips
WHERE start_time > NOW() - INTERVAL '24 hours';
-- Trips with driving time (runTimeSecond captured)
SELECT imei, start_time, end_time,
       ROUND(distance_m / 1000, 1) AS km,
       avg_speed_kmh, max_speed_kmh,
       driving_time_s, source
FROM tracksolid.trips
WHERE start_time > NOW() - INTERVAL '24 hours'
ORDER BY start_time DESC
LIMIT 10;

Healthy indicator: trips_24h > 0 during business hours, latest_trip_update within last 20 minutes.

2.2.4 Parking Events (Every 15 minutes)

API endpoint: jimi.open.platform.report.parking Table: tracksolid.parking_events Schedule: Every 15 minutes (1-hour lookback)

-- Parking data freshness
SELECT COUNT(*) AS events_24h,
       COUNT(DISTINCT imei) AS vehicles_parked,
       ROUND(AVG(duration_seconds) / 60, 1) AS avg_park_minutes,
       MAX(start_time) AS latest_event
FROM tracksolid.parking_events
WHERE start_time > NOW() - INTERVAL '24 hours';

Healthy indicator: events_24h > 0 if fleet is active.

2.2.5 Ingestion Log (All ingest_movement tasks)

-- Movement pipeline health
SELECT endpoint, run_at, success, rows_upserted, rows_inserted, duration_ms, error_message
FROM tracksolid.ingestion_log
WHERE endpoint IN (
    'jimi.user.device.list+detail',
    'jimi.user.device.location.list',
    'jimi.device.track.mileage',
    'jimi.open.platform.report.parking'
)
ORDER BY run_at DESC
LIMIT 20;

2.3 ingest_events

2.3.1 Alarm Polling (Every 5 minutes)

API endpoint: jimi.device.alarm.list Table: tracksolid.alarms Schedule: Every 5 minutes (30-minute lookback)

-- Alarm data freshness
SELECT COUNT(*) AS alarms_24h,
       COUNT(DISTINCT imei) AS devices_with_alarms,
       COUNT(DISTINCT alarm_type) AS alarm_types,
       MAX(alarm_time) AS latest_alarm
FROM tracksolid.alarms
WHERE alarm_time > NOW() - INTERVAL '24 hours';
-- Alarm breakdown by type
SELECT alarm_type, source, COUNT(*) AS count
FROM tracksolid.alarms
WHERE alarm_time > NOW() - INTERVAL '7 days'
GROUP BY alarm_type, source
ORDER BY count DESC;

2.3.2 Ingestion Log (ingest_events tasks)

SELECT endpoint, run_at, success, rows_inserted, duration_ms, error_message
FROM tracksolid.ingestion_log
WHERE endpoint = 'jimi.device.alarm.list'
ORDER BY run_at DESC
LIMIT 10;

Healthy indicator: Successful runs every ~5 minutes, duration_ms < 10000.


2.4 webhook_receiver

2.4.1 Health Check

# From within the Docker network or via Coolify domain
curl -f https://<your-webhook-domain>/health
# Expected: {"status":"ok"}

2.4.2 OBD Diagnostics (/pushobd) — Priority 1

Table: tracksolid.obd_readings Note: Push-only. No data until Jimi platform is configured to send webhooks.

-- OBD data volume
SELECT COUNT(*) AS total_readings,
       COUNT(DISTINCT imei) AS devices_reporting,
       MAX(reading_time) AS latest_reading,
       COUNT(*) FILTER (WHERE obd_data IS NOT NULL) AS with_full_payload
FROM tracksolid.obd_readings;
-- Recent OBD data sample
SELECT imei, reading_time, car_type, acc_state,
       obd_data->>'dataID1' AS rpm_or_data1,
       obd_data->>'dataID2' AS data2
FROM tracksolid.obd_readings
ORDER BY reading_time DESC
LIMIT 10;

2.4.3 DTC Fault Codes (/pushfaultinfo) — Priority 1

Table: tracksolid.fault_codes

-- Fault code summary
SELECT COUNT(*) AS total_faults,
       COUNT(DISTINCT imei) AS affected_devices,
       COUNT(DISTINCT fault_code) AS unique_codes,
       MAX(reported_at) AS latest_fault
FROM tracksolid.fault_codes;
-- Most common fault codes
SELECT fault_code, COUNT(*) AS occurrences,
       COUNT(DISTINCT imei) AS affected_devices
FROM tracksolid.fault_codes
GROUP BY fault_code
ORDER BY occurrences DESC
LIMIT 20;

2.4.4 Push Alarms (/pushalarm)

Table: tracksolid.alarms (source = 'push')

-- Push vs poll alarm comparison
SELECT source, COUNT(*) AS count, MAX(alarm_time) AS latest
FROM tracksolid.alarms
WHERE alarm_time > NOW() - INTERVAL '7 days'
GROUP BY source;

2.4.5 Push GPS (/pushgps)

Table: tracksolid.position_history (source = 'push')

-- Push vs poll position comparison
SELECT source, COUNT(*) AS count, MAX(gps_time) AS latest
FROM tracksolid.position_history
WHERE gps_time > NOW() - INTERVAL '24 hours'
GROUP BY source;

2.4.6 Heartbeats (/pushhb)

Table: tracksolid.heartbeats

-- Heartbeat volume
SELECT COUNT(*) AS total_heartbeats,
       COUNT(DISTINCT imei) AS reporting_devices,
       MAX(gate_time) AS latest_heartbeat
FROM tracksolid.heartbeats;
-- Device health from heartbeats (last 24h)
SELECT imei,
       COUNT(*) AS heartbeat_count,
       ROUND(AVG(power_level)) AS avg_power,
       ROUND(AVG(gsm_signal)) AS avg_signal,
       MAX(gate_time) AS last_seen
FROM tracksolid.heartbeats
WHERE gate_time > NOW() - INTERVAL '24 hours'
GROUP BY imei
ORDER BY heartbeat_count DESC
LIMIT 20;

2.4.7 Push Trip Reports (/pushtripreport)

Table: tracksolid.trips (source = 'push')

-- Push trips with fuel data
SELECT imei, start_time, end_time,
       ROUND(distance_m / 1000, 1) AS km,
       fuel_consumed_l, idle_time_s, source
FROM tracksolid.trips
WHERE source = 'push'
ORDER BY start_time DESC
LIMIT 10;

2.4.8 Ingestion Log (All webhook endpoints)

SELECT endpoint, run_at, success, rows_inserted, duration_ms
FROM tracksolid.ingestion_log
WHERE endpoint LIKE 'webhook/%'
ORDER BY run_at DESC
LIMIT 20;

2.4.9 Test Webhook Manually

Send a test OBD push (with empty token for initial testing):

curl -X POST https://<your-webhook-domain>/pushobd \
  -d 'token=&data_list=[{"deviceImei":"TEST_IMEI","obdJson":{"event_time":"2026-04-08 12:00:00","lat":51.5,"lng":-0.1,"AccState":1,"dataID1":2500}}]'
# Expected: {"code":0,"msg":"success"}

Note: The TEST_IMEI must exist in tracksolid.devices (FK constraint) or the insert will be skipped.


2.5 grafana

# Verify Grafana is accessible
curl -f https://<your-grafana-domain>/api/health
# Expected: {"commit":"...","database":"ok","version":"11.0.0"}

Configure data source in Grafana UI:

  • Type: PostgreSQL
  • Host: timescale_db:5432 (internal Docker network)
  • Database: tracksolid_db
  • User: grafana_ro
  • SSL Mode: disable (internal network)

3. Overall Health Dashboard Queries

3.1 Ingestion Pipeline Status (All Services)

SELECT * FROM tracksolid.v_ingestion_health
ORDER BY seconds_ago ASC;
endpoint run_at success seconds_ago Status
jimi.user.device.location.list recent true < 120 OK
jimi.device.alarm.list recent true < 600 OK
jimi.device.track.mileage recent true < 1200 OK
webhook/pushobd recent true varies OK

Alert thresholds:

  • seconds_ago > 300 for live positions = WARNING
  • seconds_ago > 900 for trips/alarms = WARNING
  • success = false for any endpoint = CRITICAL

3.2 Data Volume Summary (Last 24 Hours)

SELECT 'devices' AS metric, COUNT(*)::TEXT AS value FROM tracksolid.devices WHERE enabled_flag = 1
UNION ALL
SELECT 'live_positions', COUNT(*)::TEXT FROM tracksolid.live_positions WHERE updated_at > NOW() - INTERVAL '2 minutes'
UNION ALL
SELECT 'position_history_24h', COUNT(*)::TEXT FROM tracksolid.position_history WHERE gps_time > NOW() - INTERVAL '24 hours'
UNION ALL
SELECT 'trips_24h', COUNT(*)::TEXT FROM tracksolid.trips WHERE start_time > NOW() - INTERVAL '24 hours'
UNION ALL
SELECT 'alarms_24h', COUNT(*)::TEXT FROM tracksolid.alarms WHERE alarm_time > NOW() - INTERVAL '24 hours'
UNION ALL
SELECT 'parking_24h', COUNT(*)::TEXT FROM tracksolid.parking_events WHERE start_time > NOW() - INTERVAL '24 hours'
UNION ALL
SELECT 'obd_total', COUNT(*)::TEXT FROM tracksolid.obd_readings
UNION ALL
SELECT 'fault_codes_total', COUNT(*)::TEXT FROM tracksolid.fault_codes
UNION ALL
SELECT 'heartbeats_24h', COUNT(*)::TEXT FROM tracksolid.heartbeats WHERE gate_time > NOW() - INTERVAL '24 hours';

3.3 Token Health

SELECT account, access_token IS NOT NULL AS has_token,
       expires_at,
       ROUND(EXTRACT(EPOCH FROM (expires_at - NOW())) / 60) AS minutes_until_expiry,
       obtained_at
FROM tracksolid.api_token_cache;

Healthy indicator: minutes_until_expiry > 0. Token auto-refreshes when < 30 minutes remaining.

3.4 Database Size

SELECT hypertable_schema || '.' || hypertable_name AS table_name,
       pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name))) AS size,
       num_chunks
FROM timescaledb_information.hypertables;

SELECT schemaname || '.' || relname AS table_name,
       pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
WHERE schemaname = 'tracksolid'
ORDER BY pg_total_relation_size(relid) DESC;

4. Polling vs Push Coexistence

Both polling and webhook services can write to the same tables. Deduplication is handled via ON CONFLICT clauses:

Data Type Polling Webhook Dedup Strategy
GPS poll_live_positions (60s) /pushgps ON CONFLICT (imei, gps_time) DO NOTHING
Alarms poll_alarms (5m) /pushalarm ON CONFLICT (imei, alarm_type, alarm_time) DO NOTHING
Trips poll_trips (15m) /pushtripreport ON CONFLICT (imei, start_time) DO UPDATE
OBD None (push-only) /pushobd ON CONFLICT (imei, reading_time) DO UPDATE
Fault Codes None (push-only) /pushfaultinfo ON CONFLICT (imei, reported_at, fault_code) DO NOTHING
Heartbeats None (push-only) /pushhb ON CONFLICT (imei, gate_time) DO NOTHING
Parking poll_parking (15m) None ON CONFLICT (imei, start_time, event_type) DO NOTHING

The source column ('poll' or 'push') tracks data origin where applicable.


5. Environment Variables

Variable Required Used By Description
TRACKSOLID_APP_KEY Yes All Python services OAuth2 application key
TRACKSOLID_APP_SECRET Yes All Python services OAuth2 application secret
TRACKSOLID_USER_ID Yes All Python services Tracksolid account user ID
TRACKSOLID_PWD_MD5 Yes All Python services MD5 hash of user password
TRACKSOLID_TARGET_ACCOUNT No ingest_movement Defaults to USER_ID
TRACKSOLID_API_URL No All Python services Defaults to https://eu-open.tracksolidpro.com/route/rest
DATABASE_URL Yes All Python services Full PostgreSQL connection string
POSTGRES_DB Yes timescale_db Database name
POSTGRES_USER Yes timescale_db Database superuser
POSTGRES_PASSWORD Yes timescale_db Database password
GRAFANA_ADMIN_PASSWORD Yes grafana Grafana admin UI password
JIMI_WEBHOOK_TOKEN No webhook_receiver Webhook auth token (empty = skip validation)
DB_POOL_MAX No All Python services Max DB connections (default: 12)

6. Scheduled Task Summary

Service Task Interval API Endpoint Tables
ingest_movement sync_devices Daily 02:00 UTC jimi.user.device.list + jimi.track.device.detail devices
ingest_movement poll_live_positions 60 seconds jimi.user.device.location.list live_positions, position_history
ingest_movement poll_trips 15 minutes jimi.device.track.mileage trips
ingest_movement poll_parking 15 minutes jimi.open.platform.report.parking parking_events
ingest_events poll_alarms 5 minutes jimi.device.alarm.list alarms

7. Data Retention

Table Retention Managed By
position_history 90 days TimescaleDB retention policy (auto)
heartbeats 30 days TimescaleDB retention policy (auto)
position_history (compressed) After 14 days TimescaleDB compression policy (auto)
All other tables Indefinite Manual cleanup if needed

8. Troubleshooting

No data in any table

-- Check ingestion log for errors
SELECT * FROM tracksolid.ingestion_log
WHERE success = false
ORDER BY run_at DESC LIMIT 20;

Token auth failures

-- Check token status
SELECT account, expires_at,
       CASE WHEN expires_at < NOW() THEN 'EXPIRED' ELSE 'VALID' END AS status
FROM tracksolid.api_token_cache;

If expired, the service auto-refreshes. Persistent failures indicate credential issues in .env.

Webhook not receiving data

  1. Verify the webhook domain is configured in Coolify and routed to webhook_receiver:8000
  2. Verify the Jimi Tracksolid Pro platform is configured to push to your webhook URL
  3. Check /health endpoint is reachable
  4. Check ingestion log: SELECT * FROM tracksolid.ingestion_log WHERE endpoint LIKE 'webhook/%' ORDER BY run_at DESC LIMIT 10;

High ingestion latency

-- Check slow endpoints
SELECT endpoint, ROUND(AVG(duration_ms)) AS avg_ms, MAX(duration_ms) AS max_ms
FROM tracksolid.ingestion_log
WHERE run_at > NOW() - INTERVAL '1 hour'
GROUP BY endpoint
ORDER BY avg_ms DESC;

Rate limiting

Look for Rate limit hit in container logs. The system auto-backs off (10-30s). Persistent rate limiting may require reducing polling frequency or contacting Jimi support.