# Grafana NOC Fleet Dashboard — Operational Manual ## Table of Contents 1. [Pre-Deployment Checklist](#1-pre-deployment-checklist) 2. [Deploy](#2-deploy) 3. [Post-Deployment Verification](#3-post-deployment-verification) 4. [Dashboard Panel Verification](#4-dashboard-panel-verification) 5. [Database Verification Queries](#5-database-verification-queries) 6. [Troubleshooting](#6-troubleshooting) 7. [Day-to-Day NOC Operations](#7-day-to-day-noc-operations) 8. [Maintenance](#8-maintenance) --- ## 1. Pre-Deployment Checklist Run these checks before starting Grafana for the first time. ### 1.1 Environment Variables Confirm `.env` contains all required Grafana variables: ```bash grep -E 'GRAFANA_ADMIN_PASSWORD|GRAFANA_DB_RO_PASSWORD' .env ``` Expected output — both lines must be present and non-empty: ``` GRAFANA_ADMIN_PASSWORD= GRAFANA_DB_RO_PASSWORD= ``` If `GRAFANA_DB_RO_PASSWORD` is missing, add it before continuing: ```bash echo "GRAFANA_DB_RO_PASSWORD=" >> .env ``` ### 1.2 Verify grafana_ro User Exists in Postgres ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "\du grafana_ro" ``` Expected output: ``` Role name | Attributes ------------+--------------------------- grafana_ro | Cannot login, ... ``` If the role is missing, create it: ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " CREATE ROLE grafana_ro WITH LOGIN PASSWORD '' NOSUPERUSER NOCREATEDB NOCREATEROLE; GRANT USAGE ON SCHEMA tracksolid TO grafana_ro; GRANT SELECT ON ALL TABLES IN SCHEMA tracksolid TO grafana_ro; ALTER DEFAULT PRIVILEGES IN SCHEMA tracksolid GRANT SELECT ON TABLES TO grafana_ro; " ``` ### 1.3 Verify Provisioning Files Are Present ```bash ls -R grafana/provisioning/ ``` Expected output: ``` grafana/provisioning/: dashboards dashboards-json datasources grafana/provisioning/datasources: tracksolid_postgres.yaml grafana/provisioning/dashboards: noc_fleet.yaml grafana/provisioning/dashboards-json: noc_fleet_dashboard.json ``` ### 1.4 Verify Database Has Live Data ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT COUNT(*) AS devices, COUNT(lp.imei) AS with_position FROM tracksolid.devices d LEFT JOIN tracksolid.live_positions lp USING (imei) WHERE d.enabled_flag = 1; " ``` Expected: `devices` = total enabled vehicles, `with_position` > 0 (at least some vehicles have GPS fixes). --- ## 2. Deploy ### Start Grafana ```bash docker compose up -d grafana ``` ### Confirm Container Is Running ```bash docker compose ps grafana ``` Expected `STATUS`: `Up` (not `Restarting` or `Exit`). ### Tail Startup Logs ```bash docker compose logs --follow grafana ``` Watch for these lines — they confirm provisioning loaded successfully: ``` msg="Starting Grafana" msg="Provisioning datasource" name=TracksolidDB msg="Finished provisioning data sources" msg="Inserting/updating dashboard" name="NOC Fleet Operations — Live" msg="HTTP Server Listen" address=0.0.0.0:3000 ``` Press `Ctrl+C` to stop following once you see the server listening. --- ## 3. Post-Deployment Verification ### 3.1 Check Provisioning Loaded ```bash docker compose logs grafana | grep -E "provision|Inserting|dashboard|datasource" ``` | What to look for | Means | |---|---| | `Provisioning datasource` with `name=TracksolidDB` | Datasource YAML was read | | `Finished provisioning data sources` | Datasource created successfully | | `Inserting/updating dashboard` with `noc-fleet-live` | Dashboard JSON was loaded | | `Failed to provision` or `Error` | See Troubleshooting section | ### 3.2 Verify Datasource via API ```bash curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \ http://localhost:3000/api/datasources/name/TracksolidDB \ | python3 -m json.tool | grep -E '"name"|"uid"|"type"|"url"' ``` Expected: ```json "name": "TracksolidDB", "type": "postgres", "uid": "tracksolid_pg", "url": "timescale_db:5432", ``` ### 3.3 Test Datasource Connection via API ```bash curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \ -X POST http://localhost:3000/api/datasources/uid/tracksolid_pg/health \ | python3 -m json.tool ``` Expected: ```json { "message": "Database Connection OK", "status": "OK" } ``` If status is `ERROR` — see [Troubleshooting: Datasource Connection Fails](#datasource-connection-fails). ### 3.4 Verify Dashboard Is Registered ```bash curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \ http://localhost:3000/api/dashboards/uid/noc-fleet-live \ | python3 -m json.tool | grep -E '"title"|"uid"|"version"' ``` Expected: ```json "title": "NOC Fleet Operations — Live", "uid": "noc-fleet-live", "version": 1, ``` ### 3.5 Open the Dashboard in a Browser Navigate to: `http://localhost:3000/d/noc-fleet-live` Login with: - **Username:** `admin` - **Password:** value of `GRAFANA_ADMIN_PASSWORD` from `.env` The NOC dashboard should load as the home page automatically. --- ## 4. Dashboard Panel Verification Work through each panel to confirm it renders correctly. ### 4.1 Stat Panels (Row 1) | Panel | Expected | Red Flag | |---|---|---| | Total Vehicles | Integer matching enabled device count (should be ~80) | Shows 0 or `-` | | Online Now | Integer ≤ Total Vehicles | Shows `-` | | Recent (5-30m) | Integer ≤ Total Vehicles | Shows `-` | | Offline | Integer ≤ Total Vehicles | Shows `-` | | Moving Now | Integer ≤ Online Now | Shows `-` | | Avg Speed (km/h) | Numeric value, green < 80 / amber 80-120 / red > 120 | Shows `No data` | **Online + Recent + Offline should sum to Total Vehicles.** Check the sum: ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT connectivity_status, COUNT(*) AS vehicles FROM tracksolid.v_fleet_status GROUP BY connectivity_status ORDER BY connectivity_status; " ``` ### 4.2 Geomap Panel Work through this checklist visually: - [ ] Map loads with dark Carto basemap (not a grey blank tile) - [ ] Arrow markers appear on the map (not dots or circles) - [ ] Markers are clustered around East Africa (Nairobi / Mombasa / Kampala area) - [ ] Arrows point in different directions — not all the same - [ ] Clicking a marker opens a tooltip showing: Plate, Driver, Speed, Heading, Status, Location - [ ] `lat` and `lng` fields are NOT visible in the tooltip (hidden by field overrides) - [ ] `imei` field is NOT visible in the tooltip **Verify direction data exists:** ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT MIN(direction) AS min_dir, MAX(direction) AS max_dir, COUNT(*) FILTER (WHERE direction IS NOT NULL) AS with_direction, COUNT(*) AS total FROM tracksolid.live_positions; " ``` If `with_direction` = 0, all arrows will point North (0°) — this means the GPS devices haven't sent bearing data yet, which is normal for parked vehicles. ### 4.3 Vehicle Status Table - [ ] Table shows all enabled vehicles (row count matches Total Vehicles stat) - [ ] Rows sort: Online (green) → Recent (amber) → Offline (red) - [ ] "Status" column has colour-coded background - [ ] "Speed (km/h)" column shows colour-coded text (green/amber/red) - [ ] "Last Fix" column shows a readable timestamp (not a raw epoch number) - [ ] "Min Ago" column shows integers (minutes since last GPS fix) - [ ] Vehicles with no GPS fix show `null` in speed/location columns but still appear **Spot-check a specific vehicle:** ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT d.vehicle_number, d.driver_name, lp.speed, lp.gps_time, lp.lat, lp.lng FROM tracksolid.devices d LEFT JOIN tracksolid.live_positions lp USING (imei) WHERE d.enabled_flag = 1 LIMIT 5; " ``` ### 4.4 Ingestion Health Panel (Collapsed) Expand the panel by clicking its title row. - [ ] Rows appear for each polling endpoint (`live_positions`, `trips`, `alarms`, etc.) - [ ] "Result" column shows `OK` (green) for all active endpoints - [ ] "Last Run" timestamps are recent (within the last few minutes for live_positions) - [ ] No `FAIL` (red) entries ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT endpoint, success, seconds_ago, error_message FROM tracksolid.v_ingestion_health ORDER BY endpoint; " ``` ### 4.5 Auto-Refresh Verification 1. Note the "Last Fix" timestamp for any vehicle in the table 2. Wait 30 seconds 3. Confirm the page auto-refreshes (watch the Grafana spinner in the top-right) 4. Confirm "Last Fix" values have updated for Online vehicles --- ## 5. Database Verification Queries Run these directly against the database to validate the data powering each panel. ### Fleet Status Summary ```sql SELECT connectivity_status, COUNT(*) AS vehicles, ROUND(AVG(speed)::numeric, 1) AS avg_speed_kmh FROM tracksolid.v_fleet_status GROUP BY connectivity_status ORDER BY connectivity_status; ``` ### Live Position Freshness ```sql SELECT COUNT(*) AS total_positions, COUNT(*) FILTER (WHERE gps_time >= NOW() - INTERVAL '5 minutes') AS fresh_5m, COUNT(*) FILTER (WHERE gps_time >= NOW() - INTERVAL '30 minutes') AS fresh_30m, MAX(gps_time) AS newest_fix, MIN(gps_time) AS oldest_fix FROM tracksolid.live_positions; ``` ### Moving Vehicles ```sql SELECT d.vehicle_number, d.driver_name, lp.speed, lp.direction, lp.loc_desc, lp.gps_time FROM tracksolid.devices d INNER JOIN tracksolid.live_positions lp USING (imei) WHERE d.enabled_flag = 1 AND lp.speed > 0 AND lp.acc_status = '1' ORDER BY lp.speed DESC; ``` ### Vehicles With No GPS Fix ```sql SELECT d.vehicle_number, d.vehicle_name, d.driver_name, d.city FROM tracksolid.devices d LEFT JOIN tracksolid.live_positions lp USING (imei) WHERE d.enabled_flag = 1 AND lp.imei IS NULL ORDER BY d.vehicle_number; ``` ### Ingestion Pipeline Health ```sql SELECT endpoint, run_at, success, imei_count, rows_upserted, duration_ms, seconds_ago, error_message FROM tracksolid.v_ingestion_health ORDER BY endpoint; ``` ### Stale Positions (no update in > 1 hour) ```sql SELECT d.vehicle_number, d.driver_name, d.city, lp.gps_time, EXTRACT(EPOCH FROM (NOW() - lp.gps_time))::int / 60 AS minutes_since_fix FROM tracksolid.devices d INNER JOIN tracksolid.live_positions lp USING (imei) WHERE d.enabled_flag = 1 AND lp.gps_time < NOW() - INTERVAL '1 hour' ORDER BY lp.gps_time ASC; ``` --- ## 6. Troubleshooting ### Grafana Container Keeps Restarting ```bash docker compose logs grafana --tail=50 ``` Common causes: | Error in logs | Fix | |---|---| | `failed to connect to server` | Database not healthy yet — wait 30s and retry | | `permission denied` on provisioning path | Check the `./grafana/provisioning` directory exists and is readable | | `GF_SECURITY_ADMIN_PASSWORD not set` | Add `GRAFANA_ADMIN_PASSWORD` to `.env` | ### Datasource Connection Fails ```bash docker compose logs grafana | grep -i "datasource\|postgres\|connect" ``` Check the password is correct: ```bash docker compose exec timescale_db psql -U grafana_ro -d tracksolid_db -c "SELECT 1;" ``` If this fails with `authentication failed`, reset the password: ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \ "ALTER ROLE grafana_ro WITH PASSWORD '';" ``` Then update `GRAFANA_DB_RO_PASSWORD` in `.env` and restart: ```bash docker compose restart grafana ``` ### Dashboard Shows "No Data" **Step 1 — confirm the datasource UID matches:** ```bash curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \ http://localhost:3000/api/datasources \ | python3 -m json.tool | grep uid ``` Must show `"uid": "tracksolid_pg"`. If the UID is different, the dashboard JSON references the wrong datasource. **Step 2 — test the query directly in Grafana:** 1. Open the dashboard 2. Click the panel title → Edit 3. Switch to the Query tab 4. Click "Run query" 5. If it errors, the SQL or connection is the issue **Step 3 — verify data exists:** ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \ "SELECT COUNT(*) FROM tracksolid.live_positions;" ``` If this returns 0, the ingestion pipeline has not populated data yet. Check ingestion service logs: ```bash docker compose logs ingest_movement --tail=30 docker compose logs ingest_events --tail=30 ``` ### Geomap Shows No Markers **Check coordinates are valid:** ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT COUNT(*) AS total, COUNT(*) FILTER (WHERE lat IS NOT NULL AND lng IS NOT NULL) AS with_coords, COUNT(*) FILTER (WHERE lat BETWEEN -90 AND 90 AND lng BETWEEN -180 AND 180) AS valid_coords FROM tracksolid.live_positions; " ``` If `with_coords` = 0: GPS data has not arrived yet. Wait for the next ingest cycle (60 seconds). If `valid_coords` < `with_coords`: some coordinates are out of range — data quality issue on the device side. **Check the basemap is loading:** Open browser DevTools (F12) → Network tab → filter for `tile` or `carto`. If Carto tile requests are failing, there may be no internet access from the browser. Try switching the basemap to OpenStreetMap in the Grafana panel editor temporarily. ### Arrows All Point North (No Direction Data) This is expected for parked vehicles — direction is only meaningful when the vehicle is moving. Confirm: ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT COUNT(*) FILTER (WHERE direction > 0) AS with_bearing, COUNT(*) AS total FROM tracksolid.live_positions; " ``` If `with_bearing` = 0 even for moving vehicles, the GPS device firmware may not be sending bearing in the position payload. Check with the Tracksolid account settings. ### Dashboard Not Loading as Home Page Confirm the environment variable is set: ```bash docker compose exec grafana env | grep DASHBOARDS ``` Expected: ``` GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/etc/grafana/provisioning/dashboards-json/noc_fleet_dashboard.json ``` If missing, confirm `env_file: .env` is present in the grafana service block of `docker-compose.yaml` and restart: ```bash docker compose restart grafana ``` ### Provisioning Changes Not Reflecting Grafana polls the provisioning directory every 30 seconds. If changes to the dashboard JSON are not appearing: ```bash docker compose restart grafana docker compose logs grafana | grep -i "provision\|dashboard" ``` --- ## 7. Day-to-Day NOC Operations ### Morning Health Check (run at shift start) ```bash # 1. Confirm all services are up docker compose ps # 2. Check ingestion pipeline ran recently docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \ "SELECT endpoint, run_at, success, seconds_ago FROM tracksolid.v_ingestion_health ORDER BY endpoint;" # 3. Check live position freshness docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \ "SELECT connectivity_status, COUNT(*) FROM tracksolid.v_fleet_status GROUP BY 1 ORDER BY 1;" ``` ### Investigating a Specific Vehicle Replace `KAA 123A` with the actual plate: ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT d.vehicle_number, d.vehicle_name, d.driver_name, d.driver_phone, lp.lat, lp.lng, lp.speed, lp.direction, lp.acc_status, lp.gps_time, lp.loc_desc, EXTRACT(EPOCH FROM (NOW() - lp.gps_time))::int / 60 AS minutes_ago FROM tracksolid.devices d LEFT JOIN tracksolid.live_positions lp USING (imei) WHERE d.vehicle_number = 'KAA 123A'; " ``` ### Checking Recent Trips for a Vehicle ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT t.start_time, t.end_time, ROUND(t.distance_m / 1000.0, 2) AS distance_km, t.avg_speed_kmh, t.max_speed_kmh, ROUND(t.driving_time_s / 3600.0, 2) AS driving_hours FROM tracksolid.trips t INNER JOIN tracksolid.devices d USING (imei) WHERE d.vehicle_number = 'KAA 123A' AND t.start_time >= NOW() - INTERVAL '24 hours' ORDER BY t.start_time DESC; " ``` ### Checking Recent Alarms ```bash docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c " SELECT d.vehicle_number, d.driver_name, a.alarm_type, a.alarm_name, a.alarm_time, a.speed FROM tracksolid.alarms a INNER JOIN tracksolid.devices d USING (imei) WHERE a.alarm_time >= NOW() - INTERVAL '24 hours' ORDER BY a.alarm_time DESC LIMIT 20; " ``` --- ## 8. Maintenance ### Restart Grafana Only ```bash docker compose restart grafana ``` ### Restart Full Stack ```bash docker compose down && docker compose up -d ``` ### Update the Dashboard JSON 1. Edit `grafana/provisioning/dashboards-json/noc_fleet_dashboard.json` 2. Grafana auto-reloads within 30 seconds (no restart needed) 3. Commit the change: `git add grafana/ && git commit -m "Update NOC dashboard"` ### Check Container Resource Usage ```bash docker stats grafana timescale_db --no-stream ``` ### Grafana Data Volume Size ```bash docker system df -v | grep grafana-data ``` ### View Grafana Version ```bash docker compose exec grafana grafana-server -v ``` Expected: `Version 11.0.0` --- ## Quick Reference | Task | Command | |---|---| | Start Grafana | `docker compose up -d grafana` | | Stop Grafana | `docker compose stop grafana` | | Restart Grafana | `docker compose restart grafana` | | View logs | `docker compose logs grafana --tail=50` | | Check provisioning | `docker compose logs grafana \| grep -i provision` | | Test datasource | `curl -s -u admin:$GRAFANA_ADMIN_PASSWORD http://localhost:3000/api/datasources/uid/tracksolid_pg/health` | | Open dashboard | `http://localhost:3000/d/noc-fleet-live` | | Fleet summary | `SELECT connectivity_status, COUNT(*) FROM tracksolid.v_fleet_status GROUP BY 1;` | | Ingestion health | `SELECT * FROM tracksolid.v_ingestion_health ORDER BY endpoint;` |