tracksolid_timescale_grafan.../grafanaOperationalManual.md
David Kiania 82761e1e3f Add Grafana NOC operational manual
Covers pre-deployment checklist, post-deploy verification steps for each
panel, database verification queries, troubleshooting guide, and
day-to-day NOC operations reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 00:12:48 +03:00

18 KiB

Grafana NOC Fleet Dashboard — Operational Manual

Table of Contents

  1. Pre-Deployment Checklist
  2. Deploy
  3. Post-Deployment Verification
  4. Dashboard Panel Verification
  5. Database Verification Queries
  6. Troubleshooting
  7. Day-to-Day NOC Operations
  8. Maintenance

1. Pre-Deployment Checklist

Run these checks before starting Grafana for the first time.

1.1 Environment Variables

Confirm .env contains all required Grafana variables:

grep -E 'GRAFANA_ADMIN_PASSWORD|GRAFANA_DB_RO_PASSWORD' .env

Expected output — both lines must be present and non-empty:

GRAFANA_ADMIN_PASSWORD=<your admin password>
GRAFANA_DB_RO_PASSWORD=<grafana_ro postgres password>

If GRAFANA_DB_RO_PASSWORD is missing, add it before continuing:

echo "GRAFANA_DB_RO_PASSWORD=<password>" >> .env

1.2 Verify grafana_ro User Exists in Postgres

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "\du grafana_ro"

Expected output:

 Role name  |       Attributes
------------+---------------------------
 grafana_ro | Cannot login, ...

If the role is missing, create it:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
  CREATE ROLE grafana_ro WITH LOGIN PASSWORD '<password>' NOSUPERUSER NOCREATEDB NOCREATEROLE;
  GRANT USAGE ON SCHEMA tracksolid TO grafana_ro;
  GRANT SELECT ON ALL TABLES IN SCHEMA tracksolid TO grafana_ro;
  ALTER DEFAULT PRIVILEGES IN SCHEMA tracksolid GRANT SELECT ON TABLES TO grafana_ro;
"

1.3 Verify Provisioning Files Are Present

ls -R grafana/provisioning/

Expected output:

grafana/provisioning/:
dashboards  dashboards-json  datasources

grafana/provisioning/datasources:
tracksolid_postgres.yaml

grafana/provisioning/dashboards:
noc_fleet.yaml

grafana/provisioning/dashboards-json:
noc_fleet_dashboard.json

1.4 Verify Database Has Live Data

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT COUNT(*) AS devices,
       COUNT(lp.imei) AS with_position
FROM tracksolid.devices d
LEFT JOIN tracksolid.live_positions lp USING (imei)
WHERE d.enabled_flag = 1;
"

Expected: devices = total enabled vehicles, with_position > 0 (at least some vehicles have GPS fixes).


2. Deploy

Start Grafana

docker compose up -d grafana

Confirm Container Is Running

docker compose ps grafana

Expected STATUS: Up (not Restarting or Exit).

Tail Startup Logs

docker compose logs --follow grafana

Watch for these lines — they confirm provisioning loaded successfully:

msg="Starting Grafana"
msg="Provisioning datasource" name=TracksolidDB
msg="Finished provisioning data sources"
msg="Inserting/updating dashboard" name="NOC Fleet Operations — Live"
msg="HTTP Server Listen" address=0.0.0.0:3000

Press Ctrl+C to stop following once you see the server listening.


3. Post-Deployment Verification

3.1 Check Provisioning Loaded

docker compose logs grafana | grep -E "provision|Inserting|dashboard|datasource"
What to look for Means
Provisioning datasource with name=TracksolidDB Datasource YAML was read
Finished provisioning data sources Datasource created successfully
Inserting/updating dashboard with noc-fleet-live Dashboard JSON was loaded
Failed to provision or Error See Troubleshooting section

3.2 Verify Datasource via API

curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \
  http://localhost:3000/api/datasources/name/TracksolidDB \
  | python3 -m json.tool | grep -E '"name"|"uid"|"type"|"url"'

Expected:

"name": "TracksolidDB",
"type": "postgres",
"uid": "tracksolid_pg",
"url": "timescale_db:5432",

3.3 Test Datasource Connection via API

curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \
  -X POST http://localhost:3000/api/datasources/uid/tracksolid_pg/health \
  | python3 -m json.tool

Expected:

{
    "message": "Database Connection OK",
    "status": "OK"
}

If status is ERROR — see Troubleshooting: Datasource Connection Fails.

3.4 Verify Dashboard Is Registered

curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \
  http://localhost:3000/api/dashboards/uid/noc-fleet-live \
  | python3 -m json.tool | grep -E '"title"|"uid"|"version"'

Expected:

"title": "NOC Fleet Operations — Live",
"uid": "noc-fleet-live",
"version": 1,

3.5 Open the Dashboard in a Browser

Navigate to: http://localhost:3000/d/noc-fleet-live

Login with:

  • Username: admin
  • Password: value of GRAFANA_ADMIN_PASSWORD from .env

The NOC dashboard should load as the home page automatically.


4. Dashboard Panel Verification

Work through each panel to confirm it renders correctly.

4.1 Stat Panels (Row 1)

Panel Expected Red Flag
Total Vehicles Integer matching enabled device count (should be ~80) Shows 0 or -
Online Now Integer ≤ Total Vehicles Shows -
Recent (5-30m) Integer ≤ Total Vehicles Shows -
Offline Integer ≤ Total Vehicles Shows -
Moving Now Integer ≤ Online Now Shows -
Avg Speed (km/h) Numeric value, green < 80 / amber 80-120 / red > 120 Shows No data

Online + Recent + Offline should sum to Total Vehicles.

Check the sum:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT
  connectivity_status,
  COUNT(*) AS vehicles
FROM tracksolid.v_fleet_status
GROUP BY connectivity_status
ORDER BY connectivity_status;
"

4.2 Geomap Panel

Work through this checklist visually:

  • Map loads with dark Carto basemap (not a grey blank tile)
  • Arrow markers appear on the map (not dots or circles)
  • Markers are clustered around East Africa (Nairobi / Mombasa / Kampala area)
  • Arrows point in different directions — not all the same
  • Clicking a marker opens a tooltip showing: Plate, Driver, Speed, Heading, Status, Location
  • lat and lng fields are NOT visible in the tooltip (hidden by field overrides)
  • imei field is NOT visible in the tooltip

Verify direction data exists:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT
  MIN(direction) AS min_dir,
  MAX(direction) AS max_dir,
  COUNT(*) FILTER (WHERE direction IS NOT NULL) AS with_direction,
  COUNT(*) AS total
FROM tracksolid.live_positions;
"

If with_direction = 0, all arrows will point North (0°) — this means the GPS devices haven't sent bearing data yet, which is normal for parked vehicles.

4.3 Vehicle Status Table

  • Table shows all enabled vehicles (row count matches Total Vehicles stat)
  • Rows sort: Online (green) → Recent (amber) → Offline (red)
  • "Status" column has colour-coded background
  • "Speed (km/h)" column shows colour-coded text (green/amber/red)
  • "Last Fix" column shows a readable timestamp (not a raw epoch number)
  • "Min Ago" column shows integers (minutes since last GPS fix)
  • Vehicles with no GPS fix show null in speed/location columns but still appear

Spot-check a specific vehicle:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT
  d.vehicle_number, d.driver_name, lp.speed,
  lp.gps_time, lp.lat, lp.lng
FROM tracksolid.devices d
LEFT JOIN tracksolid.live_positions lp USING (imei)
WHERE d.enabled_flag = 1
LIMIT 5;
"

4.4 Ingestion Health Panel (Collapsed)

Expand the panel by clicking its title row.

  • Rows appear for each polling endpoint (live_positions, trips, alarms, etc.)
  • "Result" column shows OK (green) for all active endpoints
  • "Last Run" timestamps are recent (within the last few minutes for live_positions)
  • No FAIL (red) entries
docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT endpoint, success, seconds_ago, error_message
FROM tracksolid.v_ingestion_health
ORDER BY endpoint;
"

4.5 Auto-Refresh Verification

  1. Note the "Last Fix" timestamp for any vehicle in the table
  2. Wait 30 seconds
  3. Confirm the page auto-refreshes (watch the Grafana spinner in the top-right)
  4. Confirm "Last Fix" values have updated for Online vehicles

5. Database Verification Queries

Run these directly against the database to validate the data powering each panel.

Fleet Status Summary

SELECT
  connectivity_status,
  COUNT(*) AS vehicles,
  ROUND(AVG(speed)::numeric, 1) AS avg_speed_kmh
FROM tracksolid.v_fleet_status
GROUP BY connectivity_status
ORDER BY connectivity_status;

Live Position Freshness

SELECT
  COUNT(*) AS total_positions,
  COUNT(*) FILTER (WHERE gps_time >= NOW() - INTERVAL '5 minutes')  AS fresh_5m,
  COUNT(*) FILTER (WHERE gps_time >= NOW() - INTERVAL '30 minutes') AS fresh_30m,
  MAX(gps_time) AS newest_fix,
  MIN(gps_time) AS oldest_fix
FROM tracksolid.live_positions;

Moving Vehicles

SELECT
  d.vehicle_number, d.driver_name,
  lp.speed, lp.direction, lp.loc_desc, lp.gps_time
FROM tracksolid.devices d
INNER JOIN tracksolid.live_positions lp USING (imei)
WHERE d.enabled_flag = 1
  AND lp.speed > 0
  AND lp.acc_status = '1'
ORDER BY lp.speed DESC;

Vehicles With No GPS Fix

SELECT
  d.vehicle_number, d.vehicle_name, d.driver_name, d.city
FROM tracksolid.devices d
LEFT JOIN tracksolid.live_positions lp USING (imei)
WHERE d.enabled_flag = 1
  AND lp.imei IS NULL
ORDER BY d.vehicle_number;

Ingestion Pipeline Health

SELECT
  endpoint,
  run_at,
  success,
  imei_count,
  rows_upserted,
  duration_ms,
  seconds_ago,
  error_message
FROM tracksolid.v_ingestion_health
ORDER BY endpoint;

Stale Positions (no update in > 1 hour)

SELECT
  d.vehicle_number, d.driver_name, d.city,
  lp.gps_time,
  EXTRACT(EPOCH FROM (NOW() - lp.gps_time))::int / 60 AS minutes_since_fix
FROM tracksolid.devices d
INNER JOIN tracksolid.live_positions lp USING (imei)
WHERE d.enabled_flag = 1
  AND lp.gps_time < NOW() - INTERVAL '1 hour'
ORDER BY lp.gps_time ASC;

6. Troubleshooting

Grafana Container Keeps Restarting

docker compose logs grafana --tail=50

Common causes:

Error in logs Fix
failed to connect to server Database not healthy yet — wait 30s and retry
permission denied on provisioning path Check the ./grafana/provisioning directory exists and is readable
GF_SECURITY_ADMIN_PASSWORD not set Add GRAFANA_ADMIN_PASSWORD to .env

Datasource Connection Fails

docker compose logs grafana | grep -i "datasource\|postgres\|connect"

Check the password is correct:

docker compose exec timescale_db psql -U grafana_ro -d tracksolid_db -c "SELECT 1;"

If this fails with authentication failed, reset the password:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \
  "ALTER ROLE grafana_ro WITH PASSWORD '<new_password>';"

Then update GRAFANA_DB_RO_PASSWORD in .env and restart:

docker compose restart grafana

Dashboard Shows "No Data"

Step 1 — confirm the datasource UID matches:

curl -s -u admin:${GRAFANA_ADMIN_PASSWORD} \
  http://localhost:3000/api/datasources \
  | python3 -m json.tool | grep uid

Must show "uid": "tracksolid_pg". If the UID is different, the dashboard JSON references the wrong datasource.

Step 2 — test the query directly in Grafana:

  1. Open the dashboard
  2. Click the panel title → Edit
  3. Switch to the Query tab
  4. Click "Run query"
  5. If it errors, the SQL or connection is the issue

Step 3 — verify data exists:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \
  "SELECT COUNT(*) FROM tracksolid.live_positions;"

If this returns 0, the ingestion pipeline has not populated data yet. Check ingestion service logs:

docker compose logs ingest_movement --tail=30
docker compose logs ingest_events --tail=30

Geomap Shows No Markers

Check coordinates are valid:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT COUNT(*) AS total,
       COUNT(*) FILTER (WHERE lat IS NOT NULL AND lng IS NOT NULL) AS with_coords,
       COUNT(*) FILTER (WHERE lat BETWEEN -90 AND 90 AND lng BETWEEN -180 AND 180) AS valid_coords
FROM tracksolid.live_positions;
"

If with_coords = 0: GPS data has not arrived yet. Wait for the next ingest cycle (60 seconds).

If valid_coords < with_coords: some coordinates are out of range — data quality issue on the device side.

Check the basemap is loading:

Open browser DevTools (F12) → Network tab → filter for tile or carto. If Carto tile requests are failing, there may be no internet access from the browser. Try switching the basemap to OpenStreetMap in the Grafana panel editor temporarily.

Arrows All Point North (No Direction Data)

This is expected for parked vehicles — direction is only meaningful when the vehicle is moving. Confirm:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT COUNT(*) FILTER (WHERE direction > 0) AS with_bearing,
       COUNT(*) AS total
FROM tracksolid.live_positions;
"

If with_bearing = 0 even for moving vehicles, the GPS device firmware may not be sending bearing in the position payload. Check with the Tracksolid account settings.

Dashboard Not Loading as Home Page

Confirm the environment variable is set:

docker compose exec grafana env | grep DASHBOARDS

Expected:

GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/etc/grafana/provisioning/dashboards-json/noc_fleet_dashboard.json

If missing, confirm env_file: .env is present in the grafana service block of docker-compose.yaml and restart:

docker compose restart grafana

Provisioning Changes Not Reflecting

Grafana polls the provisioning directory every 30 seconds. If changes to the dashboard JSON are not appearing:

docker compose restart grafana
docker compose logs grafana | grep -i "provision\|dashboard"

7. Day-to-Day NOC Operations

Morning Health Check (run at shift start)

# 1. Confirm all services are up
docker compose ps

# 2. Check ingestion pipeline ran recently
docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \
  "SELECT endpoint, run_at, success, seconds_ago FROM tracksolid.v_ingestion_health ORDER BY endpoint;"

# 3. Check live position freshness
docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c \
  "SELECT connectivity_status, COUNT(*) FROM tracksolid.v_fleet_status GROUP BY 1 ORDER BY 1;"

Investigating a Specific Vehicle

Replace KAA 123A with the actual plate:

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT
  d.vehicle_number, d.vehicle_name, d.driver_name, d.driver_phone,
  lp.lat, lp.lng, lp.speed, lp.direction,
  lp.acc_status, lp.gps_time, lp.loc_desc,
  EXTRACT(EPOCH FROM (NOW() - lp.gps_time))::int / 60 AS minutes_ago
FROM tracksolid.devices d
LEFT JOIN tracksolid.live_positions lp USING (imei)
WHERE d.vehicle_number = 'KAA 123A';
"

Checking Recent Trips for a Vehicle

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT
  t.start_time, t.end_time,
  ROUND(t.distance_m / 1000.0, 2) AS distance_km,
  t.avg_speed_kmh, t.max_speed_kmh,
  ROUND(t.driving_time_s / 3600.0, 2) AS driving_hours
FROM tracksolid.trips t
INNER JOIN tracksolid.devices d USING (imei)
WHERE d.vehicle_number = 'KAA 123A'
  AND t.start_time >= NOW() - INTERVAL '24 hours'
ORDER BY t.start_time DESC;
"

Checking Recent Alarms

docker compose exec timescale_db psql -U $POSTGRES_USER -d tracksolid_db -c "
SELECT
  d.vehicle_number, d.driver_name,
  a.alarm_type, a.alarm_name, a.alarm_time, a.speed
FROM tracksolid.alarms a
INNER JOIN tracksolid.devices d USING (imei)
WHERE a.alarm_time >= NOW() - INTERVAL '24 hours'
ORDER BY a.alarm_time DESC
LIMIT 20;
"

8. Maintenance

Restart Grafana Only

docker compose restart grafana

Restart Full Stack

docker compose down && docker compose up -d

Update the Dashboard JSON

  1. Edit grafana/provisioning/dashboards-json/noc_fleet_dashboard.json
  2. Grafana auto-reloads within 30 seconds (no restart needed)
  3. Commit the change: git add grafana/ && git commit -m "Update NOC dashboard"

Check Container Resource Usage

docker stats grafana timescale_db --no-stream

Grafana Data Volume Size

docker system df -v | grep grafana-data

View Grafana Version

docker compose exec grafana grafana-server -v

Expected: Version 11.0.0


Quick Reference

Task Command
Start Grafana docker compose up -d grafana
Stop Grafana docker compose stop grafana
Restart Grafana docker compose restart grafana
View logs docker compose logs grafana --tail=50
Check provisioning docker compose logs grafana | grep -i provision
Test datasource curl -s -u admin:$GRAFANA_ADMIN_PASSWORD http://localhost:3000/api/datasources/uid/tracksolid_pg/health
Open dashboard http://localhost:3000/d/noc-fleet-live
Fleet summary SELECT connectivity_status, COUNT(*) FROM tracksolid.v_fleet_status GROUP BY 1;
Ingestion health SELECT * FROM tracksolid.v_ingestion_health ORDER BY endpoint;