diff --git a/docs/superpowers/specs/2026-04-12-bug-reduction-quality-program-design.md b/docs/superpowers/specs/2026-04-12-bug-reduction-quality-program-design.md new file mode 100644 index 0000000..3951747 --- /dev/null +++ b/docs/superpowers/specs/2026-04-12-bug-reduction-quality-program-design.md @@ -0,0 +1,148 @@ +# Bug Reduction Quality Program — Design Spec + +**Date:** 2026-04-12 +**Project:** Fireside Communications Fleet Telemetry Ingestion Platform +**Repo:** `55_ts_coolify_gemini_prod` +**Status:** Approved — Implementation in Progress + +## Problem + +The platform has been running in production since late 2025 ingesting GPS and telemetry data from ~63 fleet vehicles. All bugs discovered to date (FIX-M11, FIX-M13, FIX-M16, FIX-E06, BUG-01 through BUG-05) were caught manually in production — via data inspection, Grafana anomalies, or customer reports. There are: + +- Zero automated tests +- No linting or type-checking configuration +- No CI/CD pipeline +- No programmatic DB health monitoring + +Any code change risks silent regressions. Any API field mapping change risks data going silently to NULL. Any schema change risks data corruption that may not be noticed for days. + +## Goal + +A layered quality program that: +1. **Finds existing bugs and data issues** without modifying source code +2. **Prevents future regressions** by locking in known-correct behaviour +3. **Monitors production DB health** on a daily schedule + +## Constraints + +- Existing source files MUST NOT be modified in Phase 1 +- All additions are new files only (config, tests, CI workflows, audit scripts) +- Must run in CI (Forgejo Actions, self-hosted runner) and production (scheduled DB audit) + +--- + +## Architecture: Three Parallel Workstreams + +### Workstream 1 — Static Analysis + +**Tools:** `ruff` (linting) + `mypy` (type checking) +**Trigger:** Every push / pull request via Forgejo Actions +**Risk:** Zero — read-only analysis of existing source + +Surfaces: +- Undefined names, unused imports (ruff/F rules) +- Likely bugs: mutable defaults, string formatting issues (ruff/B rules) +- Type errors: untyped returns, Optional not handled (mypy) +- Modern Python upgrade opportunities (ruff/UP rules) + +First run will be noisy — output becomes the bug backlog. + +### Workstream 2 — Test Suite + +**Framework:** pytest + pytest-asyncio +**Trigger:** Every push / pull request via Forgejo Actions +**Isolation:** Integration tests use a Docker TimescaleDB service container + +**Unit tests** (pure Python, no DB): +- `test_clean_helpers.py` — `clean()`, `clean_num()`, `clean_ts()`, `is_valid_fix()` — these gate all data into the DB +- `test_api_signing.py` — `build_sign()` MD5 signature correctness +- `test_field_mapping.py` — locks in the three most bug-prone field mappings: + - FIX-E06: poll alarms use `alertTypeId`/`alarmTypeName`/`alertTime` (not `alarmType`) + - FIX-M16: trip distance arrives in metres, stored as km (÷ 1000) + - BUG-03: BCD timestamps `YYMMDDHHmmss` parsed correctly + +**Integration tests** (real TimescaleDB): +- `test_movement_pipeline.py` — `poll_live_positions()` full round-trip, UPSERT idempotency +- `test_events_pipeline.py` — `poll_alarms()` field mapping, NULL alarm_type rejection +- `test_webhook_endpoints.py` — FastAPI endpoints with mock Jimi payloads, SAVEPOINT isolation + +### Workstream 3 — DB Audit + +**Runner:** `db_audit/run_audit.py` (Python) +**Trigger:** Daily at 06:00 EAT (03:00 UTC) via scheduled Forgejo workflow + `workflow_dispatch` for manual runs +**Output:** Rows written to `tracksolid.health_checks` table; queryable from Grafana + +Six health checks: + +| Check | File | Critical | Warning | +|---|---|---|---| +| Stale devices | `stale_devices.sql` | — | Any enabled device with no GPS fix >2h | +| NULL integrity | `null_integrity.sql` | Any NULL imei or gps_time in telemetry tables | — | +| Distance outliers | `distance_outliers.sql` | — | Any trip >500km or <0km in last 7 days | +| Duplicate positions | `duplicate_positions.sql` | Any (imei, gps_time) duplicate in position_history | — | +| Data gaps | `data_gaps.sql` | — | Any enabled device with no data in 7 days | +| Enum drift | `enum_drift.sql` | — | Unexpected value in source/severity columns | + +Exit code: `1` on any `critical`, `0` on `ok`/`warning`. + +--- + +## File Layout + +``` +55_ts_coolify_gemini_prod/ +├── pyproject.toml ← ADD: ruff + mypy + pytest config + dev deps +├── .forgejo/ +│ └── workflows/ +│ ├── ci-static.yml +│ ├── ci-tests.yml +│ └── scheduled-audit.yml +├── tests/ +│ ├── conftest.py +│ ├── fixtures/ +│ │ ├── api_responses.py +│ │ └── schema.sql +│ ├── unit/ +│ │ ├── test_clean_helpers.py +│ │ ├── test_api_signing.py +│ │ └── test_field_mapping.py +│ └── integration/ +│ ├── test_movement_pipeline.py +│ ├── test_events_pipeline.py +│ └── test_webhook_endpoints.py +└── db_audit/ + ├── run_audit.py + ├── checks/ + │ ├── stale_devices.sql + │ ├── null_integrity.sql + │ ├── distance_outliers.sql + │ ├── duplicate_positions.sql + │ ├── data_gaps.sql + │ └── enum_drift.sql + └── schema/ + └── health_checks_table.sql +``` + +--- + +## Forgejo Runner Setup + +Before CI can run, a self-hosted runner must be registered on the Coolify server: + +1. Forgejo → Settings → Actions → Runners → Register Runner → copy token +2. On Coolify server: `docker run -d --name forgejo-runner gitea/act_runner:latest register --instance https://repo.rahamafresh.com --token --name coolify-runner --labels self-hosted` +3. Verify runner appears as active in Forgejo + +Required Forgejo secrets: +- `DATABASE_URL` — production DB connection string (for scheduled audit) +- `TEST_DATABASE_URL` — set automatically by CI service container + +--- + +## Verification + +| Workstream | Pass Criteria | +|---|---| +| Static Analysis | Push triggers CI-static; ruff + mypy produce output report; job exits non-zero on violations | +| Test Suite | Push triggers CI-tests; all unit tests pass; integration tests pass against service container DB | +| DB Audit | Manual run populates `health_checks` table; findings match known issues (44 silent devices, etc.); scheduled run fires at 06:00 EAT |