kianiadee/fleetanalytics_mcp

Fork 0

infra(db-roles): dedicated non-superuser roles for the six superuser apps #3

Open

kianiadee wants to merge 2 commits from infra/app-db-roles into main

kianiadee commented

2026-06-19 20:52:19 +00:00

Owner

Summary

Dedicated non-superuser Postgres roles for the six service connections that currently run as the postgres superuser — the root of the too many connections peaks and a standing least-privilege risk.

Superuser sessions can consume the superuser_reserved_connections slots and ignore per-role caps, so the 100-slot ceiling can fill with no admin headroom. Each new role gets a hard CONNECTION LIMIT + bounded timeouts, so the budget becomes bounded and visible.

The six connections (confirmed live)

Service	Database	New role	Conn limit
`webhook_receiver`	tracksolid_db	`webhook_app` (write)	10
`ingest_worker`	tracksolid_db	`ingest_app` (write)	10
`worker`	tracksolid_db	`worker_app` (read)	5
`dashboard_api` backend	tracksolid_db	`dashboard_app` (read)	8
`gateway`	fleet_platform	`gateway_app` (rw)	15
`cron`	fleet_platform	`cron_app` (rw)	5

Budget: new 53 + existing readers ~28 ≈ 81 < 100 ✅ (gateway/cron use a separate DB but the same server, so they count too).

Files

scripts/app_roles_tracksolid_db.sql — ts_app_read / ts_app_write capability groups + the four login roles, NOSUPERUSER, with CONNECTION LIMIT and per-role GUCs (statement_timeout, idle_session_timeout, idle_in_transaction_session_timeout, lock_timeout).
scripts/app_roles_fleet_platform.sql — fp_app_rw over the fleet_platform schemas (auth/domain/events/geo/ops/serve/slo/state) + gateway_app / cron_app.
scripts/MIGRATE_APPS_OFF_SUPERUSER.md — the runbook: discovery (what each app writes / whether it runs DDL), the connection-budget table, the object-ownership step for migration-running apps (reassign the app schemas to the existing tracksolid_owner — scoped, never REASSIGN OWNED BY postgres globally), one-at-a-time cutover order, and instant rollback (revert DATABASE_URL only).

Honest caveats

Grants are best-effort by app function (ingestion=write telemetry; gateway/cron=RW app state; worker/dashboard=read). The runbook's discovery step (Step 1) must confirm each before cutover — widen on permission denied.
All operational objects are owned by postgres, so these roles can write rows but not run DDL on existing tables. Apps that migrate at deploy need the ownership step (runbook Step 3).
Nothing is applied. SQL is drafted and structurally checked; I did not run role DDL against prod (it's gated). Happy to validate it in a rolled-back transaction on request.

Relationship to the other PRs

PR #1 — MCP reliability/security/build + footprint.
PR #2 — PgBouncer (optional once these roles + limits are in).
This PR removes the actual cause (superuser pools) and bounds each app's connections.

🤖 Generated with Claude Code

## Summary Dedicated **non-superuser** Postgres roles for the six service connections that currently run as the `postgres` superuser — the root of the `too many connections` peaks and a standing least-privilege risk. Superuser sessions can consume the `superuser_reserved_connections` slots and ignore per-role caps, so the 100-slot ceiling can fill with no admin headroom. Each new role gets a hard **`CONNECTION LIMIT`** + bounded timeouts, so the budget becomes **bounded and visible**. ## The six connections (confirmed live) | Service | Database | New role | Conn limit | |---|---|---|---| | `webhook_receiver` | tracksolid_db | `webhook_app` (write) | 10 | | `ingest_worker` | tracksolid_db | `ingest_app` (write) | 10 | | `worker` | tracksolid_db | `worker_app` (read) | 5 | | `dashboard_api` backend | tracksolid_db | `dashboard_app` (read) | 8 | | `gateway` | **fleet_platform** | `gateway_app` (rw) | 15 | | `cron` | **fleet_platform** | `cron_app` (rw) | 5 | Budget: new 53 + existing readers ~28 ≈ **81 < 100** ✅ (`gateway`/`cron` use a separate DB but the same server, so they count too). ## Files - **`scripts/app_roles_tracksolid_db.sql`** — `ts_app_read` / `ts_app_write` capability groups + the four login roles, NOSUPERUSER, with `CONNECTION LIMIT` and per-role GUCs (`statement_timeout`, `idle_session_timeout`, `idle_in_transaction_session_timeout`, `lock_timeout`). - **`scripts/app_roles_fleet_platform.sql`** — `fp_app_rw` over the fleet_platform schemas (auth/domain/events/geo/ops/serve/slo/state) + `gateway_app` / `cron_app`. - **`scripts/MIGRATE_APPS_OFF_SUPERUSER.md`** — the runbook: **discovery** (what each app writes / whether it runs DDL), the connection-budget table, the **object-ownership step** for migration-running apps (reassign the app schemas to the existing `tracksolid_owner` — *scoped*, never `REASSIGN OWNED BY postgres` globally), one-at-a-time cutover order, and **instant rollback** (revert `DATABASE_URL` only). ## Honest caveats - Grants are **best-effort by app function** (ingestion=write telemetry; gateway/cron=RW app state; worker/dashboard=read). The runbook's discovery step (Step 1) must confirm each before cutover — widen on `permission denied`. - All operational objects are owned by `postgres`, so these roles can write **rows** but not run **DDL** on existing tables. Apps that migrate at deploy need the ownership step (runbook Step 3). - **Nothing is applied.** SQL is drafted and structurally checked; I did not run role DDL against prod (it's gated). Happy to validate it in a rolled-back transaction on request. ## Relationship to the other PRs - PR #1 — MCP reliability/security/build + footprint. - PR #2 — PgBouncer (optional once these roles + limits are in). - This PR removes the actual cause (superuser pools) and bounds each app's connections. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

kianiadee added 1 commit 2026-06-19 20:52:20 +00:00

infra(db-roles): dedicated non-superuser roles for the six apps on postgres e1472adc3a

Six service connections run as the postgres SUPERUSER across two databases on the
shared 100-connection server — the root of the "too many connections" peaks and a
standing least-privilege risk. Superuser sessions ignore per-role CONNECTION LIMIT
and can consume the superuser-reserved slots.

Drafts (apply as postgres; nothing applied here):
- scripts/app_roles_tracksolid_db.sql — webhook_app, ingest_app, worker_app,
  dashboard_app. Capability groups (ts_app_read / ts_app_write), per-app NOSUPERUSER
  login roles with hard CONNECTION LIMIT + bounded GUCs (statement_timeout,
  idle_session_timeout, idle_in_transaction, lock_timeout).
- scripts/app_roles_fleet_platform.sql — gateway_app, cron_app (the apps on the
  separate fleet_platform DB), fp_app_rw group over its schemas.
- scripts/MIGRATE_APPS_OFF_SUPERUSER.md — runbook: discovery (what each app actually
  writes / whether it runs DDL), connection-budget table (sum ≈ 81 < 100), the
  object-ownership step for migration-running apps (reassign app schemas to the
  existing tracksolid_owner — scoped, never REASSIGN OWNED globally), one-at-a-time
  cutover, and instant rollback (DATABASE_URL only).

Grants are best-effort by app function and explicitly call out where to verify before
cutover; all objects are postgres-owned, so row DML works but DDL needs the ownership
step. See the runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

kianiadee added 1 commit 2026-06-19 21:08:57 +00:00

infra(db-roles): validated Option A — shared tracksolid_owner for migrators e571eeabed

Discovery (live) corrected the design: webhook_receiver, ingest_worker, and worker
all run run_migrations.py (DDL) and write telemetry — worker is the same image as
ingest_worker, not a reader. Because they ALTER objects they must own them, so all
three connect as the shared non-superuser tracksolid_owner (the role the repo already
intends to own these schemas). dashboard_api backend stays a reader (dashboard_app).

- app_roles_tracksolid_db.sql rewritten: tracksolid_owner LOGIN + CONNECTION LIMIT 30
  + GUCs + USAGE/CREATE; Timescale-aware ownership reassignment (skips table-linked
  sequences, ALTER MATERIALIZED VIEW for continuous aggregates, leaves reporting.v_trips
  with reporting_refresher, reassigns functions); dashboard_app read role.
- Reassignment validated in a rolled-back transaction on the live DB: reassigns the
  31-chunk position_history hypertable + the v_mileage_daily_cagg continuous aggregate,
  and as tracksolid_owner can ALTER the hypertable and create/drop tables.
- Runbook updated: discovery marked done, ownership folded into the apply (safe while
  apps still run as postgres — superuser bypasses ownership), corrected cutover order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This pull request can be merged automatically.

This branch is out-of-date with the base branch

You are not authorized to merge this pull request.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin infra/app-db-roles:infra/app-db-roles

git checkout infra/app-db-roles

Merge

Merge the changes and update on Forgejo.

git checkout main

git merge --no-ff infra/app-db-roles

git checkout main

git merge --ff-only infra/app-db-roles

git checkout infra/app-db-roles

git rebase main

git checkout main

git merge --no-ff infra/app-db-roles

git checkout main

git merge --squash infra/app-db-roles

git checkout main

git merge --ff-only infra/app-db-roles

git checkout main

git merge infra/app-db-roles

git push origin main

No reviewers

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: kianiadee/fleetanalytics_mcp#3

No description provided.