tracksolid_timescale_grafan.../docs/reference/260507_pgbouncer_deployment.md

# pgbouncer sidecar deployment

**Date:** 2026-05-07
**Branch:** `quality-program-2026-04-12`
**Status:** Phase 1 deployed. Phase 2 (pgAdmin4 sidecar) was rolled back on 2026-05-08 — see git history (`bc020cb`, then reverted).

---

## Context

**Driver:** pgAdmin4 running on the maintainer's laptop has been exhausting
`tracksolid_db`'s `max_connections`. Each Query Tool tab in pgAdmin holds its
own long-lived backend connection; combined with the existing peak of ~50–60
connections from the ingest pipeline, the budget tips over and cascades —
pgcli (and anything else trying to connect) starts failing.

**Goal:** Add pgbouncer in front of `timescale_db` to enforce a connection
budget via transaction-mode pooling. Desktop pgAdmin (or any other admin
client) connects through pgbouncer and is multiplexed onto a small fixed
pool of backends.

**Frozen scope (unchanged this round):**
- DWH bronze pipeline (`dwh/*.sql`, `tracksolid_dwh@31.97.44.246:5888`)
- n8n DWH workflows (`n8n-workflows/dwh_extract*`, `dwh_load_bronze*`)
- Grafana provisioning (`grafana/provisioning/datasources/...`)
- Python ingest containers (`ingest_movement_rev.py`, `ingest_events_rev.py`,
  `webhook_receiver_rev.py`) — they keep talking to `timescale_db:5432`
  directly. Cutover, if desired, is a separate plan.
- `db_backup` sidecar — `pg_dump` is incompatible with transaction-mode
  pooling and stays on `timescale_db:5432`.

---

## Phase 1 — pgbouncer sidecar, no client cutover

Add a new service to `docker-compose.yaml`. Internal Docker network only;
no host port binding.

```yaml
  pgbouncer:
    image: edoburu/pgbouncer
    restart: always
    depends_on:
      timescale_db:
        condition: service_healthy
    env_file: .env
    environment:
      - DB_HOST=timescale_db
      - DB_PORT=5432
      - DB_USER=${POSTGRES_USER}
      - DB_PASSWORD=${POSTGRES_PASSWORD}
      - DB_NAME=${POSTGRES_DB}
      - POOL_MODE=transaction
      - AUTH_TYPE=scram-sha-256
      - MAX_CLIENT_CONN=200
      - DEFAULT_POOL_SIZE=15
      - MIN_POOL_SIZE=2
      - RESERVE_POOL_SIZE=5
      - SERVER_RESET_QUERY=DISCARD ALL
      - SERVER_IDLE_TIMEOUT=600
      - ADMIN_USERS=${POSTGRES_USER}
      - LISTEN_PORT=6432
      - AUTH_USER=pgbouncer
      - AUTH_QUERY=SELECT uname, phash FROM public.user_lookup($$1)
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -h 127.0.0.1 -p 6432 -U ${POSTGRES_USER}"]
      interval: 30s
      timeout: 5s
      retries: 3
```

**Why these values:**
- `POOL_MODE=transaction` — recycles backend on every transaction boundary.
  Cuts pgAdmin's per-tab idle conn from 1 backend → ~0 when idle.
- `DEFAULT_POOL_SIZE=15` — total backend slots per (user, db) pair. Sits
  comfortably under Postgres `max_connections` (default 100) leaving room for
  ingest's existing ~50–60.
- `MAX_CLIENT_CONN=200` — pgAdmin can open as many tabs as it wants; they
  queue rather than fail.
- `RESERVE_POOL_SIZE=5` — emergency slack when `default_pool_size` saturates.
- `SERVER_RESET_QUERY=DISCARD ALL` — wipes session state between transactions
  so leaked `SET`s from one client don't bleed into the next.

### Auth: SCRAM passthrough via `auth_query`

Avoids hand-maintaining `userlist.txt`. pgbouncer authenticates as a
dedicated `pgbouncer` Postgres role and looks up SCRAM hashes for the
requesting user via a SECURITY DEFINER function.

New migration `10_pgbouncer_auth.sql` (08 and 09 are taken by
`08_analytics_config.sql` and `09_trips_enrichment.sql`):

```sql
-- Role created with placeholder password; run_migrations.py:sync_role_passwords
-- replaces it with PGBOUNCER_AUTH_PASSWORD on every container startup.
-- Same convention used today for grafana_ro.
DO $$
BEGIN
  IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'pgbouncer') THEN
    CREATE ROLE pgbouncer LOGIN PASSWORD 'SET_PASSWORD_IN_ENV';
  END IF;
END
$$;

CREATE OR REPLACE FUNCTION public.user_lookup(in_user text,
    OUT uname text, OUT phash text) RETURNS record AS $$
BEGIN
  SELECT usename, passwd FROM pg_catalog.pg_shadow
   WHERE usename = in_user INTO uname, phash;
  RETURN;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

REVOKE ALL ON FUNCTION public.user_lookup(text) FROM public;
GRANT EXECUTE ON FUNCTION public.user_lookup(text) TO pgbouncer;
```

Two changes to `run_migrations.py`:
1. Append `"10_pgbouncer_auth.sql"` to `MIGRATIONS`.
2. Extend `sync_role_passwords()` `roles` dict with
   `"pgbouncer": os.getenv("PGBOUNCER_AUTH_PASSWORD")`.

The migration is applied by the next ingest container restart and recorded
in `tracksolid.schema_migrations`. `sync_role_passwords` then ALTER ROLEs
the password from the env var so the placeholder is never live.

### New env vars in `.env`

- `PGBOUNCER_AUTH_PASSWORD` — password for the new `pgbouncer` Postgres role
- (existing vars reused: `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`)

### Phase 1 verification

1. Apply migration via ingest container restart; confirm in
   `tracksolid.schema_migrations` that `10_pgbouncer_auth.sql` is recorded.
2. `docker compose up -d pgbouncer`.
3. From inside any compose service:
   ```bash
   psql -h pgbouncer -p 6432 -U postgres -d tracksolid_db -c 'SELECT 1'
   ```
4. From the pgbouncer container's admin console:
   ```bash
   psql -h 127.0.0.1 -p 6432 -U postgres -d pgbouncer -c 'SHOW POOLS;'
   ```
   Confirm pool mode = `transaction`, server connections within
   `default_pool_size`.
5. `SHOW STATS;` and `SHOW CLIENTS;` should both respond.
6. Confirm no client has cut over: `tracksolid.ingestion_log` continues
   accumulating; Grafana panels keep refreshing.

---

## Files to modify / create

| Path | Change |
|---|---|
| `260507_pgbouncer_deployment.md` | THIS FILE — runbook for the rollout |
| `docker-compose.yaml` | Add `pgbouncer` service |
| `10_pgbouncer_auth.sql` | NEW — creates `pgbouncer` role + `public.user_lookup` SECURITY DEFINER function |
| `.env` | Add `PGBOUNCER_AUTH_PASSWORD` (do not commit values) |
| `docs/CONNECTIONS.md` | Add a "pgbouncer" section: pool mode, exposure, who uses it, how to connect for ad-hoc admin |
| `CLAUDE.md` §3 / §4 | Note that admin tooling now goes through `pgbouncer:6432`; ingest/grafana/backup remain direct; reference this runbook |

## Files NOT to modify (frozen scope)

- `grafana/provisioning/datasources/tracksolid_postgres.yaml`
- `n8n-workflows/dwh_extract*.json`, `n8n-workflows/dwh_load_bronze*.json`
- `dwh/*.sql`
- `ingest_movement_rev.py`, `ingest_events_rev.py`, `webhook_receiver_rev.py`,
  `ts_shared_rev.py`
- `backup/` — `pg_dump` keeps using `timescale_db:5432` directly

---

## Reused conventions and utilities

- `run_migrations.py` already applies new `NN_*.sql` files in order against
  `tracksolid_db` and tracks them in `tracksolid.schema_migrations`. Phase 1
  adds `10_pgbouncer_auth.sql` to this flow — no new tooling needed.
- `env_file: .env` + `depends_on: <svc> condition: service_healthy` mirrors
  the existing pattern at `docker-compose.yaml:28–31, 39–42, 50–53, 67–70,
  87–90`.
- Container-name resolution rule from CLAUDE.md §3 still applies for any
  `docker exec` against the new service:
  ```bash
  docker ps --filter name=pgbouncer --format "{{.Names}}" | head -1
  ```

---

## Out-of-scope follow-ups (separate plans)

1. **Cut over Python ingest to pgbouncer.** Change `DATABASE_URL` in `.env`
   from `timescale_db:5432` to `pgbouncer:6432`. Requires verifying psycopg2
   pool + SAVEPOINTs against transaction-mode pgbouncer (low risk per
   exploration — no LISTEN/NOTIFY, no advisory locks across statements, no
   prepared statements in the codebase).
2. **Rotate `dwh_owner` / `grafana_ro` plaintext passwords** still in
   `dwh/260423_dwh_ddl_v1.sql` (pre-existing item from CLAUDE.md §10).

---

## Rollback

If pgbouncer misbehaves:

1. **Stop the service without touching the rest of the stack:**
   ```bash
   docker compose stop pgbouncer
   docker compose rm -f pgbouncer
   ```
   Ingest, Grafana, webhook, backup are unaffected — they were never cut
   over.
2. **Revert the SQL migration if needed:**
   ```sql
   DROP FUNCTION public.user_lookup(text);
   DROP ROLE pgbouncer;
   DELETE FROM tracksolid.schema_migrations
    WHERE filename = '10_pgbouncer_auth.sql';
   ```
3. **Revert compose changes** by checking out the prior `docker-compose.yaml`.

---

## End-to-end verification checklist

- [ ] `10_pgbouncer_auth.sql` applied — visible in
      `tracksolid.schema_migrations`
- [ ] `pgbouncer` service healthy — `docker compose ps` shows `healthy`
- [ ] `psql -h pgbouncer -p 6432 -U postgres -d tracksolid_db -c 'SELECT 1'`
      from inside the network
- [ ] `SHOW POOLS;` in pgbouncer admin shows `transaction` mode
- [ ] Existing pipelines unaffected: `tracksolid.ingestion_log` continues
      growing at current rate; Grafana dashboards still render
- [ ] pgcli no longer hits "too many connections" when used alongside pgAdmin
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								# pgbouncer sidecar deployment
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
 								**Date:** 2026-05-07
 								**Branch:** `quality-program-2026-04-12`
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								**Status:** Phase 1 deployed. Phase 2 (pgAdmin4 sidecar) was rolled back on 2026-05-08 — see git history (`bc020cb`, then reverted).
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
 								---
 								## Context
 								**Driver:** pgAdmin4 running on the maintainer's laptop has been exhausting
 								`tracksolid_db`'s `max_connections`. Each Query Tool tab in pgAdmin holds its
 								own long-lived backend connection; combined with the existing peak of ~50–60
 								connections from the ingest pipeline, the budget tips over and cascades —
 								pgcli (and anything else trying to connect) starts failing.
 								**Goal:** Add pgbouncer in front of `timescale_db` to enforce a connection
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								budget via transaction-mode pooling. Desktop pgAdmin (or any other admin
 								client) connects through pgbouncer and is multiplexed onto a small fixed
 								pool of backends.
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
 								**Frozen scope (unchanged this round):**
 								- DWH bronze pipeline (`dwh/*.sql`, `tracksolid_dwh@31.97.44.246:5888`)
 								- n8n DWH workflows (`n8n-workflows/dwh_extract*`, `dwh_load_bronze*`)
 								- Grafana provisioning (`grafana/provisioning/datasources/...`)
 								- Python ingest containers (`ingest_movement_rev.py`, `ingest_events_rev.py`,
 								  `webhook_receiver_rev.py`) — they keep talking to `timescale_db:5432`
 								  directly. Cutover, if desired, is a separate plan.
 								- `db_backup` sidecar — `pg_dump` is incompatible with transaction-mode
 								  pooling and stays on `timescale_db:5432`.
 								---
 								## Phase 1 — pgbouncer sidecar, no client cutover
 								Add a new service to `docker-compose.yaml`. Internal Docker network only;
 								no host port binding.
 								```yaml
 								  pgbouncer:
-												fix(infra): drop pgbouncer image tag — pin 1.23.1 unavailable

The pinned tag failed to pull on Coolify deploy. Switching to the
untagged edoburu/pgbouncer (rolling latest) so the sidecar can come up.
Will revisit pinning to a known-good tag once verified live.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:48:31 +00:00
+								    image: edoburu/pgbouncer
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								    restart: always
 								    depends_on:
 								      timescale_db:
 								        condition: service_healthy
 								    env_file: .env
 								    environment:
 								      - DB_HOST=timescale_db
 								      - DB_PORT=5432
 								      - DB_USER=${POSTGRES_USER}
 								      - DB_PASSWORD=${POSTGRES_PASSWORD}
 								      - DB_NAME=${POSTGRES_DB}
 								      - POOL_MODE=transaction
 								      - AUTH_TYPE=scram-sha-256
 								      - MAX_CLIENT_CONN=200
 								      - DEFAULT_POOL_SIZE=15
 								      - MIN_POOL_SIZE=2
 								      - RESERVE_POOL_SIZE=5
 								      - SERVER_RESET_QUERY=DISCARD ALL
 								      - SERVER_IDLE_TIMEOUT=600
 								      - ADMIN_USERS=${POSTGRES_USER}
 								      - LISTEN_PORT=6432
 								      - AUTH_USER=pgbouncer
 								      - AUTH_QUERY=SELECT uname, phash FROM public.user_lookup($$1)
 								    healthcheck:
 								      test: ["CMD-SHELL", "pg_isready -h 127.0.0.1 -p 6432 -U ${POSTGRES_USER}"]
 								      interval: 30s
 								      timeout: 5s
 								      retries: 3
 								```
 								**Why these values:**
 								- `POOL_MODE=transaction` — recycles backend on every transaction boundary.
 								  Cuts pgAdmin's per-tab idle conn from 1 backend → ~0 when idle.
 								- `DEFAULT_POOL_SIZE=15` — total backend slots per (user, db) pair. Sits
 								  comfortably under Postgres `max_connections` (default 100) leaving room for
 								  ingest's existing ~50–60.
 								- `MAX_CLIENT_CONN=200` — pgAdmin can open as many tabs as it wants; they
 								  queue rather than fail.
 								- `RESERVE_POOL_SIZE=5` — emergency slack when `default_pool_size` saturates.
 								- `SERVER_RESET_QUERY=DISCARD ALL` — wipes session state between transactions
 								  so leaked `SET`s from one client don't bleed into the next.
 								### Auth: SCRAM passthrough via `auth_query`
 								Avoids hand-maintaining `userlist.txt`. pgbouncer authenticates as a
 								dedicated `pgbouncer` Postgres role and looks up SCRAM hashes for the
 								requesting user via a SECURITY DEFINER function.
 								New migration `10_pgbouncer_auth.sql` (08 and 09 are taken by
 								`08_analytics_config.sql` and `09_trips_enrichment.sql`):
 								```sql
 								-- Role created with placeholder password; run_migrations.py:sync_role_passwords
 								-- replaces it with PGBOUNCER_AUTH_PASSWORD on every container startup.
 								-- Same convention used today for grafana_ro.
 								DO $$
 								BEGIN
 								  IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'pgbouncer') THEN
 								    CREATE ROLE pgbouncer LOGIN PASSWORD 'SET_PASSWORD_IN_ENV';
 								  END IF;
 								END
 								$$;
 								CREATE OR REPLACE FUNCTION public.user_lookup(in_user text,
 								    OUT uname text, OUT phash text) RETURNS record AS $$
 								BEGIN
 								  SELECT usename, passwd FROM pg_catalog.pg_shadow
 								   WHERE usename = in_user INTO uname, phash;
 								  RETURN;
 								END;
 								$$ LANGUAGE plpgsql SECURITY DEFINER;
 								REVOKE ALL ON FUNCTION public.user_lookup(text) FROM public;
 								GRANT EXECUTE ON FUNCTION public.user_lookup(text) TO pgbouncer;
 								```
 								Two changes to `run_migrations.py`:
 . Append `"10_pgbouncer_auth.sql"` to `MIGRATIONS`.
 . Extend `sync_role_passwords()` `roles` dict with
 								   `"pgbouncer": os.getenv("PGBOUNCER_AUTH_PASSWORD")`.
 								The migration is applied by the next ingest container restart and recorded
 								in `tracksolid.schema_migrations`. `sync_role_passwords` then ALTER ROLEs
 								the password from the env var so the placeholder is never live.
 								### New env vars in `.env`
 								- `PGBOUNCER_AUTH_PASSWORD` — password for the new `pgbouncer` Postgres role
 								- (existing vars reused: `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`)
 								### Phase 1 verification
 . Apply migration via ingest container restart; confirm in
 								   `tracksolid.schema_migrations` that `10_pgbouncer_auth.sql` is recorded.
 . `docker compose up -d pgbouncer`.
 . From inside any compose service:
 								   ```bash
 								   psql -h pgbouncer -p 6432 -U postgres -d tracksolid_db -c 'SELECT 1'
 								   ```
 . From the pgbouncer container's admin console:
 								   ```bash
 								   psql -h 127.0.0.1 -p 6432 -U postgres -d pgbouncer -c 'SHOW POOLS;'
 								   ```
 								   Confirm pool mode = `transaction`, server connections within
 								   `default_pool_size`.
 . `SHOW STATS;` and `SHOW CLIENTS;` should both respond.
 . Confirm no client has cut over: `tracksolid.ingestion_log` continues
 								   accumulating; Grafana panels keep refreshing.
 								---
 								## Files to modify / create
 								| Path | Change |
 								|---|---|
 								| `260507_pgbouncer_deployment.md` | THIS FILE — runbook for the rollout |
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								| `docker-compose.yaml` | Add `pgbouncer` service |
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								| `10_pgbouncer_auth.sql` | NEW — creates `pgbouncer` role + `public.user_lookup` SECURITY DEFINER function |
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								| `.env` | Add `PGBOUNCER_AUTH_PASSWORD` (do not commit values) |
 								| `docs/CONNECTIONS.md` | Add a "pgbouncer" section: pool mode, exposure, who uses it, how to connect for ad-hoc admin |
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								| `CLAUDE.md` §3 / §4 | Note that admin tooling now goes through `pgbouncer:6432`; ingest/grafana/backup remain direct; reference this runbook |
 								## Files NOT to modify (frozen scope)
 								- `grafana/provisioning/datasources/tracksolid_postgres.yaml`
 								- `n8n-workflows/dwh_extract*.json`, `n8n-workflows/dwh_load_bronze*.json`
 								- `dwh/*.sql`
 								- `ingest_movement_rev.py`, `ingest_events_rev.py`, `webhook_receiver_rev.py`,
 								  `ts_shared_rev.py`
 								- `backup/` — `pg_dump` keeps using `timescale_db:5432` directly
 								---
 								## Reused conventions and utilities
 								- `run_migrations.py` already applies new `NN_*.sql` files in order against
 								  `tracksolid_db` and tracks them in `tracksolid.schema_migrations`. Phase 1
 								  adds `10_pgbouncer_auth.sql` to this flow — no new tooling needed.
 								- `env_file: .env` + `depends_on: <svc> condition: service_healthy` mirrors
 								  the existing pattern at `docker-compose.yaml:28–31, 39–42, 50–53, 67–70,
 –90`.
 								- Container-name resolution rule from CLAUDE.md §3 still applies for any
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								  `docker exec` against the new service:
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								  ```bash
 								  docker ps --filter name=pgbouncer --format "{{.Names}}" | head -1
 								  ```
 								---
 								## Out-of-scope follow-ups (separate plans)
 . **Cut over Python ingest to pgbouncer.** Change `DATABASE_URL` in `.env`
 								   from `timescale_db:5432` to `pgbouncer:6432`. Requires verifying psycopg2
 								   pool + SAVEPOINTs against transaction-mode pgbouncer (low risk per
 								   exploration — no LISTEN/NOTIFY, no advisory locks across statements, no
 								   prepared statements in the codebase).
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+. **Rotate `dwh_owner` / `grafana_ro` plaintext passwords** still in
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								   `dwh/260423_dwh_ddl_v1.sql` (pre-existing item from CLAUDE.md §10).
 								---
 								## Rollback
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								If pgbouncer misbehaves:
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+. **Stop the service without touching the rest of the stack:**
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								   ```bash
-												revert(infra): remove pgAdmin4 sidecar and configs

Reverts the Phase 2 pgAdmin web sidecar from bc020cb. pgbouncer (Phase 1)
stays in place. On the instance the pgadmin container has been stopped
and removed and the pgadmin-data volume dropped; Coolify subdomain and
PGADMIN_DEFAULT_* env vars to be removed in the UI separately.

Files:
- docker-compose.yaml: drop pgadmin service block + pgadmin-data volume
- pgadmin/servers.json: delete (directory removed)
- 260507_pgbouncer_deployment.md: strip Phase 2, runbook is pgbouncer-only

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 21:34:10 +00:00
+								   docker compose stop pgbouncer
 								   docker compose rm -f pgbouncer
-												feat(infra): add pgbouncer sidecar to cap tracksolid_db connections

Phase 1 of the pgbouncer + pgAdmin rollout (runbook:
260507_pgbouncer_deployment.md). pgAdmin4 on the maintainer's laptop has
been exhausting tracksolid_db's max_connections, cascading to pgcli and
operations. Adds an internal-only pgbouncer service in transaction mode
with a small backend pool (default 15) so admin-tool sprawl can no
longer starve the ingest pipeline.

No client cutover this round - ingest, Grafana, webhook, and backup all
keep talking to timescale_db:5432 directly. SCRAM passthrough is wired
via a new pgbouncer role + public.user_lookup() function (migration 10).
The role is created with a placeholder password; sync_role_passwords()
in run_migrations.py replaces it from PGBOUNCER_AUTH_PASSWORD on every
container startup, mirroring the existing grafana_ro convention.

Requires PGBOUNCER_AUTH_PASSWORD to be set in .env before deploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-05-07 10:21:35 +00:00
+								   ```
 								   Ingest, Grafana, webhook, backup are unaffected — they were never cut
 								   over.
 . **Revert the SQL migration if needed:**
 								   ```sql
 								   DROP FUNCTION public.user_lookup(text);
 								   DROP ROLE pgbouncer;
 								   DELETE FROM tracksolid.schema_migrations
 								    WHERE filename = '10_pgbouncer_auth.sql';
 								   ```
 . **Revert compose changes** by checking out the prior `docker-compose.yaml`.
 								---
 								## End-to-end verification checklist
 								- [ ] `10_pgbouncer_auth.sql` applied — visible in
 								      `tracksolid.schema_migrations`
 								- [ ] `pgbouncer` service healthy — `docker compose ps` shows `healthy`
 								- [ ] `psql -h pgbouncer -p 6432 -U postgres -d tracksolid_db -c 'SELECT 1'`
 								      from inside the network
 								- [ ] `SHOW POOLS;` in pgbouncer admin shows `transaction` mode
 								- [ ] Existing pipelines unaffected: `tracksolid.ingestion_log` continues
 								      growing at current rate; Grafana dashboards still render
 								- [ ] pgcli no longer hits "too many connections" when used alongside pgAdmin