The DB is at max_connections=100 with ~9 services each holding persistent pools (several as the postgres superuser, idle for hours), so peaks hit "too many connections". PgBouncer multiplexes many client connections onto a small fixed set of backends, bounding DB connections regardless of how many app pools exist. Adds (stack-wide infra, parked in this repo for now — see README scope note): - pgbouncer.ini: transaction pooling, auth_query pass-through, bounded pool sizes - auth_setup.sql: pgbouncer_auth role + SECURITY DEFINER pgbouncer.user_lookup() so per-app passwords aren't hand-maintained - docker-compose.yml: the service (join the existing DB network) - userlist.txt.example + .gitignore: keep the auth verifier out of git - README.md: deploy steps, incremental cutover (superuser apps first), and the transaction-pooling caveats — including the MCP-specific note (rely on role-level GUCs; simplest to leave the minor MCP direct and pool the heavy superuser apps) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| .gitignore | ||
| auth_setup.sql | ||
| docker-compose.yml | ||
| pgbouncer.ini | ||
| README.md | ||
| userlist.txt.example | ||
PgBouncer for timescale_db (connection pooling)
Scope note: this is stack-wide infrastructure, shared by every service that talks to
timescale_db— it is only parked in the analytics-MCP repo because that is where the "too many connections" investigation happened. It arguably belongs in the backend/ingestion repo (tracksolid_timescale_grafana_prod). Move it there when convenient.
Why
The DB runs at max_connections = 100. About nine services each keep a persistent
pool open — and several connect as the postgres superuser, holding connections
idle for hours. When those pools fill under load simultaneously, the sum crosses
~97 and new connections fail with FATAL: sorry, too many clients already.
PgBouncer fixes this structurally: clients connect to PgBouncer (cheap, thousands allowed), and it multiplexes them onto a small, fixed set of real backend connections. The DB's connection count then depends on the pool size you choose, not on how many app pools exist.
9 app pools ──▶ PgBouncer :6432 ──▶ ≤25 real backends ──▶ timescale_db :5432
(hundreds) (transaction mode)
Files
| File | Purpose |
|---|---|
pgbouncer.ini |
pooling + auth config (transaction mode, auth_query) |
auth_setup.sql |
creates pgbouncer_auth + pgbouncer.user_lookup() on the DB |
userlist.txt.example |
how to generate the real (gitignored) userlist.txt |
docker-compose.yml |
the PgBouncer service (join the DB network) |
Deploy (once)
# 0) on the host, generate a password for the auth role
( umask 077; openssl rand -hex 24 > ~/.pgbouncer_auth.pw )
# 1) create the auth role + lookup function (as postgres superuser)
DB=$(docker ps --filter name=timescale_db --format '{{.Names}}' | head -1)
docker exec -i "$DB" psql -U postgres -d tracksolid_db -v ON_ERROR_STOP=1 \
-v pgb_pw="$(cat ~/.pgbouncer_auth.pw)" < pgbouncer/auth_setup.sql
# 2) build userlist.txt from the stored verifier (formats always match this way)
docker exec -i "$DB" psql -U postgres -d tracksolid_db -tAc \
"SELECT '\"pgbouncer_auth\" \"' || passwd || '\"' \
FROM pg_shadow WHERE usename='pgbouncer_auth'" > pgbouncer/userlist.txt
# 3) set the real DB network name in docker-compose.yml (networks.dbnet.name), then:
docker compose -f pgbouncer/docker-compose.yml up -d
Cut services over (incrementally)
Repoint each app's DATABASE_URL host/port from timescale_db:5432 to
pgbouncer:6432 — same dbname, user, and password — and redeploy it.
Migrate the superuser app pools first (webhook_receiver, ingest_worker,
dashboard_api backend, worker/cron/gateway) — they are the heaviest
consumers. Do them one at a time and watch SHOW POOLS; (below).
⚠️ Transaction-pooling caveats — read before cutting over
pool_mode = transaction returns the backend to the pool at every COMMIT/ROLLBACK,
so session-scoped features don't survive across transactions:
- Server-side prepared statements — the app must not rely on them, or set the
driver to not cache them (e.g. asyncpg
statement_cache_size=0; libpq simple query / psycopg2 default is fine). PgBouncer ≥1.21 supports prepared statements in transaction mode if you setmax_prepared_statements > 0— enable that if an app needs them. SET/RESETthat must persist between transactions, sessionLISTEN/NOTIFY, advisory locks held across transactions,WITH HOLDcursors, session temp tables.- Per-connection
optionsstartup GUCs are ignored (seeignore_startup_parameters). Apps that set GUCs via theoptions=DSN param must instead pin them at the role level:ALTER ROLE <app> SET statement_timeout = '...';etc.
The analytics MCP specifically
The MCP sends options=-c default_transaction_read_only=on -c statement_timeout=30000
on its DSN and calls set_session(readonly=True). Behind transaction pooling:
- The
optionsGUCs are dropped — butanalytics_roalready hasdefault_transaction_read_only=onandstatement_timeout=30spinned at the role level (scripts/analytics_ro_role.sql), so read-only enforcement is preserved. set_session(readonly=True)issues aSETthat can leak across pooled clients. Before pointing the MCP at PgBouncer, either drop that call (role default covers it) or run the MCP only insessionpooling (add a second[databases]alias withpool_mode=session). Given the MCP is a minor consumer, the simplest path is to leave the MCP connecting directly and pool only the heavy superuser apps.
Operating
# admin console
docker exec -it pgbouncer psql -h 127.0.0.1 -p 6432 -U pgbouncer_auth pgbouncer
# SHOW POOLS; -- cl_active / sv_active / waiting per pool
# SHOW CLIENTS; -- connected clients
# SHOW STATS; -- throughput
# sanity: confirm the DB now sees a small, stable backend count
docker exec -i "$DB" psql -U postgres -d tracksolid_db -c \
"SELECT usename, count(*) FROM pg_stat_activity GROUP BY 1 ORDER BY 2 DESC;"
Sizing rule: total backends PgBouncer opens = Σ(default_pool_size per database) + reserve_pool_size. Keep that well under max_connections (100), leaving headroom
for superuser/admin/background-worker connections that bypass PgBouncer. The shipped
config (20 + 5 reserve, one database) tops out at ~25 backends.