fix: harden MCP server reliability, build reproducibility, and auth #1
Loading…
Reference in a new issue
No description provided.
Delete branch "fix/reliability-pool-build-guard"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Addresses the intermittent query failures on the live MCP instance (the container itself is healthy —
RestartCount=0, no OOM — so the failures are application/query-level), plus build-reproducibility and security hardening.Reliability (
analytics_mcp.py)_is_disconnect()distinguishes a genuine connection loss (class-08 /57P0xSQLSTATE, or socket-levelOperationalErrorwithpgcode=None) from an in-session query error likestatement_timeout(QueryCanceled/57014), which leaves the connection usable.query()retries once, only on a genuine disconnect, so a recycled-but-stale connection is invisible to the analyst. A real query error (timeout, bad SQL) still surfaces immediately — no double-running of slow queries.POOL_MAX) so>POOL_MAXconcurrent queries queue instead of overflowing the pool and raisingPoolError(a 500 to the analyst).minconn=0) + retry on init, so a brief DB outage at deploy time no longer crash-loops the worker.Build reproducibility
uv.lock(was gitignored) and build withuv sync --frozen, so a redeploy can't silently pull a newer, behaviour-changedmcp/starlette.Security
hmac.compare_digest)./healthzno longer leaks the analyst/token count.deploy.sh: Docker log rotation (bounds disk — logs are flooded by internet bot scans) + Traefik rate-limit middleware.Correctness / usability
WHERE summary ILIKE '%please delete%') no longer rejects a valid read. The blocklist still rejects data-modifying CTEs, multi-statements, and writes (and writes are impossible anyway via theanalytics_roGRANTs + read-only rolled-back txn).reporting, tracksolid, tickets, fuel).Verification
py_compile+ruffclean (no new warnings vs.main).statement_timeout/guard errors → not retried).uv lock --checkclean.Not included (follow-ups)
docs/ANALYTICS_MCP.mdembeds an older copy of the pool code as a walkthrough — left as-is to avoid churn; worth a later refresh.deploy.sh). The new Traefik rate-limit label only applies via thedeploy.shpath.🤖 Generated with Claude Code