fleetanalytics_mcp

8 commits 5 branches 0 tags 231 KiB

Author	SHA1	Message	Date
kianiadee	a36542dbc9	Merge pull request 'fix: harden MCP server reliability, build reproducibility, and auth' (#1 ) from fix/reliability-pool-build-guard into main	2026-06-19 21:40:07 +00:00
kiania	c02c127798	fix(connections): shrink MCP DB-connection footprint on a shared 100-conn DB The DB is at max_connections=100 and several stack services hold persistent pools (several as the postgres superuser, idle for hours), so peaks hit "too many connections". The MCP is a minor contributor but easy to bound: - Dockerfile: uvicorn --workers 2 → 1. The MCP's connection budget is workers × MCP_POOL_MAX, so this caps it at 8 backends instead of 16. Scale via MCP_POOL_MAX, not workers, so the budget stays obvious. (Pairs with the minconn=0 lazy pool already on this branch: 0 connections held when idle.) - analytics_ro_role.sql: add idle_session_timeout=5min so the DB reaps the MCP's idle POOLED connections (idle_in_transaction never reaps them — they're idle outside a txn) and returns the slots. Safe because the server now discards + transparently retries a reaped connection instead of erroring. Note: the dominant fix is stack-wide (get the superuser app pools onto bounded, timed roles; consider PgBouncer; or raise max_connections) — out of this repo's scope but documented in the review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 23:38:22 +03:00
kiania	5e3fc3910b	fix: harden MCP server reliability, build reproducibility, and auth Addresses intermittent query failures on the live instance (container itself is healthy — failures are application/query-level) plus security hardening. Reliability (analytics_mcp.py): - Discard dead pooled connections instead of recycling them. A broken socket (DB restart, network blip, crash) previously poisoned the pool and every later query handed that connection failed until container recreation. New _is_disconnect() classifies real connection loss (class-08 / 57P0x SQLSTATE, or socket-level OperationalError with pgcode=None) vs. an in-session query error like statement_timeout (QueryCanceled / 57014), which is NOT a disconnect and leaves the connection usable. - query() retries ONCE, only on a genuine disconnect, so a recycled-but-stale connection is invisible to the analyst (a real query error still surfaces). - Bound concurrent checkouts with a semaphore (POOL_MAX) so >POOL_MAX concurrent queries QUEUE instead of overflowing the pool and raising PoolError (a 500 to the analyst). - Lazy pool (minconn=0) + retry on init, so a brief DB outage at deploy time no longer crash-loops the worker. Build reproducibility: - Commit uv.lock (was gitignored) and build with `uv sync --frozen` so redeploys can't silently pull a newer, behaviour-changed mcp/starlette. Security: - Constant-time Bearer-token comparison (hmac.compare_digest). - /healthz no longer leaks the analyst/token count. - Dockerfile runs as a non-root user. - deploy.sh: Docker log rotation (bound disk) + Traefik rate-limit middleware. Also: relax the SQL guard so a forbidden keyword inside a string literal (e.g. WHERE summary ILIKE '%please delete%') no longer rejects a valid read; the blocklist still rejects data-modifying CTEs (and writes are impossible anyway via the analytics_ro GRANTs + read-only rolled-back txn). Fix stale docstrings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 23:28:58 +03:00
kiania	0355047fdd	docs(analytics-mcp): document tickets + fuel schemas and MCP_READABLE_SCHEMAS Reflect the live state: readable data-surface table (reporting/tracksolid/ tickets/fuel + owners), the owner-keyed default-privilege gotcha, the tickets.inc typed-vs-raw column note, the env knob, code-only redeploy that reuses tokens, and tickets example prompts. Status flipped to deployed & live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 11:37:25 +03:00
kiania	f83f67e73f	feat(access): expose tickets + fuel schemas to analytics_ro (read-only) The analytics_ro role only had USAGE/SELECT on reporting + tracksolid, so the tickets schema (INC/CRQ, 8 tables + 1 view + 7 fns) and fuel schema were invisible to the MCP server — queries failed with permission denied. - analytics_ro_role.sql: GRANT USAGE/SELECT/EXECUTE on tickets + fuel. Default privileges for these are keyed to postgres (their owner), not tracksolid_owner, so future objects auto-grant correctly. - analytics_mcp.py: READABLE_SCHEMAS now includes tickets + fuel and is overridable via MCP_READABLE_SCHEMAS, so the introspection helpers (list_tables/describe_table/sample_table) work for them too. - deploy.sh: reuse existing analyst tokens from the running container when MCP_AUTH_TOKENS is unset, so a code-only redeploy needs no secret. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 11:37:25 +03:00
david kiania	0c4848c656	fix: disable MCP DNS-rebinding host guard behind reverse proxy The MCP SDK's transport-security DNS-rebinding protection only accepts a localhost Host header by default and returns 421 behind Traefik (Host = fleetmcp.*). It targets browser attacks on localhost-bound servers and does not apply to a public, TLS-terminated, Bearer-authenticated service. Off by default now; re-enableable via MCP_DNS_REBINDING_PROTECTION=1 + MCP_ALLOWED_HOSTS. Also: deploy.sh health echo uses python (slim image has no curl). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 00:00:40 +03:00
david kiania	6d79ad32fb	docs: fix stale deploy-script name in bootstrap closing message Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 23:52:36 +03:00
david kiania	1eda59fe06	feat: read-only Fleet Analytics MCP server Standalone, hosted MCP server that lets the decision & analytics team query the fleet database (reporting.* / tracksolid.*) from Claude — read-only, for reporting and decisions, never edit/delete. - analytics_mcp.py: FastMCP streamable-HTTP server. Tools: query (guarded single SELECT/WITH, auto-LIMIT, write/DDL blocked), list_schemas, list_tables, describe_table, list_functions, sample_table. Per-analyst Bearer auth; /healthz exempt. No ts_shared_rev import (carries no ingestion secrets). - Read-only enforced at four layers: analytics_ro GRANTs, default_transaction_read_only=on, rolled-back txn, SQL keyword guard. - scripts/: analytics_ro_role.sql + bootstrap_analytics_ro.sh (dedicated least-privilege role, password in host-only ~/.analytics_ro.pw). - Dockerfile + pyproject (uv, package=false) for Coolify build; deploy.sh manual host fallback (standalone Traefik bridge on the tracksolid_db host). - docs/ANALYTICS_MCP.{md,html} + README: architecture, deploy runbook, add-to-Claude, verification, security notes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 23:43:24 +03:00