Commit graph

5 commits

Author SHA1 Message Date
kiania
af6fdbcd3f fix(logging): attribute each query to its analyst caller
The BearerAuth middleware matched a per-analyst token but only stashed it on
request.state, which the FastMCP tools never see — so the query log line logged
rows/sql with no caller, defeating the per-token attribution the auth design
promises. Bridge the caller name through a ContextVar (anyio propagates it into
the worker thread that runs each sync tool) and include it in the query() log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 16:54:07 +03:00
kiania
5e3fc3910b fix: harden MCP server reliability, build reproducibility, and auth
Addresses intermittent query failures on the live instance (container itself
is healthy — failures are application/query-level) plus security hardening.

Reliability (analytics_mcp.py):
- Discard dead pooled connections instead of recycling them. A broken socket
  (DB restart, network blip, crash) previously poisoned the pool and every
  later query handed that connection failed until container recreation. New
  _is_disconnect() classifies real connection loss (class-08 / 57P0x SQLSTATE,
  or socket-level OperationalError with pgcode=None) vs. an in-session query
  error like statement_timeout (QueryCanceled / 57014), which is NOT a
  disconnect and leaves the connection usable.
- query() retries ONCE, only on a genuine disconnect, so a recycled-but-stale
  connection is invisible to the analyst (a real query error still surfaces).
- Bound concurrent checkouts with a semaphore (POOL_MAX) so >POOL_MAX
  concurrent queries QUEUE instead of overflowing the pool and raising
  PoolError (a 500 to the analyst).
- Lazy pool (minconn=0) + retry on init, so a brief DB outage at deploy time
  no longer crash-loops the worker.

Build reproducibility:
- Commit uv.lock (was gitignored) and build with `uv sync --frozen` so
  redeploys can't silently pull a newer, behaviour-changed mcp/starlette.

Security:
- Constant-time Bearer-token comparison (hmac.compare_digest).
- /healthz no longer leaks the analyst/token count.
- Dockerfile runs as a non-root user.
- deploy.sh: Docker log rotation (bound disk) + Traefik rate-limit middleware.

Also: relax the SQL guard so a forbidden keyword inside a string literal (e.g.
WHERE summary ILIKE '%please delete%') no longer rejects a valid read; the
blocklist still rejects data-modifying CTEs (and writes are impossible anyway
via the analytics_ro GRANTs + read-only rolled-back txn). Fix stale docstrings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 23:28:58 +03:00
kiania
f83f67e73f feat(access): expose tickets + fuel schemas to analytics_ro (read-only)
The analytics_ro role only had USAGE/SELECT on reporting + tracksolid, so
the tickets schema (INC/CRQ, 8 tables + 1 view + 7 fns) and fuel schema
were invisible to the MCP server — queries failed with permission denied.

- analytics_ro_role.sql: GRANT USAGE/SELECT/EXECUTE on tickets + fuel.
  Default privileges for these are keyed to postgres (their owner), not
  tracksolid_owner, so future objects auto-grant correctly.
- analytics_mcp.py: READABLE_SCHEMAS now includes tickets + fuel and is
  overridable via MCP_READABLE_SCHEMAS, so the introspection helpers
  (list_tables/describe_table/sample_table) work for them too.
- deploy.sh: reuse existing analyst tokens from the running container when
  MCP_AUTH_TOKENS is unset, so a code-only redeploy needs no secret.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 11:37:25 +03:00
david kiania
0c4848c656 fix: disable MCP DNS-rebinding host guard behind reverse proxy
The MCP SDK's transport-security DNS-rebinding protection only accepts a
localhost Host header by default and returns 421 behind Traefik (Host =
fleetmcp.*). It targets browser attacks on localhost-bound servers and does
not apply to a public, TLS-terminated, Bearer-authenticated service. Off by
default now; re-enableable via MCP_DNS_REBINDING_PROTECTION=1 + MCP_ALLOWED_HOSTS.
Also: deploy.sh health echo uses python (slim image has no curl).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 00:00:40 +03:00
david kiania
1eda59fe06 feat: read-only Fleet Analytics MCP server
Standalone, hosted MCP server that lets the decision & analytics team query
the fleet database (reporting.* / tracksolid.*) from Claude — read-only, for
reporting and decisions, never edit/delete.

- analytics_mcp.py: FastMCP streamable-HTTP server. Tools: query (guarded
  single SELECT/WITH, auto-LIMIT, write/DDL blocked), list_schemas,
  list_tables, describe_table, list_functions, sample_table. Per-analyst
  Bearer auth; /healthz exempt. No ts_shared_rev import (carries no ingestion
  secrets).
- Read-only enforced at four layers: analytics_ro GRANTs,
  default_transaction_read_only=on, rolled-back txn, SQL keyword guard.
- scripts/: analytics_ro_role.sql + bootstrap_analytics_ro.sh (dedicated
  least-privilege role, password in host-only ~/.analytics_ro.pw).
- Dockerfile + pyproject (uv, package=false) for Coolify build; deploy.sh
  manual host fallback (standalone Traefik bridge on the tracksolid_db host).
- docs/ANALYTICS_MCP.{md,html} + README: architecture, deploy runbook,
  add-to-Claude, verification, security notes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 23:43:24 +03:00