No description
Find a file
kiania 5e3fc3910b fix: harden MCP server reliability, build reproducibility, and auth
Addresses intermittent query failures on the live instance (container itself
is healthy — failures are application/query-level) plus security hardening.

Reliability (analytics_mcp.py):
- Discard dead pooled connections instead of recycling them. A broken socket
  (DB restart, network blip, crash) previously poisoned the pool and every
  later query handed that connection failed until container recreation. New
  _is_disconnect() classifies real connection loss (class-08 / 57P0x SQLSTATE,
  or socket-level OperationalError with pgcode=None) vs. an in-session query
  error like statement_timeout (QueryCanceled / 57014), which is NOT a
  disconnect and leaves the connection usable.
- query() retries ONCE, only on a genuine disconnect, so a recycled-but-stale
  connection is invisible to the analyst (a real query error still surfaces).
- Bound concurrent checkouts with a semaphore (POOL_MAX) so >POOL_MAX
  concurrent queries QUEUE instead of overflowing the pool and raising
  PoolError (a 500 to the analyst).
- Lazy pool (minconn=0) + retry on init, so a brief DB outage at deploy time
  no longer crash-loops the worker.

Build reproducibility:
- Commit uv.lock (was gitignored) and build with `uv sync --frozen` so
  redeploys can't silently pull a newer, behaviour-changed mcp/starlette.

Security:
- Constant-time Bearer-token comparison (hmac.compare_digest).
- /healthz no longer leaks the analyst/token count.
- Dockerfile runs as a non-root user.
- deploy.sh: Docker log rotation (bound disk) + Traefik rate-limit middleware.

Also: relax the SQL guard so a forbidden keyword inside a string literal (e.g.
WHERE summary ILIKE '%please delete%') no longer rejects a valid read; the
blocklist still rejects data-modifying CTEs (and writes are impossible anyway
via the analytics_ro GRANTs + read-only rolled-back txn). Fix stale docstrings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 23:28:58 +03:00
docs docs(analytics-mcp): document tickets + fuel schemas and MCP_READABLE_SCHEMAS 2026-06-17 11:37:25 +03:00
scripts feat(access): expose tickets + fuel schemas to analytics_ro (read-only) 2026-06-17 11:37:25 +03:00
.dockerignore fix: harden MCP server reliability, build reproducibility, and auth 2026-06-19 23:28:58 +03:00
.gitignore fix: harden MCP server reliability, build reproducibility, and auth 2026-06-19 23:28:58 +03:00
analytics_mcp.py fix: harden MCP server reliability, build reproducibility, and auth 2026-06-19 23:28:58 +03:00
deploy.sh fix: harden MCP server reliability, build reproducibility, and auth 2026-06-19 23:28:58 +03:00
Dockerfile fix: harden MCP server reliability, build reproducibility, and auth 2026-06-19 23:28:58 +03:00
pyproject.toml feat: read-only Fleet Analytics MCP server 2026-06-16 23:43:24 +03:00
README.md fix: disable MCP DNS-rebinding host guard behind reverse proxy 2026-06-17 00:00:40 +03:00
uv.lock fix: harden MCP server reliability, build reproducibility, and auth 2026-06-19 23:28:58 +03:00

Fleet Analytics MCP (read-only)

A read-only MCP server that lets the decision & analytics team query the Fireside fleet database (tracksolid_db — PostgreSQL 16 + TimescaleDB + PostGIS) directly from Claude — for reporting and decisions, never edit/delete.

It exposes a guarded general SELECT tool plus schema-introspection tools over the reporting.* (curated analytics layer) and tracksolid.* (raw telemetry) schemas, connecting as a dedicated least-privilege analytics_ro role. It is hosted on the same Coolify host as the database (the DB is internal-only and not reachable from a laptop), and authed with per-analyst Bearer tokens.

Sibling of the tracksolid_timescale_grafana_prod backend (the DB/ingestion stack) and the dashboard_api read bridge. This repo owns only the analytics MCP server and its analytics_ro role.

Read-only is enforced at four layers

  1. Role GRANTsanalytics_ro has only USAGE+SELECT on reporting/tracksolid and EXECUTE on reporting functions; no INSERT/UPDATE/DELETE, not the matview owner.
  2. default_transaction_read_only = on — set on the role and on every connection.
  3. Rolled-back transactions — every query runs in a txn that is rolled back, never committed.
  4. SQL guard — the query tool accepts a single SELECT/WITH statement only and rejects write/DDL keywords (clean errors instead of DB faults).

It deliberately does not import the backend's ts_shared_rev, so it carries none of the Tracksolid ingestion secrets — it needs only DATABASE_URL + MCP_AUTH_TOKENS.

Tools

Tool Purpose
query(sql, max_rows=1000) guarded read-only SELECT/WITH; auto-LIMIT; returns {row_count, truncated, rows}
list_schemas() readable schemas + object counts
list_tables(schema) tables + views in a schema
describe_table(schema, table) columns, types, nullability, defaults
list_functions(schema='reporting') reporting.fn_* signatures
sample_table(schema, table, n=20) first n rows (wrapper over query)

Layout

analytics_mcp.py                 # the MCP server (FastMCP streamable-HTTP; uvicorn target analytics_mcp:app)
Dockerfile                       # Coolify-buildable image (port 8892)
deploy.sh                        # manual host deploy (standalone Traefik bridge) — fallback to Coolify
scripts/analytics_ro_role.sql    # the read-only role DDL
scripts/bootstrap_analytics_ro.sh# host bootstrap: generate pw → apply role SQL
docs/ANALYTICS_MCP.md / .html    # full implementation guide + runbook

Deploy

The DB is internal-only, so the server runs on the same Coolify host as timescale_db.

0. Create the read-only role (once, on the host):

scp scripts/analytics_ro_role.sql scripts/bootstrap_analytics_ro.sh kianiadee@twala.rahamafresh.com:~/
ssh kianiadee@twala.rahamafresh.com 'bash ~/bootstrap_analytics_ro.sh'   # writes ~/.analytics_ro.pw (0600)

1a. Coolify-managed app (recommended): create a Coolify application from this repo (Forgejo repo.rahamafresh.com/kianiadee/fleetanalytics_mcp.git), Dockerfile build, app port 8892, attach the domain fleetmcp.rahamafresh.com (prod) / fleetmcp.fivetitude.com (staging). Set as secrets:

  • DATABASE_URL=postgresql://analytics_ro:<pw>@timescale_db:5432/tracksolid_db
  • MCP_AUTH_TOKENS=alice:<tok>,bob:<tok> (per-analyst)

Then connect the app to the network that can reach timescale_db (the tracksolid stack's network) so the timescale_db hostname resolves. Coolify manages the Traefik labels + TLS from the domain you set.

Optional env (sensible defaults):

  • MCP_MAX_ROWS (default 10000) — hard ceiling on rows a query may return.
  • MCP_DNS_REBINDING_PROTECTION (default 0/off) — the MCP SDK's localhost Host-header guard. It returns 421 behind a reverse proxy, so it's off by default (the service is TLS-terminated + Bearer-authed). Set 1 to enforce, with MCP_ALLOWED_HOSTS.
  • MCP_ALLOWED_HOSTS — comma-separated allowlist used only when the guard is on (default: the two fleetmcp.* domains + localhost).

1b. Manual host deploy (fallback): check this repo out on the host and run deploy.sh — it builds the image, derives the read-only DATABASE_URL from the running stack, and runs a standalone Traefik bridge. See the script header.

Add to Claude (for analysts)

claude mcp add --transport http fireside-analytics https://fleetmcp.rahamafresh.com \
  --header "Authorization: Bearer <your-token>"
claude mcp list      # → "fireside-analytics: connected"

Claude Desktop / claude.ai: add a custom connector with the same URL and Authorization: Bearer <your-token> header.

Local dev

uv sync
DATABASE_URL=postgresql://... MCP_AUTH_TOKENS=me:dev uv run uvicorn analytics_mcp:app --port 8892

Full reference, security notes, and verification checklist: docs/ANALYTICS_MCP.md.