fleetanalytics_mcp/Dockerfile

37 lines
1.7 KiB
Docker
Raw Permalink Normal View History

# fleetanalytics-mcp — read-only Fleet Analytics MCP server.
# Coolify auto-detects this Dockerfile: set the app port to 8892, attach the
# domain (e.g. fleetmcp.rahamafresh.com) in the Coolify UI, set DATABASE_URL
# (analytics_ro DSN) + MCP_AUTH_TOKENS as secrets, and connect the app to the
# network that can reach timescale_db. See README.md / docs/ANALYTICS_MCP.md.
FROM python:3.12-slim
# uv for fast, reproducible dependency installs.
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
WORKDIR /app
# Install ONLY dependencies (flat module — the project itself is not a package).
fix: harden MCP server reliability, build reproducibility, and auth Addresses intermittent query failures on the live instance (container itself is healthy — failures are application/query-level) plus security hardening. Reliability (analytics_mcp.py): - Discard dead pooled connections instead of recycling them. A broken socket (DB restart, network blip, crash) previously poisoned the pool and every later query handed that connection failed until container recreation. New _is_disconnect() classifies real connection loss (class-08 / 57P0x SQLSTATE, or socket-level OperationalError with pgcode=None) vs. an in-session query error like statement_timeout (QueryCanceled / 57014), which is NOT a disconnect and leaves the connection usable. - query() retries ONCE, only on a genuine disconnect, so a recycled-but-stale connection is invisible to the analyst (a real query error still surfaces). - Bound concurrent checkouts with a semaphore (POOL_MAX) so >POOL_MAX concurrent queries QUEUE instead of overflowing the pool and raising PoolError (a 500 to the analyst). - Lazy pool (minconn=0) + retry on init, so a brief DB outage at deploy time no longer crash-loops the worker. Build reproducibility: - Commit uv.lock (was gitignored) and build with `uv sync --frozen` so redeploys can't silently pull a newer, behaviour-changed mcp/starlette. Security: - Constant-time Bearer-token comparison (hmac.compare_digest). - /healthz no longer leaks the analyst/token count. - Dockerfile runs as a non-root user. - deploy.sh: Docker log rotation (bound disk) + Traefik rate-limit middleware. Also: relax the SQL guard so a forbidden keyword inside a string literal (e.g. WHERE summary ILIKE '%please delete%') no longer rejects a valid read; the blocklist still rejects data-modifying CTEs (and writes are impossible anyway via the analytics_ro GRANTs + read-only rolled-back txn). Fix stale docstrings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 20:28:58 +00:00
# Copy the lockfile and build --frozen so rebuilds are reproducible: without it,
# `uv sync` re-resolves the >= ranges in pyproject.toml and a redeploy could pull a
# newer, behaviour-changed mcp/starlette and break the running server.
COPY pyproject.toml uv.lock ./
RUN uv sync --no-dev --no-install-project --frozen
ENV PATH="/app/.venv/bin:$PATH"
COPY analytics_mcp.py ./
fix: harden MCP server reliability, build reproducibility, and auth Addresses intermittent query failures on the live instance (container itself is healthy — failures are application/query-level) plus security hardening. Reliability (analytics_mcp.py): - Discard dead pooled connections instead of recycling them. A broken socket (DB restart, network blip, crash) previously poisoned the pool and every later query handed that connection failed until container recreation. New _is_disconnect() classifies real connection loss (class-08 / 57P0x SQLSTATE, or socket-level OperationalError with pgcode=None) vs. an in-session query error like statement_timeout (QueryCanceled / 57014), which is NOT a disconnect and leaves the connection usable. - query() retries ONCE, only on a genuine disconnect, so a recycled-but-stale connection is invisible to the analyst (a real query error still surfaces). - Bound concurrent checkouts with a semaphore (POOL_MAX) so >POOL_MAX concurrent queries QUEUE instead of overflowing the pool and raising PoolError (a 500 to the analyst). - Lazy pool (minconn=0) + retry on init, so a brief DB outage at deploy time no longer crash-loops the worker. Build reproducibility: - Commit uv.lock (was gitignored) and build with `uv sync --frozen` so redeploys can't silently pull a newer, behaviour-changed mcp/starlette. Security: - Constant-time Bearer-token comparison (hmac.compare_digest). - /healthz no longer leaks the analyst/token count. - Dockerfile runs as a non-root user. - deploy.sh: Docker log rotation (bound disk) + Traefik rate-limit middleware. Also: relax the SQL guard so a forbidden keyword inside a string literal (e.g. WHERE summary ILIKE '%please delete%') no longer rejects a valid read; the blocklist still rejects data-modifying CTEs (and writes are impossible anyway via the analytics_ro GRANTs + read-only rolled-back txn). Fix stale docstrings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 20:28:58 +00:00
# Run as a non-root user (least privilege; nothing here needs root).
RUN useradd -m -u 10001 app && chown -R app:app /app
USER app
EXPOSE 8892
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8892/healthz').status==200 else 1)" || exit 1
# Single worker: this is a low-traffic read-only proxy for a handful of analysts, and
# the DB connection budget = workers × MCP_POOL_MAX. One worker (× default pool 8) caps
# the MCP at 8 backends instead of 16, which matters on a shared 100-connection DB.
# Scale out by raising MCP_POOL_MAX, not workers, so the budget stays obvious.
CMD ["uvicorn", "analytics_mcp:app", "--host", "0.0.0.0", "--port", "8892", "--workers", "1"]