# n8n Docker-log Alerting (ntfy + WhatsApp) ## Context The user runs a Coolify host at `twala.rahamafresh.com` with ~50 containers across ~15 logically distinct services (tracksolid telemetry pipeline, Coolify itself, n8n stacks, Supabase, Chatwoot, Evolution API, Dekart, Forgejo, Ente, Garage, etc.). They want **n8n to read Docker logs directly, segment by service, apply per-service thresholds, and notify via ntfy and WhatsApp**. Dozzle is explicitly out of scope as an integration source — it stays as a human-facing log viewer. The integration design must not depend on it. Why this matters: today, errors in any container are invisible until someone opens Dozzle. Critical issues (panics, OOMs, ingest failures on the tracksolid pipeline) can sit unnoticed for hours. The goal is per-service alerting with severity-aware routing, with thresholds tunable per service so that noisy services don't drown out quiet ones. ## Decisions (locked with the user) | Choice | Decision | | --- | --- | | n8n instance | `n8n-o55elukmxacgp1s2xcwktyam` (queue mode: main + worker + task-runners + Postgres + Redis) | | Docker log access | New **read-only log-proxy** container — n8n never touches `/var/run/docker.sock` | | Service grouping | Auto-derive from each container's `COOLIFY_RESOURCE_UUID` env var | | Channels | Self-hosted ntfy (new Coolify service) **+** existing Evolution API (WhatsApp) | | Git | Workspace `/Users/kianiadee/Downloads/projects/03_dozzle_n8n` is **not** a git repo yet — user creates separate repo later | ## Architecture ``` n8n queue-mode (o55elukmxacgp1s2xcwktyam) ┌────────────────────────────────────────┐ Docker Engine log-proxy │ Workflow: Poll & Evaluate (per group) │ ┌──────────────┐ (new svc) │ 1. GET /logs/?since= │ │ /var/run/ │ ◄─── RO socket ────► │ 2. regex → severity │ │ docker.sock │ HTTP API │ 3. threshold + cooldown via │ └──────────────┘ (internal net) ◄─┤ getWorkflowStaticData() │ │ 4. emit Alert event │ │ │ │ Workflow: Notify (single, parametric) │ │ severity=critical → ntfy + WhatsApp │ │ severity=error → ntfy │ │ severity=warn → ntfy (low prio) │ └────────────────────┬───────────────────┘ │ ┌──────────────────────┴──────────────────────┐ ▼ ▼ ntfy (self-hosted via Coolify) Evolution API (api-vc4ok...) POST / POST /message/sendText/ ``` ## Components ### 1. log-proxy (new container) **Purpose**: the only thing with `docker.sock` access. Dumb pipe — no alerting logic. **Image**: small Python/FastAPI or Node/Fastify app (~50 lines). Build from source in this repo. **Mount**: `/var/run/docker.sock` read-only. **Network**: joined to the n8n stack's Coolify network so n8n can reach it by hostname; **no Traefik route** (not publicly reachable). **API** (no auth needed — internal only; optional bearer token for defence in depth): - `GET /services` — `[{ "group": "bo3no...", "name": "tracksolid", "containers": [...] }, ...]` - Groups containers by `COOLIFY_RESOURCE_UUID` env var. - Filtered to the allow-list in `/config/groups.yml` — UUIDs not listed are skipped entirely. - `GET /logs/?since=&until=&limit=2000` - Calls Docker Engine API `GET /containers//logs?stdout=1&stderr=1&since=...&until=...×tamps=1` for every container in the group. - Returns NDJSON or JSON array of `{ container, ts, stream, line }`. - `since` defaults to "now − 60s" if absent; `until` defaults to "now". - `GET /healthz` **Why a proxy and not direct socket-into-n8n**: any n8n editor user becomes root-on-host if n8n has the socket. Proxy keeps blast radius small and the API surface inspectable. ### 2. Self-hosted ntfy Deploy via Coolify's one-click marketplace (or as a Docker Compose service). - Suggested FQDN (matches your existing pattern): `ntfy.rahamafresh.com` - Auth: enable `auth-default-access: deny-all`; create per-topic users (one publisher user for n8n, plus client users for each subscriber). - Topics: one per service group, e.g. `tracksolid-alerts`, `coolify-alerts`, `evolution-api-alerts`. Subscribe on phones via the ntfy mobile app. ### 3. n8n workflows (in `n8n-o55elukmxacgp1s2xcwktyam`) **A. Poll & Evaluate** (one workflow per service group — easiest to tune independently) Nodes: 1. **Schedule Trigger** — every 30s (tunable per group). 2. **Static Data Read** — pull `last_cursor` from `$getWorkflowStaticData('global').cursor`. 3. **HTTP Request** — `GET http://log-proxy:8080/logs/?since=`. 4. **Function (Pattern Match)** — for each line, run severity regexes (from workflow Variables) and emit `{ severity, pattern, container, ts, line, fingerprint }` where `fingerprint = sha256(group:pattern:container)` (used for cooldown). 5. **Function (Threshold + Cooldown)**: - `critical`: emit immediately if not in cooldown. - `error`: count rolling matches per fingerprint over `window` minutes; emit when threshold crossed. - `warn`: same but larger window / threshold. - Cooldown: `staticData.cooldowns[fingerprint] = now + cooldown_minutes`; skip while still hot. 6. **Static Data Write** — update `cursor = max(ts seen)` and `cooldowns`. 7. **Execute Workflow** — call the **Notify** workflow once per emitted Alert. **B. Notify** (single parametric workflow; called by each Poll workflow) Input: `{ group, severity, pattern, container, ts, line, fingerprint }` Nodes: 1. **Switch** on `severity`. 2. **critical** branch: - **HTTP Request** → ntfy: `POST https://ntfy.rahamafresh.com/-alerts` with priority=5, tags=`rotating_light`. - **HTTP Request** → Evolution API: `POST https:///message/sendText/` with `{ number, text }`. Credentials via n8n credentials store. 3. **error** branch: ntfy only, priority=4. 4. **warn** branch: ntfy only, priority=3. 5. **Append-row** (Postgres node, optional) → `alerts_audit` table for history. ### 4. Defaults (tunable per group via workflow Variables) | Severity | Default patterns | Threshold | Cooldown | Routing | | --- | --- | --- | --- | --- | | critical | `panic`, `FATAL`, `OOMKilled`, `out of memory`, `segmentation fault` | immediate (1 match) | 30 min | ntfy + WhatsApp | | error | `\bERROR\b`, `Exception`, `Traceback`, `5\d\d ` (HTTP 5xx) | 10 / 5 min | 15 min | ntfy | | warn | `\bWARN(ING)?\b`, `deadlock`, `timeout` | 50 / 15 min | 30 min | ntfy (low prio) | These live as a JSON object in each workflow's Variables, so per-group tuning is one edit. ### 5. Group naming Friendly names mapped from Coolify resource UUID — sourced from `groups.yml` mounted into log-proxy. **`groups.yml` is also the allow-list**: only UUIDs listed here are monitored. Anything else the proxy sees on the host is ignored — non-mission-critical apps don't generate noise or burn polling cycles. ```yaml bo3nov2ija7g8wn9b1g2paxs: tracksolid o55elukmxacgp1s2xcwktyam: n8n-prod usoksgg8o40044g0cw08s8wc: n8n-simple vc4ok84gw4s0kcgwwg8gooco: evolution-api ks4sc8k4804swk0c0c4kk44c: chatwoot foo048cw4skg8kswwsowwo0c: forgejo u7rj0du43d33ncurig2t6ni1: dekart e11bva63bu7swlq6zyfckxm3: rustfs now8k08wcs044scwggos0wos: dozzle # Coolify core, Supabase, shutterdiplomacy → handled as their own groups # # Explicitly NOT monitored (non-mission-critical, per user 2026-05-17): # dy82njm7qgb5f2m573d1u3rh garage # r77s24tgmfifmpfqe86xyqsp ente # vw0wk0cg8gkwgwogsg4k0gsg excalidraw ``` Implication on the proxy: `GET /services` returns only allow-listed groups; `GET /logs/` 404s for non-allow-listed UUIDs. To start monitoring a service later, add a single line to `groups.yml` and clone a Poll workflow. ## Workspace layout ``` /Users/kianiadee/Downloads/projects/03_dozzle_n8n/ ← no git yet ├── log-proxy/ │ ├── Dockerfile │ ├── app.py (FastAPI: /services, /logs/, /healthz) │ ├── requirements.txt │ └── groups.yml (UUID → friendly-name map) ├── ntfy/ │ └── README.md (Coolify deploy notes + topic / user setup) ├── n8n/ │ └── workflows/ │ ├── poll-tracksolid.json │ ├── poll-coolify.json │ ├── poll-evolution.json │ ├── poll-.json ← one per group, derived from a template │ └── notify.json ← parametric fan-out ├── coolify/ │ └── log-proxy.compose.yml (for Coolify "Docker Compose" service) └── README.md (operating runbook: how to add a group, tune thresholds, rotate ntfy creds) ``` ## Implementation steps (ordered) 1. **Build log-proxy** locally (`log-proxy/`). Test against the remote docker socket via `docker context` or just deploy and iterate. 2. **Deploy log-proxy via Coolify** as a Docker Compose service. Attach to the same network as `n8n-o55...`. No Traefik route. Verify `GET /services` and `GET /logs/` from inside the n8n container (`docker exec n8n-o55... wget -qO- http://log-proxy:8080/services`). 3. **Deploy self-hosted ntfy via Coolify** at `ntfy.rahamafresh.com`. Configure deny-all default and one publisher user. Subscribe phones to test topic. 4. **Build the parametric Notify workflow** in n8n. Add credentials: `ntfy_publisher` (HTTP basic), `evolution_api` (header auth). Test by manually firing each branch. 5. **Build the Poll & Evaluate workflow** for **one group first** (suggest `tracksolid` — highest business value). Validate thresholds with a synthetic log line (`docker exec ingest_events-bo3no... sh -c 'echo FATAL test'` or similar). 6. **Clone the Poll workflow per remaining group**. Tune patterns / thresholds in Variables. 7. **Tune & quiet**: run for 24h, capture false positives, adjust regex / thresholds. 8. **Document** in `README.md` how to add a new group when Coolify spins up a new service. ## Critical files - `log-proxy/app.py` — the only thing with docker.sock access. Treat as security-sensitive; no write endpoints, no shell-out. - `log-proxy/groups.yml` — single source of truth for UUID → friendly name. Keep in sync as Coolify services are added. - `n8n/workflows/notify.json` — fan-out logic; any new channel (Slack, email) is added here, not in each poll workflow. - `n8n/workflows/poll-.json` — per-group thresholds. Variables block at the top is the only thing operators normally edit. - `coolify/log-proxy.compose.yml` — controls log-proxy deployment + network attachment. Misconfiguring network = n8n can't reach proxy. ## Reused / existing infrastructure - **n8n queue mode** `n8n-o55elukmxacgp1s2xcwktyam` — runs the workflows; its built-in Postgres + Redis cover persistence and queueing. No new DB needed. - **Evolution API** `api-vc4ok84gw4s0kcgwwg8gooco` — already deployed; we only consume its REST API. - **Coolify Sentinel** `coolify-sentinel` — left untouched; could later feed container-down events into the same Notify workflow if desired. - **Coolify networks + Traefik** — handle internal service discovery and TLS for ntfy. - **All Coolify-managed containers already carry `COOLIFY_RESOURCE_UUID`** — confirmed via `docker inspect` on the Dozzle container in the previous session. This is what makes auto-grouping possible without a hand-written container list. ## Open items to gather at implementation time - `ntfy.rahamafresh.com` DNS record (or chosen FQDN). - Evolution API: instance name, API key, target WhatsApp number(s). - Confirmation of which Coolify network `n8n-o55...` runs on (read from `docker inspect` at implementation start). - Optional: bearer token value for log-proxy if defence-in-depth is wanted. ## Verification 1. **log-proxy unit checks**: from inside n8n container, `curl http://log-proxy:8080/services` returns all groups; `curl http://log-proxy:8080/logs/tracksolid?since=$(date -d '5 minutes ago' +%s)` returns recent lines from all tracksolid containers. 2. **End-to-end critical alert**: run `docker run --rm alpine sh -c 'echo "FATAL synthetic test from $(date)"'` inside a tracksolid container; within 30s, ntfy topic `tracksolid-alerts` receives a high-priority message AND WhatsApp number receives the same. 3. **Threshold smoke test**: emit 11 lines containing `ERROR` to a single container over 30s; expect exactly one ntfy notification, not eleven. 4. **Cooldown smoke test**: trigger the same critical alert twice within the cooldown window; expect only one notification. 5. **Cursor durability**: restart the n8n worker; confirm cursor in `getWorkflowStaticData` persisted in Postgres and no logs were re-processed or skipped. 6. **Per-group isolation**: deliberately spam errors in one group; confirm other groups' workflows are unaffected (separate workflow = separate static data, separate schedule). 7. **Read-only safety**: from inside n8n, attempt `POST http://log-proxy:8080/anything` — expect 404/405. Confirm `docker.sock` is not mounted inside n8n.