# n8n S3 Ticket Exports — Incremental (CDC) Stream Updated on June 23, 2026. > **History.** This doc previously described a full-snapshot-per-hour model > ("No delta files … No `changes/` directories"). That is no longer accurate. > As of the June 22, 2026 re-seed the source switched to an **incremental > change-data-capture (CDC) stream** under `automations//changes/`. > The structure below was verified by direct S3 inspection of the `tickets` > bucket on June 23, 2026. Workflow-internal details (IDs, node behaviour) are > carried over from the prior version and may be stale — trust the bucket. ## Overview The FTTH ticket export now writes an **incremental** CSV stream per dataset: - The **first** file in a stream is a full current-state **baseline**. - Every **later** file holds **only the rows that changed** since the prior export — new and updated tickets, keyed by `ticket_id`. - **Deletions are never emitted** (tickets are closed in place, not removed). Consumers must ingest **every not-yet-processed file in ascending timestamp order** (baseline first, then each delta) and **upsert on `ticket_id`**. Taking only the newest file silently drops the intermediate deltas. CSV files only. Filenames use the execution time in the `Africa/Nairobi` timezone (format below). All files share one identical flat-CSV schema (header + rows) — the same column set the previous full snapshots used. ## Output Layout The change stream lives under a `changes/` prefix per dataset in the `tickets` bucket: ```text automations/crq/changes/YYYY-MM-DDTHH-mm-ss.csv automations/inc/changes/YYYY-MM-DDTHH-mm-ss.csv ``` Observed `tickets` bucket layout (June 23, 2026): ```text automations/inc/ ├── changes/ ← ACTIVE incremental stream (baseline + deltas) │ ├── 2026-06-22T15-50-39.csv (baseline, ~15 MB, 34,849 rows) │ ├── 2026-06-22T15-53-04.csv (delta, 1 row) │ ├── 2026-06-22T17-10-41.csv (delta, 22 rows) │ └── 2026-06-22T17-15-41.csv (delta, 131 rows) ├── processed/ ← our pipeline's archive of consumed files ├── full/ ← present but EMPTY (legacy prefix) ├── latest.csv/ ← present but EMPTY (legacy prefix) └── latest.json/ ← present but EMPTY (legacy prefix) ``` Notes: - There are **no longer any `automations/inc/.csv` files at the root** of `inc/` — the last full snapshots (through `2026-06-18T17-00-05.csv`) were archived to `processed/`. New data arrives **only** under `changes/`. - The `full/`, `latest.csv/`, and `latest.json/` prefixes still appear in listings but contain **no objects** (leftover/legacy; ignore them). There is no `latest` pointer and no JSON/metadata envelope. - **CRQ mirrors INC**: `automations/crq/changes/` carries the same incremental stream (with matching baseline timestamps and additional newer deltas). CRQ remains out of scope for `import_tickets.py` (INC-only), but the source-side shape is the same. CRQ's old root snapshots (`automations/crq/.csv`) are still present because nothing archives them — they are not consumed. ## CSV Schema Header (32 columns), identical across baseline and delta files: ```text ticket_id, source_type, service_type, bucket, raw_status, normalized_status, created_at_service, scheduled_at, closed_at, last_seen_at, first_seen_at, week_start, week_end, cluster, region, location_name, latitude, longitude, department, assigned_team, owner, sla_status, mttr, is_auto_created, is_auto_closed, is_alarm, is_actionable, source_s3_bucket, source_s3_key, source_snapshot_id, created_at, updated_at ``` Each row is a complete record (not a partial diff): a delta row carries the ticket's full current state, so a plain upsert on `ticket_id` is correct. The baseline still contains `is_alarm=true` rows and may include a leading `EXPORT STOPPED…` truncation-sentinel row in `ticket_id`; both are filtered by the consumer (see `import_tickets.py`). ## Timestamp Format ```text YYYY-MM-DDTHH-mm-ss e.g. 2026-06-22T15-50-39 ``` Generated once at the start of each execution, formatted in `Africa/Nairobi` (EAT). Note this is the *execution* time, not a top-of-the-hour schedule — the incremental files appear whenever a change batch is exported (multiple within the same hour are normal, e.g. `15-50-39` then `15-53-04`). ## How `import_tickets.py` Consumes It `python import_tickets.py --from-bucket --apply` (see `run_ingest.sh`): 1. Lists `automations/inc/changes/.csv`, sorts ascending by timestamp. 2. Skips files at/older than the **watermark** (`tickets.import_meta.metadata->>'source_max_key'` — the newest file already applied); on a fresh stream it processes everything present. 3. For each pending file, oldest→newest: drop `is_alarm=true` + sentinel rows, strip `DROP_FIELDS`, normalize `region`/`raw_status`, then upsert on `ticket_id`. The row upsert and the watermark advance **commit in one transaction per file**, after which the file is moved to `automations/inc/processed/`. 4. A mid-run failure therefore leaves folder state consistent with the watermark; the next run resumes cleanly from where it stopped. The first file applied onto an empty watermark is recorded as `export_type="baseline"` in `tickets.import_meta`; every file after is `"delta"`. ## Workflow Context (carried over — verify before relying on) The export originates from the FTTH Automation Ticket export workflow, calling the authenticated Scoreboard export endpoint and uploading CSV(s) to the `tickets` bucket; a sibling workflow exports `fuel_records/.csv` to the `fuel` bucket. Source DB queries are read-only and the workflows do not delete or update source rows. The previously documented workflow IDs and the claim of "two files per hour, full snapshots, no `changes/`" predate the June 22 switch to the incremental stream and should be re-confirmed against the live n8n configuration before being treated as current.