david kiania 5f5d71d500 feat(crq): add CRQ ingestion via shared engine + thin inc/crq entrypoints

Split the INC-only loader into a dataset-agnostic engine (pipeline.py, renamed
from import_tickets.py) parameterized by a Dataset config, with thin per-type
entrypoints inc/import_inc.py and crq/import_crq.py. CRQ shares INC's identical
32-column source schema and CDC change stream, so the engine is fully shared.

- pipeline.py: Dataset config (name/table/prefixes/key_regex/post_apply); INC
  keeps the capture_history post-apply hook, CRQ has none yet. geocode_locations
  now unions tickets.crq (geocoding is cross-dataset: one gazetteer/budget).
- crq/import_crq.py: drains automations/crq/changes/ from isptickets into
  tickets.crq (data layer + map; SLA/dashboard/history deferred).
- migrations/13_crq_columns.sql: CRQ mirror of 03 — typed STORED generated
  columns + indexes on tickets.crq (reuses tickets.eat_ts()).
- Deployment: Dockerfile/run_ingest.sh run both via `python -m`; pyproject
  packages inc/crq. Docs (README, implementation, deployment-and-operations,
  n8n export ref, phase-1) updated for the split + the one-time CRQ seed runbook.

tickets.crq already exists (mig 01, LIKE tickets.inc) and is unioned into
reporting.fn_tickets_for_map + resolve_ticket_geoms, so CRQ appears on the
existing Tickets map once seeded. Verified locally: ruff-clean new files, engine
lists/parses both streams against live S3 (crq=52 files, inc unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-25 23:16:38 +03:00

10 KiB

Raw Blame History

Deployment & Operations — fleettickets

Operational runbook for the INC + CRQ ingest pipelines as deployed on Coolify (host kianiadee@twala.rahamafresh.com, key ~/.ssh/id_ed25519). Covers the container, environment, schedule, auto-deploy webhook, the source-bucket cutover procedure, and verification. Secrets are referenced by where to retrieve them, never by value.

One image, two datasets. INC and CRQ share an identical 32-column source schema and the same isptickets bucket; they run as two Scheduled Tasks off the one container, via thin entrypoints python -m inc.import_inc / python -m crq.import_crq over the shared pipeline.py engine. Everything below applies to both unless noted.

What's deployed

Thing	Detail
Coolify app	`fleettickets` — id `15`, uuid `g14mwzo73q20g70vc6fzumya`, build pack `dockerfile`, git `main`
Container	built from this repo's `Dockerfile` (`python:3.12-slim`, `TZ=Africa/Nairobi`); kept alive with `tail -f /dev/null` (no web server)
Ingest (INC)	Coolify Scheduled Task `inc_tickets` → `python -m inc.import_inc --from-bucket --apply`
Ingest (CRQ)	Coolify Scheduled Task `crq_tickets` → `python -m crq.import_crq --from-bucket --apply`
DB	`tickets` schema in the shared `tracksolid_db` (internal host `timescale_db:5432`)
Source	`isptickets` S3 bucket, `automations/{inc,crq}/changes/<EAT-ts>.csv` CDC streams (see `../n8n-s3-ticket-exports.md` and `../README.md`)

Resolve the live container name (Coolify appends a random suffix):

ssh -i ~/.ssh/id_ed25519 kianiadee@twala.rahamafresh.com \
  'docker ps --filter name=g14mwzo73q20g70vc6fzumya --format "{{.Names}}" | head -1'

Schedule (cron)

Both Scheduled Tasks (inc_tickets, crq_tickets) run */20 6-20 * * * — every 20 min, 06:00–20:40 EAT. Coolify evaluates task cron in the server timezone (server_settings.server_timezone = Africa/Nairobi), so no UTC conversion — write EAT directly. The --from-bucket run is a cheap no-op when no new change file has arrived (watermark guard, per dataset), so a dense schedule is safe.

To change the frequency, edit the task in the Coolify UI, or in coolify-db:

UPDATE scheduled_tasks SET frequency = '*/20 6-20 * * *', updated_at = now()
WHERE name IN ('inc_tickets', 'crq_tickets');

The crq_tickets task is added the same way INC was — in the Coolify UI (Scheduled Tasks → Add) with command python -m crq.import_crq --from-bucket --apply, container fleettickets, cron */20 6-20 * * *. Coolify's scheduler re-reads scheduled_tasks each minute, so the change is picked up without a redeploy. Execution history: scheduled_task_executions.

The repo's Dockerfile, run_ingest.sh, and README.md document this same cron for the plain-host/VM fallback (CRON_TZ=Africa/Nairobi).

Environment variables

Set on the Coolify app (Environment Variables). Names only — values live in Coolify:

Var	Purpose
`DATABASE_URL`	`tracksolid_db` (internal `timescale_db:5432`)
`RUSTFS_ENDPOINT`	`https://s3.rahamafresh.com`
`RUSTFS_ACCESS_KEY` / `RUSTFS_SECRET_KEY`	`isptickets` bucket credentials
`RUSTFS_REGION`	`us-east-1`
`TICKETS_BUCKET`	`isptickets`
`GEOCODER_PROVIDER` / `GEOCODER_API_KEY`	keyed geocoder (LocationIQ/OpenCage)

Env vars are Laravel-encrypted in coolify-db — never raw-UPDATE them. Change them in the Coolify UI, or via artisan tinker (which re-encrypts on save):

ssh -i ~/.ssh/id_ed25519 kianiadee@twala.rahamafresh.com 'docker exec -i coolify php artisan tinker' <<'PHP'
$e = \App\Models\EnvironmentVariable::where('resourceable_type','App\\Models\\Application')
       ->where('resourceable_id',15)->where('key','TICKETS_BUCKET')->first();
$e->value = 'isptickets'; $e->save(); echo $e->value.PHP_EOL;
PHP

An env change only takes effect after the container is recreated (a redeploy — see below), since Coolify injects env at container create time.

Deploys

Auto-deploy (Forgejo → Coolify webhook)

A push to main should auto-deploy. This needs both the Coolify per-app Auto-Deploy toggle (Configuration → Advanced) and a webhook on the Forgejo repo. The webhook was missing originally (the toggle alone is not enough); it now exists as hook id 3 on kianiadee/fleettickets:

Field	Value
URL	`https://stage.rahamafresh.com/webhooks/source/gitea/events/manual`
Type / content-type	`gitea` / `json`
Events / branch filter	`push` / `main`
Secret	the app's `manual_webhook_secret_gitea` (Coolify HMAC-validates `X-Hub-Signature-256`)

Recreate / inspect it via the Forgejo API (auth: git credential fill, host repo.rahamafresh.com, basic auth to /api/v1 — no tea/gh needed). Get the secret by decrypting it in Coolify:

ssh -i ~/.ssh/id_ed25519 kianiadee@twala.rahamafresh.com \
  "docker exec -i coolify php artisan tinker --execute=\"echo \\App\\Models\\Application::find(15)->manual_webhook_secret_gitea;\""

# list / test the webhook (USER:PASS from git credential fill)
curl -s -u "$USER:$PASS" https://repo.rahamafresh.com/api/v1/repos/kianiadee/fleettickets/hooks
curl -s -u "$USER:$PASS" -X POST https://repo.rahamafresh.com/api/v1/repos/kianiadee/fleettickets/hooks/3/tests

A successful test shows a webhook hit in docker logs coolify (no invalid_signature audit) and a new row in application_deployment_queues.

Manual deploy (no push)

Trigger the same action as Coolify's Deploy button via tinker:

ssh -i ~/.ssh/id_ed25519 kianiadee@twala.rahamafresh.com 'docker exec -i coolify php artisan tinker' <<'PHP'
$app = \App\Models\Application::where('uuid','g14mwzo73q20g70vc6fzumya')->first();
$uuid = new \Visus\Cuid2\Cuid2;
echo json_encode(queue_application_deployment(
  application: $app, deployment_uuid: $uuid, force_rebuild: false, is_api: true)).PHP_EOL;
echo $uuid.PHP_EOL;
PHP

Watch it: SELECT id, status, created_at FROM application_deployment_queues WHERE application_id = '15' ORDER BY created_at DESC LIMIT 3; (note: application_id is the numeric id stored as text).

Source-bucket cutover (when the provider moves buckets)

If the provider moves the INC feed to a new bucket (as happened tickets → isptickets, 2026-06-25):

Inspect the new bucket (read-only) — confirm automations/{inc,crq}/changes/ layout, timestamp range, schema parity.
Update env (UI or tinker): RUSTFS_ACCESS_KEY, RUSTFS_SECRET_KEY, TICKETS_BUCKET → the new bucket (endpoint usually unchanged). Both datasets read the same bucket, so one env change serves both tasks.
Reconcile the DB to current. The loader drains every changes/ file newer than the watermark (tickets.import_meta.metadata.source_max_key, per dataset), oldest→newest, upserting on ticket_id:
- If the watermark predates the new bucket's first file, a normal --from-bucket --apply drains the whole new stream — no reseed needed.
- Otherwise use --reseed (ignores the watermark, drains all changes/ once): python -m inc.import_inc --from-bucket --reseed --apply (see README "Bucket cutover"). The new stream's periodic full-state re-emissions make this converge even across the cutover gap. Idempotent upserts + never-delete make it non-destructive.
- For a one-off, you can run it in the live container with the new creds inlined: docker exec -e TICKETS_BUCKET=… -e RUSTFS_ACCESS_KEY=… -e RUSTFS_SECRET_KEY=… <container> sh -c "cd /app && python -m inc.import_inc --from-bucket --apply".
Re-geocode new clusters/locations: python -m inc.import_inc --geocode-clusters --apply then --geocode-locations --apply (cross-dataset; existing gazetteer persists; only new keys are looked up).
Redeploy so the Scheduled Task's container picks up the new env (push main → webhook, or manual deploy). Old bucket is left untouched for rollback.

Bringing CRQ online (one-time seed)

CRQ was added 2026-06-25 (data layer + map). To seed tickets.crq from zero on the live DB — once the code + migration 13_crq_columns.sql are deployed (run_migrations.py applies it on build):

Verify the migration applied: SELECT 1 FROM tickets.schema_migrations WHERE filename='13_crq_columns.sql'; and \d tickets.crq shows the typed columns.
Seed from isptickets (empty crq watermark → drains all automations/crq/changes/ files oldest→newest; the stream's periodic full-state snapshots converge to current state — same convergence the INC cutover relied on, so no --reseed needed):
```
python -m crq.import_crq --from-bucket            # dry-run first ("N of N change file(s)…")
python -m crq.import_crq --from-bucket --apply     # commit + archive to crq/processed/
```
(Or in the live container with docker exec … sh -c "cd /app && python -m crq.import_crq --from-bucket --apply".)
Geocode (cross-dataset; most clusters already resolved from INC, so few new lookups): python -m inc.import_inc --geocode-clusters --apply then --geocode-locations --apply.
Confirm CRQ on the map: SELECT reporting.fn_tickets_for_map() -> 'summary'; shows a non-zero crq count. The crq_tickets Scheduled Task then keeps it current.

Verification

DB=$(docker ps --filter name=timescale_db --format "{{.Names}}" | head -1)
docker exec -i "$DB" psql -U postgres -d tracksolid_db <<'SQL'
-- watermark + freshness
SELECT export_type, records_ingested, ingested_at, metadata->>'source_max_key'
FROM tickets.import_meta WHERE dataset='inc';
-- counts
SELECT count(*) total_inc,
       count(*) FILTER (WHERE (raw->>'is_actionable')::boolean) AS open
FROM tickets.inc;
-- map payload sanity
SELECT reporting.fn_tickets_for_map() -> 'summary' ->> 'ticket_count';
SQL

New bucket changes/ empties as files move to automations/inc/processed/.
A plain --from-bucket --apply reports "nothing new" until the next change file lands.
FleetOps Tickets map freshness reflects the new ingested_at.

Rollback

Bucket: revert the three env vars to the old bucket + creds and redeploy. The old bucket and its processed/ history are untouched; upserts are idempotent and rows are never deleted, so re-running is safe.
Cron: UPDATE scheduled_tasks SET frequency = <old> WHERE name='inc_tickets';

10 KiB Raw Blame History Unescape Escape