Owner: Robert Taylor (Eng) · Department: Engineering · Status: Live · Version: 1.0
Effective Date: 2026-05-13 · Last Reviewed: 2026-06-13 · Next Review Date: 2026-09-13
Source of Truth: this page · Maturity: 4 (Operational)
| Field | Value |
|---|---|
| Severity | P2 - High |
| PagerDuty Service | HempDash Backend |
| Escalation Policy | Platform Standard |
| Response SLA | 15 minutes |
| Owner | On-Call Engineer |
| Alert Name | WARNING-APP-CRON-FAILED_JOBS |
Celery is the background task system for the Python backend. It handles:
| Task | Schedule | Queue | Impact if Down |
|---|---|---|---|
detect_stuck_orders |
Every 5 min | default | No SLA breach detection |
monitor_driver_eta_and_push |
Every 10s | default | No real-time ETA updates |
nightly_vendor_settlement_batch |
Midnight UTC | settlement_queue | Vendor payouts delayed |
reconcile_daily_payouts |
2 AM UTC | default | No payout reconciliation |
cleanup_stale_payments |
6 AM UTC | default | Stale payment intents accumulate |
poll_open_checkr_reports_task |
Every 6 min | checkr_polling_queue | Background checks stuck |
process_subscription_renewals_task |
Every 1 min | subscription_queue | Renewals delayed |
check_training_expiry_task |
Midnight UTC | default | No expiry notifications |
Broker: Redis (via CELERY_BROKER_URL)
Backend: Redis (via CELERY_BACKEND_URL)
Beat Scheduler: Celerybeat (runs on same Railway service)
Monitoring: Prometheus exporter on port 9808
celery_workers_active gauge drops to 0# Check if celery worker is running via Railway
railway logs --service hempdash-backend-worker --limit 50
# Check Prometheus metrics (if available)
curl -s http://localhost:9808/metrics | grep celery_workers
-- Check if scheduled tasks ran recently
-- (from the backend database)
SELECT task_name, last_run_at, status
FROM celery_task_log
ORDER BY last_run_at DESC
LIMIT 10;
# Test Redis connectivity
railway run redis-cli ping
# Expected: PONG
If Redis is down, ALL Celery tasks stop — the broker is the nervous system.
Railway kills processes that exceed memory limits. Check:
Killed or OOM messages# Check DLQ size in worker logs
railway logs --service hempdash-backend-worker | grep "DLQ"
Tasks that exceed max retries or fail critical validation go to the DLQ. A growing DLQ indicates a systemic issue.
# Check Sentry for celery task errors
# Filter by: service=celery, transaction=<task_name>
# Via Railway CLI
railway service restart --service hempdash-backend-worker
Or via Railway Dashboard: Service > Settings > Restart
The worker will:
Note: Tasks submitted while Redis was down are lost. The beat scheduler will re-submit periodic tasks on the next schedule cycle.
STARTED status that never finish:railway logs --service hempdash-backend-worker | grep "STARTED"
acks_late=True| Time | Action |
|---|---|
| 0 min | On-call checks worker status and restarts if needed |
| 15 min | If restart fails, check Redis broker |
| 30 min | If recurring OOM, increase resources or identify memory leak |
| 60 min | Escalate to Jonathan if settlement/payout tasks are affected |
nightly_vendor_settlement_batch if missedreconcile_daily_payouts if missedcelery_task_failures_total for patternsapp/worker.py (Celery app definition)celerybeat_schedule.py (task schedules)app/tasks/ (40+ task files)app/monitoring/celery_monitor.pyPlaybooks index · Severity Matrix · Alert → Owner Map · Home