Owner: Robert Taylor (Eng) · Department: Engineering · Status: Live · Version: 1.0
Effective Date: 2026-05-13 · Last Reviewed: 2026-06-13 · Next Review Date: 2026-09-13
Source of Truth: this page · Maturity: 4 (Operational)
| Field | Value |
|---|---|
| Severity | P2 - High |
| PagerDuty Service | HempDash Platform |
| Escalation Policy | Platform Standard |
| Response SLA | 15 minutes |
| Owner | On-Call Engineer |
| Alert Name | WARNING-INT-*-DEGRADED or CRITICAL-INT-*-DOWN |
A circuit breaker protects the platform from cascading failures when an external service is down. When it opens:
This is protective behavior, not a bug. The question is: why did the service fail 5 times in a row?
| Service | Threshold | Cooldown | Recovery | File |
|---|---|---|---|---|
| PaymentCloud | 5 failures | 60s | 2 successes | app/services/payments/circuit_breaker.py |
| Sezzle | 5 failures | 60s | 2 successes | same |
| Dwolla | 5 failures | 60s | 2 successes | same |
| Stripe | 5 failures | 60s | 2 successes | same |
Config env vars: CIRCUIT_BREAKER_FAILURE_THRESHOLD, CIRCUIT_BREAKER_TIMEOUT_SECONDS, CIRCUIT_BREAKER_SUCCESS_THRESHOLD
State stored in Redis with 300s TTL. Falls back to in-memory if Redis unavailable.
| Service | Threshold | Cooldown | File |
|---|---|---|---|
| ERPNext | 5 failures | 30s | lib/utils/circuit-breaker.ts |
| Mattermost | 5 failures | 30s | same |
| Resend | 5 failures | 30s | same |
State held in-memory only. Restarting the service resets all breakers.
# Backend circuit breakers (payment providers)
curl -s https://api.gethempdash.com/health | jq '.circuit_breakers'
# Automation circuit breakers (ERPNext, Mattermost, Resend)
curl -s https://automation.gethempdash.com/api/health | jq '.circuitBreakers'
Check the health endpoints above. Note the service name and current state.
| If the service is... | Check |
|---|---|
| ERPNext | ERPNext server status, API creds (ERPNEXT_API_KEY), network connectivity |
| Mattermost | Mattermost server status, bot token (MATTERMOST_BOT_TOKEN), rate limits |
| Resend | Resend dashboard, API key, sending limits, bounce rates |
| PaymentCloud | Provider dashboard, merchant account status |
| Sezzle | Sezzle merchant portal, API creds |
| Stripe | Stripe dashboard, Connect account status |
# ERPNext
curl -s -H "Authorization: token $ERPNEXT_API_KEY:$ERPNEXT_API_SECRET" \
"$ERPNEXT_URL/api/method/frappe.auth.get_logged_user"
# Mattermost
curl -s -H "Authorization: Bearer $MATTERMOST_BOT_TOKEN" \
"$MATTERMOST_URL/api/v4/users/me"
# Resend
curl -s -H "Authorization: Bearer $RESEND_API_KEY" \
"https://api.resend.com/domains"
Wait. The breaker will auto-transition to HALF_OPEN after the cooldown (30-60s), send a test request, and close if it succeeds.
This can happen if the service had a brief blip that triggered 5 failures. To reset:
# Automation (Node.js) — restart resets in-memory state
railway service restart --service hempdash-automation
# Backend (Python) — Redis state has 300s TTL, wait or restart
railway service restart --service hempdash-backend
| Time | Action |
|---|---|
| 0 min | On-call checks which breaker is open and tests service |
| 15 min | If service is down, check status page and contact support |
| 30 min | If credential issue, check Doppler rotation history |
| 60 min | Escalate to Jonathan if business-impacting and unresolved |
lib/utils/circuit-breaker.tsapp/services/payments/circuit_breaker.pyPlaybooks index · Severity Matrix · Alert → Owner Map · Home