Owner: Robert Taylor (Eng) · Department: Engineering · Status: Live · Version: 1.0
Effective Date: 2026-05-13 · Last Reviewed: 2026-06-13 · Next Review Date: 2026-09-13
Source of Truth: this page · Maturity: 4 (Operational)
| Field | Value |
|---|---|
| Severity | P1 - Critical |
| PagerDuty Service | HempDash Payments |
| Escalation Policy | Payment Critical |
| Response SLA | 5 minutes |
| Owner | On-Call Engineer |
| Alert Name | CRITICAL-BIZ-PAYMENTS-FAILURE_SPIKE |
CRITICAL-BIZ-PAYMENTS-FAILURE_SPIKE alert fires (>3 failures in 10 min)/api/v1/checkout/session| Source | Check |
|---|---|
| PagerDuty | Incident auto-created on 3+ failures in 10 min |
| Grafana | increase(hempdash_payment_failures_total[10m]) > 3 |
| Backend Health | GET /health — check payment_providers section |
| Circuit Breaker | GET /api/v1/system/health — check circuit breaker states |
| Sentry | Filter by tag service:payments |
# Check circuit breaker states for all payment providers
curl -s https://api.gethempdash.com/health | jq '.circuit_breakers'
Possible states:
-- Recent payment failures (last 30 min)
SELECT provider, status, error_message, created_at
FROM finance_ledger
WHERE status = 'FAILED'
AND created_at > NOW() - INTERVAL '30 minutes'
ORDER BY created_at DESC
LIMIT 20;
| Provider | Dashboard | Status Page |
|---|---|---|
| PaymentCloud (PayEngine) | PayEngine merchant portal | Check with PaymentCloud support |
| Sezzle | Sezzle merchant dashboard | status.sezzle.com |
-- Stale pending payments (>24 hours)
SELECT id, provider, status, created_at
FROM payment_intent
WHERE status = 'pending'
AND created_at < NOW() - INTERVAL '24 hours';
The cleanup_stale_payments task runs at 6 AM UTC to catch these.
provider_fallback.py handles automatic failover# Railway deployment history
railway logs --service hempdash-backend --limit 50
-- Check for constraint violations
SELECT * FROM finance_ledger
WHERE created_at > NOW() - INTERVAL '1 hour'
ORDER BY created_at DESC;
The unique constraint (order_id, provider) prevents duplicate payments. If legitimate orders are being rejected, check for duplicate payment attempts.
If the provider has recovered but the circuit breaker hasn't cycled:
# Restart the backend service to reset in-memory state
# (Redis state has 300s TTL and will also expire)
railway service restart --service hempdash-backend
| Time | Action |
|---|---|
| 0 min | On-call engineer investigates |
| 5 min | If not identified, check all provider dashboards |
| 15 min | If provider-side, contact provider support |
| 30 min | If unresolved, escalate to Jonathan |
| 60 min | If revenue impact >$500, activate incident bridge |
cleanup_stale_payments handled orphaned intentsapp/services/payments/unified_payment_service.pyapp/services/payments/circuit_breaker.pyapp/services/payments/provider_fallback.pyPlaybooks index · Payment Architecture · Severity Matrix · Home