Owner: Robert Taylor (Eng) · Department: Engineering · Status: Live · Version: 1.0
Effective Date: 2026-05-13 · Last Reviewed: 2026-06-13 · Next Review Date: 2026-09-13
Source of Truth: this page · Maturity: 4 (Operational)
| Field | Value |
|---|---|
| Severity | P2 - High |
| PagerDuty Service | HempDash Platform |
| Escalation Policy | Platform Standard |
| Response SLA | 30 minutes |
| Owner | On-Call Engineer |
| Alert Name | WARNING-APP-WEBHOOK-PROCESSING_FAILURES |
WARNING-APP-WEBHOOK-PROCESSING_FAILURES alert fires (>10 failures in 5 min)service:webhooks| Provider | Endpoint | Purpose | Impact if Down |
|---|---|---|---|
| payments | /api/webhooks/payments |
Payment confirmations | Orders not confirmed |
| sezzle | /api/webhooks/sezzle |
BNPL payment events | Sezzle orders stuck |
| checkr | /api/webhooks/checkr |
Background check results | Driver onboarding blocked |
| shippo | /api/webhooks/shippo |
Shipping updates | No tracking updates |
| springbig | /api/webhooks/springbig |
Loyalty events | VIP status not synced |
| mattermost | /api/webhooks/mattermost |
Bot interactions | Standups broken |
| resend | /api/webhooks/resend |
Email delivery events | No bounce tracking |
| github-deploy | /api/webhooks/github-deploy |
Deploy notifications | No deploy tracking |
| plane-qa | /api/webhooks/plane-qa |
QA ticket updates | QA workflow broken |
| erpnext-tasks | /api/webhooks/erpnext-tasks |
Task sync | Task tracking gaps |
# Check webhook processing stats
curl -s https://automation.gethempdash.com/api/health | jq '.webhooks'
-- Recent webhook failures in the automation database
SELECT provider, status, error, created_at
FROM "WebhookLog"
WHERE status = 'FAILED'
AND "createdAt" > NOW() - INTERVAL '1 hour'
ORDER BY "createdAt" DESC
LIMIT 20;
-- Failure count by provider (last hour)
SELECT provider, COUNT(*) as failures
FROM "WebhookLog"
WHERE status = 'FAILED'
AND "createdAt" > NOW() - INTERVAL '1 hour'
GROUP BY provider
ORDER BY failures DESC;
-- Get actual error messages
SELECT provider, error, payload->>'event_type' as event_type, "createdAt"
FROM "WebhookLog"
WHERE status = 'FAILED'
AND provider = '<provider_name>'
ORDER BY "createdAt" DESC
LIMIT 5;
Common error patterns:
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized |
Webhook secret mismatch | Check/rotate secret in Doppler |
HMAC verification failed |
Payload tampered or secret wrong | Verify <PROVIDER>_WEBHOOK_SECRET |
429 Too Many Requests |
Rate limit hit | Check rate limiter config |
500 Internal Server Error |
Handler code bug | Check Sentry for stack trace |
Timeout |
Handler took too long | Check for slow DB queries or external calls |
# Test endpoint (should return 401 without valid signature)
curl -X POST https://automation.gethempdash.com/api/webhooks/<provider> \
-H "Content-Type: application/json" \
-d '{}'
Expected: 401 Unauthorized (means the endpoint is alive, just rejecting unsigned requests).
If you get 502 or 503: the automation service is down.
The webhook-retry cron runs every 15 minutes and retries failed webhooks.
-- Pending retries
SELECT provider, COUNT(*) as pending_retries
FROM "WebhookLog"
WHERE status = 'PENDING_RETRY'
GROUP BY provider;
<PROVIDER>_WEBHOOK_SECRETlib/utils/rate-limiter.tsapp/api/webhooks/<provider>/route.tsAfter fixing the webhook issue, you may need to backfill missed events:
webhook-retry cron will also retry any PENDING_RETRY entries| Time | Action |
|---|---|
| 0 min | On-call identifies failing provider and error type |
| 15 min | Apply fix (secret rotation, code fix, rate limit adjustment) |
| 30 min | If data loss, begin backfill from provider |
| 60 min | Escalate to Jonathan if payment or order webhooks are affected |
WebhookLog for new successes)lib/utils/webhook-security.tsapp/api/webhooks/*/route.tsPlaybooks index · Severity Matrix · Alert → Owner Map · Home