Error Handling & Monitoring
This document describes error scenarios, logging, monitoring, and recovery strategies for the donation platform.
Error Categories
1. User Input Errors
Cause: Invalid or missing data from user
Examples:
- Empty email field
- Invalid email format
- Amount less than minimum (€1.00)
- Unsupported currency
Handling:
User Experience:
- Frontend validation before API call
- Server returns 422 Unprocessable Entity
- Error messages displayed in UI
- User can correct and retry
Response Example:
{
"email": ["can't be blank", "is invalid"],
"amount_cents": ["value must be at least €1.00"]
}
2. Payment Errors
Cause: Payment processing failures
Card Declined
Common Card Error Codes:
card_declined- General declineinsufficient_funds- Not enough balancelost_card- Card reported loststolen_card- Card reported stolenexpired_card- Card expiredincorrect_cvc- Wrong CVCincorrect_number- Invalid card numberprocessing_error- Temporary issue
User Experience:
- Error message displayed with reason
- User can try different card
- Support email provided
Authentication Required (3D Secure)
User Experience:
- Redirected to bank's authentication page
- Enter code or approve via app
- Return to donation page
- Success or failure message
Network Errors
Handling:
- Automatic retry with exponential backoff
- Max 25 retries (Sidekiq default)
- After all retries: manual intervention required
3. Webhook Processing Errors
Signature Verification Failed
Causes:
- Wrong signing secret configured
- Request body modified (middleware/proxy)
- Timestamp too old (>5 minutes)
- Replay attack attempt
Impact:
- Webhook rejected
- Stripe will retry (up to 72 hours)
- Manual investigation may be needed
Event Processing Failed
Handling:
- Job retried automatically (Sidekiq)
- Exponential backoff between retries
- After 25 failures: moved to dead queue
- Staff notified via error tracking
4. Fraud Detection
Multiple Failed Attempts
Indicators:
- More than 3 failed attempts in 30 minutes
- Stripe fraud detection flag
- Unusual geographic location
- High-value transactions from new donors
Actions:
- Payment intent canceled immediately
- Staff alerted with high priority
- Donor payment blocked
- Investigation required before allowing retry
Stripe Radar Flags
Stripe automatically flags suspicious transactions:
5. Subscription Errors
Failed Recurring Payment
Stripe's Automatic Retry:
- First retry: 3 days after failure
- Second retry: 5 days after first retry
- Third retry: 7 days after second retry
- After 3 failures: Subscription canceled automatically
User Experience:
- Email notification after each failure
- Link to update payment method
- Grace period before cancellation
- Can update payment method to prevent cancellation
Subscription Creation Failed
Handling:
- Error message displayed to user
- User can try different payment method
- No subscription created in Stripe or database
- Clean state, can retry from beginning
6. Database Errors
Duplicate Transaction
Scenario: Webhook replayed or delivered twice
Handling:
- Unique constraint on transaction_id
- Insert fails silently
- No duplicate transaction created
- Idempotent webhook processing
Missing Donor
Scenario: Donor exists but Stripe customer ID not yet saved
Handling:
- Retrieve customer from Stripe API
- Find donor by email
- Update donor with customer ID
- Continue processing normally
7. Configuration Errors
Missing API Keys
Prevention:
- Secrets validation on boot
- Environment-specific configuration
- CI/CD checks
Missing Feature Flags
Handling:
- Feature disabled by default
- Safe fallback behavior
- Admin can enable when ready
Logging Strategy
Log Levels
DEBUG
Development-only details:
INFO
Normal operation events:
WARN
Recoverable issues:
ERROR
Serious problems:
Structured Logging
Using SemanticLogger:
Log Output:
[2024-01-15T10:30:45.123Z] [INFO] [request_id=abc123] [ip=192.168.1.1] Processing stripe event evt_123 of type charge.succeeded
Log Storage
Development:
log/development.log- Colorized console output
- Detailed SQL queries
Production:
- Stdout (captured by hosting platform)
- Aggregated by logging service (e.g., Papertrail, Logentries)
- Retained for 30+ days
- Searchable and filterable
Sensitive Data Filtering
Prevents sensitive data from appearing in logs.
Error Tracking (Rollbar)
Configuration
Error Reporting
Automatic Reporting
- All unhandled exceptions automatically reported
- Includes full stack trace
- Request parameters (scrubbed)
- User information (if authenticated)
- Server environment details
Alert Rules
Configure Rollbar to alert on:
- New error types
- Error rate spikes
- Critical errors (e.g., fraud alerts)
- Repeated failures
Monitoring & Metrics
Application Performance Monitoring (APM)
Datadog Integration:
Metrics Tracked:
- Request latency (p50, p95, p99)
- Webhook processing time
- Background job duration
- Database query time
- Stripe API response time
Custom Metrics
Payment Success Rate
Webhook Event Processing
Subscription Metrics
Health Checks
Endpoint: /health
Monitoring:
- External service pings health check every minute
- Alert if returns non-200 status
- Alert if response time > threshold
Uptime Monitoring
Services:
- Pingdom / UptimeRobot / StatusPage
- Check main donation page
- Check webhook endpoint (with valid auth)
- Check API endpoints
Alert Conditions:
- Site down (5xx errors)
- Slow response (>3 seconds)
- SSL certificate expiring
- DNS resolution failure
Recovery Procedures
Replaying Webhooks
From Stripe Dashboard:
- Go to Developers → Webhooks
- Find the webhook endpoint
- Click on failed event
- Click "Resend"
Programmatically:
Bulk Replay:
Fixing Inconsistent Data
Missing Transaction:
Missing Donor:
Handling Failed Jobs
Sidekiq Dead Queue:
Job Status:
Database Rollback
If bad data imported:
Always:
- Backup database before manual changes
- Test in development/staging first
- Document all manual interventions
- Update monitoring after recovery
Alerting Strategy
Critical Alerts (Immediate Response)
Page on-call engineer:
- Site completely down
- Webhook endpoint returning 500s
- Database connection lost
- Stripe API credentials invalid
- Multiple fraud attempts detected
High Priority (Respond within 1 hour)
Email/Slack notification:
- Payment success rate < 90%
- Webhook processing failures > 10/hour
- Background job queue depth > 1000
- Failed subscription charges spike
- Error rate spike (>2σ from baseline)
Medium Priority (Review within 24 hours)
Daily summary email:
- Individual webhook failures
- Card declined notifications
- Unusual payment patterns
- Currency conversion rate outdated
Low Priority (Weekly review)
Weekly report:
- Total transaction volume
- Top donors
- Campaign performance
- Geographic distribution changes
Best Practices
Error Handling
- Fail Fast: Validate early, fail explicitly
- Idempotency: All operations should be safely retryable
- Graceful Degradation: Partial feature failures shouldn't break entire system
- User-Friendly Messages: Don't expose technical details to users
- Context: Always log enough context to debug
Monitoring
- Baseline Metrics: Establish normal values for all metrics
- Alert Fatigue: Too many alerts = all alerts ignored
- Actionable Alerts: Every alert should require an action
- Escalation: Define clear escalation paths
- Post-Mortem: Document and learn from incidents
Security
- Least Privilege: Limit access to production systems
- Audit Logs: Log all administrative actions
- Secrets Management: Use secure secret storage
- Regular Reviews: Quarterly security audits
- Incident Response Plan: Documented procedures
Troubleshooting Guide
Payment Not Processing
Check:
- Is payment in Stripe Dashboard?
- Was webhook sent by Stripe?
- Did webhook arrive at application?
- Was webhook processed successfully?
- Was transaction created in database?
Debug:
Donor Not Receiving Email
Check:
- Was transaction created?
- Was email job enqueued?
- Did email job succeed?
- Did SendGrid accept email?
- Did email bounce?
Debug:
Webhook Signature Verification Failing
Check:
- Correct signing secret configured?
- Request body being modified?
- Using raw request body (not parsed)?
Fix:
Related Documentation
- Webhooks - Webhook integration details
- Technical Integration - API usage and patterns
- Admin Features - Administrative tools and reporting