Skip to main content

Error Handling & Monitoring

This document describes error scenarios, logging, monitoring, and recovery strategies for the donation platform.

Error Categories

1. User Input Errors

Cause: Invalid or missing data from user

Examples:

  • Empty email field
  • Invalid email format
  • Amount less than minimum (€1.00)
  • Unsupported currency

Handling:

User Experience:

  • Frontend validation before API call
  • Server returns 422 Unprocessable Entity
  • Error messages displayed in UI
  • User can correct and retry

Response Example:

{
"email": ["can't be blank", "is invalid"],
"amount_cents": ["value must be at least €1.00"]
}

2. Payment Errors

Cause: Payment processing failures

Card Declined

Common Card Error Codes:

  • card_declined - General decline
  • insufficient_funds - Not enough balance
  • lost_card - Card reported lost
  • stolen_card - Card reported stolen
  • expired_card - Card expired
  • incorrect_cvc - Wrong CVC
  • incorrect_number - Invalid card number
  • processing_error - Temporary issue

User Experience:

  • Error message displayed with reason
  • User can try different card
  • Support email provided

Authentication Required (3D Secure)

User Experience:

  • Redirected to bank's authentication page
  • Enter code or approve via app
  • Return to donation page
  • Success or failure message

Network Errors

Handling:

  • Automatic retry with exponential backoff
  • Max 25 retries (Sidekiq default)
  • After all retries: manual intervention required

3. Webhook Processing Errors

Signature Verification Failed

Causes:

  • Wrong signing secret configured
  • Request body modified (middleware/proxy)
  • Timestamp too old (>5 minutes)
  • Replay attack attempt

Impact:

  • Webhook rejected
  • Stripe will retry (up to 72 hours)
  • Manual investigation may be needed

Event Processing Failed

Handling:

  • Job retried automatically (Sidekiq)
  • Exponential backoff between retries
  • After 25 failures: moved to dead queue
  • Staff notified via error tracking

4. Fraud Detection

Multiple Failed Attempts

Indicators:

  • More than 3 failed attempts in 30 minutes
  • Stripe fraud detection flag
  • Unusual geographic location
  • High-value transactions from new donors

Actions:

  • Payment intent canceled immediately
  • Staff alerted with high priority
  • Donor payment blocked
  • Investigation required before allowing retry

Stripe Radar Flags

Stripe automatically flags suspicious transactions:

5. Subscription Errors

Failed Recurring Payment

Stripe's Automatic Retry:

  • First retry: 3 days after failure
  • Second retry: 5 days after first retry
  • Third retry: 7 days after second retry
  • After 3 failures: Subscription canceled automatically

User Experience:

  • Email notification after each failure
  • Link to update payment method
  • Grace period before cancellation
  • Can update payment method to prevent cancellation

Subscription Creation Failed

Handling:

  • Error message displayed to user
  • User can try different payment method
  • No subscription created in Stripe or database
  • Clean state, can retry from beginning

6. Database Errors

Duplicate Transaction

Scenario: Webhook replayed or delivered twice

Handling:

  • Unique constraint on transaction_id
  • Insert fails silently
  • No duplicate transaction created
  • Idempotent webhook processing

Missing Donor

Scenario: Donor exists but Stripe customer ID not yet saved

Handling:

  • Retrieve customer from Stripe API
  • Find donor by email
  • Update donor with customer ID
  • Continue processing normally

7. Configuration Errors

Missing API Keys

Prevention:

  • Secrets validation on boot
  • Environment-specific configuration
  • CI/CD checks

Missing Feature Flags

Handling:

  • Feature disabled by default
  • Safe fallback behavior
  • Admin can enable when ready

Logging Strategy

Log Levels

DEBUG

Development-only details:

INFO

Normal operation events:

WARN

Recoverable issues:

ERROR

Serious problems:

Structured Logging

Using SemanticLogger:

Log Output:

[2024-01-15T10:30:45.123Z] [INFO] [request_id=abc123] [ip=192.168.1.1] Processing stripe event evt_123 of type charge.succeeded

Log Storage

Development:

  • log/development.log
  • Colorized console output
  • Detailed SQL queries

Production:

  • Stdout (captured by hosting platform)
  • Aggregated by logging service (e.g., Papertrail, Logentries)
  • Retained for 30+ days
  • Searchable and filterable

Sensitive Data Filtering

Prevents sensitive data from appearing in logs.

Error Tracking (Rollbar)

Configuration

Error Reporting

Automatic Reporting

  • All unhandled exceptions automatically reported
  • Includes full stack trace
  • Request parameters (scrubbed)
  • User information (if authenticated)
  • Server environment details

Alert Rules

Configure Rollbar to alert on:

  • New error types
  • Error rate spikes
  • Critical errors (e.g., fraud alerts)
  • Repeated failures

Monitoring & Metrics

Application Performance Monitoring (APM)

Datadog Integration:

Metrics Tracked:

  • Request latency (p50, p95, p99)
  • Webhook processing time
  • Background job duration
  • Database query time
  • Stripe API response time

Custom Metrics

Payment Success Rate

Webhook Event Processing

Subscription Metrics

Health Checks

Endpoint: /health

Monitoring:

  • External service pings health check every minute
  • Alert if returns non-200 status
  • Alert if response time > threshold

Uptime Monitoring

Services:

  • Pingdom / UptimeRobot / StatusPage
  • Check main donation page
  • Check webhook endpoint (with valid auth)
  • Check API endpoints

Alert Conditions:

  • Site down (5xx errors)
  • Slow response (>3 seconds)
  • SSL certificate expiring
  • DNS resolution failure

Recovery Procedures

Replaying Webhooks

From Stripe Dashboard:

  1. Go to Developers → Webhooks
  2. Find the webhook endpoint
  3. Click on failed event
  4. Click "Resend"

Programmatically:

Bulk Replay:

Fixing Inconsistent Data

Missing Transaction:

Missing Donor:

Handling Failed Jobs

Sidekiq Dead Queue:

Job Status:

Database Rollback

If bad data imported:

Always:

  • Backup database before manual changes
  • Test in development/staging first
  • Document all manual interventions
  • Update monitoring after recovery

Alerting Strategy

Critical Alerts (Immediate Response)

Page on-call engineer:

  • Site completely down
  • Webhook endpoint returning 500s
  • Database connection lost
  • Stripe API credentials invalid
  • Multiple fraud attempts detected

High Priority (Respond within 1 hour)

Email/Slack notification:

  • Payment success rate < 90%
  • Webhook processing failures > 10/hour
  • Background job queue depth > 1000
  • Failed subscription charges spike
  • Error rate spike (>2σ from baseline)

Medium Priority (Review within 24 hours)

Daily summary email:

  • Individual webhook failures
  • Card declined notifications
  • Unusual payment patterns
  • Currency conversion rate outdated

Low Priority (Weekly review)

Weekly report:

  • Total transaction volume
  • Top donors
  • Campaign performance
  • Geographic distribution changes

Best Practices

Error Handling

  1. Fail Fast: Validate early, fail explicitly
  2. Idempotency: All operations should be safely retryable
  3. Graceful Degradation: Partial feature failures shouldn't break entire system
  4. User-Friendly Messages: Don't expose technical details to users
  5. Context: Always log enough context to debug

Monitoring

  1. Baseline Metrics: Establish normal values for all metrics
  2. Alert Fatigue: Too many alerts = all alerts ignored
  3. Actionable Alerts: Every alert should require an action
  4. Escalation: Define clear escalation paths
  5. Post-Mortem: Document and learn from incidents

Security

  1. Least Privilege: Limit access to production systems
  2. Audit Logs: Log all administrative actions
  3. Secrets Management: Use secure secret storage
  4. Regular Reviews: Quarterly security audits
  5. Incident Response Plan: Documented procedures

Troubleshooting Guide

Payment Not Processing

Check:

  1. Is payment in Stripe Dashboard?
  2. Was webhook sent by Stripe?
  3. Did webhook arrive at application?
  4. Was webhook processed successfully?
  5. Was transaction created in database?

Debug:

Donor Not Receiving Email

Check:

  1. Was transaction created?
  2. Was email job enqueued?
  3. Did email job succeed?
  4. Did SendGrid accept email?
  5. Did email bounce?

Debug:

Webhook Signature Verification Failing

Check:

  1. Correct signing secret configured?
  2. Request body being modified?
  3. Using raw request body (not parsed)?

Fix: