Runbook: Mail Flow Rollback

Safely restore mail flow by reverting risky changes and validating connectors and rules.

⚠️ Business Consequence: Why Fast Rollback Matters

  • Financial Impact: Mail flow outage = $5K–$10K per minute in lost productivity and communications
  • Compliance Exposure: Failed regulatory notifications = breach reporting requirements ($10K–$50K+ penalties)
  • Operational Risk: Order confirmations, invoices, customer communications blocked = revenue impact
  • Reputation Impact: Extended outage damages customer trust and sender reputation (future deliverability at risk)

Average rollback time: 15–25 minutes — prevents extended downtime escalation.

⚠️ Runbook Summary

  • Severity: P1/P2 - Mail delivery failure or delay
  • Total time: 15-25 minutes (rollback), 10-15 min (validation)
  • Risk level: Low (reverting to known-good configuration)
  • Requires: Exchange admin role, message trace access

Pre-Rollback Checklist (5-8 minutes)

Gather this information before starting rollback procedure:

  • Identify last changes: Check change log for recent transport rule, connector, or mail flow policy updates
  • Run message trace: Use message trace to determine scope (inbound only, outbound only, or both directions)
  • Capture error codes: Note specific NDR codes from failed messages (5.1.1, 5.4.1, 5.7.1, etc.)
  • Check service health: Verify no active Microsoft 365 service incidents affecting Exchange Online
  • Document queue depth: Check mail queue length before rollback (baseline for validation)

4-Step Rollback Procedure

Execute these steps in order. Stop if mail flow resumes after any step:

Step 1: Disable Suspect Transport Rules (5-7 min)

  1. Access Exchange Admin Center: Navigate to Mail flow → Rules
  2. Identify recent rules: Sort by "Modified" date; focus on rules added/changed in last 24-48 hours
  3. Disable rules one-by-one: Uncheck "Enable" for suspect rules (do NOT delete yet)
  4. Document each change: Note rule name, GUID, and exact modification made
  5. Test after each disable: Send test email from external sender to internal recipient
  6. Wait 3-5 minutes: Allow time for rule cache to clear

✓ Tip: Focus on rules with "Block" or "Reject" actions, or rules targeting "All recipients"

Step 2: Revert Connector Configuration (7-10 min)

  1. Review inbound connectors: Mail flow → Connectors → Check recently modified inbound connectors
  2. Compare to backup: If available, compare current config to last-known-good snapshot
  3. Revert TLS settings: Check "Require TLS" and certificate requirements; temporarily disable if blocking mail
  4. Validate IP ranges: Confirm sender IP addresses match connector scope (especially for hybrid)
  5. Check outbound connectors: Verify routing rules and smart host configurations
  6. Test immediately: Send inbound and outbound test messages

⚠️ Caution: Changes to connectors affect all mail flow; validate thoroughly before proceeding

Step 3: Check & Adjust Mail Flow Policies (3-5 min)

  1. Review spam filter: Mail flow → Anti-spam → Check if policy blocking legitimate mail
  2. Check quarantine: Verify messages aren't quarantined instead of delivered
  3. Validate accepted domains: Ensure recipient domains are listed as "Authoritative"
  4. Review DLP policies: Temporarily disable DLP policies if blocking all mail

Step 4: Force Mail Queue Processing (2-3 min)

  1. Check queue depth: Use message trace to see queued message count
  2. Wait for natural processing: Queues typically clear within 10-15 minutes after issue resolved
  3. Monitor progress: Re-run message trace every 5 minutes to track delivery

Validation

  • Queue depth decreasing
  • NDRs no longer generated
  • External and internal delivery confirmed