Runbook: Mail Flow Rollback
Safely restore mail flow by reverting risky changes and validating connectors and rules.
⚠️ Business Consequence: Why Fast Rollback Matters
- Financial Impact: Mail flow outage = $5K–$10K per minute in lost productivity and communications
- Compliance Exposure: Failed regulatory notifications = breach reporting requirements ($10K–$50K+ penalties)
- Operational Risk: Order confirmations, invoices, customer communications blocked = revenue impact
- Reputation Impact: Extended outage damages customer trust and sender reputation (future deliverability at risk)
Average rollback time: 15–25 minutes — prevents extended downtime escalation.
⚠️ Runbook Summary
- Severity: P1/P2 - Mail delivery failure or delay
- Total time: 15-25 minutes (rollback), 10-15 min (validation)
- Risk level: Low (reverting to known-good configuration)
- Requires: Exchange admin role, message trace access
Pre-Rollback Checklist (5-8 minutes)
Gather this information before starting rollback procedure:
- Identify last changes: Check change log for recent transport rule, connector, or mail flow policy updates
- Run message trace: Use message trace to determine scope (inbound only, outbound only, or both directions)
- Capture error codes: Note specific NDR codes from failed messages (5.1.1, 5.4.1, 5.7.1, etc.)
- Check service health: Verify no active Microsoft 365 service incidents affecting Exchange Online
- Document queue depth: Check mail queue length before rollback (baseline for validation)
4-Step Rollback Procedure
Execute these steps in order. Stop if mail flow resumes after any step:
Step 1: Disable Suspect Transport Rules (5-7 min)
- Access Exchange Admin Center: Navigate to Mail flow → Rules
- Identify recent rules: Sort by "Modified" date; focus on rules added/changed in last 24-48 hours
- Disable rules one-by-one: Uncheck "Enable" for suspect rules (do NOT delete yet)
- Document each change: Note rule name, GUID, and exact modification made
- Test after each disable: Send test email from external sender to internal recipient
- Wait 3-5 minutes: Allow time for rule cache to clear
✓ Tip: Focus on rules with "Block" or "Reject" actions, or rules targeting "All recipients"
Step 2: Revert Connector Configuration (7-10 min)
- Review inbound connectors: Mail flow → Connectors → Check recently modified inbound connectors
- Compare to backup: If available, compare current config to last-known-good snapshot
- Revert TLS settings: Check "Require TLS" and certificate requirements; temporarily disable if blocking mail
- Validate IP ranges: Confirm sender IP addresses match connector scope (especially for hybrid)
- Check outbound connectors: Verify routing rules and smart host configurations
- Test immediately: Send inbound and outbound test messages
⚠️ Caution: Changes to connectors affect all mail flow; validate thoroughly before proceeding
Step 3: Check & Adjust Mail Flow Policies (3-5 min)
- Review spam filter: Mail flow → Anti-spam → Check if policy blocking legitimate mail
- Check quarantine: Verify messages aren't quarantined instead of delivered
- Validate accepted domains: Ensure recipient domains are listed as "Authoritative"
- Review DLP policies: Temporarily disable DLP policies if blocking all mail
Step 4: Force Mail Queue Processing (2-3 min)
- Check queue depth: Use message trace to see queued message count
- Wait for natural processing: Queues typically clear within 10-15 minutes after issue resolved
- Monitor progress: Re-run message trace every 5 minutes to track delivery
Validation
- Queue depth decreasing
- NDRs no longer generated
- External and internal delivery confirmed