Exchange Health Check
Validate core signals: mail flow, Autodiscover, TLS, throttling, and security controls.
⚠️ Business Consequence: Why This Matters
- Financial Impact: Proactive health checks prevent incidents ($50K–$500K per prevented outage)
- Compliance Exposure: Monthly validation = audit readiness (change control documentation)
- Operational Risk: Early detection of configuration drift prevents cascading failures
- Prevention Value: Identify misconfigurations before they trigger P1 incidents
Health check time: 20–30 minutes — prevents incidents before user impact.
Health Check Overview
- Purpose: Validate core Exchange Online and hybrid functionality
- Time to complete: 20-30 minutes for full check
- Frequency: Monthly preventive or immediately after major changes
- Owner: Exchange administrator or SRE
🚀 Before You Start
20-30 minutes to run checks and analyze health status
Exchange Online admin with PowerShell knowledge
Read-only health checks. No changes to system configuration.
Exchange Online PowerShell module, Exchange admin role
⚠️ Red flags or failures detected? Request Exchange Security Assessment for remediation.
5-Component Health Assessment
Check each area in order. Stop if red flags found:
Component 1: Mail Flow Baseline (5 min)
Check inbound/outbound delivery health
- Run message trace for last 24 hours, filter by "Failed"
- Acceptable: <2% NDR rate
- Check queue depth: <500 messages
- ✓ PASS: <2% failures, no queue backlog
- ✗ FAIL: >5% failures or queue >1000 messages
Component 2: Autodiscover & Connectivity (4 min)
Check Outlook can find mailbox and connect
- Use Remote Connectivity Analyzer: Test Outlook Autodiscover
- Check DNS:
nslookup autodiscover.yourdomain.com - Verify certificate validity (hybrid setups)
- Test user sign-in: IT staff can authenticate to Outlook/OWA
- ✓ PASS: Autodiscover resolves, no certificate warnings
- ✗ FAIL: DNS fails or repeated MFA prompts
Component 3: Connector & TLS (4 min)
Check inbound/outbound connector health
- Verify inbound connector scope is correct
- Check certificates not expired
- Test TLS:
Test-NetConnection -ComputerName mx.domain.com -Port 25 - ✓ PASS: All connectors enabled, no certificate warnings
- ✗ FAIL: Connectors disabled or certificate expired >30 days
Component 4: Throttling & Rate Limits (4 min)
Check no users hitting throttling limits
- Check Service Health Dashboard for advisories
- Search message trace for NDR code 429 or 4.3.2
- Identify high-volume senders (>5000 msg/day)
- ✓ PASS: No throttling alerts or 429 NDRs
- ✗ FAIL: Recent 429 errors or backpressure events
Component 5: Security Controls (5 min)
Check CA policies and MFA without overly blocking
- Review CA policies: break-glass accounts excluded, <5% user block rate
- Check sign-in logs: CA blocks <10/day
- MFA adoption: 100% user registration, no backlog
- Test DLP: Policies blocking/quarantining as expected
- ✓ PASS: Policies enforced, <5% failures, MFA 100%
- ✗ FAIL: >20% user failures or DLP breaking mail
Related
FAQs
How often should I run health checks?
Monthly for prevention, and after major changes to validate state.
What indicates mail flow is healthy?
<2% NDR and queue depth <500 messages.
How do I detect throttling?
Check Service Health advisories and message trace for 429 or 4.3.2 errors.
Which CA signals should be monitored?
Look for 53003 in sign-in logs, ensure break-glass exclusions, and maintain 100% MFA adoption.