Exchange Health Check

Validate core signals: mail flow, Autodiscover, TLS, throttling, and security controls.

⚠️ Business Consequence: Why This Matters

  • Financial Impact: Proactive health checks prevent incidents ($50K–$500K per prevented outage)
  • Compliance Exposure: Monthly validation = audit readiness (change control documentation)
  • Operational Risk: Early detection of configuration drift prevents cascading failures
  • Prevention Value: Identify misconfigurations before they trigger P1 incidents

Health check time: 20–30 minutes — prevents incidents before user impact.

Health Check Overview

  • Purpose: Validate core Exchange Online and hybrid functionality
  • Time to complete: 20-30 minutes for full check
  • Frequency: Monthly preventive or immediately after major changes
  • Owner: Exchange administrator or SRE

🚀 Before You Start

⏱ Time Required

20-30 minutes to run checks and analyze health status

👤 Skill Level

Exchange Online admin with PowerShell knowledge

🛡️ Safety

Read-only health checks. No changes to system configuration.

📋 What You'll Need

Exchange Online PowerShell module, Exchange admin role

⚠️ Red flags or failures detected? Request Exchange Security Assessment for remediation.

5-Component Health Assessment

Check each area in order. Stop if red flags found:

Component 1: Mail Flow Baseline (5 min)

Check inbound/outbound delivery health

  • Run message trace for last 24 hours, filter by "Failed"
  • Acceptable: <2% NDR rate
  • Check queue depth: <500 messages
  • ✓ PASS: <2% failures, no queue backlog
  • ✗ FAIL: >5% failures or queue >1000 messages

Component 2: Autodiscover & Connectivity (4 min)

Check Outlook can find mailbox and connect

  • Use Remote Connectivity Analyzer: Test Outlook Autodiscover
  • Check DNS: nslookup autodiscover.yourdomain.com
  • Verify certificate validity (hybrid setups)
  • Test user sign-in: IT staff can authenticate to Outlook/OWA
  • ✓ PASS: Autodiscover resolves, no certificate warnings
  • ✗ FAIL: DNS fails or repeated MFA prompts

Component 3: Connector & TLS (4 min)

Check inbound/outbound connector health

  • Verify inbound connector scope is correct
  • Check certificates not expired
  • Test TLS: Test-NetConnection -ComputerName mx.domain.com -Port 25
  • ✓ PASS: All connectors enabled, no certificate warnings
  • ✗ FAIL: Connectors disabled or certificate expired >30 days

Component 4: Throttling & Rate Limits (4 min)

Check no users hitting throttling limits

  • Check Service Health Dashboard for advisories
  • Search message trace for NDR code 429 or 4.3.2
  • Identify high-volume senders (>5000 msg/day)
  • ✓ PASS: No throttling alerts or 429 NDRs
  • ✗ FAIL: Recent 429 errors or backpressure events

Component 5: Security Controls (5 min)

Check CA policies and MFA without overly blocking

  • Review CA policies: break-glass accounts excluded, <5% user block rate
  • Check sign-in logs: CA blocks <10/day
  • MFA adoption: 100% user registration, no backlog
  • Test DLP: Policies blocking/quarantining as expected
  • ✓ PASS: Policies enforced, <5% failures, MFA 100%
  • ✗ FAIL: >20% user failures or DLP breaking mail

FAQs

How often should I run health checks?

Monthly for prevention, and after major changes to validate state.

What indicates mail flow is healthy?

<2% NDR and queue depth <500 messages.

How do I detect throttling?

Check Service Health advisories and message trace for 429 or 4.3.2 errors.

Which CA signals should be monitored?

Look for 53003 in sign-in logs, ensure break-glass exclusions, and maintain 100% MFA adoption.