Microsoft 365
corporate inbox
deliverability — 2026.
Eighty-three research files. Two hundred ten thousand words. Every claim source-tagged against Microsoft Learn, M3AAWG, USPTO patents, academic literature, and operator forums. This is what's actually verified.
Executive summary
Microsoft's email deliverability in 2026 is not the system you learned in 2024. The LLM content analysis layer codified in Defender for Office 365 in April 2026 is the dominant change — and it has zero header surface, no tenant-side bypass for high-confidence phishing verdicts, and a "Contact establishment" threat class that is a literal description of cold outreach.
The honest one-paragraph synthesis: authentic peer-level correspondence wins both the LLM filter and the human recipient simultaneously. There is no LLM-evasion writing style separate from authentic peer writing. The same signals that flag content as cold-outreach archetype also cause executives to delete the message. Engineer the writing, not the evasion.
- Plain text only. HTML triggers the SCL-9
MarkAsSpam*InHtmlfamily + content fingerprinting. - OAuth XOAUTH2, never SMTP basic auth. Basic auth has been dead at M365 since March 1, 2026.
- 3–5 sends/inbox/day equilibrium. 1/day starves Mailbox Intelligence; 30+/day burns reputation.
- The First-Contact Safety Tip is per-mailbox. A reply at Tenant Y's mailbox A does not pre-warm the banner at mailbox B.
- Cohort structure triggers snowshoe detection at any volume. Per-node volume is not the lever — registrar/NS/SPF/DKIM/timing diversity is.
- No tenant-side allow bypasses HPHSH. Not TABL. Not Safe Senders. Not transport rules. Not IPV:CAL. Only Advanced Delivery (SecOps phish-sim, unusable for senders).
- F5000 EU corporate is Microsoft-dominant (32–48% per country); zero of nine named targets sit behind Mimecast/Proofpoint. The defensive map is not the one most senders assume.
The EOP / Defender filtering pipeline
verified: microsoft learn, april 2026 refresh
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 1 — EDGE │
│ network throttle → IP reputation → domain reputation → DBEB │
│ ↓ │
│ backscatter filter → connection filter (IPV:CAL skip) │
└──────────────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 2 — SENDER INTELLIGENCE │
│ SPF → DKIM → DMARC → ARC → composite auth (compauth=X reason=N) │
│ ↓ │
│ spoof intelligence → BCL stamp → impersonation (UIMP/DIMP/GIMP) │
└──────────────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 3 — CONTENT │
│ transport rules → anti-malware → attachment reputation │
│ ↓ │
│ heuristic clustering → ML phish → URL reputation │
│ ↓ │
│ ★ LLM content analysis (2026 new — Phi-class fine-tune, 14 classes) ★ │
│ ↓ │
│ safe attachments sandbox → URL detonation │
└──────────────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 4 — POST-DELIVERY │
│ safe links → ZAP-phish → ZAP-malware → ZAP-spam (48h window) │
│ ↓ │
│ campaign views → AIR (P2 only) cluster purge │
└──────────────────────────────────────────────────────────────────────────┘
Verdict precedence (only one wins, in order): Malware → HC-Phish → Phish → HC-Spam → Spoof → UIMP → DIMP → GIMP → Spam → Bulk.
- Jan 6 2026 — MERR (2,000/mailbox/day) canceled. TERRL formula (500 × licenses^0.7 + 9500) is the only outbound limit. verified
- Jan 23 2026 — Mimecast MVTP (Multi-Vector Threat Protection) GA. New correlation gate for B2B senders. verified
- Feb 20–27 2026 — "Week Microsoft Broke Email" — S3150 permanent 550 rejections against SNDS-green IPs. PFA model misconfiguration. No retroactive remediation. verified
- Mar 1 2026 — SMTP basic auth final deprecation. OAuth-only. verified
- Apr 13 2026 — LLM content analysis codified as a first-class detection technology in Microsoft Learn. verified
- Apr 22 2026 — MC1279093 Promotions folder public preview. verified
- Jul 2026 (planned) — Promotions folder GA. "Bulk moves enabled" toggle confirmed DEFAULT OFF. verified · low risk
- Jul 1 2026 —
*.h-v1.mx.microsoftDANE+DNSSEC infrastructure becomes default for new Accepted Domains. verified · recipient-side
The verdict system
BCL · SCL · CAT · SFV · OFR · IPV| BCL | signal |
|---|---|
| 0 | not bulk |
| 1–3 | bulk, few complaints |
| 4–7 | mixed complaints |
| 8–9 | high complaints |
| preset | threshold | action |
|---|---|---|
| default | 7 | Junk |
| standard | 6 | Junk |
| strict | 5 | Quarantine |
| SCL | meaning | action |
|---|---|---|
| −1 | skipped (internal/allowed) | deliver |
| 0 | not spam | deliver |
| 1 | not spam (scored clean) | deliver |
| 5 | suspected spam | Junk / Quarantine (strict) |
| 6 | spam | Junk / Quarantine (strict) |
| 7–9 | high-confidence spam | Junk / Quarantine (std) |
Compauth reason codes (documented, May 2026)
| code | compauth value | meaning | cold-sender context |
|---|---|---|---|
| 000 | fail | explicit auth fail (SPF + DKIM fail) | broken auth — fix immediately |
| 001 | fail | implicit fail (no DMARC published) | publish DMARC even at p=none |
| 002 | fail | TABL block by recipient admin | recipient tenant blocked you |
| 011 | fail | intra-org spoof via implicit auth | rare for cold senders |
| 100–102 | pass | aligned pass (SPF/DKIM/DMARC) | the goal |
| 109 | pass | bestguesspass (no DMARC, but SPF/DKIM aligned) | works, but p=quarantine upgrades |
| 130 | pass | ARC trusted-sealer override of DMARC fail | only with admin-configured trust |
| 6xx | fail | intra-org spoof scenarios | shouldn't occur for cold senders |
| 7xx | softpass | partial alignment | recipient-config dependent |
| 905 | none | composite auth not computed | rare edge case |
| 8xx | — | reserved · no operator sightings | likely future ML pass codes |
Authentication layer
SPF · DKIM · DMARC · ARC · BIMI · MTA-STS · DANE · TLS 1.310-lookup limit (RFC 7208). Flattening required at scale.
Microsoft evaluates SPF then computes alignment to 5322.From. SPF alone passes compauth if domain-aligned.
Default tenant: selector1 / selector2. CNAME pattern <tenant>.{a,r}-v1.dkim.mail.microsoft (regionally randomized).
2048-bit recommended. Rotation overlap = 4-day hard window.
Multiple signatures: any-passes model per RFC 6376. DMARC passes if any aligned DKIM signature passes.
Microsoft treats p=none as weaker signal — even aligned SPF+DKIM mail routinely gets compauth=fail reason=001.
p=quarantine measurably improves placement for cold senders.
M365 generates daily aggregate reports (rua) from dmarceng@microsoft.com. No forensic (ruf) ever sent. Reports only generated when recipient MX is direct-to-M365.
No Microsoft-curated global trusted-sealer list. No application process. Per-tenant configuration only via Set-ArcConfig -ArcTrustedSealers.
Override fires when M365 writes arc=pass oda=1 + compauth=pass reason=130. Defender retains spoof/impersonation veto on top.
Microsoft publishes their own at mta-sts.microsoft.com:
version: STSv1 / mode: enforce / mx: *.mail.protection.outlook.com / max_age: 604800
New Set-OutboundConnector -MtaStsMode (Feb–Mar 2026 rollout): Opportunistic (default) or None. No Mandatory mode for outbound.
NDRs: 5.4.8 (MX mismatch), 5.7.5 (cert validation).
TLSA record requirements: Certificate Usage 3 (DANE-EE), Selector 1 (SPKI), Matching Type 1 (SHA-256). Usage 0/1 explicitly unusable.
DNSSEC mandatory prerequisite.
NDR codes: 5.7.321 no STARTTLS · 322 cert expired · 323 TLSA mismatch · 324 DNSSEC invalid.
EXO supports TLS 1.3 for all email submissions and server-to-server (per Purview encryption ref, May 14 2026 update). Supported, not mandatory.
POP3/IMAP4 TLS 1.0/1.1 cutoff: July 2026 (MC1293480). SMTP AUTH TLS 1.0/1.1 phasing out separately Sep 2025 → 2026.
Strict DMARC enforcement required (p=quarantine or p=reject).
Verified Mark Certificate (VMC) ~$1.5K/domain/year.
Misaligned for cold outreach — VMC logo signals "broadcast marketing," destroys the 1:1 illusion. Phase 1 verdict: skip.
LLM content analysis — 2026's biggest variable
codified in microsoft learn 2026-04-13 · zero header surfaceThe single most important deliverability change in 2026. Microsoft has deployed a purpose-built large language model as a first-class detection technology inside the EOP/Defender content-filter stack. It runs pre-delivery, synchronously, in parallel with content heuristics.
| family | Phi (Microsoft SLM) |
| size | ~3.8B params |
| candidate | Phi-4-mini-flash-reasoning |
| architecture | decoder-hybrid + SSM + GMU |
| fine-tune | SFT + DPO on MS BEC corpus |
| quantization | FP8/INT8 |
| languages | 23 (Phi-3.5/4-mini standard) |
| claimed accuracy | 99.995% (MS internal benchmark) |
- Payroll fraud
- Invoice fraud
- Gift cards
- Contact establishment ★
- Business intelligence
- PII gathering
- Task fraud
- Phishing (generic)
- Brand impersonation
- BEC generic
- Lateral movement
- Credential phishing
- Malware delivery
- Reconnaissance
Zero. No CAT:LLM, no CAT:BEC, no ContentAnalysis-Verdict header. LLM hits collapse to legacy CAT:HPHSH / PHSH / SPM / HSPM.
Verdict is portal-only:
- Email Entity Page → "Detection technology: LLM content analysis"
- Threat Explorer
- Advanced Hunting
EmailEvents.DetectionMethods
Bounce-header forensics is useless for diagnosing LLM-driven blocks.
None.
- TABL Allow — does not bypass HPHSH
- IPV:CAL Connection Filter — still scans malware + HPHSH
- Safe Senders (SFV:SFE) — not HPHSH
- SCL=-1 transport rule (SFV:SKN) — not HPHSH
- Anti-spam policy allow — not HPHSH
- Advanced Delivery Policy — only path, locked to SecOps + phish-sim
Strategy must be "score clean," never "dodge."
Authentic vs promotional is structural, not stylistic. The same surface signals trigger Microsoft's Contact-establishment classifier AND cause human recipients to recognize template outreach. The defensive copy that scores LLM-clean is the same copy that gets peer replies. There is no LLM-evasion writing style separate from authentic peer writing.
Snowshoe + cohort detection
structure binds, not per-node volumeSpamhaus CSS exists specifically to list senders whose volume is too low for volume-based detection. Verbatim from Spamhaus: "modest volumes of email that do not trigger automated spam blocking filters." Lower volume per mailbox does not bypass snowshoe — cohort structure does.
Detector evidence (academic + vendor)
| source | year | features used | min volume needed |
|---|---|---|---|
| van der Toorn (NOMS Best Paper) | 2018 | active DNS only — NS, SOA, MX, TXT | 0 — pre-send |
| Hao / Feamster (NDSS) | 2010 | BGP/ASN/AS cluster features | 0 |
| PREDATOR (Hao et al. CCS) | 2016 | registration-time info only | 0 |
| Spamhaus CSS | 2009→ | infrastructure correlation | "low volume" |
| Spooren et al. | 2022 | single DNS query | 0 |
| Cisco Talos | 2018→ | registrant-email reuse | 0 |
The 19-variable cohort-defense matrix
- registrar — mix 5+ (Porkbun, Namecheap, Dynadot, Hetzner, OVH)
- registration date — stagger 4–8 months, never bulk-buy
- registrant email — unique per domain
- WHOIS privacy — mixed (some private, some not)
- NS provider — mixed (Cloudflare, Hetzner, Route 53, DNSimple)
- DKIM key — unique per tenant, 2048-bit
- DKIM rotation timing — staggered
- SPF include chain — structurally different
- TLS cert issuer — mixed (Let's Encrypt, ZeroSSL, Sectigo)
- TLS cert SAN — one cert per domain (no bundles)
- Azure subscription parent — different per tenant
- tenant region — varied where business reasons allow
- admin email pattern — unique per tenant
- tenant naming — no shared lexical prefix/suffix
- mailbox password — unique per tenant (not single shared value)
- sender display names — non-overlapping across tenants
- deploy step timing — paced, not batched
- HELO / EHLO banner — Microsoft-managed (uniform)
- SMTP submission patterns — randomized intervals
| CSS listing of at least one cohort IP within 90 days | ~0.95 |
| CSS-driven cohort-wide degradation | ~0.85 |
| M365 EOP cohort SCL elevation | ~0.80 |
| Structure-only detection independent of any volume signal | ~0.60 |
The operator playbook
cadence · copy · monitoring · what to measureCadence equilibrium (Phase 2 P2-3R)
| per-inbox/day | regime | reputation outcome | verdict |
|---|---|---|---|
| 1/day | under-floor | starves Mailbox Intelligence pair-graph; mailbox stays "unknown sender" indefinitely | avoid |
| 3–5/day | equilibrium | reputation maturation + TAM utilization balance | recommended |
| 8–12/day | warm steady-state | burns TAM in months, durable when paired with quality | scaling |
| 30/day | BHW 2026 ceiling | at the operator-community max | aggressive |
| 50+/day | vendor max | reputation damage territory per multiple vendor sources | burn risk |
English peer-tone copy frameworks (Phase 2 P2-9)
Common requirement: every framework opens with a verifiable named-by-name specific in sentence 1–2.
"[Firstname] — Saw the Q1 [filing/report] — congrats on [specific metric]. Curious how [observation tied to filing] is playing into [their stated priority]. Worth a conversation? — [your firstname]"
"[Firstname] — [Specific regulation] hits [date]. Most teams are figuring out [angle]. Thinking [specific reasoning about their org's exposure]. How are you approaching it? — [your firstname]"
"[Firstname] — [Specific public number] at [their company] vs [sector comparison] is interesting. The gap usually means [specific operational reasoning]. Is that solved already, or on the roadmap? — [your firstname]"
"[Firstname] — Saw [new VP/Director hire on LinkedIn]. Usually means [specific operational shift]. Curious if that's the read internally. Worth a brief conversation? — [your firstname]"
Word count by persona
| persona | words | tone | CTA archetype |
|---|---|---|---|
| CEO / Founder | 50–80 | peer, specific, no fluff | reasoning question |
| CMO | 60–100 | peer, business outcome anchored | reasoning question or unilateral give |
| VP Sales / CRO | 70–120 | peer, ramp/pipeline metric anchored | "Worth a conversation?" |
| CTO / CISO | 50–90 | POV-first, NOT pitch | "Curious how your team's handling X" — perspective ask |
Anti-pattern catalog (avoid)
2-touch sequence shape
| touch | day | shape | constraints |
|---|---|---|---|
| T1 | day 0 | opening anchored to specific verifiable referent | plain text · <120 words · reasoning question |
| T2 | day 7 | standalone — substantively new — NOT "following up" | new subject · no In-Reply-To threading · references DIFFERENT public signal |
| — | day 365 | no further touches for 12 months on no-reply | suppression list discipline |
Monitoring stack
| tool | cost | what it shows |
|---|---|---|
| Microsoft SNDS | free | per-IP reputation in Microsoft's outbound pool — opaque on shared pool but register IPs anyway |
| Microsoft Postmaster Tools | free | per-domain reputation (Outlook.com side) |
| Google Postmaster Tools | free | per-domain at Gmail — complementary even if M365-focused |
| DMARC aggregator | free–$200/mo | Dmarcian / EasyDMARC / PowerDMARC — parses rua XML |
| mxtoolbox.com Standard | $129/mo | DNS drift + blocklist monitoring across ~80 RBLs |
| Owned seed mailboxes | $6/seat/mo × 5–10 seats | actual placement at real industry tenants — the ONLY ground truth |
F5000 EU target stack — MX forensics
verified · may 2026Pulled MX records + cross-referenced AppsRunTheWorld / 6sense / Microsoft customer stories / vendor case studies. Zero of nine named targets use Mimecast, Proofpoint, or Barracuda — the SEG distribution at this specific Fortune-5000-EU subset differs sharply from market averages.
| target | stack | filter layer | confidence |
|---|---|---|---|
| Mercedes-Benz Group | self-hosted MTA (corpinter.net) → M365 + Defender | EOP behind on-prem bridge · no third-party SEG visible | HIGH |
| BMW Group | Cisco IronPort (hc324-48.eu.iphmx.com) | SBRS + CASE + AsyncOS 16+ | VERY HIGH |
| DHL / Deutsche Post | Fortinet FortiMail Cloud | FortiGuard Antispam + SRR scoring | VERY HIGH |
| Siemens AG | M365 + Defender P2 + DANE (h-v1.mx.microsoft) | full Defender E5 | HIGH |
| Siemens Energy | M365 + Defender P1 (vanilla EOP) | standard EOP | MED-HIGH |
| Roche Holding | Google Workspace (gene.com MTAs front Gmail) | SpamBrain · not Microsoft at all | HIGH |
| thyssenkrupp | M365 + Defender P1/P2 mix | standard EOP, P2 at HQ | MED-HIGH |
| UCB SA | M365 + Defender P2 | full Defender E5 | HIGH |
| Ferring Pharmaceuticals | M365 + Defender P1 | standard EOP, sp=none soft spot | MED |
| country | Defender | Mimecast | Proofpoint | local anchor |
|---|---|---|---|---|
| UK | high-end | 22–28% | mid | Egress/KnowBe4, Clearswift |
| France | mid | low | low | Vade + Hornetsecurity 18–24% |
| Germany | high | med | pharma heavy | Hornetsecurity, Retarus, NoSpamProxy, SEPPmail |
| Austria | mid | low | low | Hornetsecurity 15–22% |
| Switzerland | mid | low | mid (pharma) | SEPPmail (FINMA banking) |
| Italy | mid | low | banking | Libraesva |
| Nordics | high | low | low | WithSecure, Heimdal |
| Spain | high | low | low | local channel |
Operator claims — verification table
cross-referenced against verified researchExtracted from 3,181 tweets across four operators (Liam Sheridan, OutboundBandit, Termsheetinator, Gat0rtheskater). 627 deliverability-substantive. 176 with quantitative claims. Verdict tags:
| claim | source | verdict | cross-check |
|---|---|---|---|
| "$4.50/mo per tenant, 10 tenants = $40.50 = 5K/day send" | @termsheetinator | verified-match | matches user's config.yaml exactly |
| "<2% bounce + 1–2% OOO = healthy" | @OutboundBandit | verified | Phase 1 P1-32 confirms |
| "cold email 1% / LinkedIn 10% / cold calls 5–10%" | @iamliamsheridan | verified | Salesloft + Phase 2 P2-6 lower band |
| "Question-open lift: 2.7% → 6.2%" | @OutboundBandit | verified | P2-9 CTA hierarchy confirms reasoning-question > pitch |
| "Conversion: 2% × 30% × 50% = 0.3% booked" | @termsheetinator | verified | math + Instantly 2026 benchmark |
| "Google = ~3 mailboxes/domain" | @termsheetinator | verified | Phase 1 + P2-6 consensus |
| "Microsoft = 99 mailboxes/domain" | @termsheetinator | contradicted | BHW 2026 consensus: 3–5/domain max even on M365 |
| "50 emails/day per inbox, still great results" | @OutboundBandit | controversial | above BHW 30 ceiling; vendor max; P2-3R recommends 3–5 |
| "2-week warmup is enough" | @termsheetinator | contradicted | P1-W: 21–42d to neutral, 60–90d to durable |
| "1,000,000 cold emails in 30 days" | @OutboundBandit | unverifiable | operator self-report, no triangulation |
| "$250K from cold email this year" | @OutboundBandit | unverifiable | operator revenue claim |
| "5M emails, $25M pipeline, 1000+ booked" | @iamliamsheridan | plausible math | 0.02% booked rate fits low-end of 1-4% reply × 5-20% pos × 5-30% positive-to-booked |
Full verification table (verbatim quotes, all 176 quant claims) at MSFT_Deliverability/08_PHASE_2_RESEARCH/p2_twitter_operator_claims_verification.md.
Glossary
compauth=X reason=NNN.