MCP in production: what's about to hurt with the 2026-07-28 spec

We spent the last three months migrating five client MCP servers to the new spec. I'm writing this because the 2026-07-28 release candidate dropped a few days ago, and most teams we talk to haven't realized what's coming.

Here's the spoiler: if your MCP server still keeps state in memory — and 80% of the servers we've audited do — you have roughly two months to get your house in order before your clients start complaining.

The big one: the protocol is now stateless

I'll cut to the chase. The 2026-07-28 release made a radical call: MCP is now stateless at the protocol layer. Six SEPs (Specification Enhancement Proposals) work together to get there. It's the biggest change since the protocol launched in 2024 (official announcement).

What does that mean in practice? Your server can now sit behind a vanilla round-robin load balancer. Any request can land on any replica. The client caches tools/list as long as your ttlMs permits. No more sticky sessions. No more "why isn't the client landing on the same node".

The old world looked like this:

Client ─── persistent SSE session ───► MCP Server (sticky session)
                                              │
                                              ├─ in-memory state
                                              └─ subscriptions

If you wanted to scale horizontally, you either duplicated state across nodes (nightmare) or forced session affinity at the infra layer (complicated and brittle). We've seen teams burn three sprints just to make that work properly with Redis and custom serialization.

The new world looks like this:

Client ──HTTP + Mcp-Method header──► [Round-robin LB]
                                          │
                                          ▼
                              Replica 1 / 2 / N
                                          │
                                          ▼
                              Shared store (Redis / SQL)

We flipped one client server early April. Ops cost dropped by a third because the Kubernetes autoscaler could finally do its job. Pods spin up and down without us worrying about who's talking to whom.

If your code contains Map<sessionId, ...> anywhere, you know what's on your plate until July.

Auth, finally serious

The old MCP auth model was fine for a PoC. For an enterprise deployment, it was borderline embarrassing. The new spec aligns everything on OAuth 2.1 + OpenID Connect, with two changes you need to grok:

First, mandatory validation of the iss parameter on authorization responses, per RFC 9207 (SEP-2468 for the number nerds). It's a low-cost mitigation against a "mix-up" attack class that specifically targets the MCP single-client → many-servers pattern. Picture Claude Desktop talking to your GitHub server, your Slack server, and your custom CRM server. Without iss validation, a token meant for GitHub can technically be presented to your CRM and accepted. Exactly the kind of bug you don't catch in dev and that blows up at 3 AM.

Second, mandatory PKCE for public clients, and OIDC-aligned scopes. If your MCP server has no scopes defined today, now's the time to add mcp:tools:execute, mcp:resources:read, and friends.

Here's the baseline Node/TypeScript middleware we now copy-paste on new projects:

import { validateIssuer } from "@modelcontextprotocol/auth";

export async function mcpAuthMiddleware(req: Request) {
  const tokenInfo = await validateIssuer(req, {
    expectedIssuer: process.env.MCP_OAUTH_ISSUER,
    audience: process.env.MCP_RESOURCE_ID,
  });

  if (!tokenInfo.valid) {
    return new Response("Unauthorized", { status: 401 });
  }

  if (!tokenInfo.scopes.includes("mcp:tools:execute")) {
    return new Response("Insufficient scope", { status: 403 });
  }

  return null;
}

Nothing magic. But it's now the non-negotiable minimum.

Discovery via `.well-known`, or how to be found by agents

This one's the most underrated part of the release, I think. The official MCP registry has nearly 2,000 public servers a few months after launch. The 2026 spec formalizes MCP Server Cards — a metadata file you expose at https://your-server.com/.well-known/mcp-server.json.

{
  "name": "sifo-crm-mcp",
  "version": "2.4.1",
  "protocolVersion": "2026-07-28",
  "capabilities": {
    "tools": true,
    "resources": true,
    "prompts": false,
    "apps": true
  },
  "transport": ["http", "websocket"],
  "auth": {
    "type": "oauth2",
    "issuer": "https://auth.sifo-consulting.com",
    "scopes": ["mcp:tools:execute", "mcp:resources:read"]
  },
  "rateLimits": { "perMinute": 600 },
  "contact": "mcp-support@sifo-consulting.com"
}

Why does this matter? Because a crawler — registry, browser, autonomous agent — can discover your capabilities without opening a connection. It's the agent-era equivalent of an enriched robots.txt. If you care about your company's GEO visibility (and if you're reading this blog, you should), it's a must.

We rolled it out on the MCP servers we operate for clients. Submission to the official registry: 20 minutes. First discovery hits arrived within the week.

Two extensions worth knowing

There are two new extensions in this spec I want to call out quickly because they unlock use cases we'd been waiting for.

MCP Apps, first. A server can now return not just JSON, but UI fragments (constrained HTML or JSON-spec UI) that the client displays in a dedicated area. Concretely: your "billing" MCP server can return a "pending invoice, click to approve" widget instead of text the agent has to reformat and the user clicks awkwardly.

Tasks Extension, second. For operations longer than a few seconds — video rendering, batch ETL, CI/CD deploys — there's finally a clean enqueue → poll → result pattern with proper resumability semantics on the client side. No more hand-rolled SSE callbacks. We'd built three or four custom ones for clients. It's frankly nice to be able to drop the duct tape.

And a deprecation policy

Before this release, breaking changes shipped with no structured warning. You had to track the GitHub repo, skim release notes, and hope nothing slipped. Now: announce in spec N → mark deprecated in spec N+1 → possible removal in spec N+2 at earliest. That's what you expect from a serious protocol. For platform teams planning 12-month roadmaps, this changes the game.

The migration path we follow

Bottom line from the five migrations we've done since February: expect two to three weeks for a medium-complexity server. Here's the breakdown that works:

You start with an audit (two solid days), listing every piece of in-memory state, your current OAuth scopes, your exposed capabilities. This is the most important step because you'll uncover things nobody remembers why they exist.

Then you externalize the state — typically a week of work, because you often have to rewrite code written under the "same process" assumption. We usually move to Redis for short-lived sessions and Postgres for durable state.

Three days for auth hardening: iss validation, PKCE, aligned scopes. One day for the Server Card and registry submission. Two days for load testing with k6 or Vegeta to validate that the round-robin LB holds up under load.

That's two to three weeks to go from an MCP server that "works" to one that "scales". The ROI is immediate: OAuth auditability, horizontal scaling, and discoverability.

Three traps that cost us time

I'll close with three things that cost us time so you can skip them.

Overly aggressive tools/list cache on the client side. The server advertises a long ttlMs for perf, but forgets to invalidate when a deploy changes the tool schema. Result: for 10 minutes, the agent tries to call tools with an outdated signature. Our fix: also expose a tools/list/etag and use If-None-Match on requests. Client revalidates cheap, you keep the cache, everyone's happy.

Misconfigured auth issuer for multi-tenant servers. The iss must reflect the tenant, not the global app. Otherwise, two distinct tenants might see their tokens accepted on the wrong resource. We caught this in staging at a client. Chilling.

Tasks extension used as an excuse for sloppy writing. If an operation can run in 1 second, don't put it behind async tasks — you're adding a round-trip and a potential timeout for nothing. We saw a team asyncify their entire CRUD "because it's more modern". Tripled the latency on simple ops. Don't.