Austin Spires

April 23, 2026

Compute, Network, Storage -- What Changes in the AI Era


Cloud computing gave us a useful mental model: compute, network, and storage as programmable primitives, available on demand. That model still works.

What changed is the stress profile under AI.

AI did not delete the three pillars. It bent them hard enough that old assumptions now fail in subtle and expensive ways.

From where I sit, this is not an abstract architecture debate. It is the same pattern we have seen repeatedly in infrastructure transitions. Demand shifts first, bottlenecks surface second, and operating models catch up last. At Fastly, we lived versions of this every time traffic patterns changed faster than planning cycles. The names of the services evolve, but the discipline required to serve real traffic at scale does not.

Compute: from centralized heavy lifting to network-adjacent execution

In the cloud era, most workloads could rely on general-purpose CPU fleets. You scaled by adding instances, containers, or functions. If your architecture was reasonably decoupled and your autoscaling policy was sane, you could usually recover from mistakes with more replicas and better queue handling.

In the AI era, a lot of practical value does not come from training workloads. It comes from what happens before and after model calls: request shaping, auth and policy checks, routing decisions, feature flags, cache key normalization, prompt guardrails, response transformation, and graceful fallbacks when upstream systems wobble.

That is lightweight compute, but it is not low impact. Done close to the user and fused with the network path, it removes unnecessary origin trips, reduces tail latency, protects expensive model backends, and turns raw traffic into intentional traffic.

For most teams, this is the compute shift that matters most right now:

  • Less "where do we train?"
  • More "where do we make each request smarter before it gets expensive?"

This is exactly where compute starts to look like a fusion of network and logic. The best place to run this class of work is often the same place you already terminate TLS, enforce security controls, and manage cache behavior: the edge.

Network: from north-south traditional flow to east-west as the norm

Most web architecture was optimized around north-south traffic: users to applications, applications to APIs, APIs back to users. Latency mattered, but jitter was usually manageable and packet loss was often recoverable at the application layer.

AI traffic patterns are opening more use cases.

As assistants and agents fan out across APIs, the request graph gets wider and more bursty. One slow hop can still degrade the user experience, but now the fix is often upstream traffic discipline: better admission control, smarter retries, tighter timeout budgets, and early response shaping before requests explode downstream.

This is the part that many teams can underestimate. They still think in "throughput" and "regional failover." Those are still critical, but now we also need to think of the network as an active control plane for request quality, not just a passive transport layer.

At the same time, user-facing traffic keeps growing in complexity: more API calls, more multimodal payloads, more automated clients, more machine-to-machine request chains. The global internet did not pause for AI. If anything, AI multiplies the same internet behaviors edge platforms were built to manage.

That evolution is exactly where edge architecture becomes strategically useful. If model-heavy systems increase backend concentration, then distributing intelligence for routing, caching, security enforcement, and request shaping closer to users is one of the few ways to keep end-to-end behavior predictable under load.

I'm biased here -- Fastly's well suited for this paradigm shift. If your architecture assumes all intelligence belongs in a handful of giant regions, you are accepting a blast radius and latency profile that modern products do not need to accept. Push lightweight decisioning to the edge, keep heavy model execution where it belongs, and treat the network path as part of your compute strategy.

Storage: from durable capacity to throughput and locality strategy

Storage used to be the "easy" pillar to explain: choose object, block, or file; decide durability and lifecycle; tune cost. Most developers just went with whatever was easy on AWS.

AI turns storage into a real performance concern.

Training pipelines need high-throughput access to massive datasets. Inference systems need low-latency access to model artifacts, embeddings, and retrieval corpora that may change frequently. Governance teams need lineage and retention controls because model behavior now depends on data quality and recency in visible ways.

The result is that storage architecture decisions increasingly look like application performance decisions, not just data management decisions.

Three shifts stand out:

  • Locality pressure: where data sits relative to compute and users matters more than before. Not even including data sovereignty concerns in our current moment.
  • Freshness pressure: stale context can degrade output quality faster than stale web content degraded user experience.
  • Cost pressure: egress and replication policies become product decisions, not back-office defaults.

For operators, this should feel familiar. We already learned that caching policy is product policy. AI simply extends that lesson to a broader data plane.

What this means for the next two years

I am optimistic here, and not in a "everything will magically work itself out" way.

We have solved hard transitions before. We know how to build abstractions around messy infrastructure. We know how to instrument systems, isolate failure domains, and improve resilience while demand scales.

But optimism without operational honesty is just hope, and "hope is not a strategy."

If you are leading platform, security, or architecture decisions, I think four questions matter right now:

  1. Are we modeling AI as a new product surface, or as a multiplier on existing traffic and API paths?
  2. Which parts of our system assume human request pacing and break under automated fan-out?
  3. Where are we over-centralized, and what should move closer to users for latency and resilience reasons?
  4. Do our cost models reflect east-west traffic, multimodal payload growth, and egress reality?

The teams that do well in this cycle will not be the teams with the loudest AI announcements. They will be the teams that connect model ambition to infrastructure reality faster than their peers. That was true in the early cloud era. It is true again now.

The three pillars still stand. But in practice, compute and network are converging at the edge into one operational surface: inspect, decide, transform, protect, and route in real time before origin and model spend begin.

Keep the heavy lifting centralized when it should be, and move the high-leverage logic outward to where requests actually enter your system.

← Back to writing