Game Backend Deep Dive - Overwatch

Deep dive of Overwatch's (2016) multiplayer game design and netcode architecture by Tim Ford.

Key Insights

Key Insights

Key Insights

  • ECS as a Netcode Enabler: Overwatch's Entity Component System architecture reduced the entire gameplay netcode surface to just three systems out of hundreds, making an inherently complex problem tractable.

  • Command Frames and Adaptive Tickrate: The simulation runs on fixed 16ms command frames (dropping to 7ms in tournament mode) — a deliberate tradeoff between responsiveness and server cost that any multiplayer studio should understand.

  • Rollback Requires a Server Worth Trusting: Overwatch's reconciliation system works because a dedicated, authoritative server holds the ground truth. Without it, the entire rollback-and-replay model simply cannot exist.

  • Time Dilation as a Packet Loss Shield: When the server detects input starvation, it signals the client to simulate slightly faster, flooding the input buffer before loss can cause mispredictions. This feedback loop runs constantly, invisibly.

  • Predict Everything, Including Rockets: Overwatch predicts all abilities and projectiles by default, opting out only when necessary; an unusual approach in 2017 that made the game feel dramatically more responsive.

Tim Ford, now Studio Head at Kintsugiyama and previously Lead Gameplay Programmer on Overwatch since the game's inception in summer 2013, presented Overwatch's gameplay architecture and netcode at GDC 2017. The talk covers how Blizzard built a server-authoritative, prediction-heavy multiplayer system on top of a strict Entity Component System architecture, and why that architectural choice turned out to be inseparable from the netcode's success.

Indirectly, this highlights best practices that any game studio, big or small, can add to their multiplayer to help improve its online architecture. Let's peek inside.

What Is ECS and Why Does It Matter for Netcode?

Entity Component System architecture separates a game world into three distinct concepts. Entities are just IDs;  nothing on their own. Components are pure data containers attached to entities; they store game state and have no behaviors. Systems hold all the logic and run against any entity that has the specific combination of components the system cares about.

This strict separation forces every behavior to live in a clearly defined, isolated place. As Ford put it, the goal is a "pit of success", i.e., an architecture so constrained that you are pushed toward writing consistent, maintainable, decoupled code almost by default. In Overwatch's production codebase, with around 46 client-side systems and 103 component types, only three systems are responsible for gameplay netcode: movement, weapon, and state script. The rest of the engine doesn't touch it. That isolation is what makes the netcode maintainable as the roster of heroes grows.

Worth noting: ECS is a pattern worth exploring in depth on its own. We may cover it in a dedicated article, as its implications extend well beyond netcode.

Command Frames, Tickrate, and the Cost of Responsiveness

Overwatch's simulation runs on fixed command frames of 16 milliseconds each, roughly 60Hz. In tournament configuration, that drops to 7ms, closer to 128Hz. Each frame, the client consumes player input as close to the present moment as possible and sends it to the server. The client's clock is always ahead of the server by half round-trip time plus one buffered command frame. At 160ms RTT, that's about 96ms of lead time.

This matters for cost. A higher tickrate (i.e., shorter command frames ) requires more server CPU cycles per second and more network bandwidth per match. Dropping from 16ms to 7ms in tournament mode is not free. It's a deliberate choice reserved for competitive play, where the marginal responsiveness improvement justifies the additional infrastructure spend. For general matchmaking, 16ms strikes the right balance.

That extra cost scales linearly with tickrate. Infrastructure that deploys servers closest to players helps keep latency low without needing to compensate through tickrate increases alone. Edgegap's game server orchestration platform reduces average latency by 58% by placing servers at the network edge nearest each player population.

Rollback Only Truly Works with a Dedicated Server

"This is the most important problem netcode for gameplay engineers we had to solve," as Ford described it. The objective is a responsive networked action game, which means predicting player actions locally. But we can't trust the client with simulation authority, because, as Ford noted plainly, "some clients are jerks." As highlighted in our articles, peer-to-peer networking architecture is a haven for cheaters who can manipulate player-hosted game servers, and even relays have security risks.

The solution is a trusted, authoritative dedicated server.

When a client mispredicts, say, it thought the player was running but the server determined they were stunned, the client rolls back to the server's authoritative movement snapshot and replays every buffered input forward to the present moment. Because the character movement simulation is highly deterministic, this replay reliably reproduces the correct state. The correction is smooth and, in the vast majority of cases, invisible.

This system simply cannot exist in a peer-to-peer or relay-based architecture. There is no ground truth to roll back to. The dedicated game server running in the cloud is not just a convenience; it is the structural prerequisite for the entire prediction and reconciliation model. Without it, you choose between trusting clients or making every player wait for server confirmation before seeing any response to their input.

For more information on rollback, make sure to read our article about rollback netcode here, alongside insights on how to mitigate latency where we compare rollback netcode vs input delay.

Time Dilation: Fighting Packet Loss Proactively

When packets are dropped and the server runs out of input to simulate, it duplicates the player's last known input and hopes for the best. This creates mispredictions. Overwatch's answer is not to absorb the damage; it's to prevent it.

When the server detects input starvation, it notifies the client, which begins dilating time. Instead of a 16ms fixed timestep, the client treats it as approximately 15.2ms, simulating slightly faster and pouring more inputs into the network pipe to build up a buffer on the server's side. Once conditions stabilize, the client dilates back in the other direction, gradually draining that buffer. This feedback loop runs constantly.

Paired with this is a technique Ford traces back to Quake World: a sliding window of inputs. Rather than sending only the current frame's input, the client bundles every input since the last server-acknowledged movement state into a single packet. Because players typically hold keys rather than tapping at 60Hz, this compresses very efficiently. If a packet is lost, the next one still carries all the missing inputs, filling holes before simulation runs. Two complementary systems. Neither novel on its own, but together they make packet loss largely invisible to the player.

Predicting Everything, Including Rockets

Most multiplayer games predict player movement and stop there. Overwatch predicts everything (i.e., movement, all abilities, and weapons) by default. Teams have to explicitly opt out of prediction for a specific ability. This philosophy extends to large, visible projectiles like Pharah's rockets.

At the time, GDC talks from studios Blizzard respected explicitly advised against it. Rockets are big, physical objects in the world, not tracers, and a misprediction makes them visually vanish. Blizzard figured out a way to do it anyway. "Predicting rockets is rad," as Ford put it. "It feels really good." Occasionally a rocket disappears. There is the odd forum post. Completely worth it.

The lesson for any shooter developer: default to prediction. Opt out selectively. The responsiveness gains are real, and players feel them even if they cannot articulate why.

Hit Registration: Rewind Only What You Need

When a player fires, the server rewinds the world back to the shooter's frame of reference before computing whether the shot connected. This backwards reconciliation is a well-established technique. What's less common is how Overwatch manages its cost.

Rather than rewinding every entity in the scene, a spatial bounding volume representing approximately the last half-second of each entity's movement is checked against the bullet ray first. Only entities whose bounding volume intersects get fully rewound for precise hit calculation. This culls the vast majority of candidates before any expensive computation runs.

Above approximately 220ms round-trip time, hit impact prediction is disabled entirely. The system switches to extrapolation (e.g., dead reckoning the target's position based on last known trajectory) rather than rewinding a target so far back that a victim who successfully dodged behind cover could still die. It's a deliberate fairness clamp.

Deferment: One Side Effect, One Call Site

Overwatch defers large side effects rather than invoking them wherever they're needed. The principle is simple: when a behavior triggers a big chunk of work, ask whether that work has to happen right now. Usually, it doesn't.

The clearest example is impact effects. Multiple systems across the engine need to spawn surface impacts, from hitscan bullets to explosive projectiles to beam weapons. Each creation touches entity lifetime, scene management, and resource management. Rather than scattering this logic across a dozen call sites, Overwatch queues pending contact records into a singleton component. Once per frame, before render prep, a single system processes the entire batch.

As Ford summarized the lesson: "behaviors are much less complex if they are expressed in a single call site in which all major behavioral side effects are localized to that call site." The benefits stack up beyond clarity. Better cache locality, the ability to impose per-frame budgets on effect creation, and the ability to smooth spikes when many effects are requested simultaneously.

Additional Insights for Multiplayer Game Developers

  • Explicit component access enables multi-threading. If systems declare upfront which components they read versus write, the scheduler can safely run non-conflicting systems in parallel. Ford noted retrospectively that Overwatch's ad hoc approach obscured this opportunity. Enforcing strict tuple definitions from the start would have surfaced complexity earlier and made safe parallelism much easier.

  • The simulation clock must be fixed, even if the renderer isn't. Overwatch's simulation runs at a mandatory 60Hz. If the renderer drops to 30fps, the engine runs two simulation ticks that frame. The simulation is far cheaper than rendering, so this is manageable, but the fixed simulation rate is non-negotiable for determinism. And if the server can't keep up, Ford warned it enters "a death spiral" — each late frame forcing more ticks, compounding until the server falls apart entirely. Optimization isn't optional.

  • Architecture rules take time to discover. It took Overwatch's team about a year and a half to settle on their ECS rules. Code that predated or violated those rules remained the ongoing source of the most bugs and maintenance burden. Define and enforce your rules of engagement as early as possible.

---

This article is based on and cites the original GDC 2017 presentation by Tim Ford, published on YouTube. All rights in the original content are owned by their respective owners.

Written by

the Edgegap Team

Get your Game Online Easily & in Minutes

Start Integrating Now!

Get your Game Online Easily
& in Minutes

Get your Game Online Easily & in Minutes