Netcode Concepts Part 3: Lockstep and Rollback

A series on general concepts in Netcode

Index

Part 1: Introduction
Part 2: Topology
Part 3: Lockstep and Rollback

Synchronising game state using Lockstep

In Part 2, we covered different topologies, showing how clients connected to each other and/or with servers to share game state and events. In this part, we look at one method used to keep games synchronised while minimising the effects of latency: Lockstep.

Lockstep methods revolve around ensuring that each game client is processing the same game step (or “frame”, or “tick”) at the same time, ensuring that no client is seeing a different or older version of the game state from each other. Each client advances the game state, locked in step with each other, hence the name “lockstep”.

Basic Lockstep

In its most basic form, Lockstep deals with latency by…not doing anything. Literally just waiting for the data to finish transmitting. Do nothing else until all the data has gone where it needs to go. This is Lockstep.

An example of lockstep running on one game client. The client sends out its own character inputs, and then waits for inputs from other players, before processing a game tick. It waits as long as necessary to receive that data.

However, by having to wait for data transmission, the basic version of lockstep comes with a significant limitation: it needs all clients to fully complete their game tick and their transmission before the next tick can happen. If even one client holds up the tick due to latency or a slow machine, all other clients (and server) must wait. The game has to slow down its tick rate to a speed determined by the slowest (or laggiest) client; and if there’s a lag spike in the network, then all games will have to freeze (lag) momentarily.

An example of Player 2 (Orange) lagging. Although there’s some variation in how long each tick lasts, Player 1 (Teal) client must wait for the data to arrive. Both clients stay in step.

Basic lockstep tends to work acceptably on a local network where latencies are low; but on the internet, latencies may cause lockstep games to run slow or lag often. In the example in Part 1, the connection between Sydney to New York has a round-trip latency of about a 200ms. A lockstep game in this scenario would have to run at 5 game steps a second or slower, which sufficient for a turn-based game, but insufficient for anything faster-paced.

In order to address the issues with lockstep, there are several variations and improvements that make the method more usable for games that need faster tick rates:

Asynchronous lockstep

Asynchronous lockstep is where games are happy to run out of lockstep for as long as players do not interact. As soon as players encounter each-other in the game, the game enters lockstep mode.

This method does not offer any real solutions to lockstep, as players inevitably at some point interact, but can keep the network traffic low until certain points in the game.

Deterministic lockstep

Deterministic lockstep is where rather than sending game state updates such as the position and speed of the the player character, the player’s control inputs are sent instead, and the other clients process those inputs to find out how it affects the game state.

For example, where before if a player shoots an arrow, the client would need to send not only the speed and position of the player, but also the arrow to the other clients. In deterministic lockstep, the client simply sends the fact that the player has pressed the “fire arrow” button, and the other clients would deal with spawning an arrow, and processing the arrow’s motion.

This method reduces the amount of data that needs to be sent every tick (as player input data is usually significantly smaller than game state updates).

The “deterministic” part of the name comes form the fact that in order for the games to remain synchronised using just input data, each client must be able to reliably reproduce the same game state. If the input says jump, each client must jump the character and have it move in exactly the same arc across all clients. If the input says “fire and arrow”, the arrow needs to fly the same path across all clients. The smallest difference in result would cause the game to desynchronise. This need for inputs to be fully deterministic turns out to be a challenge with cross-platform games, where the differences in the OS or underlying processor architecture of a platform can cause small differences in calculations that can, over time, accumulate sufficiently to desynchronise the games.

Deterministic lockstep with input delay

While deterministic lockstep provides some marginal benefits in reducing data transmission size, the real benefit comes in allowing for input delay. In regular Deterministic Lockstep, all clients have to wait to receive input data from each other before running a game step. However, if that input data were scheduled to be executed in some future game tick, the game would be able to process other ticks in the meantime, without having to wait.

An example of a three-tick input delay. Inputs are transmitted, and buffered for 3 ticks before being used. This gives data time to transmit

By no longer having to wait, and expecting for inputs to arrive by the time they’re needed, the game may run with as fast a tick rate as necessary. As long as that input data arrives before their scheduled run time, the game would not need to ever enter a wait state.

If Player 2 (orange) starts lagging after INPUT 6 is sent, as long as that data arrives at Player 1’s client before it is needed, there is no impact on gameplay from the lag.

For example in the Sydney-NY example where the single-trip latency was about 100ms, using basic lockstep, the game would have to wait for at least a full round-trip time of 200ms between steps to account for the data arriving and a confirmation being sent out. The maximum tick rate would be 5 ticks a second. Instead, if Input Delay were to be used, the game could run as fast as needed, as long as the delay were long enough for the input data to arrive.

The downside however is that any player input would be delayed, even on the local client. It would delay the player seeing any feedback from their input. In that example FPS game, if the player started running forward, it would take 100ms for that input to be scheduled on the server, and a further 100ms for the result of that input to get back to the player. It would feel like there was an input lag of 200ms. What’s worse is players with good low-latency connections must use the same input delay. They too must experience an input lag of 200ms, even if their own internet connections is better than that.

Can Loopback be improved further? As a matter of fact, yes, the next logical step of loopback is…

Prediction and Rollback

So, what if instead of waiting for the input from other players to arrive, we anticipate what that input would be, and make a “predicted” game state that is displayed? (Bear in mind that in the simplest case, the anticipated inputs can be “no inputs”)

The game processes ticks using predictions of what the input might be, then, when the actual real inputs arrive later, the game state is rolled back to when the prediction started, and re-calculated forward while incorporating the real input data, and the current game state swapped out for the new, reprocessed one.

An example of rollback: When no data is received, ticks are made using a predicted input. When the actual input arrives, the state is rolled back to that point, and reprocessed forward with the input.

The result of this is input delay for the local player can be reduced, at the cost of the player is seeing a predicted game state (which might be entirely wrong) before the other player’s inputs for that tick arrive and the game state is rolled back and updated. This can result in artefacts like rubberbanding or teleporting, and other graphical “glitches”, which occur when the predicted game state is different from the actual one.

To reduce this momentarily erroneous predicted game step, Rollback can be combined with Input delay to split the difference: inputs are delayed by a small amount, and prediction/rollback is used to cover for the rest of the latency. This results in a compromise between input lag and needing predicted ticks.

Rollback combined with 2-tick input delay: this reduces the need to continuously rollback the game state, and only employs rollback to deal with momentary lag. In this diagram, Player 2 (orange)’s input 5 packet gets delayed. This causes Player 1 to need to predict tick 5, and later rollback when it is received.

An additional benefit of this rollback system is the game is now tolerant to unexpected network latency — if the inputs don’t arrive by the time they’re needed, instead of freezing until data is received as was the case in pure Lockstep with Input Delay method, the game simply continues to make predicted game states until the data arrives. Of course, the longer this happens for, the bigger the difference will be between the predicted state and the actual state, and therefore the more discontinuity the player will see when it happens (such as teleporting and rubber-banding).

To make the game feel immediate and responsive while using rollback, games will often play cosmetic animations only for things that happen in the predicted game state. In a shooting game, a successful shot against a predicted state might show a hit marker but not subtract health until the actual state is received; in an MMO, a unit taking attacks may play a flinch animation; in a fighting game, the character taking attacks may take a hitflash or recoil. This makes the game feel like it responded immediately to a local input, even though the actual data confirming it hadn’t been received yet. The predicted state might well have been incorrect, and no damage actually taken, nevertheless as long as it doesn’t happen too often, it can go unnoticed to the player.

Limitations

At this point, Lockstep, when augmented with input delay and/or rollback, performs well enough in real-world scenarios that it can satisfy the netcode requirements of a wide variety of styles of game. It works well for P2P topologies without authoritative servers, as each client makes a direct connection to each other, ensuring data always takes the most direct route possible. But lockstep’s core benefit is also its weakness: that every game must be locked in step with each other.

While we have most of the impacts of latency taken care of, Lockstep still requires all clients to be able to all make exactly the same game ticks as each other. Any missing tick would cause the games to desynchronise. As a result every client connected to the game session must make game ticks as fast as the slowest client. If the slowest client lags (due to processing limitations), all other clients must also wait.

This means for games that push the limits of a computer’s processing power; or MMO games where there can be hundreds of players simultaneously connected, the chance of a single player dragging down the performance of the game session is high. In these cases, a different model is needed. In Part 3, we talk about asynchronous client-server models that are better suited for these tasks.

Demo

A demonstration project is available fro GameMaker Studio 2:

Download and more information from: https://meseta.itch.io/lockstep

🤖 Build robots, code in python. Former Electrical Engineer 👨‍💻 Programmer, Chief Technology Officer 🏆 Forbes 30 Under 30 in Enterprise Technology

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store