Friday Facts #76 - MP inside out

Posted by Blue Cube on 2015-03-06

Today's edition of the Friday Facts has been written by Blue Cube, enjoy!

Hello fellow Factorians!

I'm breaking away from our magnificent testing / team building session here at our office to bring you more babbling about the development of your favourite game.

This time there will be less of the regular "fixing bugs, fixing multiplayer, designing spaceships" theme from the past weeks and the post will be a little more technical, focusing on the workings of our magical multiplayer code.

Lock step

As you have probably noticed, since the last major release (0.11.0) the game can be played over the network. There has been a lot of discussions on the forums concerning the lock step architecture, so let's start with that.

In the lock step architecture each of the networked peers is running the simulation of everything that happens in the world and there doesn't need to be any central server; when a player makes an action, only the action is somehow transferred to all other players.

The biggest advantage of lock step is the low amount of data sent over the network. Because people with keyboards can only generate a few hundred bytes per second, this approach scales really well for large maps. You can play the game just the same no matter if it has hundred objects or million, which makes this method very attractive for strategy games (AoE, Starcraft and others have used this approach).

And because nothing is perfect, there is obviously a price to pay for the low traffic. In regular games you don't care that much if enemy's health is 0.0001% off, or that the rocket exploded tiny bit more to the left than it should have. Computer's generally don't do things at random, but if the programmer is not careful enough, unpredictable events can leak into the game and cause these problems. ... and because with lock step architecture you never directly see the other player's game state, there is no way to correct for these small errors and eventually they might accumulate and cause both players to see a completely different game. When such errors appear it is what Factorio players got to know as Desync.

There are obviously many other ways to make a game work over the network, one of the most used ones being the client server.

Client server

In the simplest form of the client server architecture the game runs only on the server and clients serve as something like a remote control, periodically sending a snapshot of the game state to every client. The main problem here is that for every action there must be a message sent to the server and back to the client before any results become visible.

To work around this, most modern FPS games since Duke Nukem 3D use something called client side prediction. Client side prediction basically returns the whole game processing to every client and every time an action is made, the client both sends the action to the server and applies it manually without knowing what other players did. When later the server sends a new game state, the client modifies the local state to smoothly merge it with the received one. Rinse and repeat.

Implications for us

As I said before, Factorio uses lock step simulation. This allowed us to make the game playable over the internet with hundreds of thousands of active entities without resorting to any major hacks / optimizations. We also decided to make the game completely peer to peer, which has some interesting consequences.

One of the negative sides is that every player needs to have an open connection to every other player and send the data. This becomes a problem when playing over internet and not all of the players have public IP address (although we also have NAT punching which allows you to play even in this case and works almost every time). The biggest issue with pure P2P is when a group of players want to play over LAN and another group wants to connect to them over NAT. In these cases Factorio gets confused and completely refuses to connect.

Most of these problems, however, can be limited by partially moving from the pure P2P later. For example if two peers cannot connect directly, one of the others can serve as a proxy for them.

The most fundamental limit of lock step architecture is that the game speed is limited by the slowest player. Because to finish a frame input from all other peers needs to be processed, a peer who can't run the game fast enough will slow the game down for everyone. In client server the server can just choose to ignore the slow client, in Factorio ignoring them would cause the game to break for everyone.

To help with this, in Factorio we implement sort of a buffer time interval (called "latency" when starting the game). This determines amount of time that a peer can wait for anyone's messages without lagging the game. Unfortunately this also causes the game to delay all local actions by this time.

That is it

So i hope this post did not bore you to death (it was both shorter than expected and longer than expected at the same time), there might be more technical posts coming in the future if there is demand for them. Next week you can look forward to some of Kovarex's or Slpwnd's wisdom.

... and of course, we are still fixing bugs, fixing multiplayer and designing spaceships, don't worry.

The comment thread is at the forums as usual.