Live Multiplayer Games - P2P & Host Migration, a Technical & Cost Analysis of Backend Infrastructures - Presentation by Michal Buras (Lead Network Engineer at Highwire Games)
The following is the transcript from Michal Buras' presentation at Live Service Game Summit (2024.09.10).
The intended takeaway is a better understanding of different networking technologies available and the benefits along with challenges of different game infrastructure. Specifically, peer to peer as a technology and its challenges, specifically host migration, and how that impacts your development resources.
Presentation - Full Transcript
Introductions
Hi everyone,
I'm Michal a today I wanted to show you the journey we went through to create a perfect multiplayer solution for Six Days in Fallujah, specifically our analysis of peer-to-peer solutions and how to manage host migration. Also, we will talk about dedicated server game hosting.
The takeaway you'll have, hopefully, is a better understanding of different technologies available and the benefits along with challenges of different game infrastructure.
Breaking it down, we'll go over peer to peer as a technology and its challenges, specifically host migration, and how that impacts your development resources.
Then we'll look at different dedicated servers hosting infrastructure options, including bare metal, compute virtual machines, and container instances.
About the Speaker
But first of all, a bit about myself I'm Michal Buras, Lead Networking / Online Engineering for Six Days in Fallujah. The game had a very successful launch and we're continuing to develop it over time.
Personally, I've been in game development for six years, ten years in software development. I started in VR for industry initially, but not for games and I wanted to make games. So, I joined the [Multiplayer Games Group] (MPG), where I did some fascinating work on networking amongst other things. And after that I was working as a generalist for Breach Studios, but I really wanted to do networking, so I joined Highwire Games.
Objective of the Presentation
Why am I here, right?
After the initial launch, I was tasked with my team to add host migration to Six Days, as it's key for our experience and for the game to continue being live. What I realized is that there was a lack of information and analysis [available] between the different technologies, namely peer to peer versus dedicated hosting.
Also, being curious, I wanted to evaluate if peer-to-peer [networking] really was the cost-effective solution we keep saying in game dev.
Finally, I hope this exploration would help inform others and grow the knowledge available to the game dev community at large.
Peer-To-Peer Networking in Multiplayer Games - The Basics
I realize not everyone in the room might be engineers, so here's a quick overview on the basics of peer to peer and how to connect players over the network.
Peer to peer, as they say, is supposed to be super simple. The game creates a listen server and with the public IP you connect all players to the server. And unfortunately, the truth isn't as easy as it seems because your public IP is not your own public IP.
This means that the IP address is shared. Your Internet provider groups multiple users under single IP with a process called network address translation.
The only difference is the port number which is temporarily assigned to you when you request a webpage or other resource. For non-engineers, think of the server's IP as an apartment block. However, as there are multiple flats in each building. Every time you order a pizza, you are given a unique flat number, and the number changes over time.
And this causes a lot of confusion.
We need to find a mechanism to get the number assigned and share this information somehow with other players.
Peer-To-Peer Networking - STUN Servers
This solution is called a STUN server. Think about it as Yellow Pages. If you want people to be able to find your business, you have to update the Yellow Pages with your current number.
Unfortunately, this solution is not secure, because our flat number could be shared further, and anyone from the internet could send anything to us. Most of the service providers find this unsecured and block this kind of traffic.
Peer-To-Peer Networking - Relays Servers
The alternative, the only alternative we have, are relay servers. Your private address is not shared with anyone except the server which relays the information between you and the other player.
There are almost compute-less servers, but because they require a certain amount of compute power, you'll have to host them, which means you need a service provider.
There are a lot of services available which deliver relay server solutions which we evaluated, but for this presentation I will talk about only three that I find the most interesting.
Relays: Epic Online Service - Overview
First of all, Epic Online Services (EOS) give away free relays, which means zero upfront costs and zero maintenance costs.
This solution requires least amount of development, they are cross-platform, have a lobby system, and the only thing you are missing for a full game is a matchmaking system.
However, you have absolutely no control over how those players are routed.
A relay server adds an extra node to your game packet that it has to pass through, which introduces latency. Given the limited number of relays, player might be routed to distant ones based on availability at the moment. This high latency is unacceptable in a commercial game, as it leads to poor player experience, and consequently churn. This directly results in frustrated players on Discord and loss of revenue.
Relays: Epic Online Service - Vulnerabilities
Epic Online Services can also be vulnerable.
This is an example from one of the AAA titles from Steam. We have here a group of players in the lobby, and they can exchange information which is handled by WebSocket. When you're making a peer-to-peer game, you usually gather relevant information on the player hosting the game, meaning on his PC, right?
Using a simple application called Fiddler, you can access this lobby information store and expose all the other player's private information in plain text. And this app takes literally two clicks to install and can be used by a ten-year-old kid. So someone could basically scrape the data [from your] database, right?
And this is a massive amount of risk for a commercial game to bear. So be careful what you implement using the lobby system.
Relays: WebRTC
[An] alternative peer-to-peer solution with I found interesting are WebRTC relays. I discovered this infrastructure option through People Can Fly’s Outriders who had an Unreal deep dive presentation a few years ago.
In short, it's a good implementation because they reuse relay servers, but they have to modify the Unreal code a lot to cope with it. Also, they had to host the relays by themselves. It's a custom infrastructure that you need to build, test, optimize, and maintain, which means monthly costs that could have been spent elsewhere.
However, it's unclear if they are really that much better in terms of performance to EOS relays.
Relays: Self-Developed / Self-Hosted
Which brings me to self-developed relay servers. And one of the projects, the idea was that we can make a relay server in Unreal Engine that will serialize packets, maybe analyze them to prevent cheating. And this didn't make any sense at the conceptual level because of additional computational latency and cost.
If you're computing, it's not a relay server anymore. It becomes a form of a dedicated server. So my recommendation, if you want to have your own relay tech just, grab a proxy from GitHub, modify it to be runtime configurable. You don't need to implement any kind of packet repetition, ordering, et cetera, as your game manager should handle this.
But if you take a step back right now and think about all of this. At this point, you're not making a game, you're building infrastructure. Burning money on stuff that isn't helping your game grow with content, better tech, or features.
What's the point?
Host Migration & Impact on Challenges of Relays
Now, all of these services have a fundamental challenge.
You need a mechanism called host migration, because if the player who is hosting the game exits, everyone gets disconnected.
It's a fundamental issue you need to figure out because it literally kills the game session end to end. A massive problem for any commercial game where multiplayer is the key.
We had a massive problem in Six Days with people just randomly quitting the game because our game is a difficult game. You get one shot, one kill, right? And they maybe they are not aware about being a host or maybe they don't care about being a host. So it was a party stopper.
[In] Six Days I've implemented basic host migration using only Unreal engine net driver and work good in its very basic form. I want to get too much into detail here since this is for Unreal geeks, so you can ask me in the in a private chat after, if you're curious.
But the whole project required full refactor in order for all the features to support it. Now, each small feature, each designer made blueprint had to be reworked, and there is no silver bullet for implementing host migration, unless your engine natively supports it, but this will come with a network bandwidth cost.
There was too much of a project management risk for too little to gain in comparison to other alternatives.
The takeaway being, while the upfront of relays were cheap, the host migration solution you have to build becomes extremely expensive. Not so cheap after all.
All this investigation of different technologies and services led us to realize that the "cheap" solution that was peer-to-peer, we'd need to build an entire system called host migration, which won't be seamless resolution and poor user experience, then we'd need to maintain it.
So either you hire additional staff or you move your project release date. In both cases, you burn money. It's all because you're increasing the complexity of literally each part of the game. Our evaluation is that we would need an additional one tester and one dev that would be purely assigned to reworking all the features to proceed further with this.
This would cost us monthly around $15,000. And that will just grow later. And I want you to remember this question, what else we can get for 15K? What else we can get for those development resources?
Dedicated Servers - Infrastructure & Cost Breakdown
The next step was to evaluate other options, namely dedicated servers. Let's compare each option and do a cost breakdown.
Before I go further, let's take a look at this graph. On the left side you can see Six Days’ game sessions in time. This is real historical data on a 24-hour period.
How to read this. For an example, the red plot is North America East and at UTC zero has a peak of thirty-three games. It means there are thirty-three servers of Six Days running in parallel.
Since our game requires two vCPUs or logical threads, this region to sustain would require sixty-six logical CPUs. On the right side, you have a monthly cost of such game distribution. And this would be only one region. So let's now multiply this a few times. And see how many games we could have on the graph if we were to exchange the development resources for the infrastructure.
And the assumed budget was $15K.
Public Cloud (AWS)
First the most obvious, everyone goes to AWS.
Let's see what we can get on the most recognizable public cloud. I have calculated the cost of the peak for each region separately to make this as accurate as possible.
[Six Days has] procedural map generation and very sophisticated AI, our game requires a bit more steam, but it's still 2,200 players daily game sessions. It's not bad, right? And if our game was like Valorant like or Counter Strike, it would be almost 10K player sessions.
Bare Metal
Good old bare metal.
I know some of you might say that bare metal has better performance and could fit 20 percent more. In case of Six Days, my gaming laptop can't run the server on a single logical thread, so the calculation, unfortunately, remains the same.
Maybe something simpler, like Valorant, could do 8K like Amazon.
Anyway, in our case, it wasn't the options. Also, I have to note that there might be some additional flexible bare metal options out there, where you could fit a bit more players, but I couldn't mention in this presentation.
Container Instances - Overview
So I started Googling container instances, as I was wondering if there was any progress in the containerized game server tech, and I found those guys.
As container instances are fully managed, they do not require any engineering costs, I just upload the game image, I don't care about regions, overhead, or any configuration I had to do with other infrastructure technology.
That is how I discovered Edgegap. We were the first one to develop it in such a way it was usable for multiplayer games.
Container Instances - Image Caching
All companies like Azure, because why did I come up with container instances? I didn't come up from thin air.
First time I was doing infrastructure for a VR company I was looking for other options than scaling cloud compute. I found container instances in Azure, but they were super, super slow to deploy.
Because they don't do image caching, which Edgegap do.
I was very happy to find that someone finally implemented the missing thing for container instances. This enables us to handle twice more than on public cloud with the same budget. And again, if this game was Valorant like, $15k USD would close their monthly budget for server infrastructure to host the whole game.
Container Instances - Distributed Orchestration
Another cool thing to note here is that the containers with the containers, we are not bound to any specific data center.
For an example, we can host one game daily for our friend in Cape Town without managing a local Kubernetes cluster which would be unused for the rest of the time. This makes our game more inclusive.
Usually when you make a game you go for the lowest hanging fruit in terms of game regions, right? But if you add all those small locations where people are unable to play because of the high latency to the data center, those small berries become a really good meal.
Since there's a lot of take, talk here about game monetization, you can think about this as an opportunity to open to new markets that were unavailable before.
Public Cloud - Wasted Capacity vs Container Instances
Why are those container instances so good?
Let's have a look at this chart I did. The green line is non averaged real player data. As you see, it has spikes, fluctuations, very hard to predict.
When we are using public cloud solutions, we need to keep an overhead of empty, unused game servers in order to accommodate those players and any potential deviations.
This is why increasing and decreasing the available server pool is very slow. So, in public cloud, the whole area below the red line is what you pay for and the area below the green line is what you should be paying for. The space in red is where the money escapes.
Now that's only for a single region that scales and multiplies by the number of regions you have.
Container instances solve these issues because they are super fast to deploy because of the image caching that I mentioned previously. There is no delay in starting or shutting down, this way we can precisely adjust to the demand in real time.
As a result, the blue line aligns almost perfectly with the player traffic, and you end up only paying for what you use.
Final Comparison
All of this brings me to the last slide of our presentation, on how we managed networking infrastructure in Six Days. We found that when you are playing in a private party with your friend, peer to peer without host migration is sufficient, and that is 40 percent of our traffic.
Conclusion
When I was a kid, a lot of games were peer-to-peer like Counter Strike. And there was, we're peer-to-peer listen servers. And we never had issues with host leaving, because we knew each other, and we didn't want to break the fun. Unless, of course, Mom came into the room and told us to quit the game and do homework, because playing games won't get you a science degree.
Mom was partially right. But for match made games with random people, peer-to-peer mod was a super disaster. Especially for difficult game like ours.
So we dedicated servers on Edgegap for games that are matched made so your player experience is not based on some random guy rage quitting the game.
So, I recommend with all my heart you going hybrid [of Edgegap] with your game if you want really quality along with quantity.
That's it. Thank you for your time and don't hesitate to add me on LinkedIn. Thank you, guys!
Sources and/or content collaboration with
Michal Buras, Lead Networking Engineer at Highwire Games