Creating a third-person zombie shooter with DOTS - Unite Copenhagen

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and you store up by saying that I have never ever done anything even resembling something like this before so I'm quite nervous but I'm also very excited that all of you are here today because at foreign Wharf we are super hyped about dots and obviously we are not alone welcome to this talk creating a third person zombie horde shooter using dots my name is Simon and I am the game director at form of entertainment which is a small game development studio from Gothenburg in Sweden for the past year we have been developing a post-apocalyptic cyberpunk themed zombie survival game which is not just about the score survival but also about facing off against hordes of thousands and thousands of enemies and in order to realize this vision we had to use dots in our game we have two goals with this presentation the first one is that someone who is who might never have even used dots before will have a good general understanding for dots and the general concepts and why it is so powerful the second goal is to show y'all a concrete example of how we talk dots and implemented it in our product the presentation is divided into three parts the first part I will show you some gameplay footage so you get a better context for what we are doing then we will go into the more technical part where I will first talk a little about the problems that we faced and how then we solve those problems using dots and then I will just wrap up the talk in a quick conclusion section so project F and Z because we don't have a name for the game yet let's take a quick look at some gameplay [Music] [Music] okay so our game is a survival game in its core it features a lot of the classic stuff you would see in a survival game like having to fix food you have to stay warm in a shelter etc etc you start off small and you Croft simple weapons like rifles and pistols and other guns and then as you progress through the game and you get access to more exclusive loot you will have to build stuff like semi-automatic rocket launchers tanks armored vehicles etc the world is inhabited by massive hordes of zombies in this case thousands and thousands of them will try to kill you and we really want this to be a good cooperative experience so we've also focused a lot on the multiplayer and we developed a demo last year where we tested this on a bunch of players and what we found out quite early on was that these horde mechanics can turn it a pretty interesting situations if the players do not cooperate plan and communicate correctly our goal is for the players to still feel immensely powerful we want them to be able to slaughter hundreds and hundreds of zombies in a matter of seconds but sometimes that still won't be enough sometimes when you are looting an urban area for example and you trigger one of these hordes your only goal will - just get out of there alive so that you can get your loot back to your base so you can progress in the game all of this takes place in a procedurally generated world the game is tile based so everything the world is built in tiles which can have objects standing on them and walls surrounding them the players are spawned randomly into this world and they are free to their own device to explore and find stuff and loot urban areas and sites and then upgrade their base everything in the game is also moddable so we have exposed all of the content in the game in XML files that the players have access to and this is not just an extra layer that we've added upon the game itself it's something that we use ourselves in order to create the content so for example items in the game all of the objects you see standing around in the world weapons vehicles everything will be mobile and also the algorithms defining how the biomes are generated in the world etc so let's head into the actual dots part of this session in order to better understand what problems we face I have to just briefly talk about the architecture of our game engine so basically we use what is called an MVC pattern a model-view-controller pattern and what this essentially means is that the view is completely separated from the controller you can also see the view as the game client and control as the game server so in our case everything that happens in the game something explodes praefectus or fired things collide the pathfinding for the zombies for example everything happens on the controller Fred the view then is updated by the server over the network with a subset of the model of the game the game model and this is just the subset that the view really needs in order to accurately represent for the player what happens in the game one of the things that we had problems with in the beginning was of course the path finding and a collision avoidance of all of these thousands of zombies because it's Samba is actually one individual entity there is no abstraction in our game of a Samba world and it's the algorithms for the individual path finding and collision avoidance that makes them act in whore-like manner with the controller Fred we can actually run the server on just 5 ticks per second and the view done interpolates all of the data so that it's it flows smoothly for the players the important thing to note here is that it was in the view that we had the major bottlenecks before we started using dots and it's where the dots has really shined for us and this was specifically two particular implementations which I will get to later so first of before we actually had in two dots we had to think about does it make sense for us in our project and actually in our view the entities that we simulate all of these thousands of zombies they are really simple so they're actually quite easy to model in a data oriented manner so it fits our problem description very well we decided to do this for two specific implementations which I will talk about here today the first one is the world object and the world objects is simply a static objects the stand around in the world it can be trees it can be rocks it's all of the walls in the game etc and if we could transform these world objects into EEZs first and that would work very well we decide that we will go on and look at the enemies as well so what you think in a data oriented manner what we have done is that we take a look at object in this case a tree and we try to find out what is the minimal required data that we need in order to represent this object in our game now for the trees in our game the data looks something like this we have two floats for the position an x and a y component and also one float for the rotation which is only for the sea axis because in essence our game is a 2-d game with 3d graphics then we need one float for the scale and we need some 3d data because of course we want to be able to see the trees in the game when we have made up this definition of the minimal data we need we can then turn it into what is called an archetype and an oracle type is basically the easiest class you could almost say you use this in order to define what the tree is and we use this as when we instantiate the tree we will do that using the tree archetype in this case you can see we have two sets of components the first one being the unique components and this is called a local to world components this is basically a component that is shipped with dots and it basically replaces the transform you would have in a mono behavior or a game object it's a unique component because obviously all trees need their own position their own rotation and their own scale we can also say to have shared components and in this case this is the render mesh and the render mesh basically contains the 3d model the material and that kind of data and since in this example we assume that all trees have the same model and the same material they will also have the same render measure so they all share a reference to this one I will show you some code examples today but I would really just blast through them because I won't have time to go into the details I should also mention that the details are not the important part I will focus on in this presentation it's more the general concept but in this example here we can see that we create the archetype but basically specifying what components does a tree have what data does it contain when we instantiate trees using this archetype the system will create what is called an archetype shank and an archetype shank is basically as a pre-allocated space of memory where we will store trees so even if we just instantiate let's say seven trees in this example the system will create an archetype chunk in the memory which can store for example a thousand trees and just leave some of their spaces blank a norica type shank basically contains a set of arrays which contains all of the data for the trees in our game so here you can see for example we have one array with the local two world component which means that each tree is represented by an index in that array we can also see that there is an empty array and the entity array basically contains a unique ID for each entity in our game and some metadata for deciding when the tree is living or when the space in archetype shank is free so as you can see here when we instantiate these seven trees they are just an index in one set of those arrays so the third tree will be indexed to in the entity array the local two world array etc when we create an entity can look something like this in this example we first basically just fetch the model and the material from a prefab and set that in the shared component we also decide what position rotation is scaling the tree should have and then we set the components of the instantiated tree in the bottom here we do the set component for the unique data and the set shared component for the render mesh this is what it can look like quite simplified when we want to instantiate a lot of objects so in this example we can say that the players have moved around in the world and they have triggered the load three new map chunks in this case these new map chunks have 500 new trees standing on them so when it instantiate those what the system will do is that it will find the first archetype shank in our game which has free space available and then simply assign the new tree values to those two arrays I talked about earlier the entity array in the local twirl array and this is of course extremely much cheaper than the traditional Association of game objects in traditional Unity programming it's basically as equivalent to assigning a couple of values into a couple of a race and you have a new tree standing in your world destroying a tree is almost as simple let's say that the player has just shot down the tree in order to gather some wood this tree happens to be somewhere in the beginning of the second archetype shank which we just added trees to it's important that the free space is preserved at the end of the archetype shankha race so what the system will do is that it will find the last living tree in the same archetype shank as the tree we want to destroy then it will simply swap the positions between those trees and flag the tree we want to destroy s that effectively preserving the free space at the end of the archetype shank and keeping the important data that we actually use tight and compact so to recap when using easiest and dot we want to think about objects as data the minimal amount of data required in order to represent them when we have this definition we can create what is called an archetype and the archetype is basically almost a clause it's a definition of what an object is so that we can use that in order to instantiate objects the archetype shanks are basically a set of a race which contains the data for our objects in the memory and instantiating an object is basically equivalent to just assigning a couple of values to these arrays in these archetypes ranks and destroying a tree is equivalent to swapping the position between two objects in these arrays and unflagging it as dead so this completely eradicated all of the problems we had previously with lag spikes because whenever the players would move around in the world you would have this massive lag spikes because we would have to instantiate thousands of objects and destroy thousands of objects in one go so that's pretty good we thought let's go ahead and try to do the same thing for the zombies if we can simply just get easiest to work for zombies we will not have to worry about spawning them and destroying them in large quantities so I thought it would be a good idea to do this part in a table so here we can see the pre dots test we would have 2000 animus and that would be about nine milliseconds of time it would take to update on each frame which is of course a huge chunk of the 16 millisecond time budget we have then we will look at the easiest implementation the easiest plus the job system and finally also inject the burst compiler and for each step we will see what the performance gains were as with the world objects when we create the enemies we have to also define what is the minimum data we need in order to represent an enemy in our game so in our case we need a position and a rotation just like the world objects a position rotation component three floats then we also need a target position because as I mentioned earlier we use 10dc pattern and the actual position of the zombies is on the controller thread on the server and this means that the view have to be updated of this action position and then line nearly interpolates the positions towards this target position we also need a 3d data a model and a material and then there's also a fourth thing called an enemy tag we've got two different enemy types in our game right now small enemies and big enemies and the only thing that differentiates their archetypes is this enemy tag so it's basically as an empty component or small enemies have a small enemy component the beginners at the big anim component and the only reason it's there really is for us to be able to distinguish between these two kinds of entities the most important thing however that the zombies have that the trees do not have is the system and this is basically what replaces the update methods of mana behaviors in easiest as you can see we also marked the animation data and animation systems as orange and because I won't have time to talk about those today but there is a great example project you can look up online featuring a very efficient way to do animation instancing when you do easiest made by Nordea sand unit in collaboration so I would really check that out if you're interested in effect making your animations in easiest more effective in traditional unity when we update zombies before dots it could look something like this the main thread would loop over a big list like perhaps 2000 zombies it wouldn't access each zombie and then call for an update method in one of the mono behaviours the zombie scripts all say and what this effectively means is that each zombie moves themselves around in the world from within the biggest difference between this and the easiest system is that the easiest system is completely separated from the entities themselves it can look something like this to the left here we have all of the archetype shanks in our game right now we have a couple of archetype shanks containing trees there are some rocks and then there are three archetype shanks containing enemies in our own update function in the system we have the code that takes the zombies position and rotation and linear interpolate sit towards the target position and rotation that they have on the server in order for the system to know which entities it will execute this logic on if we have the construct what is called a query and a query is basically a description of the entities you want to fetch in this case we have decided that we want to fetch all of the entities in the game right now which have both a position and a target position component and this will of course filter out all of the world objects because they do not have a target position but it will fetch all of the enemies because we do not care about enemy tag and stuff like that when we decided what enemies to fetch the system will basically fetch those using the query put them into the system and execute the logic that we have specified the query can look something like this in this example we basically are specify that we want all the entities all the archetype shanks containing entities which have all of these components a transform component and a target position component this is not even the main point that the system is not part of the entities themselves what makes easiest so powerful is that it is designed with the hardware in mind and in order to understand why I will just briefly explain how the cache works in the CPU so this is a very simplified sketch of the cache memories and the ROM memory and the CPU closest to each CPU core we have what is called a level 1 cache and that's the smallest but first as the memory where we can fetch data whenever the core needs to work with it so in this case we fetched the position and rotation for one Samba for example if that position rotation component is within the level 1 cache when we fetch it it will be an extremely fast operation however if it happens to reside in level 2 cache that might be about five times slower if it resides in the level three cache it might be about twenty times slower and if we're really unlucky we have to go all the way back to the random access memory and that can be over a hundred times slower and the important part here is not the details the important part here is that to understand if we can use the level 1 cache as much as possible our application might run potentially a hundred times faster and in order to understand why ECS dust is so effectively we have to look at the cache lines so whenever the CPU wants to fetch data that it has to work with it will do this in what is called a cache line and a cache line is usually 64 bytes this is what it would look like in a game object before we started using dots when we fetched a transform a data for the position and the rotation we do this in a vector3 for the position which has three floats and we also fetched a quaternion for the rotation which has four floats which in total is 28 bytes as you can see here if we would loop over a thousand enemies we would load a thousand cache lines into the CPU cache or most valuable memory and we would basically most more than half of it would be trash data that we don't need and this is because in the transform cloud we have the position rotation declared first and then we have a bunch of declarations for variables that we don't really need in fact it's even worse than this because according to our minimal definition of the date that we need we only need three floats for both the position and the rotation which means that when we loop over five thousand zombies for example we basically flush the entire CPU cache and 80% of it is trash data if we look at the archetype chunk memory instead we can see that since all of the data is a lie is placed in a race it will align perfectly in the memory and after the first position rotation component there will be another position rotation component another position rotation component etc and as we can see here when we fetch a cache line using easiest when we fetch the first position rotation we will get four more position rotations loaded into the level 1 cache for free and this means that when we continue looping over these arrays the fetching of the data will be much faster the CPU can also engage in what's called prefetching it's not hard to predict when you loop over a thousand enemies when you have done the five first what happens next you will have five more enemies and five more enemies and even five more so the CPU will start to recognize this pattern and start fetching 2 cache lines instead of one each go which would of course be impossible if you look at the top example here of the game object where we have a little valuable information in the beginning of the memory and then just a bunch of stuff that we don't really need so if we compare this result before we use dots each time the CPU would fetch a position and a rotation from a zombie we would get about 12 bytes out of 64 possible when we use e CS and prefetching is enabled yet every time the CPU fetches a position and a rotation we get a hundred and twenty bytes each fetch and this just based on this very simplified theory our game should be able to do this ten times faster now and if we look at the result it's actually about nine times faster than what we did before so now we can have two thousand enemies in the screen at once and we can do that in one millisecond of time which is of course really good but even if we could stop here and be pretty satisfied with our goals we'd still wanted to look at the job system and the birds compiler and see how far dots can really push this so the job system the job system is very much like an ordinary system here we have an example for two left we have all of the archetype shanks in our game right now the players might have played for a while and they have triggered a word encounter so when we do our query in order to fetch all the enemies we get sixteen archetype shanks these archetype shanks in the job system or what the job system does rather is that it will take this full workload of all of these enemies and it will divide them into smaller jobs and then distribute them across all of the CPU cores and it can look something like this so in this case it created sixteen jobs one for each archetype shank and created queues to all the four cores with four jobs each now if for example the operating system has kept one of the core busy during this time it might be able not be able to do the work as fast as the other cores would that had nothing to do so what the system can do is engage in what's called work stealing and this is basically letting cores that have finished their jobs faster than other cores steal stuff from the other course and in theory of course this means that all the cores will have will work at a maximum capacity until the full updating of all the zombies is done this is what it looks like in code basically we do it in two steps the first one is scheduling so in this example we have defined what a race from the archetype shank we want to use when we update the positions of the zombies and we also prepare some other data for example time.deltatime some constant values for speeds etc and then we basically tell the system for this group of archetype shanks schedule these jobs and yeah distributed over the course and this is the exit good function so this is basically the code that we run for each of these jobs and instead of being for the whole lot of zombies this time this is on a shiny by shanky basic basis so this code will basically loop over the Rays in one archetype shank and update the positions of our enemies with the job systems in it improves the performance by an additional factor of five so compared to before dots it's now thirty five times faster for us to simulate enemies in our game which at this point has a result of two thousand song bees in 0.2 milliseconds and that's also really good but the thing is that we can also inject a burst compiler and what's so interesting about the burst compiler is that if you have just done your easiest in your job system correctly it's extremely easily to get it to work the burst compiler is the closest thing I've been to magic since I started coding 15 years ago and I have a very limited understanding of exactly how it works and what it does but in general it allows us to decide to declare for unity to compile certain code snippets in two extremely optimized machine code and used to give an example of this here we have a classic assembly example of adding two floats together it will take one float and another float add them together and you will get the result with the burst compiler we get access to what is called single instruction multiple data instructions and if you look something like this so it can in one go take eight floats add them to eight other floats and get eight results and of course if all of these floats would be fetched from the random access memory for example the the gain you would get from this kind of instruction is not that good but since we have done our stuff in easiest and most of our data is an 11-1 cache when we fetch it this becomes extremely fast my favorite thing about the birth compiler is how we implement it so basically all we have to do is to declare for unity that we want to compile this certain code using the burst compile and it looks something like this it's basically as an injection and what about performance gains when our CTO Anders showed me this the first time it completely blew my mind because the burst compiler takes this 0.2 milliseconds and turns it into this we couldn't use 2,000 enemies anymore to make a good measurement for the time it took so we had to bump it up to 20,000 and at 20,000 we do went to have 20,000 enemies in 0.04 milliseconds which is a performance gain of 2,250 times faster than before we started using lots which is just insane yeah so to recap Orakei type chunks store data in a race of data and what this allows us to do in the systems when the system loop over these arrays in order to execute logic on our entities most of the data will reside in the level 1 cache in the CPU which is extremely much more faster than if we would do it in an opiate or written manner which would lead to more level 3 cache hits and a level and random access memory fetches the job system makes sure that all of these updates are executed on all of the CPU cores and hyper frets that we have available in our hardware and in theory it should keep the processor at its maximum capacity throughout the whole job execution the burst compiler further improves this by compiling our code into extremely optimized machine code and it also targets the CPU architecture of the machine we are running at the moment so if you run a different machines the burst compiler will lead to different results and it will make sure that you have access to all of the instructions available on that particular CPU so let's take a look at a stress test in this video gifts you will see here I'm not sure about exact amount of zombies but it's about five to seven thousand and in general we run at a pretty fairly stable 60 frames per second which was our goal there are still some problems you might notice there are a couple of frame drops here and there but that's mostly due to garbage collection spikes and other things that we have to work on going forward but the 60 frames per second goal can be said to be pretty matte if we would run this before we implement the dots if we completely trash even a fairly high end computer so our game objects dead and this was one of the questions we asked ourselves before we started using dots do we have to rewrite our entire code now and make it into dots code and that's not simply the case a lot of the stuff in our game or still game objects the player is a game object the vehicles are game objects even world objects that have light sources or particle systems attached to them or game objects and adults hybrid is viable we have implemented an infrastructure in our game that allows us to create objects that are either ECS and this or game objects without even thinking about it so when our content creators and artists go into our code and want to create entities they don't have to think about dots they can basically just put in their 3d mesh create an entity in the XML I showed you earlier and there's a boolean there that you set true or false if it's an entity or a game of it and in general we want to really encourage everyone to think in a more data oriented manner not to think about objects as a human intuition perhaps would do like an object is a set of components that all have their own logic executing etc but actually designing our code so that it designing our code with the hardware in mind yes so basically I would really encourage all of you to try dots if you've been thinking about it and have been a bit hesitant because maybe if you are successful all of your performance issues could quite literally almost be completely gone and that's pretty much it for this presentation so if you have any questions or want to discuss this the easiest way to do that or the best way to do that is to communicate directly with us on our discord so you can find us on Twitter and there you will find a link to our discord server you can also just send me a mail and I will try to answer the questions as well as I can or I will simply forward them to our CTO who will happily help you with that thank you [Applause] [Music]
Info
Channel: Unity
Views: 38,355
Rating: undefined out of 5
Keywords: Unity3d, Unity, Unity Technologies, Games, Game Development, Game Dev, Game Engine, DOTS, Entity Component System, Burst Compiler, C#, C# Job System
Id: yTGhg905SCs
Channel Id: undefined
Length: 32min 35sec (1955 seconds)
Published: Mon Oct 07 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.