Unleash the power of Unity DOTS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
yeah hello everyone welcome to exa boot camps today open lecture slash workshop about a very exciting topic today um and thank you so much for holonautic of course for joining in and for unity bringing brian thank you brian for being with us today and for answering all these um questions about dots that will come up yeah i mean agenda pretty clear i mean about us about exa bootcamp i think um you've probably heard about our open lectures before we are always um bringing great speakers like today to talk about new topics upcoming topics that may be interesting for us xr via ai developer and in the end and that's very useful for everyone here i think that you can ask your questions you have about the topics being discussed please make use of it um use the q a tool uh that you have here in zoom and submit your your questions as soon as possible so that we can have a chance to to prepare the answers uh yeah about us about exa bootcamp so um what is special about this is that we have a very very great community and thank you all for showing up today i think right now i'm seeing over 200 people um in in this room so that's awesome so feel free to also connect in our discord server in our discord server maybe someone can post the link to that and also um yeah exchange and share your knowledge with each other so that we can all advance the industry together and yeah our courses our master classes our very project for grocery technical focus so that you can directly use them on the job and yeah we have a lot of industry level advice to actually create our curriculum so it really makes sense um what you're learning [Music] so yeah our alumni are always very happy and we have some of them also here um attending today uh so even if you've already learned a lot of unity before and are working on it on an everyday basis i think you can still learn new things in our master classes and we'll be very happy with the results and um yeah feel free to also check our youtube channel for for more reviews and yeah we have all these companies which are visiting our classes um every time and uh yeah so this is also a good um good opportunity for you to network with with more um people in the exile industry and yeah fair and do you want to maybe say a few words about our upcoming courses yes thank you rahel uh yeah every time we we are organizing these open lectures and it is great to see more and more people joining in such even sophisticated and maybe even uh quite specific classes so we see that there is a interest and we would like to continue and commit on bringing this knowledge uh to you as much as we can of course so yeah as rahul mentioned we have various classes we are we are known for advanced and intermediate unity-based classes but we also have beginner classes coming up this summer and for more details you can always visit xrbootcamp.com i don't want to take so much of your time but regarding the advanced classes maybe it might be interesting to to to tell you what is our approach this year and next year we believe that the standalone devices vr ar devices are becoming more and more um important for vr industry so we have created actually three main pillars in the next slide uh i can um i can share with you so first one is the lifelike interactions the second one is performance software architecture which is actually uh we will talk a lot today and third one is smoothly running high quality experience um in a performing way so for these three main pillars that we have uh defined as a as a goal to upskill the industry we have uh designated actually different classes for that uh most of these class are for people who already work on unity and vrar development so the first class is an advanced vr interactions class which we we also call it viral because there's a project there we are teaching actually reactive programming there which we have we are really proud to have um a very great diverse lineup of cohort from like this cohort is quite interesting from companies like uh autodesk hp and deloitte and um lots of also universities like harvard carney melon uh joining this cohort so we are quite happy to to really see them in one place and the discussions are really wonderful we just finished one of the live sessions today um and maybe in the next slide i can also show what what we are doing in the um in the viral project so it is like basically a virtual robotic arm controlled by by uh hands hand uh gestures and maybe rahel you can also uh go to next slide so we can show yes hand gestures grab objects and immerse kinematics and physics based interactions so at the end of this eight weeks advancement interactions program you are creating this um fully physics based um virtual robotic arm and let's go back and talk about a little bit of the rendering optimization class which is actually starting this week um yeah this is this is quite interesting because what we are doing here is we are teaching random optimization for standalone xr devices and at the end of like this is a six weeks program at the end of this program the last two weeks is a nightmare scenario that we are giving you a high quality good looking scene running on uh on a standalone device but with five frames per second so we expect you to to bring it to 72 frames per second based on the techniques that you learn so uh yeah this class is starting actually officially tomorrow so anyone who is interested i mean we are almost full of that class as well but if anyone interested for last minute we are happy to open maybe one or two slot so today we will talk about dots we have as you probably see we have also some plans to to really upskill the industry on dots but first today we would like to open up the the much more on the informing part what is the current stage of dots how you can benefit it for your current or future projects what are your challenges so we would like to actually uh thank to unity team supporting us because they are the ones who know the current state of dots more than anyone else so um i know that there are several questions coming up maybe we can answer most of them but in case that we cannot answer all of these questions uh we will make sure that these questions are being stored in our discord server and in our database we would like to create actually a few more dots related workshops in the following months based on of course interest so i'd like to thank brian and fabrics for joining us today so uh brian and maybe you can hand over from here with your slides and you can and also share a little bit about yourself and about uh about thoughts currents that you're in stage okay hi everyone i'm brian and i'm here with fabrice and we are both from the dots education team and i'm going to walk through some slides just explaining an overview of what dots is about and at the end we'll have q a we're going to focus really mostly on the parts of dots that are already production ready that's collections jobs burst and mathematics and then entities in the packages related to entities we'll say a little bit at the end about them um but i definitely don't want to speak for the company in terms of like making announcements so i can just say that um if you have specific questions go ahead and ask but understand particularly when it comes to like talking about timelines of when things come out of preview i definitely can't even hint at that um but but we can maybe say a little bit about where things stand and what things may need to improve like what is what is being worked on we can we can talk about some things that have already been talked about on the forums to some extent already so um let's see let me get into my slides oh i need to share my screen excuse me um i guess i'll say a little bit by myself so i joined unity a year ago um before that uh my background is in is in web development really that's most of my programming career and then i made the jump over into games i i do have a couple years experience teaching in coding boot camps uh and and then i started picking up unity and that eventually led to this so you two uh i successfully made the jump from web development to games so um what is dots that's probably what it's about so concretely thoughts it's a set of packages and that's the best way to think of it you shouldn't necessarily think of it all as a unified hole you can in some cases take some packages and use those and not use others in particular you can use these first four packages the collections jobs burst in mathematics you can very effectively use them together in the context of a game that you're otherwise is not using entities you can just take a conventional unity game using game objects and model behaviors and find opportunities as roger will demonstrate concretely to to utilize these packages and and often solve some some important performance problems you might have so these these five packages of dots they define what i would call the core programming model a way of writing code that leads to much more efficient cpu uh code and then we have other packages and dots that primarily these other five that are about implementing standard gameplay functionality things you need from game engine like rendering physics animation audio and netcode these are implemented in terms of entities and the other uh packages the other core packages so so that is the basic division there and and like with entities because these depend upon entities they also are not yet out of preview but we'll say a little bit about them at the end so first question is why is your cpu codes law like what is it that slows down code and actually let me apologize i have a habit of sometimes of talking too quickly so someone kicked me if i i'm going way too fast i know there are probably a lot of non-native english speakers in the audience so slow down um so a major source of cpu inefficiency is can often be garbage collection um garbage collection is going to effectively randomly pause your game at just at random times it wants to run and do its business scan through a bunch of memory looking for your garbage and when it does that your code has to pause for some amount of time and so that can lead to um little jitters in in better cases at worst it can lead to like outright pauses that are very noticeable to users so that's generally not really acceptable in games right it's particularly action games um there are ways to mitigate the problem that's what unity developers currently do and have done for a long time you can mitigate the problem but it you really want to get rid of it entirely ideally another big problem is that in classic unity most code actually by default it's all on one thread it's all running on the main thread and that's leaving a lot of cpu cores to waste um typical unity game written in mod behaviors all that's running on the main thread and maybe you're not utilizing the other cores at all unity itself in some of its systems will will do to some extent rendering and audio will utilize other course in some ways but your own code is all sitting on the main thread and that's an obvious problem and then the code running on those individual cores the machine code generated by the compiler it's generally not all that great standard c-sharp compiler standard compilers for all languages don't typically generate the best code so that's a major problem and then these last two bullet points concerning cash friendliness that relates to entities so i'll save that for the second part of my talk after after roger speaks so the collections package what is it about so just simply provides you with standard data structures lists hash maps queues etc basic you know classic uh computer science 101 data structures but they are unmanaged um the native list and native hash map that you create they are unmanaged meaning in c sharp terms they are not known to the garbage collector managed objects are things which the garbage collector knows about and is aware of and is responsible for disposing when you no longer need but with unmanaged objects when you create them you yourself are responsible for eventually de-allocating them you have to when you create a native list at some point you are expected to call the dispose method to de-allocate it otherwise you know if you fail to do that you can end up with a memory leak of some degree and having to manually manage memory in this way being responsible for the allocation can be kind of scary um programmers of all of all kinds not just in games have been used to garbage collection for a long long time so it's scary to think that suddenly you might introduce a bunch of memory leaks in in your code for a number of reasons dots i think makes this not really a big concern at all and one of those major reasons that this turns out not to be a big deal in dots is because we have what we call safety checks when you run your code in the editor and you enter play mode by default at least safety checks are enabled and one thing those safety checks will do is they can monitor your your collections you've allocated and look for ones that you haven't allocated in time when you when you allocate a native list for example you specify how long you want to live and if it lives too long you'll get an error you'll get an exception on the console so that typically will tell you exactly when you have these problems and they're not going to hide in your code and you can in most cases very very simply solve the problem so that is a really important part of dots is these errors um so by using unmanaged objects we're going to be avoiding garbage collection and that's great but also very importantly we want to deal with unmanaged data because the next two parts the job system and burst they can only work with unmanaged objects they can't touch managed data at all so that's another key reason that we need collections here so the job system what is this about the idea of the job system is that we want to be able to write multi-threaded code but we don't want to have to yourself do it in the conventional way of spawning threads yourself managing those threads farming work out to the threads and then having to synchronize data with with locks and so forth you don't want to have to do all that because it's infamously difficult to get right it's difficult to make that code correct at all and not have race conditions and not have bugs behavior bugs but then also even if you get all that correct you can often end up with not really getting the full power of all those cores because all those locks can lead to a lot of contention and threads blocking other threads so it's hard to get the performance story right when you do it all manually yourself so the idea of the job system is it wants it's a way of writing multi-thread code that is far simpler and less error-prone and is much more likely to deliver um efficient use of those cores so the idea of a job is that it's a self-contained unit of work in that it is this thing that has its own private data it just does some kind of in-memory computation on that data uh and then when it's done it produces you know that that data then some of that can be consumed as a sense of in a sense as output of the job you're not allowed to do i o in jobs you can't like read and write files or access network connections you can only just work with this in-memory data so they're self-contained with a caveat that well you need to get data out and so you provide to a job you provide at least one collection so that when you do some computation you can then store the results in that collection and then when the job is done you can read the data in that collection and that is is how you get data out of a job basically so concretely what you do as roger will show is you just define a job type which is a struct you implement this interface called ijob or one of its variants and you just have the actual code of the job is this execute method that is what will actually run with the job runs and the only data that execute method is allowed to access in the job is the fields of that struct so everything that the job needs has to be obstruct on that field and so you instantiate the job you call the schedule method it gets put on the queue of all the jobs and the job system itself will then decide uh when to pull jobs off the queue and run them on one of the threats the job system itself is is maintaining a pool of threads uh worker threads and when one of those threads goes idle when it has nothing to do the job system can just grab something else off the queue and throw it on that that core throw it on that worker thread and then the job will run on that thread and only that thread until it's done and and then that thread is available for something else so that is the basic pattern um now very importantly in concurrent programming in writing multi-threaded code the issue what makes it hard is synchronizing access to data and so what can happen with your jobs is they're self-contained except for the part where they access collections and so possibly you might have job a and job b that say are both touching the same native array and what you want almost always is you want one of those jobs to finish run and finish execution before the other one you don't want it to be indeterminate of which runs first and you don't want their execution to overlap because that can lead to all sorts of race conditions and weird things happening so you want to be able to schedule jobs these two jobs and tell the job system i know they touch the same data but you should run this one before this other one um so that's what a dependency allows you to do when you schedule a job you can tell the job system hey this new job i'm scheduling there's this other job that's already been scheduled i want that to be the dependency and so this new job has to wait for that other job to finish before it will be pulled off the queue so you can in fact have a job with many dependencies and be the the dependency of many other jobs and so you can form these long chains of labor chains of jobs if you need to um and so you can just create all those jobs with the appropriate dependencies throw them on the queue and then the job system handles the details of precisely which ones run in what order but it'll it'll respect those dependencies it'll make sure that all the jobs dependencies are fin are have already finished before that job runs now another key feature again is back to the safety checks you don't want to mis forget to add those dependencies and so when you run in the editor the safety checks at least when turned on by default um they will catch cases where you try and schedule a job that you that touches a collection that some other job already sitting on the queue or already running it'll catch the cases if there's a conflict between them if they both touch the same native collection um and it'll throw an error when you try and schedule the second job and that's a really good thing because you don't want to have those those mistakes hiding in your code do be clear though that you don't want to unnecessarily create dependencies you don't want to have if two jobs a and b have no shared data then there's no reason that generally that they shouldn't be able to run in parallel and so you wouldn't want to have a dependency between them because then one of them would have to finish before the other can start and that of course would not be great um and last thing to say about the job system is uh you might be wondering well what about cycles like what if i have two jobs that both depend upon each other in a cyclic way in a recursively psychic cyclic way that obviously would be bad because job a would have to wait for job b but meanwhile job b would be waiting for job a and they'd they'd both be deadlocked they both would never start running even that obviously would be that bad you don't want that well happily you don't really have to think about that in the job system because um the api just simply doesn't make it possible you can't even schedule that in in the first place so don't really have to worry about it so that's the job system and then the burst compiler is simply a a c sharp compiler that does a bunch of aggressive optimizations that a regular c-sharp compiler cannot and will not do and the reason that birds can do these optimizations is because it is compiling not just any c sharp code but only code that conforms to a subset of c sharp that we call hpc sharp standing for high performance c sharp so if you're writing hpc sharp code you have to stick to a number of restrictions there's a lot of finicky rules most of which you don't have to worry about because they're really really edgy small edge cases but the big one the big thing you have to not do is you can't use managed objects and and happily that turns out to be basically the same requirement for your jobs is you know so basically most any job you write you should be able to burst compile it and in fact all you have to do really to burst compile your jobs is just throw on the uh the the burst compiler attribute on that job and then that signals hey this should be burst compiled and that's basically all you have to do so so burst is pretty much like it's it's free free performance it's magic it's the closest thing and dots to magic it's the part where you don't have to do the work for jobs you have to you know express your your code and form of jobs and that can be a little difficult that's an extra burden on you with entities as we see as we'll see later like that's a whole new way of writing code but with burst as long as you already have jobs you don't have to do anything pretty much um the only reason you would want to turn off bursts uh basically let me go back to my slide so that doesn't bug me um the only reason you would probably want to turn off burst in most cases is just in development you want to debug and you can step debug burst code but um it can be a little flaky sometimes it's not not perfect so sometimes you turn it off so you can better better debug your code but only temporarily for development purposes um so you're probably wondering what kind of optimizations does burst do it's a number of things but uh probably the main one in most cases the biggest thing is that a regular compiler in c sharp or c plus plus for that matter is not really going to utilize cindy probably at all in most cases cindy are cpu instructions that perform arithmetic and bitlogic operations on mass so instead of for example a regular ad instruction for example just takes a single pair of numbers and gives you back the result but then ascend the ad instruction will take will multiply as i add together uh multiple pairs of numbers uh say like eight depending upon the processor and until it's typically eight so you're getting you know much more work done in a single instruction compared to regular arithmetic and it doesn't necessarily run exactly in the same number of cpu cycles but close to it so so it's close to like an 8x speed up basically for that for that work you're basically getting eight times the amount of work or done in about the same amount of time so if you can utilize these effectively particularly in code that does a lot of computation in big loops like if you can have your compiler generate cindy code for that in a smart way you can get really big performance benefits with bursts it is not uncommon to see performance gains of like 10x that is that is really not uncommon the average is maybe more like 4x to 6x but that's still huge we're talking about like 400 to 600 um on an average case which is way better than typically what you expect from compiler optimizations usually we're excited to get like three percent performance gains out of compiler optimizations and this is way way better to be sure you know it's not total magic and it depends on what your code does precisely you know there are going to be cases where maybe burst can't do all that much in optimization and so you get only minor minor gains or um or potentially even a small regression but those are rare cases most of the important stuff you want to do performance sensitive code doing a lot of math you're going to get big wins out of burst so um oops back to that slide so so last uh here of the the first four packages mathematics very simply it is a mathematics library but what's special about it is that it has hooks into burst you can um when the mathematics function is compiled by burst it has hints to burst telling it hey you should utilize such and such cpu instruction that is optimal for this particular math operation because even in those in many cases bursts can't necessarily figure it out for special particular kinds of math operations so that's what mathematics is about you just use it like any other math library pretty much and not not too much to to know about it it's just another math library to learn that's very similar to existing ones so pretty straightforward there and so now let's see so these four first four packages again you know using dots without entities be clear that all this stuff is production ready um well collections is actually not quite out of preview but it will be very soon um it's imminently going to be released out of preview um and so you can use these all today and that can be a very effective thing to do in a lot of cases in the context of a game that otherwise you don't think of as being dots a lot of people think of dots as being synonymous with ecs but that's not really the case you can use these four things and say you have some hard computation problem that is slowing down your frames if you can isolate that problem that one feature you can in many cases find an opportunity to express that in terms of jobs and burst compiled jobs the only trick is that well in the context of a um of a game object based game a typical unity game basically everything you have is managed objects and so the problem you have is you need to get your data in an unmanaged form so that you can pass it into jobs and then when you get the results back use those jobs in the context of your managed objects so that does imply in some cases you need to end up you'll end up copying data from from from the manus world into the unmanaged world feed it into your jobs get back the results and then copy it back to your managed world that and of course in some cases that all that copying if you're doing that every frame doing a lot of copying for that purpose that could you know eat away the benefits of using jobs in some cases so you have to watch out for that in a lot of cases though maybe you don't need to do all that much data copying and so you're fine and then in some cases the solution what you can do is maybe that data doesn't need to be in an unmanaged sorry in a managed form at all to begin with you can just have it always stay in an unmanaged form in the first place that way you don't have to do all this copying back and forth and and be clear like regular managed uh c-sharp unity code that can deal with managed uh un that can deal with unmanaged objects just fine there's no restriction there so that often is a viable solution which i think actually is what roger's example is going to look like so uh i have no idea how long it took i apologize if it went too fast yeah that was more like 20 minutes so um when we get to entities maybe i'll have to i might have to rush pretty quickly but we'll we'll stay over the 90 minutes to answer questions so don't worry about that part so hand things over to roger so yeah we will have actually the example done with the recording so we're sure that nothing crashes nothing goes wrong and profiles work properly so that we don't lose time because it's very dense and we know that it's very dense and we have already seen some people feel a little bit lost or overwhelmed by all those new things and yes you basically learn a new engine in some way with the dot stack but what you get for it is so powerful and so um novel and new that it's definitely worth it to look into so during the time while the video plays don't be afraid to answer ask questions we will try to answer them in the background or we will select them to discuss life some of them are a little bit broader in terms of like you know mindset change or do i need to lose everything i learned up to now to be able to use thoughts um and some of them are of course need to be a little bit more discussed so continue to ask questions we love those questions we think many of them have a lot of relevance and we'll try our best to answer them and also no one currently is an expert in dots except maybe the unity team themselves because it's so new and data and programming has not existed for that long and burst is new everything is new so it's the perfect time to ask questions and be worrisome about certain things because that's the perfect time where we will learn what is still needed what we can improve and how we can best teach those concepts that you're completely good with it so let's jump into the videos and there's also poll that's mainly for us to know like where you stand have you already worked with dots or even with ecs have you already maybe shipped a product which uses dots uh it would be very interesting for us to know where you currently stand that we also know the audience a little bit and tone things down or up for the next iteration where we talk about dots again perfect so yeah we are running the hands-on part right now please continue the discussion on chat but don't submit your questions on the chat it is very difficult for us to find so we will also run a few polls just to understand the expectations from this workshop from dots and the level of the audience so we can um share more details according to the audience so um yeah welcome to unleashing the power of thoughts we will walk you through a small and a little bit contrived example of how you can use the production ready parts of the dot stack to enhance your current applications even though they're built fully in mono behavior land so a little bit about the speakers today or who helped prepare this presentation first we have dreaming i'm light latios if you have ever been on the unity forums um since basically ecs or dots existed and you had any questions chances are that you got a very friendly and helpful answer from this person he has been very helpful in within the whole community and has given advice on the most complicated problems we had um and also even created his own framework called the latest framework which is built on top of the dot stack so if you ever build an ecs focused application in unity i would heavily recommend that you at least check out his framework he gives solutions to many things which all which often crop up on the forums which people have problems with like shared containers as well as singletons and all of the other things which have been solved in a very interesting way and also if you want to see very advanced dots implementations please check out this framework there is a lot of example codes which are very helpful about myself i have started more as an economist and moved to being a bioengineer and then finally became a computer scientist and then i've worked in a few companies and now started together with dennis connor holonotic where we are basically exploring the cutting-edge technologies in spatial computing as well as inside unity to try to squeeze out the most interesting things possible in the current domain but enough about uh how that all got created now about the challenge so we have a certain task to do so one thing happened uh your boss actually fell in love um just a few days ago and he really has his brain chemistry completely altered through that and now he wants to spread that information to everyone in the world and of course he wants to use the currently published experience to have hearts everywhere that he can showcase his love to everyone evidently um we have to have a certain frame rate to be targeted that the users can still play it although that hard promotion is actually running and the main thing is do it fast don't ask too many questions and just get it done basically so you go back to your desk and you think okay you know that that hard promotion will probably not last very long it's very likely that brain chemistry will go back to normal and we all realize that that hard promotion was not the best thing so we don't want to spend too many days on figuring out a good strategy to make it happen although we want to please our superiors that they think it's perfectly implemented so our first strategy is just to spawn little hearts make them pulsate and animate in the level and we spawn as many as we can that it feels like the whole level is filled with hearts and in the first implementation we do not care about frame rates we just care about getting the right number of hearts that it feels that it works well we also don't try to do it um very sophisticated we really do the simplest way possible here in unity we can see we have used the 3d game kit which probably most of you have already played around with to have a complete game and to that complete implementation we added the hearts promotion as an additional feature and if i press play here you will see it's the second level of the 3d game kit and there is basically nothing else in there currently and we also added a little bit of a fps counter on the side that we can see how many frames we have so as we see here in the normal game uh we have exactly 60 frames per second on average at 60 frames per second so that is a perfectly playable experience and now we start to add the hearts promotion to add that additional features which we got requested to add in a very short amount of time to add that we added a hearts manager and as you can see here it's just a monobehaviour with an additional component it's called the heart promo manager what we have here we just have the heart's prefab we have a spawn zone to indicate where the hearts will be spawned and we have a list of all the hard objects we also have a total number of hearts variable which you can adjust in the editor to just move them those hearts up and down in addition there is some boilerplate code but the main interesting part is we just on awake we instantiate and spawn uh the number of hearts which you defined in the editor in a random location of that spawning volume and then each heart has a separate logic on it to actually rotate which we can see here in the heart's promologic each of those uh just oscillate a little bit and bump a little bit up and down and rotate and that's basically the the animation we added to the hearts so once we have this we can go into unity and we set it in the beginning just to a thousand hearts and the red volume you can see here is just the volume the spawning volume which we spawn all those hearts and we try to make it big enough that the anchor pass is basically the whole level so let's quickly press play okay let's have a look on how it is if we spawn in the in the full level a thousand hearts as we can see here we don't see any of them uh it looks like actually that the the thousand hearts respond never got great oh yes here we see a small little heart currently um being displayed but it definitely doesn't look like there are many many hearts in that level it's just one tiny one there now the question is of course um might be that we actually didn't spawn the 1000 we have some kind of error with the logic but as you can see here a lot of those are definitely there the main problem is just that they're a thousand hearts in that size of a level is just not enough um so of course uh the next step is we have to find out which number of hearts do we actually need to get to something acceptable and we you do it with an increment of uh one power of magnitude so instead of a thousand we spawn now ten thousand and we will have another lock and see if that will be enough uh to have a convincing hard promotion going on um so let's have a look around now we can see there it's more easier to spot a few hearts right it's there are a few um but it still doesn't feel like there's a lot of love going on in um this level so let's bump it up by another order of magnitude and let's see what happens when we spawn 100 000 hearts in that level and of course we don't worry a performance at this stage we just say okay how many hearts would be needed with the most simple logic that we just spawned all over the place and we can see like with a hundred thousand hearts um it looks like yeah that's about the number which feels like okay there's a lot of hearts being around and it feels more or less good of course the frame rate is probably not anywhere in the level of acceptable but at least we have determined that with the 100 000 hearts we can actually launch that promotion and it feel will feel good enough now let me go back uh here the end we get to the results of that first iteration um we have a slight problem so the hearts are there we are really happy but the three frames a second from the initial 60 is actually very terrible and um we found the required number of hearts but we also see that when we look into the profiler we see that that hearts promotion logic like the update method which spawns those hearts and then each of those uh hundred thousand instances of the hearts they take together uh 83 milliseconds which is definitely nowhere to be acceptable but so we cannot have a hundred thousand game objects in the world and expect the hearts promotion to be playable in any feasible manner now for the second iteration as we saw before a hundred thousand hearts as game objects being displayed at the same time without any additional logic is just not feasible so we go for a different strategy we know that 1000 hearts are necessary to make it feel like there is really a hearts promotion going on but those hearts don't need to be presented as game tricks all the time because we cannot see most of them so we go for a pooling strategy and we only represent virtual hearts the 100 000 and then from those 100 000 virtual hearts which are randomly distributed we will focus on the 1000 which are closest to the camera and inside the view frostum that we can basically use a pooling technique as soon as we move the camera around we basically only display the one the 1 000 hearts of the one hundred thousands which are currently closest and inside the view frosting that should give us a better performance because we don't have a hundred thousand game objects all at the same time being animated and uh consuming logic we only have a thousand but the player should barely notice the difference because he will only see the ones which are close to him the ones behind him as well as those far away will actually not be visible so let's go into code and see how that was implemented again do not focus too much on the implementation part of this as that is not the important part the more important part is it's about the strategy of how can you get to better performance while using data abstractions and virtualization of things so for example here we have the heart data which is implements i comparable and it has a position a distance to the camera and index which we'll use to define which is currently visible and it has a boolean to know if that heart is currently actually visible or not based on the view frosting and then we also have a compare method which allows us afterwards to sort all those 100 000 um hearts which we create by giving the ones which are currently visible priority so if the one is currently visible compared to another heart then that one has priority and if both of them are visible basically if the result is zero then the prioritize the one which is currently closer to the camera this is basically a custom sort function based on our requirements of which hearts we want to be sorted first when we sort an area of those virtual hearts and the rest is very similar we have just camera planes etc we instantiate the hearts exactly as before we have a hard pool count now of a limited number of hearts instead of the 100 000 and then in the update method we basically go through all the hearts which we have and we check them the bounds of them for their views the unity is implemented uh gamma tree utility to test which one are in the view frost of the camera and we also calculate the distance from the camera the hearts are and then from that we can use our custom sorting function to get the first 1000 hearts which are the most um closest and inside the view frost them and we then do some swapping logic uh you don't need to worry about that here we just swap the hearts that those which are currently visible and the close of the camera are in front of the array and those are the ones where the pool then gets the position set to and so we can have then only 1000 hearts and prioritize the ones which are currently visible and if you go back to unity and press play we can see we still have the impression like there are a hundred thousand warheads because they're visible all over the level and we currently have around 15 to maybe maximum 20 frames a second and that's not really there yet it's already much better than what we have before but it's not really playable yet and the good part is that we only have a thousand hearts as game objects actually represented but it still looks like that we have a real hard promotion going on in the game and now when we look at the profiler from the currently running process we can see that the hearts pro manager this logic the update logic where we sort and handle all the hearts is actually the one which takes up the most time it takes up nearly 50 milliseconds and so we have to find the way although we are on the right path as we can see we definitely have better performance than before we still need to find a way to update that sorting and prioritizing logic in a in a good way that we get to an acceptable frame rate and if we summarize what happened before is basically we reach around 15 to maybe 20 frames per second we still know that the bottleneck is the update function and sorting step is very expensive uh we also have many frets which are sitting idle when we go back to the profile we can see that here in the job we can see that nearly all of those threads the worker threads from 1 to 14 in this particular machine's machine are just sitting idle and don't do anything while the main thread is extremely busy so of course there we if you can do some kind of parallelization and improvements then probably it could help us to get to a higher frame rate and also we see that the main bottleneck here is the horizontal material update function and we will look in the next iteration how to improve that version three in the third iteration we try a different strategy to get a little bit better at what we did before so we have noticed that we have a few threads which are just sitting there not doing anything so it would be probably a smart idea to start to put some of the work which are currently done on the main thread on those worker threads so the main thread can do something else um so to do that we use one feature of the dot stack which is the jobs the jobs allow you to schedule um certain tasks to be done not just on the main thread but actually uh to be done on worker threads and even further down the line on multiple vertical threads at the same time but we will come to that but to use the jobs there are certain restrictions which apply so what type of jobs you can execute need to have certain data types and certain functions are not yet fully available so for example we need to implement our custom sort function because the array.sort for example is not really available in the dots job stack that you can directly use that it has a few restrictions i don't go into detail now but just be aware that we had to implement our custom sort function as well as a custom frosting check function because the one which came from the geometry utilities could not be directly run inside the jobs and then we go directly to the code because the actual uh in the implementation nothing changed uh it's more like in code where we see a lot of things changed so we have the hearts pool job which implements the ijob interface and the only method which you need to implement is the execute method it just means that here you will do the work and you have to pass in all the data points which that execute method will operate on again if you are new to the dot stack and you're it's a little bit overwhelming don't worry about it just focus on the higher level picture of the strategy it's we can run certain things on worker threads instead of the main thread by implementing that ijob interface here we just go through all the pool records which is a new data type we added uh it's just an optimization strategy where we have the position of the heart and then decide if we should write um that hard this frame so if we need to update the position of this heart this frame or if it was already visible in the last frame and that's just a little optimization that we don't need to update the position of hearts which were already visible um during the last iteration and we go here through all the hearts we do a similar strategy as we have before we calculate the distance to the camera here we actually approximate it with the distance square because it's good enough for our implementation and it is faster than calculating the actual distance um and then we'd have a custom test planes function which allows us to check if something is inside the view frostum uh we use the dots pro it's just a approximation but it's good enough working in this example and then we have our custom sort function uh for the hearts array where we go through and basically and sort all those hearts to be the first 1000 to be there when they are inside the view frostum and close to the camera and then we go through all the hearts and swap again the hearts around and adjust the position of those actual game object hearts to be the ones of the thousand closest to the camera and inside the viewfrustum for the test planes one we just use a dot product here to check if it's inside the plane inside of those six planes and use the end operation to check if it's inside the view frostum don't worry too much about the implementation details here you can afterwards take a look at it and the sort function uses the typical quick sort implementation which you can find online it's a very well-known sorting algorithm it's not necessarily always the fastest but it works very well for this type of problem as you will see later and afterwards the main new part is we have to create all the data structures evidently for that we also need to allocate them one thing is if you use jobs and the dot system um those data structures and arrays are not uh taken care of for you like that's usually the case with monetary efficiency sharp you actually when you have data structures like areas you actually need to allocate them and in the destroy method you need to dispose of them as we can see here on destroy we dispose of all those areas which we allocated with the persistence modifier because we basically create those areas but we are then responsible also to dispose of them once the work is done with those areas it's not automatically taken care of but that allows you to have basically no garbage collection and have it gives you a better performance overall um by you doing it that way so we have the hard data the pool records and then just the six planes which we need to do the calculations and i walk through this implementation very quickly and again it should just give you the higher level picture of what we did and the more important part is like the strategy or the tools you have in the dot stack rather than the actual implementation uh here we do exactly the same thing as before um and then we go through them set the main camera and then in the updates method uh we added a little tool that we can toggle between using the worker threads and running it on the main thread to see what is the performance we actually gain by running something on the worker thread instead of on the main thread and then we calculate all the planes we set that those planes um to be used based on for every update as the planes change when the user changes the camera then we create that job and we pass in the data points which the job needs to execute if we use the worker threads then we use the schedule method otherwise we use the run so when you call the schedule method then the unity will basically decide for you if that method should be run on the main thread or on a worker threat it's not guaranteed to run on a worker threat with the run method it's guaranteed to run it's basically a synchronous execution so you can directly apply the records because you're guaranteed after the run has been called that the the job is basically finished if you schedule it and then you schedule the whole jobs um then you basically have to call complete before you can apply uh the next method on the finished job and that's why we give the thread a little bit of time and only in late update uh if the worker threats are used we actually complete the job and then we apply the records which is just setting the actual position of the pooled hearts to the correct ones which we have in the in the pool records and the pool records are just there that the thousand hearts which we actually need get clearly identified and they also know if they have been visible or not in the last frame for a small optimization anyway a lot of code but the main important part is we have now the opportunity to execute work not just on the main thread but actually on a worker thread although with the cavita we have to probably implement a few functions like the sorting and the frosting calling on our own so let's go back to unity and click play and see what the performance benefits are when we run something actually on a worker thread so we can see here nothing changed yet because we're still running everything on the main frame so we quickly type on j so now we start to use worker threads and we can see that the performance is exactly the same or even slightly worse um let's have a look at the profiler because something is weird so we should see um that the job is now executed on a worker thread instead of the main thread um if unity decided to actually run it um on the worker thread and we can see here that the heartspool.update function is run on a worker thread and it takes up a lot of time and the actual late update function has to wait for that job to be completed so although we moved the work away from the main thread because the late object needs to wait for it we actually didn't gain anything um as you can see here like the the performance is basically equal or worse it's not executed on the mainframe but it takes either the same outfit or even more because we actually have to do all the scheduling to be able to work on a different thread and the main thread and so it looks like all the work we did to use the job system was actually not worth it um it just made things worse uh we can see that here that the hearts promo logic um although it's working on a worker thread and not on the main thread it just takes basically the same amount of time and we didn't gain anything the frames stayed exactly the same so um it looks like we the job system is not really worth much if you have an operation which is taking up a lot of time and you need it to complete in the same frame which is most of the time the case um but wait there's one thing which the dot stack also offers apart from the jobs which you can run on worker threads and basically to asynchronous computation in in a smart way you also have burst um and burst is probably one of the most exciting technology pieces uh which are available for unit developers now um of course it's relatively complicated you need to add the burst compile attribute to a job that it is first compiled um and when you if you wonder if that is really everything we need to do it's actually true once you have a job function done in a proper way like reversible types etc as we did here then the only code change you need to do that you can use burst is to add that attribute um on top of the job and burst is um let me first show you what burst actually can do um so here let's try to re-run um the whole simulation it should have recompiled and now let's see what happens when we run exactly the same implementation but this time with burst and look at this we somehow are nearly running at 40 frames from 30 to 40 frames a second um it's only become a lot smoother so let's let's have a closer look of what actually happened when we look at the profiler especially enough like when we look here the hearts pull drop update the the method before which run on the worker thread oh yeah we have to actually run it on the worker thread um let's continue the analysis and run it actually on the worker thread so we can easily see how much time it actually takes so here we are way above the frame rate we have before and let's have a quick look on that job being executed on one of the worker threads so we're just adding burst compile to it the job runs a lot faster than before and actually when we uh look at that from before here we saw it run in 11 milliseconds when we look at the profiler and we look for that worker thread and we see that the heart's job which before took up a lot more time we can see that the time was vastly reduced compared to before and we can see that basically here the job gets scheduled and then it completes in late update and the time it took is vastly inferior to before so when we actually look closer the update function weight went from around 40 milliseconds to 4 minutes so we have a roughly 10 x improvement by just adding that small burst compile on top of the method um and that's just the fascinating part about burst burst basically understands the mathematics code and the other code you write um to its intrinsic parts and optimizes it for the processor to be executed in the most optimal way possible so it's it gives you a possibility to actually run code as fast as if you manually by hand optimized it in the c plus bosland like if you would um try to get the best performance possible out of your code by aligning everything correctly in memory and doing the the perfect alignment so that the least amount of machine instructions are used that's basically what burst does for you in in general and it's incredible how much performance boost you can actually get with that by just adding a little uh burst compile on top of a job and that is what makes the whole thing incredibly powerful that's one of the main things why the dot stack becomes so interesting yes you are restricted in the data types you can use and how you have to write those job but once you write them correctly you can use burst and it just catapults the performance by a factor by orders of magnitude and depending on what you do it can even be more than 10x um but of course we're not finished yet so we now go to the fourth iteration when we go for the photogration we notice that um we basically have one job now doing everything and what if we would actually split up that one single job which does all the work into multiple because if we have multiple jobs doing their work we can actually distribute them over multiple worker threats as long as they don't depend on each other but if they do depend on each other then we can use the dependency management which basically handles all of that for us with having multiple things doing things concurrently and we don't need to worry as much about read and write access to the data because unity basically provides an easy to easy way to actually handle all of that um so we decided to go with the pool records reset that's the resetting we do in the beginning as a separate job then we do the calling like testing if something should be visible or not and how close it is to the camera as a separate job and the sorting as well is now a separate job and then the swapping job where we um exchange the ones which should be visible or not to be in the beginning and then those real hearts get the correct positions of those hearts which are currently visible those are the four jobs we want to separate of everything to and we also see here the job dependencies basically that the resetting of the of all the various from before that doesn't need to happen in the beginning it can actually happen during other jobs work it just needs to be finished before we do the swapping because on the swapping there we basically set the if it should be writing this frame or not then we have the calling job which needs to happen before the sorting job as the calling information is important for the sorting job to know which hearts are have higher priorities than the others so let's quickly jump into the actual implementation and here we have um the fourth version of the code and the main thing is that we have the same hard data we have the same pool records as we saw before the reset job is now separated and that is a very simple job and it uses the ijob 4 implementation that is basically a job which takes an index into account so the execute method gets that index and the ijob 4 usually operate on arrays and get the index from that area element and then we get out the record to pull record and we just set the write this frame to false and then we apply back to the area that the data is actually written as we all know structure structs are actually passed by value note by reference and so we have to assign it back to the to the native area and that is the only thing this job does and the interesting part is here you don't need to handle actually the array directly because you get the index of where to operate it on and there's another few benefits you get if you implement the ijob4 interface then we have uh the heart culling job which also implements the same one here we go just through the array of hearts and step through each one of them and test uh if they are visible or not and the distance from the camera the main important part is as i job fours are possible to run on multiple threads multiple worker threats they actually are can be executed in a concurrent way which is extremely powerful but it has one restriction you cannot um just write to all types of areas which you pass in you can only write to the area where you pass the index in so unity knows that you're currently operating on that area and no nothing else basically writes to the same area while you execute the job that's why the planes are which we don't actually modify in that function we need to indicate that it's a read-only array so that way we can actually schedule the job not just on one thread but actually multiple threads which allows it to run um in a lot more efficient way if you have a few worker threads available so those are jobs then the sorting job is exactly the same thing as we have before and execute we just do the sorting of everything and that should uh basically lead to the exact same performance now just we can schedule it on after um all the jobs have been run or so certain jobs could run in parallel if they don't depend the sorting doesn't depend on the one on the finishing them beforehand and then we have the swapping job which just does the same code as before now just as a separate job so that's all that is needed and then here we just schedule all those jobs so we have the reset job for the resetting one we just scheduled that one and we can pass in a dependency and as we saw here in this one this job will have no prior dependencies but the hard scrolling job also can start at any point in time but the sorting job needs to depend on the calling job and as well as the hard swapping job needs to depend on both the hard sorting job and the pool record shop to be finished beforehand and this we can see here that this one has default means just it doesn't depend on anything beforehand the same is true for the heart's culling job that one doesn't depend on anything but the sorting job here depends on the calling job which we defined here so this one gets passed in as a dependency that means it will only run the sorting job once the culling job is finished and in the end we then need to have the two jobs combined so the resetting job and the sorting job are then combined into a new dependency and that one gets passed into the heart swapping job because we saw before those at the swapping job needs to have both the cooling and sorting job to be finished as well as the resetting job to be finished and by doing that basically unity can guarantee and analyze the code in that way that we know that this job will not run before the others two have finished and then we just schedule all those job and in the late update to give them a little bit more time we complete them and then we apply the records as we did before so let's have a quick look into unity what that actually helps with if it actually improves or decreases performance and it what the main usage is of that so as we can see that separation of jobs sometimes helps when they're then we're basically having the same camera position but as soon as we move around or move the camera it still is better than before but it's not perfect so let's maybe have a look what the main difference is when we go into the profiler to see um what actually happened with those jobs are they scheduled in a way which they which we hope for that they're on multiple worker threats and can distribute the work more evenly so we have here the hearts sorting job uh which takes up a lot of time actually nine milliseconds and we have the hearts calling job which also takes up nearly one millisecond and because they both depend on each other even if you run them on a worker thread it they don't benefit a lot from it because they're basically having to be run one after the other and we see that the the little uh swapping job which happens after those two uh is very short and doesn't need a lot of optimization so when we look at those results like it already is better but splitting up those chop into separate ones doesn't bring as much because most of them in this specific scenario actually depend on each other so we cannot paralyze the work in in a good way and evidently the cooling job and the sorting job are the most expensive so what if we are actually looking at our current problem and see what we can do to improve it let's start to be smart about the sorting and also let's try to distribute actually the the calling job not just on one fret because that one can easily be multi-threaded because each data is operated on individually and if the calling job can get faster basically our bottleneck gets a lot faster so let's go into that so we know that the heart's prone the hard sorting job takes a takes up quite a bit of time and also the calling job takes up not quite significant portion of of the performance so let's go back into unity and see what happens when we implement those improvements so back in the editor we can see that the sorting job what can we do we know that we need to have the 1000 best hearts to display but actually the order of those 1 000 hearts doesn't matter to us as much i mean if the first thousand hearts in are in the correct order based on the distance of the camera or if they're completely jumbled up they actually don't care we also can know that the last few hearts as long as the ones after the 1000 hearts which we care about they can be in any order so we actually can do an early out on this algorithm on the quickstart argument if we um take those two conditions into account if we know that the the index we're currently looking at is actually like inside that pool of hearts which we care about then we don't need to continue sorting it further down that's a particularity on that quick sort algorithm and the same is true if you are way uh beyond the part where we actually of those a thousand hearts if that one gets sorted correctly we don't care about so that's a way where we can do an early out and the historistic part is here that you thanks to burst you can implement a custom sorting function and run it at the same performance as you would uh when you would be an engine developer and have access to c plus plus bindings and do some kind of magic there because burst gives you raw metal performance which is basically close to the absolute optimum you can get so you can implement your custom functions and don't need to rely on all the libraries which maybe had access to the underlying architecture of unity that you that they could implement something faster for example the navmesh agents definitely have an advantage from unity because they run um inside of the engine code and so they can do optimizations which other nav mesh frames maybe couldn't have done but now with the burst compiler and the job system and all of that you actually can get to the same performance as the unity engine came to so you're not anymore limited you don't live anymore in two different worlds and only one part you can do and you can only do so much you can actually can now get to the same level of performance as unity did by doing their implementation which opens up a huge amount of possibilities for uh third parties to give solutions to the unity engine which run as fast or uh which are basically on par and have no restrictions compared to the unity engine directly having that implemented which is extremely exciting and you already can see the limit on the asset store of those developers which have taken up the job system and the burst compiler to actually optimize a lot of their implementations especially when they were very computational heavy and the second part of the optimization apart from being smart about the sorting job is that we instead of running it on one thread we just run it on multiple threads the calling job can easily be paralyzed parallelized because we don't care about the previous or the next data point on that cooling entities it's just like one which needs to be looked at and the only thing we need to change to actually be able to do that is you have to implement the ijob4 interface and we have to do a schedule parallel then we have to give the length of the array and just uh we define inner loop patch count of 16. and the number here is a little bit difficult to explain what what is the optimal here often you use the profiler to figure out what is the best but usually you go by a multi multiple of two so 2 4 8 or 16 32 etc and this job doesn't depend on anything so by just doing those small changes let's see if we see any performance improvements when we run this the application again so here we should now see especially when we go into the profiler that the calling job now is distributed over multiple frames and the sorting should be a little bit better so as we can see here we are now averaging around 47 frames a second sometimes even higher depending on how warmed up that is and how the recording is currently going on um stay here yes we can see it's actually running on a quite better frame rate but let's have a look at the actual profiler and let's see if we can find out what is happening so the first question of course we have is is the calling job distributed over multiple frames and we can actually see through the profiler indication that even though it's not equally distributed it's definitely screwed over multiple frames and we can see that each job itself takes 0.3 milliseconds and the total is 2.52 milliseconds of those 16 instances so that definitely helped to distribute the whole even though the actual total time of those 16 instances is longer the the time it takes for them to finish is actually faster because it's distributed over multiple threads so the actual time it took for that to finish is 0.3 milliseconds and even though in total it took 2.5 milliseconds and then we also have the sorting job and that got the vast improvement instead of n log n we actually are more uh the 2n of complexity with the algorithm just because we figured out that we could early out because we don't care about the perfect sorting order of those hearts once they're inside the pool size or outside of the pool size so it means with those small simple clever implementations one using primarily the job system of unity and the easy way to multi-thread something and dispute over multiple threads and the concurrency in all of the other stuff they implemented and the other one is just being smart about the custom algorithm or figuring out how to improve on that so we can see uh so we early out we distribute the cooling job over multiple threads and that actually led to us to be get very close to 60 frames we have a performance assorting with the highest performance now is instead of 2.7 seconds it went down to three milliseconds when i recorded that without uh the recording going on and the calling chop is now distributed over multiple frames as you can see here and here the total is one nearly two milliseconds but because we distributed them over multiple frame the whole thing is actually taking up 0.1 milliseconds in total so distributing that calling job of multiple frames had additional advantages and that's why having burst and being smart with a burst with the implementation of course actually can give you a huge amount of benefits and we are very close to the 60 frames a second which we definitely want to target with that hearts promotion lock in logic but it's not the end let's go and have a look what else we can do the fifth iteration we saw that the monobehaviour update of the hearts is quite expensive when we actually look at it the heart chromologic update function itself even though it takes up nearly no milliseconds individually because we have a thousand of them it nearly goes to one milliseconds of time being taken up by that function and as we already know how to use jobs and we know that the update logic basically doesn't depend on other hearts so it's very isolated we can run that multi-threaded with an ijob4 interface um so we basically do the computation of where how the heart should look like on on the fret and distribute over multiple and to further speed up the performance and let me go into the implementation for this again for all the rest all the other improvements we did in the past of course we keep the only thing we improve upon now is to have that heart's logic instead of being done on each heart individually the hearts manager actually takes care of all of the hearts and does it for all of them and we basically have need a new structure which is called the heart's promo logic transform value which has a position in the rotation and with that we have a new job which we can find here these are all the same jobs as before the swapping the sorting the calling and then we have the hearts promologic job this is again an ijob 4 which basically goes through through an area of data points and then extracts it and here we just do exactly the same as we had in the hearts promologic update function it's basically just this taken into here and doing exactly the same now that is basically the only thing we do in addition to all the other optimizations we did in the past and let's have a look of what that actually benefits us with or if it actually stays exactly the same as before so what did we discover in the results um we are still more or less at the same performance we just moved that one milliseconds now for the parts from a logic job from those roughly one milliseconds from before we actually managed to reduce it to 0.337 milliseconds um but writing to the actual transforms is still quite expensive but at least we already reduced those transfer logics from doing it in the update by doing it in a job by a factor of three which is already quite good now let's push it even one step further the bottleneck is now actually primarily the updating of those transforms and with that logic and unity actually has implemented a new way of modifying transform values with a so-called transform access array um and when we go to the last version uh which we take a look at today the last iteration we can see in version six we we have one new implementation of course all of those are burst compiled um we have a method called ijo parallel for transform and that allows us to actually have a transform access array which allows us to modify directly the value of motor behavior transforms inside the job when it's burst compiled which means we can speed it up heavily so we have here exactly the same code as we had before the oscillation but now we can directly modify the local position and rotation of a transform with the transform axis array and to actually do that we have to do nearly nothing we just have to create the transform axis array here which we call trial transforms um and then we can go here we just add the actual transform which is the child transform from those hearts the part which actually rotates which before had the hard promo logic on it and we still leave it there but we disable it so it doesn't do the computation twice we add that hearts promologic transform to the chart transforms and then when we have that in the update method we can just simply schedule them after the whole logic has been done and we have identified which hearts actually need to be updated uh we can here then run this hard chromologics job where we pass in the child transforms of those heart transforms and this job doesn't depend on anything because we know all the jobs have been completed beforehand and we actually can see after it's in the profiler how well this actually worked and we first let it quickly compile that we can see what is the in the last iteration what is actually the performance of of the game after we added the 100 000 hearts to it and how well it actually works so we can see here we are even with the profile running we are not we are very close to the 60 frames sometimes it goes down and especially because i do a screen recording now we are here at an average of around 45 frames without the recording we easily managed to get around 60 frames a second even though we run the heart's job the calling job the the sorting job of those those number of hearts in the game and we can see the hearts are clearly visible everywhere and of course we could push it a lot higher if we wanted to if we had a lower target of maybe only 30 frames but for the moment we leave it at that and we actually can see that we are averaging around 55 to you know definitely above 30 frames a second we managed to through that transform access we actually managed to reduce that hard logic even further to 0.06 milliseconds from the series of three milliseconds i mean it's already very little time which it takes but any further improvement is is still incredible and through the transform access area we actually managed to reduce that by a multitude not exactly a 10x reduction in time but very close to that so now to the conclusion i know there was a lot of information i don't expect any of you to be able if you have never heard of thoughts or the job system or any of that to implement all of those methods um right after hearing this talk that was not the goal the goal is that you can see that even if you lose mono if you use mono behaviors now and you only want to use the production ready parts of the dots technology stack you can still use it in your current game to actually achieve incredible performance so it also with the transform access array we saw that unity has a tight integration between the job system and the transform access areas that you can actually use it for more behaviors as well and not just in the ecs land the hard logic which was evidently very simply implemented in the beginning where we just spawned a lot of hearts in the whole volume it was completely impossible to run and maybe bend from basically three frames a second to up to close to 55 frames a second so it's evident that the dot stack can give you incredible performance while you can do engine level performance implementations of sorting and custom functions and you can actually use the burst compiler to do something which was completely impossible before i hope that small little overview helped you to get at least excited and maybe learn more about the dot stack and we hope you have a few questions and we cannot wait to answer them thank you okay i hope that was at least to some point understandable for many of you it's definitely difficult to wrap your head around um other political types like how to implement the job functions etc and also there were some algorithms which are not easy to grasp if you've never heard of quicksort and they don't know how to get an early out but we will make the repository available for you to um you know investigate uh later on after this webinar um we will share it afterwards with the links it will be on github and you can then browse through the code and when you join our uh the extra bootcamp discord you can also ask questions there if certain things are still no clear but we have an additional section prepared brian will talk about the ecs part which is actually highly exciting especially because job dependencies become a little bit complicated if you start to modify things and need to think about it and ecs makes a lot of things a lot simpler especially pooling of game objects is not anymore needed as you can spawn thousands of them in one frame and it doesn't cost much so it's a highly exciting technology snack especially also unity physics which is a stateless physics engine there's so many cool things coming down the pipeline but i want to take don't want to take anything away so brian please go ahead with the ecs part okay thanks very much um let's see so it looks like i mean if in terms of official clock time we have like five minutes but i think we can go beyond and it's not an issue right and then we can do the q a after i speak for maybe 10 minutes or so sure okay cool um i do want to note about roger's example project i should note like you should understand like the only way to truly accurately measure performance is to not do it in the editor and not while you're recording either that is going to mess with your your results so you saw like some inconsistencies in some of what he showed there so that's an important point um also i will note actually i believe i wonder if you looked at collections roger there is there should be a sort provided for you in the form of a job that is already you probably didn't have to implement your own sort i would guess i wonder if you looked at that um and also one more comment on that is that i just looking at my my first guess is you probably could solve that problem uh most effectively by spatially partitioning the hearts like break your world up into a bunch of grid cells i'm sure you considered this but for educational purposes i understand why you did what you did but but like the probably pro the way you probably solved that particular problem in reality is you want to like break things up into a bunch of separate lists bucketed by grid cell and then you wouldn't need that that probably would solve most of the performance problem in that particular case but but but otherwise it was nice to see that that like progression of here's a single threaded job now parallel and burst that was cool um i i will also the last thing you just said is you said entities allows you to create a bunch of game objects and not worry about pooling that's that's not accurate you instead of game objects you create entities and those you don't really typically have to worry about pooling in fact i can't think of a case where you would honestly okay so now going to talk about entities sorry for correcting you so much um so the performance problems we're trying to solve with the entities package is that uh you want as much as possible your data access to be cash friendly to the cpu and you want the code itself to be cash friendly what does this mean so sorry i'll try to slow down a bit so uh in a modern processor in a modern system the cpu tends to be way way faster than the system ram right it's orders the magnitude faster and so anytime the cpu has to actually read and write memory um it's going to sit install it's going to wait so that's why we have a cache sitting in between and so anytime you access an address of memory instead what happens is it gets pulled it gets copied from main system ram into cache and then cpu then reads it from the cache and the next time the cpu then has to read that same address if it's already if it's still sitting in the cache then great the cpu doesn't have to wait for the cpu so these units of memory called cache lines which are 64 bytes on most platforms now they when you access a single byte you're actually grabbing that whole cache line and it's getting copied into cache so you access that byte again or any of those other 64 bytes it's probably still sitting in the cache the next time so ideally when you access data as much as possible you want the cpu all the data it needs to already be sitting in the cache now obviously that's not always possible but the question is how can we try and bias things you know arrange things so that's usually the case at least for our most important workloads and uh very very hopefully there's a feature of what memory systems will do where uh called prefetching where if you access a certain bite in memory and the cpu then goes to the next bite the next fight the next bite if you start reading through memory sequentially the prefetching logic of the memory system will kick in and say aha i i'm going to guess that you probably need the next cache line after this one so i'm not just going to grab i'm going to go ahead and grab that while you're still working with this cash line and probably by the time you get there it'll probably be sitting in cash already for you so it's very much like laying the train tracks right right down before the train gets there that that's what you want with pre-fetching and and so then if you go process some bigger way if you go loops through some array of bytes um what you'll get a cache miss probably for that first byte it's probably if it's not sitting in cache you're gonna the cpu will stall waiting to bring that into that cache line into memory but then all the other cache lines of that array thanks to prefetching you probably usually doesn't have to wait in most cases so as much as possible we want to try and arrange our key data the things that we need to process at scale we want to put them in some big nice contiguous arrays that is what we're trying to aim for uh and then the story with code uh caching is that well let me go back again so that doesn't distract me oops um the issue with code is that you know code itself has to be loaded from memory uh into cache before this dp can execute those instructions and so ideally um when code executes all of the instructions are already sitting in cache most of the time so an anti-pattern what you don't want to have happen is say you have a thousand monster objects and they all have an update that needs to be run and what you would want is for in a single frame that update method for that monster to be loaded once for that frame and then you loop through all the monster data and run the update on each one of those monsters that is the sensible obvious thing to do but if you update one monster and then go do something else run some other code and then update another monster go off to something else and then update the next monster if you like interleave the updates of your monsters with other stuff well that other code itself has to be copied from memory into cache and it might overwrite the the monster update code in the cache it'll temporarily overwrite it so the next time you update a monster in the same frame it has to load it again so like in the worst case scenario if you update a thousand monsters you might end up reloading that one up monster update method code a thousand times which is obviously very stupid we don't want to do that right and you might think well surely unity with my behaviors doesn't do that if i have a bunch of thousand monsters of all the same type it can surely like group them together their updates you would think um for good reasons it doesn't actually do that so you do get this the bad the bad case the anti-pattern in the context of modern behaviors so that is one thing that entities means to fix so the entities package itself so the performance problem we're trying to solve is cache friendly data in code and what entities is is an implementation of an architectural pattern that is known as ecs standing for entities components and systems be very clear it is not a system of entity components it's a thing with entities components and systems that's that's the proper way to read it really so the entities and the components are the data sides of things and i don't have the time to go into details in this talk of precisely how things are laid out but the gist of it is we want that component data and those entities to be laid out in nice big arrays for so for most of our purposes most of the critical parts of our code we can just loop through them in nice big linear fashion and get the benefits of prefetching for the most part and avoid a lot of cpu stalls avoid a lot of the cpu having to sit and wait for data access in many many cases and then on the code side that's the systems systems are you know kind of skip ahead systems are extremely simple really they're just a thing that has since it's a type you defined with an update method basically and that update method will run once in the frame just once always um and kind of handily also compared to model behavior updates these system updates when you care you can actually specify oh yeah i want this system to run before this other one in the frame so that's kind of handy you know you can't really do that with with mod behavior updates you can put things in pre-update and late update versus update but that's you know not as flexible you don't have as much control there so that's really all there's to say about systems really um so going back to the entities any of these are basically analogous to game objects pretty much they serve pretty much the same purpose but in terms of data they're very different a game object is um an actual managed object that is a container that contains the number of components it has like a list of components that it has whereas an entity is just an id number it's just this tiny little int basically how long it yeah excuse me um so the components meanwhile are themselves unmanaged objects in most cases they're struct values um and they again i can't go to details here but they end up arrayed uh in nice big arrays so if you have a bunch of food components and bar components your foods will for the most part be in a bunch of nice big arrays and all your bar components will be in a different set of arrays and so when it comes times to access hey i want to loop over all the entities which have a foo and a bar component i want to process each one individually i want to visit each foo and bar of each entity one by one you can do that with nice big linear array access that is the gist of what we're shooting for that's what everything is optimized for in entities is that particular case now to be clear in some kinds of code you know you just naturally do have to skip around number you have to jump around and visit this entity which links to this other entity that some use cases that's just necessary but the idea is we're optimizing for like the important cases the big cases where we need scale and that's in those cases you can usually go through these components in in nice sequential order and very importantly um you can access these entities in the component data in jobs so it's all in managed data for the most part and and so that can be used in jobs and so you can you know jobify that that code that processes your your components and first compilot so you get all the benefits of all the things that roger was just just talking about right now so i i do strongly encourage you if you are interested in looking at entities right now i strongly encourage you to maybe i'll go find some links that i can post later um but there are already a lot of talks to talk about more detail about precisely how the data is laid out and these things called chunks it's not really complicated i'm sorry that's my cat it's not really complicated but it really helps to understand that data structure uh one second [Music] um so i'm gonna have to eject my character from the room in a minute so um the question then is okay you're gonna make your game out of entities instead of game objects perhaps or or possibly have them live side by side but you want to be able to construct a scene of game objects sorry a scene of entities right and have them loaded as a scene you would think then that in the editor you would have a kind of scene where you can just directly say this the scene has this entity that's entity that entity but actually for a number of reasons they're kind of complicated um we instead have you create a scene out of ordinary game objects and give it components i'll get a movement in a second um and the idea is that at build time we want to convert that list of game objects and their components into a set of entities that can then be serialized into um into a file and then that is what is loaded when the scene runs i'll be right back sorry uh maybe we take one of the questions um although brian has uh handled his um little kitty um so we have one question was um if you have studied computer science and you have maybe lived a few years inside unity um if you basically need to unlearn everything you know and start again from scratch because thoughts and data into programming is so much different from what you're used to um maybe brian as your back can you maybe help if you think um who is useless now and everything is new and fresh well when we do our internal dots training we train up our own people on using dots part of it is emphasizing data-oriented design principles which dots are sort of uh structured around hence the name um in some sense yeah you are relearning habits but i can say particularly with the non-entities parts of dots um jobs and bursts mainly you can i would recommend you can find selective use cases for them where yeah you have to get familiar with the api but that's probably the best ending point to dots because when you do use entities you're going to want to be using that stuff as well so it i think that is probably a good entry point at this point for for for getting burst and dots is trying to find some opportunity of i have this one particular hard computation problem in the context of my normal unity game let's maybe try and experiment with using uh jobs and burst for that part and then maybe we should come back to this i bet febreze has some thoughts so yeah i think overall uh if you ever i think your experience in computer science probably has will help you understand how amazing the job system is because if you ever have to implement something by hand with logs right on semaphores and all of that and know how hard it is to get that right and actually working properly in all circumstances i think your computer science background will probably help you appreciate of how much easier it is with all the tools offered by unity um but i don't think it's useless that you have learned about databases ecs is very similar to a database i think you can benefit a lot from your previous experience it's just you might need to unlearn certain things or be a little bit questioning more of what you did in the past and maybe be open for new ideas yeah with endings in particular if you're going to make a game that's all in entities um that is a head spin in many ways and it takes some adjustment once you get used to it i think in many ways it pays off and is simpler in the end like a lot of bad powders i have a video on youtube called auditorium programming is bad or make an argument about how object comparing it leads you to stray in all sorts of ways and kind of it's very easy to create messes where i have found in particular you know once you get past the entry point the learning curve you end up creating code that's a lot simpler ultimately and easier to reason about not just for performances sake it's a it's a hard abstract argument to make about how is it actually easier to write cold code ultimately with dod in this way i think it is it's a long argument that probably don't have time for here but i i certainly think that's the case certainly in the long run there's kind of a trajectory in a project where in the early phases it might seem easy to work with what you're familiar with but then very quickly towards that after you know a few months of development it's like oh i've already created a mess and i think dod and dot symbol dcs once you get familiar with it it can they can really uh avoid a lot of those those kinds of anti-patterns i'm sorry so um should i pick up here yeah let's continue with the presentation beyond some more questions almost done yeah i'm almost done here so um yeah so so we call this build time you know the conversion workflow you may have heard it as but this is the idea is that you'll construct your scenes in so-called authoring data which are game objects that won't then in most cases exist at runtime you want to have your runtime data be efficient entities and the game objects are there just for authoring purposes for like putting together a scene and describing the initial state of how things are supposed to be um yeah usually it ends up that one game object will correspond to one entity one generated entity and then the components on that game object usually one of them corresponds to at least one component that's put on that entity that is usually the way it works out okay so so so the remaining packages of hybrid rendering physics uh audio net code i don't really have too much time too much to say about these uh admittedly they'll go ahead and ask questions but the general idea is that okay if you're going to be creating a game around entities instead of game objects then you want to have you know standard game engine functionality like rendering and audio and physics and so forth so that's what these provide and they have the benefit of not only allowing you to do those things with entities but they themselves are implemented in terms of entities in the other dos packages so they have the benefit the performance benefits of those um into varying degrees that can actually itself in itself lead to significant performance gains is you know parts of unity and effect or um the the game engine you rely on is being rewritten in this fashion so the hybrid render um i won't say much about it other than it's do be clear it's called the hybrid rendering package because not not just the rendering package because it itself is not actually a render what it does is it takes your entity data that'll have components saying like translation rotation and scale and uh and have a little component called render mesh on it it'll take those and convert it into a form that can be passed on to the scriptable running pipeline the the normal unity srp hdrp or urp and do that does the actual rendering hybrid is basically just kind of transforming data it also does the culling part and it does lod but um that's the just of it and they've they've made great strides in the past year it's come quite a long way in terms of like getting closer to feature parity with acrp and europe to do a lot of performance improvements yeah so it's making good progress most of what i can say about it um animation package i think i know very little about animation in general including this package um it's like i honestly probably can't answer any questions about it um it's maybe one of the lesser i don't even know if there's a public version available yet i honestly don't know i'd have to go look um sir animation there is a package available there are a few examples but it's very um rough like it's you have to actually write code with connect they don't yet really have an api that's meant to be convenient to use it's like low-level kind of inconvenience you guys are focusing on features and everything implementation and they just haven't gone to that part yet i think is the gist of it um no visual editor yet and all this stuff but you can see what they're working towards in the examples they also have really nice example projects where you can see what they're currently working on yeah yeah i know they're working on like tooling aspects as well related to animation yeah um physics package uh one of the more mature packages i think um you know maybe there's some features left i know i think they're working like some additional motors and joints and things but it's you know probably some performance problems and such but uh it's relatively mature at least compared to animation i suppose um and the interesting thing about it is that there's two back ends so in a physics engine there's the solver part that does all the the math and the physics simulation and with the physics package you can write code to the api of this package and either use the the unity physics backend as it's called or the havoc backend a back-end that's been written for us by havoc and the main difference between these two things is that the unity one is stateless it doesn't do any kind of caching from frame to frame it doesn't carry over any state information and the advantage of that is mainly for network games in network games that can lead if you have state that has to be carried around that can lead to cases where it has to be transmitted over the network you know putting a lot of burden on bandwidth and that often is not really an option at all so you'd probably lean towards to my understanding you i'm not an expert on this i think you would lean towards the stateless option the unity physics option for a network game whereas the havoc uh back end the the main thing it does different is it does caching and what that caching is is information about like guesses about how much air there was in the prior frame and you carry that over and you can factor it into the physics calculation and thereby get more accurate results and that the degree of accuracy is going to the practical effect is that the primary use case is you want to stack a bunch of objects and not have them start jittering and fly apart so you want stable stacking for like large stacks of things that is the main use case to my understanding um it's debatable how many games really need that perhaps but that that is what you would reach for that's one reason to reach for the havoc back end instead um i i guess if you want to make a network game that needs both that also needs stable stacking i don't know if you have an option there but uh that is my question they quite improve the stacking um yeah talk on that of how they did stateless stacking and of course it's a compromise it's not as good as havoc evidence yeah you're right there's an inter i'm sorry am i cutting off oh sorry yeah there was a microsoft uh game stack event where um the primary person on on our havoc back end from havoc uh sorry forget his name uh he uh gave a really interesting talk talking about how okay in a stateless implementation there are some tricks where you can get closer to the you know to have greater accuracy and it was pretty cool i i said search for game stack unity physics i think microsoft unity gamestack yeah interesting video um media package like animation i know almost nothing about it um so i almost can't even really say anything about it if you have questions go ahead and ask and then lastly netcode for business i actually do know quite a bit about this because we just put together an internal training session to to train our people on hey you need to understand that code it's a really confusing topic netcode itself not this package per se the gist of it here is that well in netcode in general there's there's two main models of how to do netcode there's a authoritative server-based version where you have the clients do prediction and there's a version of a model of netcode where you don't have any authoritative server it's just all peer-to-peer but the the game the hard part is your game logic has to be deterministic which can be hard to do and you might want to do rollback which is what fighting games do so that peer-to-peer model most games use authoritative server but then games with fewer player count the small player accounts like fighting games and rts's those are the primary cases um they often typically will use peer-to-peer and right now what we call the net code package is just authoritative server the the plan is there will be a separate package at some point that will be peer-to-peer but right now it's authoritative server which is what most people want and i mean you could do a fighting game in rts with authoritative server usually in practice you want to the reason you want to do peer-to-peer mainly is because to save on bandwidth that's the main one you don't want to have to pay for excuse me you wanna don't pay for hosting time for that that doesn't make a concern cpu time on on a web server somewhere website on a data farm somewhere okay so as i mentioned at the top like i really can't uh i don't want to say anything that even hin said like timelines for when entities in these other packages are going to come out of preview not really to say that even if i could tell you um but i can sort of give you a sense of like where things stand that you could piece this all together if you just go on the forums and these things have been talked about so none of this is like really new but in general i would say like yeah just definitely the usability performance and feature completeness of entities itself and these packages you know to one degree or another that definitely needs to improve in various areas better documentation samples that's recent our and myself that's our department um you haven't seen us produce anything yet you probably might see us release some public stuff probably you know this year can't promise anything but you'll probably see some stuff from us not too too far in the future uh and then the rest of these bullet points i i don't want to go through them one by one but if you have questions you know go ahead and take a screenshot and if you have if you're interested in what i mean by any of these bullet points go ahead and ask but uh this is me throwing your bone of like trying to give a little hint of like where things stand and yeah there are definitely things that are being worked on and improved um i can't say anything about timeline but like do understand uh people people speculate oh dots must be dead because you know we don't necessarily haven't been messaging much about it of late for various reasons um but the more people are working on than ever so like it's it's not dead by any means i can at least say that uh you know definitely as we emphasize the the four packages are our preview you know that's all blessed to use right now and then entities people are asking should you even adopt it now it really depends on you i think is the real answer it depends on do you want are you willing to live with something that there's that in its current form uh with no guarantees like uh it's a hard question um that it's kind of above my pay grade to to to answer um it is probably it is also to be clear like it's gonna be the most invasive thing to adopt like if you want to like really embrace entities it's going to your old project probably is going to want to revolve around it in a way that with just jobs and bursts that's not the case you can use that selectively in the context of a larger project and so that again that's probably the more effective place to start and i have no idea how long it talked much longer than i expected that was 20 minutes again wasn't it okay so um thank you and i
Info
Channel: XR Bootcamp
Views: 1,627
Rating: undefined out of 5
Keywords: VR, virtual reality, Virtual Reality video games, virtualreality, VR for PC, PC VR, virtual reality experience, games VR, Virtual Reality VR, Augmented reality, AR app, VR/AR, VR platform, VR headset, AR headset, mixed reality, MR, industry, VR ecosystem, VR developer, VR development, AR development, VR ready, VR ready PC, VR tool, VR environment, technology, handtracking, mrtk, vrtk
Id: SoNnyPpE2Ok
Channel Id: undefined
Length: 104min 16sec (6256 seconds)
Published: Mon May 31 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.