Building a Data-Oriented Future - Mike Acton

Video Statistics and Information

Video
Captions Word Cloud
Captions
[Music] all right so today I'm gonna talk about building a data oriented future I want to give you a little bit of background about me to start with why I'm here so I've been working in games for about 25 years mostly in the console triple-a space so I've done PlayStation 1 games PlayStation 2 games PlayStation 3 games PlayStation 4 games Xbox 360 Xbox one the last game that I worked on after about 11 years of being the ancient director and insomniac games was spider-man so from a career in working in the engine space in game development I moved to unity where I no longer directly work on games so the question is why for me and my journey so unity reaches over so if we look at over the last 12 months about three billion devices have run made with unity experiences about 29 billion installs of made by unity experiences over that same 12 months in about a hundred and ninety two countries there are developers working in unity and in the last 12 months there have been over 2 million active students working learning and unity about 20 about 45% of the top 1,000 mobile games are made with unity 50 percent of all new mobile games are made with unity and about 60% of all AR and VR content was made with unity about 60 percent of the content at Tribeca Film Festival made with unity 65 percent of the interactive content at Sunday Sun Dance was made with unity and of all the most innovative AR VR companies as defined by Fast Company they're powered 90% of those are powered by unity unity supports 25 platforms or a little bit more so for me and my journey unity met impact so what's my mission and why do I want to reach that group there's two things that I want to accomplish one is I want to see a future where players get the maximum value from their experience when they're playing so if you may there's an app here for this conference right and there's a there's a sort of a stream complaining about the app from this conference and you know my battery is being drained and it's using too much data and all the things right that's that's not you feeling like you're getting the maximum value from your experience that's precisely the kind of thing that I want to I want to personally address and I want enough game developers or just developers in general capable fulfilling that promise right I want people I want the world to be full of people who can do a good job with that and make sure that that every player who picks up their phone is playing a game or using an app feels like they're getting the maximum value from that experience so that's my personal mission and why I'm here and why might unity what I do at Unity is lead our data oriented tech stack development what we're focused on is which we call dots which we focus on the high performance multi-threaded scalable and optimized code and delivering that in a way that that our users our developers can achieve the same thing so what problem are we trying to solve so I like to say oppor the problem we're trying to solve it to get the clown's out of the car if you're trying to race a car you're trying to race this car for instance the problem is not how fast the engine is the problem is not how streamlined the car is the problem is that you have a dozen clowns inside holding it down so before you do anything else you need to get those clowns out and that's what I'm focused on the most obvious the the biggest problems that we have the lowest hanging fruit that is just holding everybody to everybody back so the problem is as I see it that the default approach that we teach just in general is terrible for performance the default approach is also not optimizable you get to a wall and there's nothing you can do there's no fixing it the default approach introduces problems and complexities and additional machinery more complexity then then you actually need to solve the problem and the default approach on top of that rarely solves the actual problem so I like to say there three big lives and software development most of the time I talk about this in terms of what's wrong with object-oriented programming in particular but it's just three lies that that are prevalent in the engineering industry as a whole and certainly also in game development one is that software is a platform we're taught we teach people that software is a platform that you can reason about software independent of the hardware and that's simply not true there's no world in which that's true software is not a platform to the second law is that code should be designed around them all of the world that you should have this model of how the world works and you say who's telling the story inside your code and ignoring the realities of both the hardware and the data that's also a lie because there's no world in which that produces good engineering results now third and sort of we're the insidious thing that we teach our students is that code is more important than data we teach them how to code if we teach them what code is like we teach them syntax of code what we don't teach them is what's important about the data that they're transforming what the most important thing that they should be looking at is so these three sort of insidious lies have have crept over decades into the industry as a whole and if he's gotten us and dug as a giant hole then we now have to get out of so what do we want we want performance by default we want the default thing that you do to approximate the right thing to approximate something that that's performance and we'll get you there we want optimize the bull by default so if it's not optimized right out the gate if it's not the fastest thing I could possibly be right out the gate that's fine so long as there's a path for you to get there there's too often it's you're put in a position where you write something and you've painted yourself into a corner there's in fact no way for you to get to an optimized version of that so you might hear somebody say well just measure this at the end and I'll fix it there's there's in real production in real life in real engineering you can't just do that there's no fixing it right at the end one the scope of those fixes are much much too large for the cost and risk at the end of a production I'll tell me that half the time you just need to throw everything away anyway because you designed it wrong so two things in terms of optimized ability that I want to solve for one is I want to cross I want to have no pre-production and production wall so there's a common theme that I'll hear where people will go through P production and certainly in game development but and other develops as well where though they'll build something to see you know how it's going to work right this pre-production is this going to be fun in terms of the game or or how are these mechanics going to work that's pre-production to answer those questions and then they want to transfer from that pre-production work or that prototype works to production and the only real answer they have at that point is to throw all that away and start over again you know what are the lessons that we hadn't start over again so is this wall between pretty pre-production and production what I want is an iterative experience you get to that end of experimentation and you can take that and actually use it to continue on your on your development path and I want no paratroop optimizers so certainly in games what we see is that there is a there's a sort of group of engine programmers generally speaking at any given studio that right at the end of development will try to go in and fix everything up we'll try to go in and optimize it down to under 16 milliseconds or under 32 milliseconds depending off the running 60 or 30 frames a second they'll try to sort of squeeze it squeeze all that stuff right in at the very end and it's very difficult it's very difficult to do because they can't make massive high-risk changes and there's a limited number of people that can actually do this work I want scalability by default so one of the things that we're focused on is moving from you know is looking at games that are as small as a hundred kilobytes to games that are large as a hundred terabytes so I want those choices to be available to developers but I want to make sure that we've covered that range and I want the fewest number of moving parts that will solve the problem well so Mike one of my goals is to help developers on this axé their actual problems and help them solve them well so I want to make people as they work with what we're doing understand their own problem better and I want to coexist with experts so what I don't want is a world in which we're providing a solution and you don't have to think about that thing anymore it's not a black box what it is is is a utility it's a thing that you can use and you can focus on which parts of that you want to use and if you are an expert you can do some part of that much better we want to coexist with you we want you to be able to replace that part and do it much better so what is the core tech that we're working on so one is a job scheduler a job system so how we go multi-threaded by default everything that we do should be multi-threaded by default one of the things that we've seen in games in particular is they they're blocked on the main thread you have this dear most of the work happening on the main thread and then occasionally you'll be able to to push things off on to other threads or other course this is a systemic problem in design where you've designed something to be sequential initially and there's no way out of that space so we want to make it by default it's it's able to go wide part of our core tech stack is a burst compiler which I'll get into a little bit it's a new compiler around a new language called HP c-sharp our memory containers and ECS so I want to talk a little bit about what I mean by ECS and our context here but first let's talk like the sort of big picture of how all these pieces fit together see if your game or application and under that you can use set of packages some of those may be ours some of those may be from other developers some of them may be your own one of those packages is ecs those all by default run wide across all the cores available on whatever machine that you have in jobs we have a job scheduler that that works with those are native containers which is the thing that allows you to reason about memory and our dots runtime the burst compiler is a thing that knows across the stack and will compile the code so that it works with all this so how does the job scheduler work first the critical thing is that you have to fully declare the data used in any job there's a critical portion of being able to reason about what data is being accessed you have to tell us what data is actually being accessed there's no arbitrary memory access and a job you cannot optimize that if you do that you have to declare read/write permissions on that data this with those two things we can do verification of correctness we can tell you if there is a race condition and we can we can have you fix it this in terms of actual practice it makes a massive difference when we look at work where game developers in the industry is in general versus what this allows is multi-threaded coding in particular is tends to be an expert level practice when you want to optimize for that you tend to need experts in that space so what having the ability to give you feedback on correctness gives you is that anyone now can start to experiment in that space if you do it wrong will tell you you're doing it wrong so you can trust a say a junior programmer or somebody without that that specific experience to start working in a multi thread space by default and get the feedback where they're doing it wrong and iterate and start to learn how to do it correctly and that enables just about anybody to write multi-threaded good solid multi-threaded code we also have overrides for special cases in those cases where you as an expert developer know what's better we do for instance type based alias analysis and if you happen to know that these things cannot a leus you can basically cut through it and say just let me do this so for that I've kind of heard it described as a sort of baby rust in the way that you know you we need to declare all these things we needed to clear these connections that we can do verification that ways the same however because we let you sort of cut through that when you absolutely do know what's going on it's a little bit different so I'm gonna talk about our burst compiler why did we choose this HPC sharp or set which is a subset of C C sharp one is that C sharp is already familiar to unity developers about half of all c-sharp on earth is in unity so it's a significant part of the ecosystem HP c-sharp is a reasonable subset of c-sharp this is not an issue of c-sharp versus native code because that's not what we're doing well we're HP c-sharp is native code we have a compiler it's compiling into native code that's specific to your platform so it's a full compiler actual ahead of time compiler so what is that subset of c-sharp that we're talking about so it's constrained by having no class types anything that would garbage collect out the window anything that is garbage out the window so no class types no boxing no garbage collection allocation no exceptions for control flow and what these tighter constraints give us is a highly optimizable language that we compile ahead of time and we can statically analyze your code for safety issues it's still c-sharp ish we have all the basic types who have structs in arms generics properties say it's a safe sandbox even for parallel code so you're writing c-sharp ish code in parallel why didn't we just fix c-sharp so that's an intractable problem by design it's poorly designed for this case garbage collection is not a tractable problem manage objects mean cache misses end of story there's no way around this boxing is bad and just tons and tons of object-oriented legacy code that which is not optimizable so instead we focused on the subset of c-sharp that we can highly optimized and then we can make guarantees about for our developers so how does burst actually work so we compile assemblies and normally you know as you're working it you know as you're iterating the assemblies get compiled normally using the normal C sharp compiler since it is a a subset of c-sharp and then a synchronously burst compiles the job-job kernels in the editor so it'll immediately start running on the regular C sharp environm and then burst will compile it asynchronously so burst consumes net il it's an LVN front-end we can because we have because we can do alias analysis on our memory we can do run much more aggressive LVM optimizations we can get actual reasonable results at the end and the ahead of time workflow will integrate with LT cpp which is our C sharp to to CPP conversion utility all these job Myrtle's can be pre computed and pre-built ahead of time an hour in editor inspector it looks like this on the left is all of the job kernels that exist in the in your game or project and you can examine the assembly output the ir output unoptimized ir l put any the state of that you can also examine it / you can change the compile options you can also examine it per target so if you want to see what the assembly output looks like for arm versus x64 you can do that so in our view burst in unity makes is a better combination we can do right now we have some context aware alias analysis and by context aware I mean we know specific things about what your code is doing because we have an integrated top-to-bottom environment so we know how how you got that memory by using our memory containers we know those memory containers can not alias so we can in fact insert that that aliasing information and push it down to the the compiler we also want to introduce a lot more static analysis off offline analysis and that's still unemployment it but we're working on it we're also adding precision and determinism controls you can say I only I want much much reduced precision and say these math libraries because I I can afford to be sloppy or I want much greater position we also want determinism not only determinism on the same platform so if I'm running on x64 and I'm running another machine on x64 I expect exactly the same results but cross-platform so if I'm running on an arm device say my iPhone and I want a server running and I want that server to produced exactly the same results as my arm then we're going to guarantee determinism across these two devices or also introduce a higher level SOA to a OS conversion so structure of arrays to our area of structure conversions this is normally a super manual process for programmers excuse me a super manual process to redesign the data and transpose it based on a case you know on your specific case and so what we're trying to do is introduce the concept of you could just transpose that data with a switch and experiment with what it's the best possible layout for you and with all this we get code unity code iteration times so as you're editing you or HP c-sharp file it gets put into the editor it builds it runs all that it's available inside the editor it's available and part of your iteration loop in our target is under five hundred milliseconds so there's no there's no long build wait times there's known there's no big big link time for me to see the results of that in my and my editor so our memory containers as I mentioned before no garbage collection they're basically we support custom allocators for common use cases like temp and temp job with lifetime and access rules the key to our memory containers is that we have good aliasing rules by default by default we're saying these containers cannot alias that's understood by the compiler and therefore we can spit out that information to LLVM therefore it can give us actually good results at the end which is something and if we compare to say C++ something that's not possible give in that scenario where you have an unknown source the compiler it doesn't know where that memory came from cannot do an a leasing analysis based on two separate pointers doesn't know that they could possibly not point to the same thing which is why I say restrict keyword was introduced however there's no possible practical way for you to introduce restrict across your entire application anywhere that could possibly poison that that data we can do that so what is ECS bringing us it brings us a good what for EEZs we want a good default data layout we want a good default transformation pipeline we want a good data iteration experimentation workflow and we want it to be optimizable so a quick background on what ECS is so an sort of typical object oriented space you have game objects as a container it's a it's a it's a class that has a bunch of stuff in it and it looks like this they're each individually heap-allocated you can imagine each one of these is an individually heap-allocated game object of various pieces of data and they're just all over memory ECS is different in that an entity is it just a key into data where the data is homogeneous of a specific single type so it looks more like this where each one of these things is a set of types and it's a it's a structure of arrays basically of the individual component types so if we look at this and divide it in two if we look at this each of our types here divided into separate individual chunks what we call an archetype is the definition of the combination of specific components that you might have so an archetype is a set of types like if you have position rotation render type in a rigid body type that's what we call that combination is what we call an archetype chunks are a currently a 16 kilobyte block of memory structure of arrays of all the types that you have in that particular archetype we have we have a concept of query so you can request a specific set of types so I'm requesting these two colors for instance and says these are all of the archetypes that match those two colors it can gather them up gather up all the chunks that belong to them bring them all together line them up stitch them together so that you can process a work kernel across the specific types that you've requested and that work kernel runs inside of a system so basically the two concepts that we have are the components and how the components are laid out in memory and the systems which run across those those queries we have worlds we have worlds for isolation so conceptually this is how a job component system works you basically are launching you have a management thread which is the equivalent of your main thread which launches a bunch of systems and each one of those basically by default goes wide across as many cores and worker threads as you have on your machine and those all get stacked up so let's talk about that's the background of what we're doing so I want to talk about what what are the principles behind what we're doing and a little bit about what data-oriented means so what are the principles of data oriented design so first is the global energy required to transform some data should be proportional to the amount of surprise and what that means is how much - if you have an event that's not surprising so for instance if you have a frame in a game this frame and the next frame of the game the next frame is it is guaranteed 99% of the time to be very very similar to the previous frame most of the work that you do in the next frame is unsurprising so it should be proportional to that fact you should be in fact providing doing less work in order provide that second frame by comparison if the camera had completely moved across the world right then the second frame would be completely unrelated to the previous frame and should be doing more work in that case so it should be relative to what's surprising so as an example like think about where you keep your toothbrush it's unlikely that you keep it in a drawer in your kitchen right most likely scenario is that you keep your toothbrush next to your bathroom sink why do you keep your toothbrush next to your bathroom sink because it's unsurprising that you're going to have to brush your teeth you know for a fact that you're going to have to brush your teeth tomorrow tonight whatever so you prepare it and you put it very close to the to the place where you know you're going to have to brush your teeth so the amount of energy you need to put in to get your toothbrush is proportional to how surprising it is that you're going to need to brush your teeth right so the opposite of that would be you store your your toothbrush under your bed right and every time you want to brush your teeth you after you'll get it but the fact is that that's how we program that's how by default we program people put their data in in a place that's far away in terms of it's not in cash or it's in memory it's or on disk they put them put their data and very far away from when they actually need it even though they know they can guarantee that they need it I want to talk about global energy I'm looking at it in fact globally so I'm looking at our unity work at unity times and the number of developers using any times the players that are playing those developers games and applications times the time that those play sessions those play sessions are happening so all that energy actual energy uses practical energy usage whether that's energy being pulled from this cave this power cable in my laptop or it's on a battery on device all the energy adds everything impacts that energy usage and I would like to reduce that so the next principle is the purpose of all programs and all parts of those programs is to transform data from one form to another this should be intuitively obvious a computer can't do literally cannot do anything else right it gets inputs and it gets outputs a game is basically an input and output machine there's controller inputs and a half the output is the screen and the audio and it's a hierarchical from there some piece of that is inputs and outputs and does something some piece of that is inputs and outputs and does something that's literally all a computer can do and yet somehow we're taught to do something different we're not taught to transform data and understand the transformation of the data what we're taught is to tell a story with code abstraction and that's not what we need to accomplish what we need to accomplish is transforming their data with the least amount of energy that we possibly can well let's talk about abstraction a little bit so there's two kinds of abstraction when I think about abstraction one is utility abstraction so there's a scaffolding so this is I'm just I'm giving you a tool to do something and you could use that tool or not so for instance a light switch in my houses utility abstraction I could not use that live switch I can take the light switch off and I can wire those two wires together that's fine it's not actually hiding any information it's just making it more convenient for me to do is do a specific activity then there's a second kind of abstraction which is storytelling abstraction where in fact I am trying to hide what's happening with the data where I am trying not to specifically not to inform you about what's happening behind the scenes and between those two things one of those is one of those is useful and good and the other one is really really not the next principle is that if you don't understand the data you do not understand the problem so in terms of understanding the data imagine that you are you have if I say I'm gonna give you a deck of cards with integers on them and I want you to sort them or I want you to add up the numbers on these cards and you can think about how you might do that and I say okay well what if there's only ten cards how would you approach that problem well if there's a million cards how would you approach that problem what if there's 10 billion cards how would you approach a problem what if 99% of those cards were a zero how would you approach that problem almost all those questions all the answers to all those questions would be different completely different in fact in each one of those cases the the end you're talking about essentially unrelated problems that the solutions would have would have very little to do with each other even though the mechanics of adding or sorting or are similar the actual problem solving is very very different the solution that's very different in those cases so you need to understand the data you simply cannot solve the problem and converse thing you need to understand the problem you can you can inspect and understand the data and this is this is trivially demonstrable at any code base pick any function in your code base and capture the data in it dump it out look at it actually examine it there's inevitably something that you can learn from that inevitably something that's going to surprise you different problems require different solutions programmers love three things in my experience generic frameworks platform-independent and future proofing no let's say no to these things so generic frameworks is implying that I don't know I don't want to know anything about this problem and I want to hide all the things about this problem and I in fact not only do I want to do that I want to kick it down the road for somebody else to solve and you're not actually going to solve the problem so what I don't want a generic framework so what I want is utilities and standards things that people can actually use platform independence does is not a thing that actually exists it could not be independent of the platform that make no practical engineering sense at all what you have though is platform commonalities you have some platforms that are very similar that share similarities and you can you can combine those similarities that that part is fine future proofing is another example of something that doesn't actually exist you cannot future-proof you cannot solve a problem that you can't possibly know how what's there what what anything at all about that problem so you need to solve problems they actually have or that you have the experience to anticipate there are problems that you can anticipate those are fine I'll add this tweet so the only future-proof systems I've seen are systems that are easy to delete which I heartily agree with if you have different data you have a different problem so if you catch yourself thinking you know I'm looking at a problem well what's the best way to do this generically abstractly you're thinking about that the answer is inevitably there isn't one and what are you measuring and you need to stop and you need to if you catch yourself saying that you need to stop and look at your data because that's going to help you answer your question there isn't a best way abstractly there's no abstract version of this problem you need to actually look at your data actually examine it and that will help drive what the actual the actual solution is if you don't understand the cost of solving a problem you don't understand the problem so if I'm sitting down I'm saying okay I have a problem in front of me and I'm trying to solve and I'm trying to create a system I'm trying to create a feature and I'm trying to reason about what the you know how this feature is going to work but if I haven't thought about what the cost of doing it is then I can't I can't possibly understand it well enough to solve for it and there are four things in particular that we reason about that I think every problem needs to reason about and that we want to build tools to help people reason about one is performance constraints if you're not reasoning about what the performance constraints of any problem you're working on are you're not solving the problem you simply cannot solve the problem well so whether or not it has to fit in one millisecond or 10 milliseconds or 10 minutes it doesn't matter the fact is that it has constraints so whatever you're thinking about whatever problem you make you may be working on if you think well I haven't thought about the performance constraints of this problem I think well then you think well it doesn't really matter for whatever I'm working on the performance doesn't really matter that's in fact never true because if we took whatever you're doing and we said we multiply by it's going to take ten thousand times longer to do it be like well no well obviously that's not good enough like obviously it has to be faster than that okay well then there is a constraint what actually is it find that number so that you can stay under it second is determinism constraints you know do you want to make sure that this thing is repeatable scalability constraints how big how small how many and workflow constraints UX as part is part of designing a solution for any problem if you don't understand the hardware you can't reason about the cost of solving the problem so this is what I mentioned there's no such thing as a software platform that makes no sense everything that you have runs on some hardware somewhere and you have to be able to reason about what hard words what's what hardware it's running on at least at some level and so somebody said well it's obvious it's obvious it's going to work it's obvious it'll be fine well is it true is it truly obvious like is whatever you're working on going to run on and video GPU will run on an arm device will run on a pico eight will it run on ten thousand distributed x86 64 cores will it support as a c 4.1 will run on the iPhone on the Android these are all platform and hardware questions that must be answered as part of just designing any piece of software so it can't compiler figure this out can I just do whatever well let's talk about so specific case the answer's no but let's talk about some specific issue around a common x86 64 core core so if we look at what people normally would consider expensive which is a square root root instruction somewhere on the range of say 20 cycles on a very fast machine but if we look at the actual memory access times what we see is it's basically three cycles ish to get data from l1 up to more than 200 cycles to get stuff from main Ram if you have an l2 miss so what does that look like so if I have something in memory versus something that is an l2 miss so l1 in l2 in l1 great fast in l2 not as fast and if I've if the memory I need is not an l1 and not in Altoona I need to go to Ram what does that look like right so l2 cache misses per frame are very likely to be your most significant component the most significant thing that you need to concern yourself with as an engineering issue so let's take a really quick example here's imagine this is a game object or a game object like this it has some function the method like this which is just multiplying a couple of things and imagine that this is heap-allocated in a very traditional object-oriented way so what does that look like here's the code of that function what we see is you see it a 2 by 32 bit read here which are let's say or on the same cache lines takes about 200 cycles we do our floating-point mul and AD which is about 10 cycles let's assume that the square root is inline and that's about 30 cycles and then we do our multiple a back to the same address which is now in l1 so let's say it takes about 3 cycles and then we read and write the new line from the next line which takes about 2 200 cycles so the point of that looking at that is the time spent waiting for l2 versus doing actual work and this function is about 10 to 1 that 1 is the compiler space that 1 is the space the compiler can fix so if we look at a different way this first line we wasted 56 of 64 bytes you're reading in cache lines there's no way around it that's how the hardware works we wasted 256 of those 64 bytes in that read if we look here we wasted 60 of our 64 bytes and the next cache line that we need to read which amounts to about 90% waste this is waste of actual energy and actual things that are actually happening on the hardware that we're dealing with which means everything by nature is about 10 times slower than it needs to be just for doing nothing so you can also can think of it as 10% capacity used that doesn't actually mean 10% used well on top of that there there's there's additional optimizations that can be done but this is pure these are the clowns in the car this is pure 90% waste for no value whatsoever and this particular thing is why we have organized our data in this format and the structures of Ray's in chunks so that by default things are getting loaded in this form everything is a data problem including usability maintenance debug everything is a data problem you measure it figure out what you want to improve so what we want to do is give everyone the tools to understand what they're doing and fix mistakes we want to help build experts not replace experts and solving problems that you probably don't have creates more problems you definitely do so this speaks to building abstractions every abstraction you make is some another set of problems for somebody else for somebody else like you're going to invent something else that now somebody else has to understand and they have to understand all the the edge cases and all the weirdness that you've created as part of your abstraction on top of that they have to understand the thing that the abstraction sits underneath and on top of that they have to understand the hardware every abstraction you create creates problems so you need to minimize the actual abstractions the more context that you have the better you can make a solution so for example burst has context about where memory is that's the advantage of burst we want to help developers build tools to take advantage of what they know and we can't know so overall what I want to talk about when I what I mean with I mean data and data oriented design is not magic it's engineering we want performance by default and we want optimizable by default thank you [Applause]
Info
Channel: WeAreDevelopers
Views: 4,645
Rating: 4.9006209 out of 5
Keywords: conference, congress, Europe, tech, technology, IT, people, code, future, coding, programming, programmer, software, engineer, developer, developing, WeAreDevs, WeAreDevelopers, unity, data, data-oriented engineering
Id: u8B3j8rqYMw
Channel Id: undefined
Length: 36min 54sec (2214 seconds)
Published: Thu Jul 11 2019
Reddit Comments
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.