Data-Oriented Design

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this is gonna be our talk on the design or data oriented design that's gonna go over so I am Sean Middleditch most of you here probably know me DigiPen senior the art AIAS program president of the game engine architecture Club and I am a total nerd for engine architecture so this talk is going to be about data oriented design we're specifically over what exactly did a or data oriented design is why it's important what it's used for ways to apply it to your own game projects and then some some examples of existing game engines that have made use of this and other ways you can make use of data oriented design so what is data oriented design data where to design or there's a lot of different ways of doing programming architectures there's a lot of different programming methodologies some of these include object-oriented programming which most of us are familiar with there's functional programming which is popular in certain segments not so much in the games industry but is really popular a lot of communications and other algorithmic types of situations of machine learning and machine AI is particularly a common place to see functional programming procedural programming is your classic C style programming there's there's a couple other popular ways of designing software the the approach you use to design the software data oriented design is one of these approaches so to give you a little bit of an idea as to what data or engine design is good for a contrast to object-oriented programming or it's go over some of the points of what object-oriented programming is really all about so object-oriented programming focuses on interfaces and the way that different objects interact with each other in the system it's about abstraction encapsulation ensuring that you have these components that can work together and work independently it's really a great way of doing flexible design for your software so data oriented design on the other hand is they focus on the data itself rather than thinking about the interface as how data is going to be used how data is going to interact with their parts of the system it's about how do you actually design your data structures for the the maximal amount of use or the maximum amount of efficiency and the the easiest reuse essentially so some of the key concepts for data oriented design are controlling your memory access patterns and focusing on a simplistic approach here algorithms and removing data dependencies and we'll go into what all those mean in a little bit more detail here so they don't need to design focuses on making tightly packed contiguous chunks of memory for data structures so essentially this means is that if you've got game objects in memory for instance instead of creating tons of little objects to represent your game object you've got your all your different components and all these different little bits of pieces of data and they're all wrapped behind this generic interface this these virtual function calls and these interface classes and all that kind of stuff we're so gonna focus on a nice simple struct for each piece of your data we're gonna have a data structure just for physics a data structure just for graphics this data structure does nothing but handles the actual data itself if this sounds a lot like components this actually plays in very very well to a general component based design so the data oriented design is going to focus on the algorithms and how you manipulate that data rather than the interfaces for interacting with that data so if you have your physics engine for instance you know the particular pieces of data you need for your physics to work you know you're gonna need your velocity or acceleration your mass position data you can pack all these together in one simple structure and then have an algorithm that will just operate on physics and nothing else so you're not mixing and matching different pieces of your engine at the same time we're going a bit more on why that's important pretty shortly here but the key point here is that the data oranga design will maximize efficiency and it also helps keep your code simpler and certain certain respects so they don't design focuses and what we call plain old data and this goes back to just a simple C struct this doesn't have methods it doesn't have logic it has no behavior attached to it all's it is is a container of nothing but data specifically if you have a struct this is a struct without smart pointers or without other classes internally that themselves are complex you've got a struct that has integers and floats point basic pointers although we tried to we tend to avoid those where possible in data oriented programming just the raw stuff that the CPU can work with very very very efficiently so a lot of C programming actually is kind of natively in a almost data oriented design matter without really intentionally going that route but all of your methods and logic are external to the types so if we have a physics component we then have a totally separate physic system that actually operates on these components if we have a data structure like we want to make our own custom sorted priority queue then we would have the data structure that holds that data and then we would have a completely separate set of methods and logic somewhere else in the program that actually manipulates these they're not combined together in a single class so just a quick example we're talking about your traditional object-oriented programming that most of you are probably using we create a class for example foo we have some data members and we've got a bunch of methods to interact with these on some cases they're these simple accessor methods to get inset data that don't even really do anything besides setting and retrieving data if we're going with a data oriented design style we're using almost a straight-up C style approach we've got just a plain struct it's got its data and that's it we are done with the data there's no methods no logic nothing special here and then we have some functions sitting around somewhere else that actually manipulate this data so you can see in terms of actually writing your code for data or to design in terms of how your code looks it's really not hugely different it's just a slight reorganization and moving a little bit back towards C the actual strength of data oriented design comes in how you use these structures and what you do with them so data oriented design stresses the removal of data dependencies this is another key part of data or to design and this goes back to our component system in a lot of modern component based engines especially a lot of games I've seen using unity which makes it really easy to do this we have these individual components that all kind of reference each other so you've got a logic component that's bound tightly to your physics component which is bound tightly to a collision component which is bound tightly to your transform component in order to actually use any one component you have to pull in all these other components so this kind of data dependency creates a lot of problems when it comes to efficiency if you want to update your physics component you first have to go dereference the pointer to your your collision component and pull the data out of that and that's going to in turn require you to dereference a pointer to a transform component goes somewhere else grab the data out of that component so in order to just update just your physics you've now gone and dereferenced and access three possibly four different objects and this can cause a huge problem for performance so one of the things here is in data or into design we focus on trying to put as much data as possible in a component that is relevant to what that component needs if physics need some for its position and velocity the physics components itself contains those they're not split off somewhere else and by nature we then avoid not completely eliminate we'd avoid pointers and references between our data structures as much as we possibly can one of the other advantage of this besides just speed is it makes a lot easier to multi thread your program if you know that all of your physics structures don't reference other objects and you can write a thread that operates on just your physics structures well another thread is operating on your data on your graphics data structures and there's no chance they're gonna step on each other's toes because they're completely independent from each other in practice and particularly in games is kind of hard to pull off a lot of components need to transform you to position actually keeping that separate is not necessarily trivial but there are some tricks you can do copying your position into physics then copying it back out if your physics is done updating is one example of what you can do there so I said a few times now that data oriented design is really helpful for performance the multi-threading aspect is one particular way that's helpful with performance but there's some fundamental ways with how your computer works and how modern CPUs and how modern code interact with modern CPUs that make data oriented design really really useful if you're trying to squeeze out the last bit of performance in your game so let's step back to the object-oriented programming approach that most of us are using I need to take your your average naive game object system which for students is gonna be most of ours that's what we start out so we create a game object and a bunch of components and we we allocate all these objects of new so we say new game object and then inside of that some list of components we want to create we just call new on each of the components and we've created all these object and put them together because we're calling new each of these objects gets allocated somewhere in memory using the default memory allocator in short this means your objects are kind of spread out all over memory you created your game object here and your physics component might be spread over there in memory and then another game object created its business component ended up somewhere over there in memory totally separate places they're not close to each other at all furthermore the update logic is also kind of strewn about so the average way the average game engine component based game engine works is you have an update method on your game object and every every run through the main loop you iterate over all your game objects and call update on that game object and then the game object iterates over all of its components and calls update on those components so what happens is your ping pong hang back and forth between these different systems in your engine so your first your first game object is updated and it updates physics and AI and then graphics and then it goes to the next game object and goes back to physics and updates that and goes back to AI and updates that instead of doing all of the physics at once or all the AI at once so we're bouncing over the place you got objects spread throughout memory we're constantly jumping between ordered or updates and we've got different functions and code that spread out in memory that we're bouncing back and forth between which causes a performance problems so the performance issues come down to the way modern CPUs work and it's actually kind of funny because I still every now and again see people talk about performance tricks that don't actually work on a modern CPU they're actually performance pendulum is a shion's certain things like pre calculating large buffers of data it's actually slower to access this buffer of data in some cases and to recalculate the data on the fly and this this ties in to why having our objects per all of our memories bad so modern CPUs have a deep pipeline it's kind of like an automobile factory or if you're familiar with the graphics pipeline the same general concept a pipeline stall is whenever the pipeline is partially empty so for example if we have five stages to a pipeline and there's nothing in that pipeline because we're waiting for one thing to finish we can't put the next item in until that first one finishes that pipeline is gonna be empty before we continue on this is called a pipeline stall and this hurts performance-wise got a couple so the pipeline stalls occur when one instruction was to wait for a previous one to complete this is mostly dealt with compiler but not in the case of branches so branches are when you have an if condition or also when you have a virtual function call so let's let's consider we've got all these components recall and we're doing an update loop and all of our components and they're just a virtual component in interface it has an update method so we call you know update on component you know come interface I component and it's going to get dispatched the proper update method for the specific component that you're calling in to become the CPU has no idea where that's going to get bounced to so it can't precache that jump and there's gonna be a pipeline stalls become the CPU is saying well I don't know what function we start running so I've got to actually wait for that jump execution that function call to come the XR the pipeline and then I jump to the function and then I can start reading the instructions from that function so you get this huge overhead for calling these virtual functions that can actually really really hurt if you're calling a lot of these in a row or in a sequence so exactly what a pipeline kind of sort of looks like this isn't a real CPU install this is kind of a virtual made-up one but these are kind of the broad steps if you've got a list of instructions at your program it's compiled into the CPU is going to fetch the instruction decode it to figure out what it is it's gonna figure out when it can do then when it can execute the instruction it's gonna dispatch it to the actual execution units it's gonna read that it's gonna execute it and final is going to write back the result minna-san idea of real CPUs the current Intel course of about fourteen stages the bulldozer 16 to 19 and then Pentium 4 had 20 which is going to be remember that Pentium 4 was not known for being speed demon that's one of the reasons so we talked about a pipeline stall that's essentially when large portions of this pipeline are empty in an ideal case every single stage is going to have part of an instruction in it it's going to be an efficient factory pushing as many parts through as it possibly can so let's say we've got a few examples some instructions going through here so we've got this blue instruction that's currently in the dispatch stage and this green instruction that's over in scheduling let's say that the green instruction is in some way waiting for the blue instruction to finish so it can't actually like it can't actually run until the blue one is done so we're gonna step through and the blue instructions going through the pipeline but the green one is stalled and everything behind it is stalled they're not moving and it's still executing the blue guys still moving through there nothing's happening finally the blue guy goes through the pipeline we've got this huge empty you know three three stages in our pipeline that haven't had any work done on them that's three cycles of your CPU that have just been completely wasted finally that instruction is done and now the instructions can be moving forward so again this this will happen a lot of math code tight loops that you might have over your physics your physics bodies for instance this can happen a lot if you're doing a lot of we're branching or we have all these conditions to deal with with virtual interfaces or trying to bounce back and forth between physics and graphics you end up with a lot of these stalls and they start adding up you know a good portion the time your CPU is executing it's not actually doing anything certain modern CPUs like the i7 has all kinds of tricks in its bag and were to try to alleviate this problem because it's so common in your average application hyper threading is one way there's the the CPU will actually reorder your instructions if you're looking at the assembler output of your program that's not necessarily the order the instructions will actually get running because a CPU has a ton of silicon or it tries to figure out and see well maybe this this instruction this came in here isn't dependent on any of the other ones so we could have put that ahead of these guys and let it run through and filled up the pipeline a bit more on your modern i7s this exists but if you're coding for say a phone these features aren't necessarily there because this this kind of intelligence and the CPU eats up too much power too much silicon space so it's important for you to make sure that you have very efficient algorithms and that they don't design plays into that by making sure you know this nice tight very compact loop that loops over a single algorithm at a time isn't trying to bounce back and forth between algorithm a and then algorithm B and introducing all these pipeline stalls the other big thing that data design can help alleviate is the weight state the weight state is what happens when your CPU is meaning it needs to access some memory and that memory is not currently in one of the CPU registers so we like to think that Ram is fast we've got all these you know super high speed RAM sticks we buy these days it's actually not that quick at all there's there's actually faster levels of memory but they're smaller so if we have objects that are spread all over throughout memory because we've been using standard new or we've got these really large objects that have all kinds of extraneous data in them then it's gonna make things run slower because that actually accessing ram is not cheap your modern CPU runs way way faster than your RAM does so a access to random memory to your RAM can take potentially dozens or even hundreds of CPU cycles the service in the worst case I believe on DDR the average ddr3 stick these days on an i7 it's somewhere around 180 180 cycles so if you try to access something in memory and it's not already on your CPU your CPU is sitting there doing nothing 480 cycles again ignoring the fact that there's they the i7 does to try to alleviate that to a certain degree but worst case scenario you have just eaten up a ton of cycles that your your processors not actually doing any work and then your performance arts suffering pretty noticeably in larger triple-a games this can actually be one of the largest bottlenecks aside from graphics so a couple things to understand about how the memory works and why this the slowness exists the RAM which is the main sticks of memory we open our computers these days is dynamic random-access memory it's essentially made out of it uses capacitors internally so there's this there's just delay every time it needs to read or write data it has to wait for a capacitor to charge or empty out the the memory also has to pause certain parts of itself occasionally because the capacitor slowly leak and it has to go ahead and fill them otherwise it'll start losing memory so there's all kinds of slowdowns that happen in DRAM we ended up using it however because we can make really really large banks of memory in DRAM it's very compact and it doesn't use a lot of power because it's using the capacitor so a lot of times when your RAM just kind of sitting there it's only using power that's being used or refreshed long story short it is necessary to have these multi-gigabyte machines that don't draw too much power they're used today it's worth noting that the RAM organizes itself into banks rows and what's called a row bank cache and you can generally think of these as the different access levels to the RAM itself there's a lot of latency with D Ram there's we want to avoid hitting ram as often as possible essentially there's also SRAM so older computers if you guys were around say maybe 20 years ago their SRAM was our main system memory these days you'll generally only find it inside of your CPU cache the thing with SRAM is that it is extremely fast compared to DRAM it is however very power hungry in order for a single bit to say set it requires a constant stream of power to that bit in order for it to stay the way it is so it's it's more power-hungry and it's it's also something more complicated to build so you can't fit as much of it in a single area so you can't get a 4 gigabyte SRM stick you can get maybe maybe six or eight megabytes which is what you see in a CPU cache and stick with in power envelopes so yeah you only see this in the CPU cache nowhere else but it's much faster than DRAM which is important so the CPU cache can be used to prefetch data so we've got all this memory at this Ram that we said was slow we've got the CPU cache the point of the cache is that it is much faster to access in the RAM so if we're looking to acts as a piece of data and it's in the cache it's fast if it's in RAM not so fast we can prefetch data into the cache so let's say we've got a bunch of objects we know that we're going to update all in one go we can tell the CPU to start reading those objects into cache and while it's doing that start working on the first objects that's come in so we're doing work on objects while the CPU is streaming in more data as it goes on this this gets rid of that weight State so instead of us reading on an object and then waiting for the next object and then processing the object we're doing this all in tandem all in parallel however in order for prefetch to work it requires that all of our data be laid out in contiguous memory if our objects are spread out in three different places we need to prefetch each each of these objects individually worse because the way the data structures for these systems are usually laid out with linked lists or some kind of tree we can't actually just read the object in order to know where the third object is we first have to read the first object to get its pointer to the next object read that object in and then grab its pointer to the next object and now we can finally read in the third object in the list so there's a sort of one nice prefetch and we keep running we're doing all these prefetches over and over and each one we got to wait for the the first one to be serviced there's just all kinds of inefficiencies in that system if instead all of our objects are in a single array so they're all together in memory we can say hey CPU read that whole memory I read that array in prefetch the whole thing while I'm processing it or prefetch it and chunks at least that fit in cache it's gonna be way way faster noticeably so in some cases so a quick idea of what this looks like memory access in this system goes all the way down to the CPU registers which are instantaneous essentially one cycle to access these through up to three levels of CPU cache that have different levels of performance generally CPU cache little one is very fast but it's very small there's maybe 64 K on some CPUs whereas level 3 is maybe all of 8 megabytes uncertainty abused these days then we've got our system memory and which includes a row bank a row bank is kind of like a CPU it's a cache for the RAM that helps with sequential reads so again if we have everything in an array and we're iterating over that that array the row banks are gonna make it really really fast to to keep reading from a sequential area of memory but if we're bouncing all over memory then the row bank isn't doing anything for us then lastly the very top is networking file system access I'm sure you all know that accessing your disk is a lot slower than accessing RAM that's why we try to preload and streaming data but when you're when you're thinking about design of an algorithm or design of a data structure or performance of your game you need to keep this in mind you really want to keep as much of your data down here as possible the data that you're actively working with because the the higher up in this hierarchy that your data is the slower your programs going to run the more often your programs gonna be if the CPU is gonna be in that weight State and just spinning its wheels not really being able to do anything if you've ever heard of the cache coherency or cache friendliness of an algorithm or a data structure this is what it's talking about how easy is it to keep that whole data structure in cache and iterate over it and update it all in one go without having to bounce back to system memory all the time so it's really important thing to keep in mind so allocating with new spreads the objects all over iterating these means are gonna be they're not in cache you have to regrab them over and over again significant performance penalty for doing that so try to avoid that so if you're using the standard naive loop model for game objects who might have something like this we each one these boxes represents an individual memory allocation essentially so there you can see they're kind of all over the place the graphics components there they're nowhere next to each other in memory if I want to update all the graphics components of one go I'm gonna have to access this block of memory and then that block of memory I'm iterating over the game objects to then iterate over the components it's gonna be iterate over game object 1 then go through its components it's just me bouncing over the place we're gonna lose all those benefits we just talked about yeah so we really want to avoid this in cases where the performance matters if we're using data we're into design now we're thinking about how can we put our data in our data structures together in memory how can we structure things to be more fish and for the way CPUs worked so we're gonna put all of our graphics components in a single block together we're gonna put our physics components in a single block together so now when we say hey it's some update graphics it's time to do all of our drawing we can this axis is one block of memory and everything we need is there we can iterate over it and there's we can prefetch there's make me wait States I think is gonna be a lot quicker noticeably so if you have a lot of graphics objects so there is this fantastic paper that goes over everything I just went in excruciating ly higher amounts of detail but it is something that every last single one of you as game programmers should know I mean I would argue you must know this so the article is called what every programmer should know about memory it's written by mr. Draper you can search for it you can find this PDF pretty easily absolutely you guys need to be reading this as soon as we run this talk you should go look it up and take a look through it so data weren't your design is not necessarily just about performance it is also about simplifying certain aspects of your engine design so when using object-oriented these interfaces and abstractions and in directions in order to make things more flexible and make these reusable components and there's this very famous quote that comes up quite a bit all problems in computer science can be solved by another level of indirection however those of us who care at all about performance or work and large projects are very very much aware of the second half of this you can't really solve the problems of too many layers of indirection by adding more indirection and that is an actual problem is what I'm getting at here to Miller has been indirection can make your program more complicated I'm sure you've all been in a case we were trying to look at a project you're like well what does this function do you look at it and it's calling into some data structure and you look at that it's got a pointer to another object and you look at that and that's calling some method somewhere and you follow that and that's going to some virtual interface and you have no idea where the implementation is you're like I have no idea what's going on this system is entirely too complex I can't fix bugs I can't reason about its performance this this is no good so the system might very well be extremely flexible but it's not understandable by anyone except the person who wrote it and even then it's not understandable by him four years later so they don't need to design with this focus on plain old data will help alleviate that and it gets you thinking again about simple more simple layers of abstractions or removing layers abstractions they don't strictly need to be there so the yeah the encapsulation abstract data types all of these just kind of lead to these these horrible sets of layers that are that are not comprehendible focus on plain old data it's not it's worth noting we can still have abstract data types and by that I mean we have standard vector and C++ where it's got this this behavior you can push and pop items to it it grows and it resizes what the actual contained item can be anything you can still do that with data or into design there's nothing that says you can't have a standard vector or a standard map of any plain struct or any plain piece of data in fact it's actually a very useful thing and it good or a data oriented design will actually make heavy use of this for performance reasons it's a lot better to have a nice well-written well-tested resizable array implementation than it is to kind of ad hoc things all over the place make sure you get better performance and more finely tuned algorithms it's worth noting that the the the Daedric design can be cleaner to some people's definitions of clean removes a lot of the the cruft that builds up it does tend to be less flexible if you don't have these polymorphic interfaces these virtual functions and these abstract layers of indirection that means every time you want to make a change you have to go in and actually change your what the code does so if you're focusing on the iteration of your game concept data works your design can get in your way if you want to make it easy for designer just pop in create a new component attach it to a game object this this can potentially make it a little bit harder that said keep in mind that you don't have to have just they don't to design or just flexibility you can mix and match in different parts of the engine where it makes sense so and again be aware that when I say they don't redesign can create cleaner engine but is a contentious term one man's clean is another man's horribly ugly so use it appropriately and don't don't get in that habit of being the guy who runs around that says they don't work to design is the cleaner way of doing things that's gonna get some trouble here and there likewise do not be the guy who runs around saying object oriented is the only way to go it's also neither of these are true the real world is not binary only computers do that mix and match as appropriate so just kind of help you get an idea of the difference I cannot possibly put the actual code samples and slides even the short one is is too much but the guys who do bit squid recently released their foundation library on bit bucket and it's really instructive if you want to understand the differences between a data oriented design approach and a more traditional object-oriented approach for a resizable array the bit squids header which includes implementation is 162 lines long as opposed to Electronic Arts game oriented STL where it's standard vector is 1623 lines long that is a a pretty large difference in terms of the amount of code it took to write the amount of code they have to test and the amount of code you have to understand if you want to go here and modify things it's worth noting that performance wise these should be more or less identical they are resizable arrays they're doing more or less the exact same thing but in terms of simplicity the bit squid foundation array it only has some of the basic ops it does not have all of the stl functions like the ei STL implementation does it only supports plain old data so for example when you resize the array in the bit squid library it just uses memmove and resizes things it doesn't call constructors and destructors on the objects in your array so if you have a smart pointer or a class it needs its constructor or destructor it's copy constructor run it won't handle that on the other hand if you're using plain old data it's gonna be a lot faster in theory a well written STL can actually optimize that the same but for some of your more advanced data structures it can help and then it's also worth noting that because it is a data oriented approach it is not going to be compatible with a lot of the same STL patterns there are beginning and support you can use some of the basic iterator algorithms and the bits cool library but it is different it's not going to be super compatible to STL so that said it might seem in some cases like the bit squared approach is simpler it's much smaller code it's a lot easier to work with it's very easy to understand it's it's very very simple the other hand it is missing a lot of features it doesn't have safety for types that need the constructor destructor so there is a strong case we made that the EAS DL approach will actually be more beneficial in certain use cases it's really up to you to figure out which one is going to work better for your specific needs of your specific project I encourage you or discourage you from going either either approach so I would definitely recommend googling both of these checking out the code and you know coming to your own opinion particularly implement something like this yourself and kind of get an idea as to which you feel is better for your particular style and for the people you're working with in the project you're on so using data or to design in a game engine specifically rather than just the abstracts so first thing I want to note is especially as student or hobby projects your performance is not going to be solved in most cases but they Dorn to design the vast majority of performance issues I see are people using the GPU api's and correctly you're gonna get most of your bottlenecks in places like that optimizing how your game objects run is gonna get you very very little net result until you start dealing with very large games with very large numbers of game objects or you're running really complex systems so even what's triple-a games don't use they doing to design for their core game objects there are however certain places where data oriented design is extremely important and you want to make sure you are using it you probably have already been using it didn't really realize it but if you are writing your physics and a hardcore physics engine that has constraints know that you are almost certainly using a form of data oriented design trying to make it run efficiently without that it's not going to work very well so even if you are using in a triple-a game engine that isn't even using components because they typically have a an external third-party library that is internally organizing all of its objects in a very efficient manner and the components are just kinda like this veneer over the internals of that library for graphics and particularly particle engines you're dealing with possibly tens hundreds of thousands of objects and if you're creating an individual object for each of these and calling new on each of these and our virtual methods to update each particle individually that's not gonna work very well um any of you who have tried that probably have learned that already because you haven't I don't recommend trying it it will just be a waste of time go ahead and use a date oriented approach you're gonna have this nice tightly packed array of your particles you're gonna iterate them in a single loop that iterates over them once that's gonna be a way faster the same can go for calling and batching if you keep your data structures tightly packed and without all the layers of indirection you can handle thousands of objects very very efficiently that you might otherwise have a lot of trouble handling there are also parts where data or to design is not necessarily the best approach because it removes the flexibility that you want for game iteration all of your game logic anyone who thinks that there's an architecture for game logic has not written of games this will always be a hairy spaghetti mess of nastiness you want to focus on making it easy to change that nastiness rather than making it optimal it will never ever get there ai some of the lower level AI arc apply your pathfinding system in a I might you know you can actually make a lot of use of data rotate programming which are higher levels of AI not quite so much and also when you're building your tools don't worry about the performance of them for the most part live together in c-sharp and just make it as easy to build things as possible the only place where you probably care about optimizing your data access is if you're writing say a light map generator you probably thought to be as fast as possible your your artists will appreciate being able to generate light Maps in a minute versus four hours yeah if you're working optimizing those things certainly go back to data learning to design I think about you know how can we pack these things in memory tightly I can make it very very efficient to iterate over our data structures and you know how can we make this this algorithm efficient same thing for large-scale batch operations so there are some existing game engine designs out there that focus entirely on data oriented design one that's come come to popularity fairly recently is called entity systems there's a couple articles here one of the original ones where I believe coined the turn entities entity system is on this any systems of the future of MMORPGs there's a very popular I believe Java framework called Artemis that a lot of people are using to figure out how any systems work so the key goal of an entity system is that a component purely has data it is plain old data it fits entirely within that part of the in order to design there's no logic at all and the and the entities are your game objects are just an identifier they're not some object floating around that has a bunch of pointers for their components if you have gate if you have a game object you created you give it a unique identifier like three and now you can identify all the components that belong to game object three but is looking up component with game object ID three three individual component managers so then the nad system the system part of this is that the code that updates a component is part of what's called a system so for example you have a physics component you have a physics system the physics system also manages all the components and internally it's just gonna have an array with all these components that's gonna have some mapping to go to given a particular entity ID find the corresponding component but that's pretty much it then the systems are designed to be completely separable from each other so either way to think of this is for example with physics again you make a physics system whatever your game object says I need to allocate a new physics component says hey physics system create a component for game object number four it adds it to the array adds the data and it's there and then every time with your game loop you say hey systems update yourselves physics updates graphics updates all in one go possibly in parallel if you're using threading using that particular threading model it's really it's really simple there's not a whole lot to it if you look up the Artemis code which is in Java nice and easy to read there's almost nothing there it is a super super simple design most of the work they put into it is the different ways of managing a system rather than managing the components because components are so simple and it also can parade quite a bit of speed in that regard however I I dislike the entity system approach because it does force you to make all of your components in this this did oriented design fashion without the logic put in there there's there's no way of saying I want to have this this OP this component that is nothing but logic its whole purpose is to have some virtual function calls implemented so we can differentiate between the behavior of this object versus versus this object so a strict and any system to me is gonna be focusing too much in the on the performance of things where performance doesn't matter and removing your flexibility in the areas where your flexibility your iteration time is really key so what you'll find is most most high end component based engines are gonna take kind of a conservative approach they're not gonna have the naive approach for all components are allocated individually they will use some data oriented design in the components where it actually matters so one of the key things is you do want to focus on data only components in any component that doesn't need logic doesn't need methods implemented in it as any virtual interfaces don't put them there as soon as you have them you're now kind of tied into using you know this this extra layer of abstraction and all those problems whatever before always with any component system try to use data components as much as possible each component has an Associated factory which is very similar concept to the system in an entity system the key thing here is that the factory for a particular type of component can be overwritten or changed so we can say the factory for the physics components are going to allocate everything in this this memory pool so they're all tightly packed together we can iterate over them did or any design for our some of our other components that say we only have one or two of in the system it just uses a standard new malloc allocator doesn't really matter where they're at there's a lot of them I mean for certain other components maybe we allocate them using a different allocator that tries to say put all of the logic based components close together in memory so that we can just iterate over all of the logics for one game object all the logic components one game object in one go and just kind of have this one loop over all logic essentially for the game that kind of allocators a little more complex but not overly so so the advantage right now is that we have a semi data oriented design for logic that will still keep things a little bit better and there's a memory access we've got flexibility to override things when we need it and not even worry about performance in cases where did flat out just doesn't matter so systems systems manage updating of their own components so only the logic components have the update is gonna be the key part here your data components don't have update there is no loop that says run through and run update and every component that's solely unnecessary only components that have logic have this which he can implement C++ by having two types of components you can have logic components and data components in keeping a list of just the logic components separate so we're using data and design where it makes sense and we used more traditional logic oriented programming where that makes sense and that way we get the benefits of both these systems without having our hands tied mean locked into one one method or the other so before certain certain aspects of your game are probably already using a date or design even games that have no component knowledge whatsoever they don't use a component based approach can still make use of this in certain ways so the physics systems if you're using havoc or something like that its managing all that sone objects internally and all your component or game object does is just kind of let havoc know that it needs to create certain objects or update certain objects and then you just kind of tell the physics library to update it's the state of the world and it's doing everything as efficiently as it possibly can they're doing everything internally and this is kind of an object oriented approach here the system is abstract instructor way about an interface but internally it is using that data oriented approach so a couple notes just finishing things up here most importantly profile code I see I've seen more than a few entity systems in particular people have written these without really understanding what they don't want to design is or what the purpose of it is or what the purpose of an entity system was all supposed to be and they actually just made these really bad any systems that were still having objects strewn all over memory and they they had the managed part you know so the system manages all the components of its right type and like this is this makes things easier to update but it has none of the performance advantages at all and there they're just losing that flexibility of the you approach with that really not really gaining anything so if you're going to write code and you're thinking performance is the reason you're writing and actually profile your code actually make sure what you're writing is actually faster run some tools that there are tools that will check cache misses make sure that your your data is being accessed in a nice friendly way they can give you warnings about data structures or pieces of code that are violating some these concerns actually use these don't don't just assume that just because you thought you were using a system that somebody's son was performant you're actually getting a good one and second of all and most importantly keep it simple don't don't think you have to go through now and rewrite a bunch of your game engine to be data or to design because I said that the way you're doing your game objects was the naive approach and they're super slow if it's not actually causing a performance problem for your game don't worry about it focus on actually making your game focus on more flexible code that's going to make it faster for you to build up your game especially with the time constraints we have here at DigiPen you know use use the the the topics we talked about today when you're noticing a performance problem we're in some of those systems like particle engines if you haven't written yours yet you know take that into consideration you're running those things what actually matters so the rest of your engine keep it simple at least amount of code possible get things up and running move forward to make a great game and that is it do any questions about anyone over where'd you start with component-based design is the question so we've actually had a talk here at the club earlier first the videos that online I do have some slides up on that there's also a great article called evolve your hierarchy that is it was one of the original blog posts that went over what components were why they matter what they're all about then there is also a great article by an ex gas powered games developer Scott vilest I believe is his name I'm one of the incorrect but Dungeon Siege is one of the first commercial games actually use a component based architecture and he wrote this fantastic set of slides and part of a research paper and how it works why it works actually goes over some of the same topics here but how do you how to make components work with multi-threading worried about performance and some ideas on how to build something like that into your game engine so I definitely recommend checking that one out I don't remember the name of it all top of my head but if you search for a gas powered games component based design you will probably find the paper give me other questions all right thank you [Applause]
Info
Channel: DigiPen Game Engine Architecture Club
Views: 42,559
Rating: undefined out of 5
Keywords: DigiPen, Game Engine, Game Development, Sean Middleditch, Data-Oriented Design
Id: 16ZF9XqkfRY
Channel Id: undefined
Length: 41min 45sec (2505 seconds)
Published: Mon Jan 07 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.