OOP Is Dead, Long Live Data-Oriented Design

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning my name is Tian and first of all thank you for waking up early to come to this talk today we'll be chatting about data oriented design and I decided to do this talk because when looking at all the resources all the talks all the blogs on data oriented design I was unable to find a good example of people showing a real-world production system made with data oriented design comparing it with object-oriented programming and outside of games so I've been in the video games industry for around 10 years now and I've been doing games I've been doing mostly however game technology I work at a company called coherent labs where we create the technology for games so it's very likely that if you're playing and if you're a gamer your machine has executed some of the code we'll be looking at for the past six and a half years we've been working on products based on chromium WebKit and now our in-house proprietary game UI and the browser engine if you're not familiar with the terms chromium is the open source project that is at the heart of the Chrome browser of slack Skype and many other applications that are probably running on your machines and WebKit is the original project that the core of chromium is forked from and works in in Safari on iOS on Mac so if you're an Apple user you're using it as well so we had this successful project and around two and a half years ago when I was working on optimizing and improving the software based on these technologies the whole time I felt like I was in a cage and that cage was the architecture of WebKit which is also the architecture in many ways of chromium of that of those layout engines so we gather up with my co-founders and did the only sensible thing of course which is to start from scratch leave all that behind and build our own in-house browser engine for games and there was a very big question for us can a browser engine be successful with date oriented design we didn't know that we had successful projects with date oriented design our graphics engine was built with that we knew about a lot of games that have been successful by employing the design but we didn't know of anything with with the scale we were looking at and we didn't know if it scaled to a software that is very object oriented in its core if you look at all those open source projects they're all object oriented and this has many reasons it's part legacy it's part the way that the standard of HTML rendering requires you to do some things so to see how that went let's do a very quick demo ok let's run a test page that I did with chromium this is a build that I have here okay it's a very simple example these are three thousand rectangles all that I do is they move left and right and they changed al-qura their color and at the center there's a big red rectangle whose only idea is for us to gain an insight on the performance so if you're a keen gamer or somebody that works a lot on performance you already see that the frame rate is not so great and the exact number is it's a bit difficult to see it's around 13 15 ish frames per second so to zoom that a bit I'll use a very old-school method of doing this I'll paste it in paint and zoom it and the reason is that the windows magnifier actually reduces the performance of the windows presenters so we will skew our test if we did it with the magnifier so we get fifteen point seven frames per second it's very far from the sixty frames that we would like to have for a smooth animation okay let's see what we did okay if you look down on the on the right we got around 50 frames actually in my hotel room it was 70 but who knows what's happening so we have a significant improvement on the performance and the answer to that question happens to be yes we're successful and we'll see how how that how that was implemented so today's agenda is we'll be looking at the basic issues of object-oriented programming now just glance at the things that we believe can be done better we look at the basics of data oriented design there are a lot of resources I have a final slide which references that you can look up later the idea for today will be to design a specific system to look at a design made with object-oriented programming then to redo it with date oriented design and see the difference between those two what's so wrong with object-oriented programming why people have a problem with it I believe that a picture speaks a thousand words so this is an inheritance hierarchy it's a bit difficult to follow right so you get diamond inheritance and whatever so object-oriented programming at its gist marries the operations with the data they're done on and it's not a really happy marriage because in the end you end up with those giant objects with a lot of unrelated data in them that you poked in different parts of your software if we imagine a game and you have an enemy you might have a data about its physics some data about its AI some data about its rendering and so on but every time you access that object for instance to draw it you're actually polluting your cache and getting a lot of data from the other stuff that you're not using right now and object-oriented programming also has the feature of hiding State in many places and by hiding I mean that you have all these boolean all these hidden ifs in your in your methods that in the end increase substantially the complexity of your code I know those things have a real impact on performance scalability modify ability and testability and this will be the for quality attributes will be analyzing data oriented design takes a different approach it puts the data first you can you can imagine it sits in the title so if we look at the lower side of our slide in object-oriented programming you you usually have this logical entity with all its fields all the stuff that logically belongs to that object in data oriented design we do a bit of a different approach we separate the data according to its usage so we might have data a to see data a with fields a to see that it used that it that is used in system alpha we get data B from fields with from D to F we use that in alpha and in beta and we transform that data to a new collection that it you that is used down the pipeline so the gist of data oriented design is to separate the data from the logic we try to use the data where necessary without polluting our caches and trying to keep our logic simpler so we embrace the data we don't try to hide it we avoid hidden usage whose how it's an interesting technique we try to avoid virtual calls because very often we just don't need them and date oriented design is one of those things that at its core are simple but are very difficult to master a bit like the game of Go it promotes domain knowledge you cannot separate well your data and build a good system if you don't know that data if you don't know the problem that you're solving and the system you're designing and as I mentioned there references at the end of the talk for for additional information so let's design a system and the system I have selected is a system for animations from the browsers browser engines that I showed earlier and to give you an idea about what is possible with with such a system this is a very small example from from one of our samples but it gives you the idea that with animations you can do a lot of things in CSS so the example that we showed earlier for the performance was just moving rectangles and changing their color but you can do more complex things like transformations moving text opacity and so on so you can modify different properties the details of the of the system are not that important it's more important for us to to gain an insight on how to create it the definition of the animation is again very simple we have textual definition we have our key frames everybody who's doing animations know what the key frame is it's I want the animation to start here and then here and has this property here and end here and we have the duration of the animation in a real world system that there are other properties obviously but these are the basics and the others don't change to design that much we already saw that we can modify different types of properties and this will have a real impact on our design this means that when we implement this in C++ we'll have to interpolate different types of C++ structures or objects or whatever we can have numbers we can have colors we can have transformations and so on and another important design is that this definition is not static we want to allow the user at runtime to modify the animations so let's try to implement this with object-oriented programming and for the example today I've selected chromium and I don't know how many people are familiar with the chromium code base but I really really encourage you to go and study it it's an incredible piece of software done by one of the best engineer one of the best engineers in the world so there's a lot to learn of course it's a giant code base many millions lines of code but when you get the gist of it is very cool so chromium actually has two animation systems we'll be looking at just one of them the other is similar in its in its design so all the things that I'm talking apply to that as well and it very closely follows the HTML standard so that's why probably they have this object-oriented design in the first place as its object oriented and we are creating animations what do we expect to have a class called animation right that's that's exactly what I looked up in the code there is a class called animation that's great but wow it inherits six other non-trivial classes well at least it's final so we we know that there are no others down the line and you remember the the picture that I showed at the beginning about the inheritance hierarchy it actually it's actually the inheritance hierarchy of this class so hope you get a lot of a lot of inheritance it's already difficult to follow right ok so to gain a better understanding of the system we'll be looking at the most common operation which is ticking or updating an animation that is from the time and the definition to generate the new value for this frame of the property that you're that we are changing the new color or the new position of our elements and we'll delve into the code of chromium it starts very as we would expect there is a method called service animations it takes time at the second line it takes the current time and it calls update on every animation pointer that is in our collection there is a collection of animations needing update so clear so far however we already see some some strange things happening they are not walking over the animations needing update vector they are copying all the pointers and walking over and iterating over another temporary vector and the reason for that is that in this call stack but also probably in other call stacks they are deleting from the original vector so we have some unclear lifetime semantics it's very difficult for us now to know when those animations get created and most importantly when they get deleted if they're writing this code there is a reason for that so let's look at the update method what what's happening right there fairly simple it takes the reason but the very first line is the hidden state I was talking you about if there is no time line which is some point or whatever they return false so the animation will do nothing in some condition but we're already paying a big cost just for this if because the animations are allocated on the heap so we're just touching that field just to see if it if it's set or not and we are paying for a cache miss and we might be having a lot of branch mispredictions as well because if we have a good mix of active and inactive animations the branch predictor will probably get confused so we are paying some formance cost and we're exponentially increasing the states and the complexity of our system okay so if we have this content which is another condition what do we do we delegate actually the animation to another class which is held in the Declaration of our animation as a member so chromium this garbage collection in C++ but you can think of this member type as a shared pointer so they are we are again paying the cost of a possible cache miss and this animation effect read-only is actually the definition of our animation all right so I'll be skipping some of the cost acts I had a lot more slides but had to have to remove them for time constraints will be skipping the cost acts and looking just the interesting parts next up we're going to update our value so we skipped a bit and we see some interesting stuff that happens a lot in object-oriented programming in some cases the animations have to call an event happens so if there is an event delegate will call an event condition and here we have brought coupling between our animation system and the event system moreover we have no idea what the on event condition will actually do so it might call JavaScript which will destroy all the hot data that we have in our caches it will have to load all the cold data for the JavaScript execute whatever it's doing and then you have to reload all the data that we just threw away so a possible very heavy performance hit there are probably also some instruction cache misses because we don't know where this code is going in our giant source so we have calculated all this and comes an interesting part that I mentioned at the beginning we want to be able to interpolate different C++ types because our animation could be on a color be on on a number whatever so however we still want to keep that in our system somehow so how do we do that in in classic C++ classic object-oriented programming we want a method that is different depending on on on what is on the type that we want to interpolate and they have different sizes anyone yep virtual functions an abstract class that's exactly how they do it so there is a class interpolation which is abstract and it is inherited for every possible type that can be interpolated and the virtual interpolate method does the magic unfortunately this type of dynamic type erasure in C++ is very costly to do you have the cost of allocating all those interpolations and in the in our example earlier that's six thousands animation so six thousands of those and you have the very likely cache miss when you call the call the virtual and a very likely instruction cache miss depending on the types of properties you are animating finally we have calculated the new value the interpolation that is magic we have to apply that new value to the properties to the objects that are being animated and again we do it in a very straightforward manner we have a member a member variable target which is the element that will receive the new value and again we bring some coupling between the system's because now our animation system whose only job was to calculate new values according to time knows about the document object model about the styling system so that it can call the appropriate method of the element in our case this method is called set needs animation style recalc and if we look at the implementation we'll see something very very scary this is again a symptom of Deut oriented programming I believe where you come a method but as it is a black box you have no idea what's going on so you might be you might end paying a very big cost on that and we are indeed paying a because because set neat style recalc which is called there actually walks up the document object model tree for every element and sets a boolean and on each call up this tree will very likely be getting a cache miss so we were doing simple animations now we're walking up the Dom tree we're again changing the context of our program so to recap very quickly to do our animation we've used more than six non-trivial classes the objects contain smart pointers to to the other systems they are coupled with the other systems from the interpolation we use abstract classes and we're basically drinking the poison of object-oriented programming all the way all right now let's try do it in with data oriented design let's clear our mind forget what we saw and return to the drawing board so let's design our system around the most common operation and this is the operation of ticking of updating the animation it happens like 99% of the time we have other operations obviously like adding removing posing an animation whatever but the most important one the one that defines our performance is the pic for this tick we have two very simple data as pieces of data is input we have the definition obviously of the animation and we have time and as output if you think about it we also have some very simple piece of data we need the properties that have changed we need the new property values and we need a mapping the elements that we receive those properties and again as a cornerstone of data oriented design we we'll work on this system like there are many animations with object-oriented programming and and the thing that we saw earlier it works as if there is just one animation so you call update on the animation it does what it does but this is not what's actually happening you don't have one animation you have dozens hundreds even thousands so how will our tick work oh it can be done very simply let's create a class called animation controller or whatever name we find interesting it will contain an array of our animation States it will take time as input those states are generated from definitions and it will output exactly what we need an array a vector of the changed properties an array of the elements that have changed with the links back to to the new properties and here the element pointers are just dumb pointers for our system it doesn't know what they are it doesn't directly call to the document object model with that data as output we can leave it to the next systems in our pipeline to decide what to do so the events system can walk these changed properties these changed elements and decide to call the events who are necessary the styling system can walk over these elements and decide what to do knowing their new properties so we've reached a lot of decoupling a lot of separation of concerns how do we build those animation States well it's very easy again go flat forget about methods just have plain old data style struct we told the data that we need the details are not important this is the runtime data that we need so the start time the post time of the animation and here's a bit of a twist we also have all the animation definition copied for each animation state and we did this for two reasons the first one is that empirically we found out that the is better this way and second it makes some operations very easy imagine you want to change the duration of a certain animation if the definitions were shared pointers you would have to implement some sort of copy-on-write or other idiom - - to do that but now that they are separate you change just the property for that animation all the others are completely unaffected so think about how that might help us for instance doing this in multiple threads okay so we have these properties however we haven't solved the problem of having to interpolate different types of different C++ types remember they were doing it with the abstract class but we don't want to pay the cost for that so to avoid type erasure C++ has a very good way to do it statically we can use a template we can use a template because we know exactly all the types of properties that we'll be modifying there are no new ones so we now have different different types templated by the animation keyframe and we would like to iterate over them to run the animation however obviously they have different sizes in memory so we cannot put them in a nice neat vector and walk walk it over but as I said we know each one of them so what we can do is very simple let's have a vector for each type now they are the same size so we have as many vectors as types as needed and when we want to interpolate them it's very simple we go over all the types a all of the vectors of that type then we go to the second one then we go to the third one so we are now having an order of magnitude less cache misses than we had with the object-oriented case the reason is that we'll be having a cache miss when we change the type of the vector we are iterating on however and this is again a bit of the main knowledge but we know that we usually have a handful handful of properties that are really animated so we'll end up with a bunch of very large vectors that we can linearly iterate and the prefecture of the CPU will do its magic and will will be will not be paying the cost of cache misses and we'll end up with a lot of vectors that are simply antion and we'll skip over them so instead of having one cache miss for each animation as it was in the object-oriented case so when we're actually having one cache miss for each type of animation which is an order of magnitude less actually in the performance example I showed earlier there were 6,000 animations because there were 3,000 rectangles but they were changing the color and the position so these are different animations so in the object-oriented case we got 6,000 cash misses each frame in this case we're just walking over the color and the position so we'll be getting just 2 cache misses probably and on the implementation level it's very easy we can template our tick animation function according to the property and just have the same code running I mentioned that object oriented programming tends to hide a lot of state we saw that with the activeness of the animations so you can have some that are enabled some that are disabled and in the easiest way of implementation you can have a boolean is active or in the case of chromium it was if there is a timeline which is basically the same thing and do if this do that and whatever and in our system the activeness in activeness is the most important state the most important boolean that we have so we can employ something that we call existence based predication we just have two arrays one array is for the active animations one is for the inactive Nations the act of enabling an animation means just moving data from one to the other or vice versa in this way when we iterate and have to apply an operation on our active animations we know that there there is no hidden state we know we can apply the exact same code the exact same transformation on all of them this is a bit difficult to do for every possible state that you have so you can prioritize by the most important by the one that has the most volatility or you can try to reduce the number of states so to look a bit at the code that that's final the details are not that important but if you look at it it's very simple and and in many ways beautiful because there are no ifs there are no branches and you can read it like like a book so I mean we interpolate the key frames we interpolate the values everything works as we would expect and this get interpolated value function that I have highlighted is just a template that will do the right thing and call the right function for the property we're changing and though this code can be safely now in our C++ file it won't pollute with templates all over the place and will also keep the code together very likely in our final executable so branch so instruction cache misses are very unlikely okay however our system is now working it's very high performance we're very happy with it but we want to add an API to the user and we want it to be a convenient API and here object-oriented style of classes with with methods and so on is really really convenient you can give the user just an animation object and she can call play pause and and whatever so we want to give that to the user so let's create our animation object it's now is the time it has all the map that we would like but contains only one simple piece of data and that's animation handle just an ID and the first all the operations to other functions the animation controller so our animation class is very done but it gives this convenient API to the user while we can still rework our internal data and make our system as performant as possible and we're using the simple handle to simplify the lifetime of those objects because now we know that the lifetime is completely controlled by our controller there is no shared ownership over this data and to implement the Dom API in the controller is very simple you just have all the methods that take the ID so to quickly recap the concepts between our P and data oriented design that we saw in the chromium case we were using an inheritance of six classes and all the logic was cramped in this animation object and it's it's links in DoD we did a very simple flat animation state struct templated and we had other methods that worked on that we used a list of dynamically allocated interpolation so abstract classes but in C++ you can do better you can use templates and if you know all the types that you're having you can just have an array for each one of them will be used boolean flags for activeness and again by Flags it's not just a boolean it can be in the presence or not of a pointer whatever and in data oriented design we used different arrays for the different states and to give an API to the user with object-oriented programming we directly gives the the API of the class however our user just wants to play on animation and immediately gets like 100 that doesn't know what to do we solve that by by giving a very clear API only with the stuff that is needed and finally with object orientation you reach out to the other systems in the program directly with data oriented design we just output tables with the new data and let the other guys in the in the pipeline decide what to do with them so the key points of data oriented design try to keep your data flat try to use existence based predication to get rid of this of this state all over the place try to use ID based handles and to reduce the pointers and to simplify the lifetime of your object and try to use table based output because this table based output will actually be the input for the next thing down the pipeline let's analyze how we did okay the first thing we'll be looking at is performance and I'm I'll be looking at the performance of this system it's six times faster so this is this is a hidden treasure for us we did not change the algorithm we did not invent some very clever way to do animations we're executing and doing pretty much the same thing only by reorganizing how the data is used how the code is structured we've gained six times better performance on this system and think about it it's very likely that you have such systems in your code bases as well and the treasure is waiting to be found next up let's let's talk about scalability how could we hypothetically scale those systems and here by scaling I mean a very simple thing making them work on multiple threads well in the object oriented case it will be very difficult because we'll have to account for all those dependencies for the data races that are very probable in the system so we don't know when we say to the target to to change its style if some other tread is doing that we don't know if all the methods we're calling our thread safe and we have to make sure they are another problem with the hidden state is that if we hypothetically solve these issues and run let's say half the animation on core a half the animation core B if they have a very different runtime profile so if core a is lucky and gets a lot of inactive animations then it will be idle for a lot of time we won't have an even distribution of work with data oriented design things look simpler our animation states are completely independent of each other we duplicated some data to achieve that so what we can do is just let's cut our our vectors in half half doors here have those on the other core every one of them outputs its data and in the end in a classic for conjoined fashion we can merge that data together and leave it to the next system what about testability Wow again with the object-oriented case it looks so well difficult I wouldn't be the one that writes unit tests for all this because I would have to mock a lot of things for instance the events the document object model just to test my my animation system moreover I have a lot of hidden state so a lot of ifs and else's and so on so if I want to cover all of them I have this combinatorial explosion of tests that I would have to do in the data oriented case looks again a bit more simpler just because we have really isolated the animation system from everything else we have our very clear input our very clear output so as long as that contract is not breached and the right input gives you the right output you're fine the system works and the other systems down the line have the responsibility to use that output in the correct way for the modify ability such large and complex object oriented hierarchies tend to harden and by that I mean that they become so complicated that the you that the programmer is very much inclined not to modify them so it's very difficult for us to for instance to experiment with changing the layout of the object or moving some data from one class to the other just because everything will break and we have to fix a lot of things so those things are usually left to big refactorings that in the end never happen in the data oriented case as we are completely owning our data and it's very clear we have a lot more wiggle room we could separate our animation stating in to erase our animation could continue to use both of those but we can send part of that data to another system and we can try and reorder some of the functions it's a bit clearer however there is one thing that object oriented programming excels at and this is what I think what I call quick modifications imagine that we add a new state like if it's if it's Tuesday the animation runs twice as fast with object-oriented programming it's very easy to do if choose the do whatever else do the other thing so we add a new state and it just works if we are to keep the pure data oriented design this is a lot more difficult to do because we would have to reanalyze our data we would have to look for ways to keep this state out of our main kernel of the function that is that is running the animation and it will probably take us a lot more time but in the end it will probably pay off obviously the it oriented design is a tool like object-oriented programming is like everything we do in programming is and as every - it has it has its it has its place but it also has some downsides and the biggest downside I believe is that the correct data separation is actually very hard to do I mentioned that earlier but data oriented design is easy to explain but very difficult to master and very difficult in the end to to really achieve what you're looking for and here comes a bit the domain knowledge that is good for you to have if you don't know very well the problem that you're solving it's very easy it's very difficult for you to design the data in in the way that it will be optimal so no your problem is existence based predication is also a bit a bit difficult to achieve if you have a lot of those states because you start having this multi-dimensional arrays with this and this and whatever and my advice there is try to reduce the state if you've done GPU programming there are great examples there you can try and reduce the state by executing the same operation on multiple data or or other techniques just get rid of those ifs and else's quick modifications can be tough that that I mentioned already and we as programmers might have to unlearn a thing of two or two so we have been wired with this object oriented type of thinking since since forever actually so in the beginning especially it's difficult to start and rethink everything in this new framework and unfortunately the language C++ is is a bit difficult to to use in in some cases is not always your friend so should we get rid of object-oriented programming altogether I don't think so it is a two as well it has its its place in in our code bases and yeah it's it's the antithesis of my title but sometimes we have no choice you might be using third-party libraries that use object-oriented programming so you're out of luck you might have API requirements that make you to use object-oriented programming you might have seen that but object-oriented programming does not mean is not a synonym for classes and methods and so on with bate oriented design I still had classes I still had structs it was that I put them first I put the data first and not the operations not the code polymorphism and interfaces have to be kept under control I don't believe that their their role is in a very low-level system like interpolating a couple of numbers in the animations but they have their place in high-level systems in client-facing api's they are amazing if you do a type of plug-in and you real really need some dynamic dynamic polymorphism and remember that C++ has amazing facilities for static polymorphism you can go a long way with templates or just when you know all the types that you have you can you can generate everything actually it works surprisingly well finally are there some changes in C++ that will make make it easier for us to write data oriented design code I have a dream which will probably never come true but there are languages that allow you to change the memory layout of objects very easily and to experiment for instance with array of structures structure of arrays and not having to rewrite like half of your code for that I don't see it coming but we can always hope I didn't show that but in games very often date oriented design goes hand in hand with entity component systems and even if you don't need the whole flexibility of an entity component system it would be great if it was possible for us without paying a big cost to reorder and to split our classes into and this is actually doable in C++ and I'm talking about doing it without a pointer in direction because if you do it with pointer in Direction it's very easy but it requires a lot of custom code that it's it's a bit of a mess for me the ranges proposal looks very exciting I believe it will be a great addition and it will help a lot with some of the constructs especially to make the code more clear more readable and this is a bit of a pet peeve of mine but unordered maps and adored unordered sets are not that great because they do allocations I think that we need in the standard I knew another let's say another hash table that has some relaxed requirements and would allow us to use another scheme of addressing and reduce those allocations so obviously there are a lot of open source solutions we're our own hash table and actually reduced the allocation count in one of our version versions by 10% just by changing the unordered maps and sets with a scheme that usually uses open addressing and this again brings you back to the know your machine and design software for the machine that will run it don't design for a hypothetical abstract machine like it's in the standard because it doesn't exist so to conclude object-oriented programming is not a silver bullet but date oriented design is not one either the thing is that you must use your best judgment they are just tools in your toolbox so they have their places you it's your job to find them thank you we have around 10 minutes for questions so please yeah you're educating sorry you're advocating compile-time polymorphism and I liked it as well and then there are always people who say to me Abbott and the compiler will stamp out a lot of codes and from your experience in also knowing the machine when do you hit problems in that area like there's too much codes so that you get instruction cache yes sir and do you know if - so how I can see whether those problems occur yes this is this is a concern so what we try to do is keep the templates inside implementation files in the animation system this is doable if we see an explosion of templates this is this is a problem that we try to cope to case-by-case so you have two problems that you might explode your build time which is obviously very bad for productivity and the other thing is what you mentioned and we have seen in some cases the compiler producing a lot of binary and increasing our instruction cache misses there is no formula to solve that to be perfectly honest either you have to reduce the templates or you can rely on some black magic and and this is what we have done in some places honestly so if you have the fortune of working on a platform that very easily shows you where the instruction cache misses happen which which we have access to so on game consoles for instance it's very easy to see you can go there and try to reorder your functions or you can try to use link time optimization actually in our experience link time optimization helps a lot with some of those cases but sometimes you're out of luck and you have version one where everything is fine and then you change like a line somewhere in code and you see like 5% slower just because clonk has reordered some stuff and when you're when you're working really close to to the maximum possible formas it's a thing of life I don't have a formula to solve this all the time unfortunately thank you thank you right you've described an architecture for this system that includes data and operations on that data and I'm not familiar with this domain very much but I'll take your word for that it's a good choice but with those structures you may want to maintain certain invariants in the data for example and you could do that with constructors or methods yes and you've also described that for certain parts of your system like the user facing part or the client facing part you may want to have those those abstractions yes and so I guess my biggest question is why what exactly do you consider to be object-oriented programming and why is why is what you've described and not that you guys I think really who's talking about choosing different abstractions for different parts of your system and I don't really think that that contradicts this idea of opie unless you have a different idea of what that is got it yes it's a different way of of solving the same problem so it's the same data it's the same invariants that's true however we have turned a bit around the way it is done because with object-oriented programming what I believe it means and what usually people end up doing is having this object that we saw animation that contains all the data all the invariants and all the operations for for one animation right that's what we saw however this does not map well to the hardware it also does not map well to the complexity of our system because we end up with these giant giant class that knows everything and can do everything let me try to refine my question sure so the title of your talk is Opie is dead yeah and I know you yourself sort of weakened that stance at the end a little bit but what exactly is that is what I'm asking what precisely is is a PITA you want to die the notion of having these giant illogical classes that contain unrelated data and I believe they have to be reworked in different collections of data used by the different systems did that answer your question that's just sort of sense to me like bad engineering versus good engineering well a lot of codebase unfortunately end up with with the thing that I showed because when you have the class animation every developer usually goes and adds a new boolean or adds a new data or adds a new method and you can call it better engineering you can call it however you want but it's it's a fact of life no matter how much you try and definitely the chromium and WebKit are done by some of the best engineers but in the end you end up with this giant inheritance hierarchies and so on hi so if I understand correctly there was a difference of about 15 to 20 milliseconds per frame in your in your initial demo versus to chromium demo yeah do you have a sense of where those 15 to 20 milliseconds for going like we talked a lot about virtual and directions and cache misses but we didn't really talk about how many milliseconds per frame those were taking up just that they were occurring so like for example Chrome's GPU processes heavily sandboxed after it has to actually communicate via IPC just to even read the frame buffer and so that's a potentially dominating source of cycles that obviously your program doesn't have absolutely so my demo ran a bit slower than I was expecting it to be honest but it's probably because the PC is not plugged what I have an idea and this is definitely it has some effect on the performance that's why when I do the final comparison my comparison just I compared just this system the first one was just a way to to engage and and to see that the overall system works and there are a lot of reasons for it to work better the GPU process honestly is probably not the biggest one the biggest one is usually the last cache misses the simpler call stacks I don't know if you're familiar with with the code with the code bases of chromium etc but they have very long call stacks and there are reasons for them to do that III get that but for instance the animation system could be made if it does it just does the same thing and it's a lot faster and the milliseconds are here so six eight and this is on these PC versus demo is simple and yeah do you think that the data oriented design scales well with actual required design complexity like for example you've got probably twenty other subsystems that need to hook into the animation system and modify it or be notified when things happen or run JavaScript code and like I mean maybe your stuff said I don't know but yeah yeah it does that yeah it does that and one one other point on the only performance here we usually see on PC around four to five times better performance and this was an artificial test but I'm talking about real you eyes and so on but this is because this PC has eight megabytes of l3 cache I don't know what that is okay okay if you if you go to a platform that has a lot less cash like a mobile phone or a console the difference in performance goes way up so yeah continuing on the line of what the previous person on this microphone was saying I don't think you're being entirely fair to object-oriented programming and not all implementations of object-oriented programming feature all of those features and also yeah this is a case where obviously the classic approach with object or programming is not applicable actually I want to ask you with about this 6.12 type of CPU is this on this one yeah because honestly I think that it would be even more on an arm CPU oh yeah yeah yeah I mentioned that but well the problem was that I had to build the chromium for arm and I didn't want to do that but definitely if you have less cash on your processor this number goes up and the transfer you're the first part yes I'm sure that there are in the world well written object-oriented systems that they have their place that's that's what I say at the at the end I can see some challenges in debugging [Music] did you manage to solve them yes there are some challenges the biggest challenges we are accustomed to while we we see a bug we go into the burger we have this object we can inspect all the data but when you're doing the Italian design you might know which data belongs to which logical entity and what we usually do is that in the bug we annotate our data so it's easy for us to relate it to other stuff in the systems usually I don't understand this sure I haven't haven't seen seen generic tools I have seen a lot of tools not hoc like the ones that we have just to solve this so we annotate the data we have some some extensions for the debugger to be a bit easier for us to find it I've seen a lot of game companies especially when when things go multi-threaded it's even more difficult to find it a lot of companies have some type of graph that they visualize or or try to do it that way but I'm not aware and I cannot really think of a generic tool that that would really help it's very domain-specific thank you thank you what is the experience is a long term support and maintenance using this approach my biggest concern is that time goes you had more data to the structures uh-huh and well yeah you keep multiple copies so the data there is it'll be like data blowed so in a few years you pretty much has to reorganize all the layers and to adjust them to the architecture of the CPU is using to the caches and pretty much redoing the whole thing yes well we pay attention to what we do in the first place and second I believe that these goes a bit on the on the slide that I said about the things that the systems that tend to harden I believe that it's easier to rework and eventually split the data if it is done in this way then it if it is done in a giant hierarchy of classes so as time goes by requirements change and the system has to change we've had so far very good experience so we've worked for years on chromium's base and for some years on on this stuff and actually the maintenance time and solving a bug and implementing a new feature time is an order of magnitude faster and of course it might be related to the fact that chromium is millions of lines of code ours is very big but it's not that big so it depends I believe that in time this is easier to maintain thank you thank you you talked about that there was read-only common state between all of these classes that you needed to duplicate yes for us all of this yes could you talk about the challenge as to why that couldn't be simply pulled out and read-only States separated from mutable State sure because it's not redone the fact is that the user can modify that definition at any time so one way to do it is to do a copy on right because you might have multiple animation that refer that the other is to copy it so as we have little data it's an empiric finding that we did that it's easier and faster for us to copy the data as it changes that fact of life might change and and you saw in the chromium base you might have noticed that the class was called animation effect read-only it's not read-only they have cost methods that do cost casts remove the cost and change them so that's not true so it's mutable in their state as well unfortunately we are out of time I'll be here so everybody who has questions is welcome and thank you again for your time
Info
Channel: Coding Tech
Views: 130,124
Rating: 4.6943998 out of 5
Keywords: oop, object oriented programming, data driven design, data oriented design, software development
Id: g1TsP60z2OQ
Channel Id: undefined
Length: 60min 46sec (3646 seconds)
Published: Tue Oct 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.