Unite 2016 - Let's Talk (Content) Optimization

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay hello welcome to the talk I'm glad you're all here glad glad to see you all made it through to a day three of you night this is going to be a fairly dense talk so I'm going to have to speed right through this I apologize if you're arriving late so what are we going to talk about today obviously in big letters it says optimization but that's a pretty broad area this is going to end up being something of a grab bag talk on a number of different topics but it will focus primarily on content content optimization and the code that generates loads and displays your content for those of you who have seen the talk that I gave in Amsterdam with my colleague mark this is not the same talk don't worry if you haven't seen that one that one covers profiling in depth it covers memory optimization and memory profiling as well as CPU optimization if you haven't seen it yet it's on our YouTube channel I encourage you to go watch it later at your own leisure some of the things that I mentioned in that talk are relevant to understanding the stuff in this talk I've tried to make it as self-contained as possible but if you have questions we'll take them later all right so quick introduction Who am I hello my name is Ian I am an enterprise support engineer and I cover the European territories this means that I go to our enterprise customers and I help them solve technical problems most often I get called in towards the end of a project and I have to help people get there to get their game running on a low-end device try to get that last few milliseconds of performance off their frame time try to bring their memory budget down so they can run on iPhone 4s or something even smaller what this means that I see a lot of problems I'm going to talk to you about those problems today and let me head off one thing that a lot of people talk say at these talks yes profile things before optimizing them I'm going to speak stridently during this talk there are plenty of micro optimizations that you should not be applying before you profile it don't fix problems that you don't have make your game fun make your game work then make your game fast but since we're talking about content optimization today there are some things you need to think about early in your development lifecycle if you change the way you build or structure your content midway through your project you're going to piss off your game designers and your artists because they're going to have to go recreate all the stuff they've are built that's not good for anyone's budget okay so what are we actually going to talk about four major major topics we'll start by talking about unities serialization system what it is what data it generates what gets serialized where it gets serialize and how that can affect the instantiation and loading times of your content from there we're going to talk about the humble transform component a very simple and basic part of unity many people don't think too much about it but the underlying details of how it's implemented are important to the performance of your game that will actually lead us into talking about director and the animator component we're going to talk about one neat trick to making making it more performant from there we're going to start talking about unity UI now there's a lot of things I could say about unity UI that would be a talking of itself so I've selected two specific things that I want to talk about I'm going to talk about one specific type of unity component pooling that one type of component and then I'm also going to talk about the performance fundamentals that are key to making your UIs performant in unity all right serialization let's start there what is it what is serialization what is the serialized data that I am talking about well what I'm talking about is the data that unity derives when it's trying to make copies or save copies of things that are derived from unity internet object or your serializable classes most commonly if you open your assets folder you will find this stuff in your scenes in your prefabs and in any assets you've derived from scriptable object now again these are most commonly saved on disk but they're actually used in memory as well you see when you instantiate a copy of a game object unity serializes the original copy recreates the game object and component structure of that original object and then feeds the serialized data that it was temporarily cached in order to create that cloned copy this is also important because if you if you are running your game in the editor and you change some c-sharp scripts you've probably seen that your game breaks this is because when you change your script in the editor we actually serialize the contents of the current running scene cache that off into a temporary data buffer finish compiling your scripts reload the mono scripting domain so we have all of your new code and then we feed that new code all the cache serialized data an attempt to restore your running scene the way it originally was but really this is all kind of background information what actually gets serialized everything I do mean everything if you have 100 transforms inside of a prefab every single one of those transforms is going to be serialized out onto disk there is no duplicate data checking in unity serialization system there is no fun interesting compression algorithm for me to describe it doesn't exist and to prove this to you we're going to look inside of a prefab so I created a UI prefab this is a simple game object one game object with three components on it now obviously there's going to be a transform because every game object has a transform there'll be a canvas renderer and there'll be a UI text component you can see all that in that magenta box there and then those components actually are also in the same file you can see one of them in the yellow box it's identified actually by a unique identifier which you can find at the top of its declaration it's after it's that number after the ampersand this is how we can uniquely identify this specific asset inside this specific serialized file now we can also see that it's of type monobehaviour it's right there in text but if I scroll down if I scroll down and actually look at what gets serialized to save that monobehaviour well the first thing is we all know that model behaviors run c-sharp scripts so we actually have a reference at the very top in magenta to the c-sharp script file in your assets folder now because the c-sharp script file is separate from the monobehaviour prefab itself we have to refer to another file inside your assets folder that's what the GU it is up there and this is the good that you'll find if you open up a meta file in a text editor so between that good and the file ID that unique number that we have we can actually refer to any asset robustly on disk regardless of whether you change it as long as you keep the meta file in the right place but beyond that we can actually see all of the data serialized by the c-sharp script itself all the stuff that we need to restore the public fields and the serializable fields in this monobehaviour it's in that yellow box there so because this is a UI text object we have a reference to a file object we also have some of the other font data the size of the font the state of the rich text checkbox and so on you can also actually see at the bottom there that giant chunk of lorem ipsum that i pasted into my into my inspector actually I pasted in five paragraphs of lorem ipsum but I wanted this slide to be readable the important thing to take away from this is that all of that data is inside that serialized file all five paragraphs are in there and maybe that's not so bad when you're only typing in a few characters but if I pasted it war and peace into my UI text object that's going to make a very large prefab or a very large scene indeed and this is important because all the data that is serialized even into these files in your assets folder will also be present when we build those into binary and save them into asset bundles into the resources system or into the scene files all of that data is included it is packed into what unity internally calls a serialized file you may not have heard that term before because it's actually not a public API it is entirely internal but it is the core of this is the core system that seems resources and asset bundles are built atop now what is a serialized file effectively it's a binary pack file we take each of the assets that we're going to place into this asset bundle into the resources folder serialize them as binary concatenate all those binary data blobs together and just slap them verbatim into the file we may apply some compression if you're using compressed asset bundles of course now because there's multiple assets inside of this utilized file we have to include some meta information there's obviously there's an index that says ok this asset is located at such-and-such byte offset inside my binary data blob but there's also a type tree a type tree is a data structure that we use to figure out where the different properties are inside a specific types serialized data so if I'm serializing a UI text object I can use the type tree to say okay where is the rich text property and the types will say okay one rich text property is a boolean and - if you're loading a UI text object it will be located at byte offset XYZ from the start now this is actually important because while you're developing your game we probably have plenty of serialized assets on disk you have plenty of scenes you have plenty of scriptable objects on disk but you're also iterating on your code so if you change the code backing the scriptable object at runtime you don't want to have that your game designers to have to recreate all that content you we still want to be able to decode that content and unity uses the type tree structure to do that now this is actually not done at runtime so long as the type tree that was used to build a PA serialize file is the same as the type tree that is being used to load that serialized file unity will actually just feed the binary byte stream directly to the serialization system will not have to do any property by property decoding is actually the only reason this is performant now this is all very nice but what's important here is that everything that you put into one of those serialized files needs to eventually get loaded if you want to restore its assets into memory every single one of those bytes that I see realized out and that UI text object is going to be in that serialized in that serialized file and when I want to restore that UI text object I have to read every single one of those bytes back out unity just opens a file handle jumps the appropriate offset creates a byte stream and feeds that to our serialization system I'm seeing a lot of thousand-yard stares right now a lot of them you're probably wondering okay this is all nice background information why is it relevant to me I'm going to tell you a story a story of three little code paths what we're going to do is we're going to talk about how to populate your scene with content now there are several different ways of doing this but we're going to create a very simple game object structure we're going to have one parent game object and 10,000 children each of those 10,000 children is going to have one monobehaviour attached to it so one way you could do that is manually you could just create each game object with new game object you could attach the mana behavior with that component and reparent it I don't think anyone in this room would opt for that approach instead you might say okay well you're creating the same structure the entire time so I'll create it once and I'll save it off as a prefab or as a scene or put it on disk somewhere and then I'll just load it from resources and instantiate it well that's a lot less code that's only two lines of code and it seems to accomplish everything I need to do I think this is what most this is what I see when I visit customers I often see people putting large chunks of data on into a prefab and then loading that up out of resources out of an asset bundle it really matter but there's a third way I think quite a number of you noticed something when I was creating those game objects when I was creating all those child game objects every single one of them was identical so you might think okay I'm going to create a single small template prefab and I'm going to clone that template prefab 10,000 times now on this slide I've actually just constructed the template manually once and then cloned it the performance characteristics really don't differ whether you're loading this purpose template off disk or whether you're creating it because this is such a small example okay let's take a little poll show of hands who thinks the first way where I'm creating 10,000 game objects who thinks that's going to be the fastest one guy thank you sir for your courage okay who thinks that loading everything off of disk loading that 10,000 prefab 10,000 transform prefab off of disk who thinks that's going to be fastest okay also like one or two people and who thinks cloning is going to be the fastest way I can tell this is an American election year because less than half of you are voting it was a trick question I'm sorry the loading the loading time characteristics actually change depending on the platform you're using so in the editor it appears to be faster to clone the one large game object than it does to clone 10,000 small ones and of course it's much faster than creating them from scratch but whether I am on a powerful Windows 64 machine with an SSD or a crappy old iPad air to cloning those 10,000 template objects suddenly becomes a heck of a lot faster okay let's dig into this why is this happening so I took the iOS case and I pulled out each of the three operations that were performing they're the spawning of game objects the adding of components and the repairing you can see the axe with the results of this performance test here now it's a bit it's formatted a bit strangely so let's dig into this a little further okay what's going on in the creation case well we can see immediately that all of our time is going to adding the component this is because when you call add component we have to do a whole bunch of lookups to figure out exactly which class you want us to create and add to your game object we also have to spawn a bathroom class for that a C++ backing class so most of that time is actually going into just looking through those look-up tables trying to find out which component you want now the amount of behaviors I use for this example were empty they did not have an awake callback if I did have an awake callback and if that awake callback had a significant amount of code in it then you would also see all the awake time showing up in that add component column you've also noticed probably that the reparent in column is about ten times slower than the cloning case but I'm going to come back to that so hold that thought okay what's going on with cloning why is cloning faster and why is it formatted so strangely well obviously when you clone game objects and you call game object instantiate to a firt at first blush all the time seems to end up inside game object instantiate self of course because it's cloning the game object but what I did is I took this into instruments and I pulled out all the different lines of code and figured out what they were doing and it turns out that each operation actually becomes faster creating the game objects is about 10% faster adding components though you can see creating the components cloning the components without that lookup is nearly ten times faster than doing it manually similarly reparent thing is also about an order of magnitude faster I'm gonna go into that in a little bit all right so what's going on with with the loading case why why is there such a giant chunk of time in the remainder block there well the remainder block includes all of the disk i/o now remember I have 10,000 copies of the same transform in the same monobehaviour serialized into that file that's going to create many hundreds of kilobytes of prefab on disk and I have to read every single one of those prefabs off of a really really slow SSD oops so in this case when you have slow storage which is the case on most devices your loading time drowns out all the other costs of all the other operations that you're performing your probably also wondering ok well what was up with the editor why was it faster in the editor then to load all that content and instantiate all that content well the unity editor cheats the unity editor keeps many of your assets loaded while it's running so since we don't have to read anything off of disk we just make a copy of those 10,000 game objects and the difference there is simply in the number of method calls as I talked about in the Amsterdam talked extensively each method call you make in c-sharp has a small amount of overhead associated with it further game object instantiate is not implemented in c-sharp that's actually just a c-sharp front method that moves down into native code that transfer of control from managed code to native code is called trampolining I didn't it imposes an additional small amount of overhead now on a call by call basis that overhead is almost insignificant however when you scale up to cloning 10,000 objects suddenly performance degrades by 30 percent okay you could also think maybe this is not a fair comparison why did you load that thing synchronously you could've just pushed all the disk IO onto a worker thread and that's true but if this was if I was loading a scene if I was loading a level of my game and the user was stuck on a loading screen they're still going to be waiting this is not to discount the importance of asynchronous loading of course if you're streaming content in if you're streaming in parts of your level of extrema in parts of an open world then obviously moving all the disk IO onto a worker thread is very very important for reducing the stuttering you will see trying to stream that content in but perhaps also you're on a mobile device without a whole lot of memory and perhaps you've got a lot of full screen you eyes for things like a store page that your users aren't going to access all the time so maybe you want to stream load those all in as well well even if you asynchronously loaded that's going that could take quite a long time especially when the user hits a button you don't know the user wants to load this screen until they hit that button well if it takes a millisecond to load your UI hierarchy off of disk these are still going to perceive your game as being slow also think about those UI prefabs for a second unity UI encourages game designers and artists to use a lot of a game of game objects and transforms to achieve different layouts and to structure their UIs you've probably got in any busy UI dozens if not hundreds of game objects and therefore they're transform components those hundreds of components as well as the UI components both the transform components are going to produce thousands of serialized properties and every single one of these is going to exist in that file off of disk and this is often why UI pre v loading is slow because you have so many objects inside of it and many of them especially if you have got a scroll list or something are often repeated you have template objects ok so what can you do obviously I would love to be able to tell you that nested prefabs were available but they're not however I have seen many many people create their own solutions for this for you know creating their own templating cloning systems where they take pieces of their game pieces of their UI grate them as template template game objects and just load those on demand and clone them repeatedly and if you're on a memory generous platform obviously you can load plenty of other things ahead of time there's another thing you can do though when I see people structuring their game content when I see people structuring their game content and putting things on mana behaviors they put game design parameters for enemies and for level level pieces on mana behaviors well then if you have multiple copies of that enemy in your scene you're going to have multiple copies of that enemy in the scene file and we're going to load all those identical copies once again so if you didn't want to just template those and clone them another thing you could do to reduce the size of your scene files or your prefabs is to pull that configuration data off onto a scriptable object because now instead of serializing 100 enemies with 100 properties creating 10,000 properties to load you have one file one asset with 100 properties and 100 references to it we've gone from 10,000 to 200 properties to serialize this can be quite important if your levels are large or if you have large repeated repeated portions of your game content of course it's also good for your designers now if you're interested in script objects my colleague Richard fine gave a talk this morning on it I understand many people may not have been able to make that however that talk should already be on YouTube he gave a similar talk at Unity Amsterdam so I encourage you to go watch that talk alright let's back up for a second remember this reparent in column what's going on there why is it ten times slower tree parent things when I'm creating game objects than when I'm cloning them take a look at the code that I used to create the jutsu the to transform hierarchies in one case I created a new game object which will spawn a new game object in the root of the scene with its own transform component in the root of the scene I did some other code sure I think that we can ignore that but then I repented it into its intended place in the hierarchy when I was cloning things I use an overload of game object instantiate that accepts the prefab now some of you're probably thinking oh okay this is method call overhead again you already mentioned that no this actually has to do with a low level implementation detail of how transforms are laid out in memory you see in unity 5.3 and older versions of unity there is no defined contiguous structure for placing transforms inside the native memory heap so they could be anywhere regardless of how your transforms are structured we would have to load data from all over the managed heap in order to iterate over these transforms from 5.4 and onward that changed in unity 5.4 and onward from now on all transforms each root transform allocates a memory buffer that memory buffer contains all of its children that memory buffer is fully contiguous it is one giant blob it's one giant array of memory it's also compact there are no gaps in between the transforms and most importantly those transforms are kept in sorted order they're sorted depth-first which means if you iterate over those transforms in depth first order you are essentially iterating over that memory buffer as if it were a linear array you're not jumping around inside of it and you're not jumping around all over the heap this is the key change this is the key to improving performance of the transform at least in 5.4 why well it has to do with a low-level detail of how processors deal with loading stuff from main memory when a processor loads a value from main memory it doesn't just load one value it loads what's called an entire page it loads 32 or 64 bytes or some amount depending on the structure of the processor itself however the transforms components data fits into less than one page but if all your transforms are contiguous and you're using the one after another then you're actually going to use that entire page and reduce the amount of time that your processor is stalling waiting for that that data to be fetched from main memory onto the processor and many processors these days have what's called optimistic prefetch where they load additional pages subsequent to the one you're currently using into their cache so those pages are already ready for you to use by the time your code has gotten to them this is extremely important for large game object hierarchies especially you can see this yourself it's a simple experiment create two game objects give one of them ten children give another one of them 10,000 children put the same amount of behavior on both of the route transforms and the root transforms only have that thing just move the game object around in space or rotate it it doesn't matter what you will see is that the game object with ten thousand children will be significantly more expensive just to move or to rotate even though you're only changing the data for one transform now there is a downside to maintaining this sort order as well if you're inserting things into the middle of a transform hierarchy says we want to maintain a depth-first sort sort order we have to move data down in memory and insert the new stuff in the middle also if that buffer runs out of memory we have to allocate a new one and copy all the old data into the new buffer okay well you can take advantage of these behaviors actually so in five four we added this new game object instantiate call that I used kind of cheating I know the main difference here is that I don't create a route game object when I spawn a game object and pass the parent transform in to instantiate the game object is instantiated directly into the target hierarchy so because we never create a new route transform we don't allocate a new memory buffer for that transform if we just create a new game object at the root yes there's a small temporary allocation just for that transform component that's a route transform but when you reparent it one line later that memory gets released again that's a waste don't do that the other thing you can do is if you can guess at or calculate the size of the hierarchy into which you're going to load your content what you can do is set that hierarchies capacity manually so then we no longer have to waste any time expanding that memory buffer each time you exceed its capacity just use the hierarchy capacity property here now I also want you to think about your object pooling code because you a lot of people tend to use transforms as folders they like to organize their scenes it's nice for you it's nice for your game designers to have a nice clean hierarchy and when you reap out when you pull your game out it's when you put them into a pool people often move them into a separate Transformers to keep them separate from everything else keep them tidy but since those transforms are now going to be in the middle of other transform hierarchies we're going to have to copy things around in memory and that repenting has become much more expensive so don't do it in release fields of your game in release builds of your game no one sees your hierarchy your players don't see your hierarchy so you don't have to worry about whether it's clean or dirty just if deaf out that the code that does to reap Aron ting in release mode add add a preprocessor to fine or of course you can always just have it only do that in the editor you can also use the conditional attribute I'm going to leave that one up to you for homework you can google it it's on MSDN now why was this change so important I think most of the people here thought the keynote so you can guess exactly what the next slide is the next slide is the on transformed changed message good I see a lot of nodding so this is a message that unity presently fires every time it transforms position rotation or scale change this message propagates both to that transform and all the components on its game object as well as all of its children now when I say anytime a position rotation or scale change changes I mean it anytime you assign to one of those properties or use a method that assigns one of those properties this message is fired now you may if you haven't seen the keynote you may not have heard of this message before it's because it's an internal message it's used by unities internal systems there are plenty of systems inside of unity that need to be notified when something moves around in world space consider physics for example physics maintains its whole separate physics copy of the scene that's where your rigid bodies and colliders actually live that's where physics actually does its work but in the visible hierarchy type of your scene you have a Collider on a transform and when you move that transform around you want that change to propagate over to the physics scene so that Collider has to know when the transform has changed hence the message so a micro optimization you can currently do is if you have changes to position to rotation to scale and you have multiple changes within a single frame batch them up just do the arithmetic add up the vectors add up the quaternions and apply them once this because of because you reduce the number of on transform change messages being propagated and considerably improve the time it takes for us to update our scene now you don't have to worry about non changes if you add up a bunch of stuff and it ends up you know coming back around to where it started you will catch that and will not send a non transform change message but as they mentioned in the keynote you could also just wait because we're going to fix this the transform system in about unity five six issues currently the latest date that I have on it is going to change again we're going to change to a dirty bit system or things like colliders and the rendering bound system are all going to be notified simply that they've that their position has become dirty and when when they actually need to know when they've position has changed when they actually need to do their own calculations they can then they can just load the data at that one point in time so this makes changing large transform hierarchies considerably considerably cheaper but now think about animators I took this model from our wonderful R&D content team and when I pulled it into my scene it created 80 transforms I think most people in this room are aware that every time you pull in a skinned rush mesh renderer by default it creates one transform per bone in the animated skeleton those transforms will be updated once per frame as long as there's an animator running on that model this is going to be the apply on animator move heading in the unity profiler now don't worry the cost of this update only scales linearly with the number of transforms that you've got so we don't send one for every single bone individually we just send one to the root and it propagates down after we've applied all the bone changes but also each one of these transforms is going to be serialized into your content so if you have a model with 100 transforms and you have 100 copies of that model inside your scene suddenly your scene has 10,000 more transforms and we've already seen what that does to your loading times easy to fix one check box do this early on the on the rig tab of the model importer there's this little optimized game objects checkbox it's very easy to miss if you check that it disait the big thing that it does is disable this behavior it no longer creates transforms for each bone in the skeleton it only creates one for the route bone but it also allows unity to reorder the data the Meccan of animation data in order to make it more multi-threaded we'll talk we'll see you see that in a moment the other thing is you probably can't just ignore your entire model hierarchy if you're making a fighting game maybe you need like your hands so I can shoot lasers out of my hands or I need to expose the eyes so I can put like $5 DLC sunglasses on to the my character's eyes in that case there's an extra transforms to expose box that will appear just whitelist typically the transforms that you need this will actually just expose those transforms and not their parents so it considerably cuts down the number of transforms in your scene that have to be updated that have to be serialized but of course I've been talking a lot let's go see some more charts I'm sure everyone is excited for more charts so 100 of these models marching how much frame time is going to going to be allocated to the animator now this is an empty project so it's most of it but this is actually just the frame time used by the animator itself on three different platforms now one is on a fairly old iPad when it's on the MacBook that this is running on in the bathroom and the other one is on my old PC at home as you can see I need to upgrade but what you can see here is that on all platforms switching to the optimized version considerably improves the performance of metonym 50% or more depending on the platform and exposing bones does not seem to have much of a performance impact on how metonym runs now let's actually break this down further so we're going to go on go into the two major operations that Mehcad M does now there's one big heading animator update that includes most of the cost of just running the animation system itself so that's evaluating the curves interpolating the curves blending them together performing at any inverse kinematics you might need evaluating the flight the Meccan M finite state machine if you're using that and also copying all the changes in the bones to the transforms in the scene the other thing is measure skinning regular old Mexican where you just calculate where the triangles are based in the positions of the bones and as you can see the animator update time improves massively by 50% roughly on standalone platforms but not quite as much on an iPad only about 20 percent whereas mesh skinning improves massively 1-8 about 1/8 the time on an iPad and better bit less than 90% and we lose more than 90 percent of our time when running on desktop why is that well I can visualize that for you if you go into the unity profiler use the CPU profiler and click on that hierarchy drop down one of the items is the timeline profiler and that lets you see what Unity's worker threads are doing so in this case where I'm using where I'm showing you what's happening in one frame with an unlimited unoptimized game object you can see that well mecha num is doing a bunch of of work on its worker threads but then it hits that dirty scene objects heading which counts for nearly half of the time this is a desktop platform by the way that dirty scene objects dirty scene objects block occurs entirely on the main thread and what it's doing is it's copying the bone data to the transforms in the scene you can also see that all the mesh skinning is also happening on the main thread what happens when I hit the checkbox o our worker threads are happy now so first we've gotten rid of most of the transforms that we need to update so that dirty scene objects heading all but disappears the rest of mechanism is still chugging along on the worker threads but then first mesh skinning has moved over to the GPU thread the work the graphics thread that can be run from there because we've optimized the the layout of the mesh getting data but also it's now multi-threaded Bowl so we didn't actually get rid of all the time that we were spending doing mesh skinning we just freed up the main thread to do other things okay let's take a hard left turn let's talk about your UI text components okay who here has pooled large pieces of the UI that have lots of text on them you've probably seen this when you're activating and deactivating your game objects now this is 100 UI text objects their content has not changed and all I'm doing is hammering the game object activate the activate button repeatedly every time I do that 5 milliseconds spike on a fairly powerful machine that's not going to be friendly to your framerate there's only 30 characters in each why does it take so much time okay well this has to do with actually a problem with UI text itself see the way unity UI works the way canvases work is that when one of their constituent objects changes it informs the canvas that that object has become dirty that that sprite needs to be redrawn that text needs to regenerate its mess normally that only happens when you actually change something when you change the sprite that you displaying when you change the text that your display except the unity UI does this in the on enable callback as well oops so how can you work around this it's not pretty it's not pretty at all but you can disable the canvas component this will prevent that UI from being drawn without sending the on disable and on enable messages as you able and disabled game objects so if you just deactivate the cannabis component you will no longer be drawing that UI you will no longer be sending those recalculate your geometry callbacks to the UI text to sprites that change in the background the problem of course is that all of your mana behaviors and all the other behaviors are still going to be running so you may need to disable those manually it's got some very ugly side effects there it's very easy to have a performance vampire running in a hidden UI so I have to be careful about that all right but the thing is it doesn't act that the performance of calculating those text object doesn't just scale based on the amount of text so here I took the same text and I put it into one giant UI text object and then I progressively split it up I split it up into 100 game objects in the middle and a thousand game objects on the right as you can see performance begins to degrade quite rapidly why okay let's look at some deep profiling data so here is the one game object displaying 3000 characters well as we can see most of the time is spent running running text on populate mesh that's actually doing the layout doing all the mathematics to lay out the mesh pieces for in your UI and that's what we want it to be doing so this is pretty good what happens when we have 1000 objects suddenly other pieces of the UI get pretty ugly you can see that graphic update material is now taking 22 percent of my frame time and we're spending only about half of our time updating the text message itself so only doing half of the time is doing productive work real productive work the rest is starting to show up in that yellow box there in the self column method call overhead every time we have to update one of these text messages we're calling not just one method we're calling tens or possibly hundreds of methods so by adding thousands of objects I've actually added about a million method calls to a single frame and it's purely the method call overhead that's going on in here this is purely the method call overhead that is degrading our performance so suddenly that overhead is accounts for nearly 20% of our frame time that's pretty ugly and now let's step one let's take one step deeper let's go into how unity UI actually works and what it was meant to do this is actually quite important because if you understand this you can understand how to actually make your unity wise performant I see a lot of people creating unity you eyes in one specific way that guarantees that they will have bad frame rates guarantees it alright so what was a canvas what was the UI canvas meant to do the UI canvas was meant to allow game designers and artists to create content in their game and not care about how it's going to be drawn the UI system is supposed to do those calculations for you it's going to take the set of objects you want to put on the screen figure out how few batches it can draw those in and then actually send those to the graphics card okay but if any one thing on a canvas changes if any one drawable component what I mean drawable component I mean any anything visible on the canvas so any sprite or text object etc when any one drawable component changes the canvas must recalculate its draw calls because two things could have become interposed because you might have changed the material that a sprite was using you might have changed the font material that that a text a text object was using now to calculate those batches a canvas will take the set of its all of its drawable objects and analyze them and to analyze them it has to know how they're going to be drawn by the GPU so it sorts them by depth every program in the room just had a light bulb going off in their head because they heard the word sort and you know that scales worse than linearly it's an N log N and operation with the number of things in the input set so as we add more and more game objects as we add more and more drawable components to our canvases the time it takes to rebuild those canvases increases but also as you add more and more things to your canvases the likelihood that any single object changes is also going to approach one so suddenly you are rebuilding a massive set of game objects every single frame and that's going to take 5 10 20 30 milliseconds or more okay well why why is it doing that sorting operation fun fact all unity UI is transparent all of it always and I know someone's saying no no I only used opaque sprites the Alpha Channel is fine I did maybe I didn't even have an alpha channel it doesn't matter we still submit it to the transparent queue of the GPU so we have to sort it back to front and we be drawn back to front this also means that if your designers like to take lots and lots of sprites and layer them on top of another to create a complex layout it also means we're going to be sampling many of the pixels in your back buffer or we're going to be sampling many of the pixels on your screen multiple times I have seen games that have four or five fullscreen textures laid out one atop the next well there goes five milliseconds of your GPU time occluded quads are not cold at all so what can we do and the first thing is if you do have a lot of stacked objects you could try merging those layers down this is usually a bit of a fight because one it makes your UI less maintainable it's no longer can put you know it's no longer a bunch of components that you've placed on the screen it's now a bunch of just big textures and of course that's all that may also blow out your texture memory budget you can also try merging text objects so that it has the same problem it's sometimes it's harder to confine the updates orkut reeve the layout that you want so you often you can't reduce the number the pure number of drawables in the canvas is set so what do you do instead you split up the canvas canvases could be nested one inside the other and what this does is it isolates the set of children on the nested canvas from all that sibling or parent canvases this does two things first it simply reduces the amount of things that that nested canvas has to sort so your sorting operation becomes cheaper your yeah the analysis to create the draw calls becomes a lot cheaper but also if you take the things that change frequently you take the aunt the animated sprites if you take the text that's being scrolled out and you put that on to a nested canvas alone then when that changes you're going to have to only analyze that small set of objects that only that always change all the things that are unchanged will not be re-evaluated so you've achieved double performance savings both you've it sped up the cost of building the batches you've also reduced the number of batches that you need to rebuild let's see an example so stamina bar ugly programmer art I'm sorry this is the sort of thing you'll find in pretty much any mobile game these days now what's on it five components to text objects three images we've got a label we've got a clock we've got a progress bar Phil a progress bar background and we've got the background to the window itself well when are they going to change two of them are going to change pretty much every frame or at least every second the fill and the clock meanwhile the background stuff is never going to change once we've built the batches for it we could in theory just ignore it for the rest of your applications life cycle so if we split those those two things on two onto two canvases we've reduced the amount of work we're going to have to do every frame by six by at least 60% we've reduced the set of drawable objects that we need to evaluate every frame by 60% and that's going to have a more than 60 percent performance benefit to your frame time I have gone really fast through this talk for that I apologize there's a lot more information out there the enterprise support team produces best practice guides these are available on this website here we have two available right now one of them is a UI guide which goes over some of the material that I talked about here today but in much more depth but it also has a lot more component specific data patterns of structuring your data as well as more specifics on how performance changes and how the battery build process works we also have an asset bundle guide which covers the serialization system as well as the resources system so if you're interested in those I encourage you to go find those we have one more guide coming very soon actually sitting on my computer over here I just need to copy edit it but that will be out in the next few weeks and that contains that that is the performance best practice guide that will cover the information I'd spoke about in this talk as well as the Amsterdam talk so it talks about profiling it talks about transforms it talks about memory optimization and it talks about CPU optimization thank you for coming I hope you've enjoyed this I hope this has been at least marginally useful took for you we have about 10 to 15 minutes so we can take a couple of questions here if you don't want to ask me a question the big stage I and my enterprise support colleagues are going to be in the ask the experts room for the rest of the day so if you don't want to tick around for questions thank you for coming thank you for attending unite and I hope you enjoy the last few hours here if Mike's here Sophie please sir beautiful talk brother this is a performance in unity I mean making it run smoothly making it fast is just a pleasure and making it this knowledge right here is forever my name is Lex I run a the my CV are made up my CV ru and my question really is about what is your in your opinion the most optimized VR game or app and and how would you upped imagine further so actually I haven't dealt with a lot of VR games so I'm not the best one to answer that all this still applies tremendously thankthank I hope it's gonna be useful to you but once I played more VR experiences once I tried more of your experiences I'll get back to you please I like to help you out then thank you uh yeah I'm just curious the on transform change to call back a lot of times when you have a lot of different systems interacting like ika and animators and physics and nav mesh agents it can be very difficult to track down why something is moving well is there a particular like performance reason or something why that isn't exposed to scripts or some kind of auction to have maybe editor only be exposed to scripts I don't know the the architectural reason but I can speculate okay essentially I would I would expect that if we expose the untransformed range message to script then you would all you could also retrain the transform in that script and you could create an infinite loop mm okay an infinite recursive loop so I would expect that's why we try to make sure that that many of these api's are as bulletproof as possible okay I have two questions first you mentioned how when you disable and enable UI elements a child child into the canvas if you don't enable and disable the canvas itself it has to do some extra mesh recalculation work would there still be an optimization with making nested canvases enabling and disabling those yes excellent do you mind if I ask a question related to render loops scriptable render loops only a little bit about that but I can try to answer the question okay I was wondering if perhaps in the future we'd be able to change some of that UI logic so it doesn't have to do all of that transparency sorting whoo I don't think you'll be able to change unity UI itself because the canvas the canvas what you'd have to change the canvas component and that's not exposed in scriptable under loops canvases themselves always submit their geometry to the transparent queue so I don't think that'll be possible you have to end up building your own UI system from like unity 2d or something like that and I've seen people do that okay cool thank you you're welcome hi I have a question about canvas free drawing please so let's say I've got a text chat going on that's scrollable right so that canvas itself as I'm scrolling is rebuilding if the scroll instead was actually moving the camera instead of the text itself would that actually be a better optimization there would that prevent a redraw that's think that still inherit that seemed I believe that I believe as long as that camera was just yeah that would probably work okay um it may be better of course just to isolate this the scroll rect onto its own onto its own canvas and use um and instead pool the text objects that you using to display the text because that will actually also use less memory okay great thanks this is actually uh there's actually a section on that in the UI best and the UI best practices guide online awesome I'll check that out thank you um so on our game right now that I'm the game that I'm working for we are working on our animations right now and we're trying to leverage both the mecha name animation system but also trying to work with tweens and trying to do that for this stuff we need to handle more programmatically generally what is a better practice what is more performance like using Meccan amor using tweens or combination to what's kind of a that's heavily depends on your content not go if you're using mechanism using the optimized game objects method then usually mecha name is going to be more performant because it can multi thread all the calculations it can multi thread all the evaluation of all those curves whereas when you're using tweens unless you've especially built your tweening library to actually do all of its evaluation on you know your own threads then those are all going to be gonna be occupying main thread time okay thank you hi um regarding canvas performance in the hierarchy doesn't matter if it's a root object or children of other transforms the canvas you mean the canvas yes no it does all right thank you anyone else hey when it comes to world space canvases there's a lot of the stuff that you covered today change because of well it being world space and the dynamicism of it and also do dirty does world space canvass become dirty when it's transformed has changed and have to go through that whole canvas see I don't think so but I'd have to go check okay thank you you're mentioning hi excuse me you're mentioning that if you have a thousand children and transform you move that transform the transform becomes a lot slower is there another way of optimizing that besides just reducing the amount of transforms you have in not until unity five six thank you sorry okay if you add a transform to the end of a hierarchy is it faster because of the memory partitioning yes basically you have to do any data copies that's why it's faster yes you spoke about the serialization at the beginning yes how you for example load something from resources and then instantiate it in essence that's is isn't that ad serialization cost then a serialization than a DC relation cost if you're in San Shi a ting it into the scene is there plans to have like a direct kind of dump into your scene while loading because I think that's a common use case at least for me and my team the interesting thing was there wasn't until about yesterday I was talking with Joe and some customers yesterday about this very thing and they suggested yes ad a basically instantiate from disk method I don't know if it's going to actually get coded so don't like wait for that I mean but yes maybe more questions kind of a feature request to add on to that I would love to be able to instantiate and copies of something to avoid some of that that's a good idea umm can you open a bug on that or something okay so um there are some interesting things like the optimized game objects in the rig uh but some of the problems with that are that you may have like a pre-existing prefab but the break baked in and yes I don't know if that would update that right and so the way to get around that would be to kind of cleanly dynamically instantiate your rig but then we're getting to the serialization issues is there any kind of like happy medium between having your data cleaned out so you can do all these great things and update existing content but not have slow load times or do you just kind of have to keep it all clean and separate the beginning and then you get close to the end kind of jam everything and make it on maintainable the fast generally you want to use this at the start okay I could conceive of some editor script code that you could use to update your prefabs so you could so in the editor you could actually have access to the serialize property system so you could basically recreate your prefabs notice by copying things from you know checking the types instantiating them rebuilding a game out of hierarchy via script and then cut using the serialize property system to copy copy the data values over and then attach the the updated optimized model that would probably have to be the other way the reason that I actually mentioned this is that if you don't use this early and you begin putting those models in tear scenes and prefabs yes you basically have to go reauthorize I've seen games slip because of this because they didn't use that option suddenly 50 to 75% of their frame time was going to Meccan him because they had many many many characters on the screen on a low-end device and then when I showed them the checkbox like this basically solves all your problems but they've served their game market in a certain way they need to attach colliders all over the skeleton well if they went when even when you do recreate the prefab you do get a significant performance improvement so it would have actually gotten them to 30 or 60 frames per second but they would have had to create recreate literally thousands of pieces of content right so we had to go look for other ways to optimize and then as a follow-up to that there's um there's also the thing where you can right-click on an animator component and flatten the hierarchy is that basically the same as optimizing it I know that there are some downsides that we encountered doing that or not like why are they equivalent it's not equivalent it is a little bit better blow it's Italy the animator version rather than the rig version it's but it's much better to use the optimized game object on the animator okay flattening the orbit hierarchy that just prevents the transform creator message from propagating down the entire thing so we just do one for each of each of those transforms but it does not significant other ants forms it doesn't get rid of that giant dirty scene objects block okay that we saw in the timeline profiler thank you so may be covered in your doc but if you're doing things in UI animation world like moving panels where's the right place to do the movement to incur the lowest cost is there is there like a you know panel sliding across the screen what am i moving am i moving the whole canvas from the root am i moving a drawable object inside of the canvas like which of those is the best approach it depends on how you structure your canvas so if your just moves like sliding a window across the screen it really doesn't matter what actually I think it's it might be faster to move the root transform of the canvas that I'd actually have to check I don't know if the root cannibis moving the root canvas itself in world space or in screen space I don't know if that actually dirties the canvas or not I have to go look but I suspect it does not so that would actually be faster if that's the case okay and we'll run a test on that the other thing nested nested canvases and the event system there's a don't work you have to do there is that do address like best practices on that specifically not really on what you should be doing is anything that isn't clickable turnoff raycast target because the way the graphic ray caster works is anytime you have a game object that's enabled inside of a canvas hierarchy you actually when you enable enable at that drawable clickable object it walks up to the top to the campus and says okay give me your event system okay add this graphic clickable object to your set of objects that's clickable it's literally justin'll it's literally just a list and when I say list I mean yes list brackets tee I mean a generic list and then when the graphic ray Kassar does its raycast it actually just walks through that list and does a box intersection check to see if the point that you clicked in or that you're interacting with is inside that inside the the rect transform of each of those graphics so ya reduce the size of the number of clickable objects in your scene thank you very much

Info

Channel: Unity

Views: 42,937

Rating: 4.9917183 out of 5

Keywords: Unity performance, Unity programming, Unity scripting, Unite 2016 Los Angeles, Unite LA, Unite 16

Id: n-oZa4Fb12U

Channel Id: undefined

Length: 53min 0sec (3180 seconds)

Published: Thu Dec 01 2016