Performance Profiling | .NET Object Allocation Tracking Tool

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
>> In C and C++ you probably spent a lot of time thinking about memory allocation, but there might be cases where you want to think about those same things in.NET. If you want to learn more, there is a really cool profiling tool called the.NET Allocation tool, which we're going to learn more about on this profiling episode of Visual Studio Toolbox. [MUSIC] >> Hey, everyone. Welcome to Visual Studio Toolbox. I'm your host, Leslie Richardson. Once again, I am joined by Sagar Shetty, who's a PM On the Visual Studio Diagnostics team. Hi, Sagar. How's it going? >> Good, Leslie. Thanks for having me. >> Absolutely. Once again we are back for another episode of our profiling series. What are we going to talk about today? >> We're going to go through another deep dive for another tool within the Performance Profiler in Visual Studio. Today, we're going to talk about the.NET Object Allocation Tracking tool. A bit of a mouthful, but it's one of two memory profilers in the Performance Profiler suite. It's basically designed to show you the top functions, the code pathways, and different types that are allocating the most amount of memory in your code. >> Cool. I don't know if you ever simplify that down to NOAPPS. No apps. >> Yeah. Normally, we just call it the.NET Alloc Tool internally and that's short enough. >> Yeah, that sounds cool. Usually, when I think about memory allocation or just allocating anything, I typically think of C or C++ where you have to do all that manually. But in.NET, usually, there's the garbage collector takes care of all that stuff for you, so why have the.NET Allocation tool? >> Garbage collection, that's definitely a great point, Leslie. Especially with.NET, it's gotten to a point where there is the garbage collector that, like you said, does a lot of that memory management automatically. That being said, profiling at a high level is about getting the optimizations that you want the highest degree. Even though a garbage collection can help you with a little bit of that memory collection, there are still optimizations to be had based on the way that you write your code, and hopefully using our tool, we can help surface some of those optimizations that you can do on top of garbage collection. >> Can you describe a little bit about what those specific use cases would be for a.NET user, who might want to use this tool? >> Yeah, absolutely. I think the easiest way to explain is to jump right in. Let's go into VS and then start looking at some of the views and I think some of those scenarios will become more apparent. >> Awesome. >> Going into VS, again, just like all the other tools to get into the Performance Profiler, it's the same workflow. You go to the context menu, you can go to Debug, and then Performance Profiler, or just use the keyboard shortcut Alt + F2. Then we get to this page. To give a little bit more of a background with this particular tool, so today we're talking about a.NET Alloc tool, so I have this box check marked. This tool is going to be good for really any managed scenario or managed application. It's good for all flavors of.NET Framework,.NET Core, ASP.NET, etc. >> Works with Framework too? >> Yeah. >> Wow. It really is a managed that pretty comprehensive tool in that regard. For native, you're not going to use this tool. Just based on the ways it's architected that it uses the ICorProfiler or just the profiling interface for the.NET runtime, so it really is more of a managed experience. Go ahead. >> Is there a similar tool for C++ users that they can use? >> Yes. On the C++ side and for memory analysis on the native side, you're going to end up using the Memory Usage tool. Now, these two tools aren't exactly alike, but that will give you some insights in terms of where your code is spending a lot of time for a memory perspective. Going back to the.NET Alloc tool, another thing I'd like to call out is in this particular settings window if I click on this gear icon, we come to this particular window. Now, in the past, we've talked about a few different data collection methods and when Esteban was talking about the CPU Usage tool, for example, something we talked about was sampling, which essentially was a data collection technique where you're taking snapshots of the performance data in our code and stitching it together. The.NET Alloc tool on the other hand, you can use sampling if you want, like you can switch it over to this. But by default, it uses a slightly different data collection technique called instrumentation. Instrumentation is essentially, if you think of snapshots and sampling, it's taking pictures and stitching that together, instrumentation is like a video. It really is much more detailed and it's giving you exact call counts and they very fine-tune for in precise and accurate data. That's cool because you can get exact call counts. We'll see some of those values in the example today. One thing I will caution users using this tool is that as a result, this data collection technique has a pretty high overhead. To counteract that, one recommendation I have is to keep traces as short as possible. I will reiterate this at the end of the video, but performance is definitely something we're working on with this tool in terms of improving and speeding it up. I can assure you this is something our engineering team is very hard at work for. But in general, just as far as best practices, keeping traces short would probably be something I'd recommend with this tool if you want to use instrumentation. >> What is the default set to like frame-wise or however you measure how much or how little gets tracked? >> With this tool, it's going to literally track every single object allocated. As we'll see once we run the tool and go into the views, it will look at different object allocation types, and for each different object type in your code, it will show you how many objects were created and how much memory that total is taking up. So by default, the instrumentation is tracking all of them and grabbing all of those objects, and it's measured on an object level. >> Sweet. Well, I'd like to see it in action. >> Yeah. I'm going to close out that window. I have an Uploader app, and I'm going to go ahead and click "Start" and start the profiling session. The app we have today is actually a WinForms app. This app is essentially what it's doing is helping us visualized prime numbers. It's a pretty simple app. You have a minimum value and maximum value and for each of those value an ellipse is created, and each ellipse, it corresponds to a specific number. For numbers that are prime, they're indicated by a green icon, and for numbers that aren't, it's yellow. We'll exercise this a few times. Click "Stop". This app does quite a lot of memory allocation, so it's a good way to show off this particular tool. >> In this case is a object being created for each number or each dot that was on the screen there? >> Yes. We're going to dig into the source code a bit more but for each ellipse that's creating, and in this case, an ellipse objects. We're using an ellipse class, and so there will be an ellipse object being created. So yeah, a lot of memory is being allocated. >> Yeah, I can imagine. [inaudible]. >> Ramp up, max up to 10 million. >> Exactly. Awesome. Now we're looking at the report generated by the.NET Alloc tool, and there's a couple of views here. The first thing I want to point at are the graphs at the top of the screen. At the highest graph, we have live objects. These objects are just a total count of the live number of objects that are allocating memory within your code. This is objects across all different types. In some of the views when we go down to the tables, we can look at and categorize objects by specific type, but this graph is just showing an all-up total count measurement. That's live objects. In the second graph is the object delta, so the change. This is showing anytime you have large spikes in objects, and then as we alluded to earlier Leslie with garbage collection cleaning up and reclaiming some memory, you have these red bars every once in a while. The red bars are actually indicating where garbage collection is occurring in your code. In general, you can think of it where green bars are just adding on more and more objects. At the beginning, you'll see large deltas because it's just the way the math works out, right? >> Exactly. >> Yeah, exactly. When you're initializing and you don't have many objects to begin with, even adding a few more objects percentage-wise because this is on a percentage basis a lot, over time that tends to dip a little bit. But the really important thing with these graphs or swimlanes, as we call them within the Performance Profiler, is you can time select and filter down by that. For example, I can select a range on this graph. What that will do is filter down the data in my tables by that time range I selected. So if I was really interested in a bit of garbage collection, here are some area where there was a lot of app activity, I could filter down and then look at specific time ranges and really dig deeper for that specific time range. Another thing I'll point out, and this isn't as applicable for this tool but just to reiterate, when you're running multiple of our tools in conjunction with each other, we will stick swimlanes at the top if you're using a tool such as the CPU Usage tool that has a swimlane, and if you're running multiple tools in conjunction with each other and you want to filter by the same time range across all tools, if you do the filtering at the top for all the tools and all the reports, it will filter by that time range too. With this tool, generally running it by itself because of the high overhead, but I just wanted to call that out again. Those are the graphs, and I'm just going to clear the time selection for now just so we looked at everything all up. Now we dig into the tables where there's I think a lot of insights. Generally, there's a few different ways that you can start your investigation for today. I'll start with the allocations view. The allocations view is essentially showing you a bunch of different object types or classes or structures within your code. There's a very long list here, and so like ellipses bubbled up to the top. For each one of these types were showing you the number of allocations. This is the number of, in this case, ellipses or objects of that type created within your code. Furthermore, in addition to just the number of allocations of that type, we're showing you the actual amount of memory that's being taken up. So that's the bytes column is showing you across all of those allocations. Then, also, average size. That's just the division between bytes and allocations. >> It looks like a lot of allocations going on for pretty much all of these. >> Yeah, and so we'll dig into ellipse a little bit more in the source code in a second. But one more thing I want to talk about as far as the types is generally they fall into two categories and two subcategories. The two main categories are value types and reference types. Actually if you notice, this is something we've modified over the last few years. We've actually added in icons into this particular view. This blue icon over here indicates a value type and this yellow one indicates more of a reference type, so what those are. Value types are things like, as we see here, a double or an integer or a Boolean even. I said an integer, for example, is a value type. Whenever you create a variable of a type of a value types such as an integer, what happens is a specific memory address is pointed out and that's where the variable is initialized and stored. In the case of a value type, the value of that variable and that variable exists at the same memory location. In the case of the reference type, things work a little bit differently. A reference type are things like strings, class is a reference type, array is a reference type. In the case of a reference type, where a variable exists and where its value exists are actually two separate spots. A string might be initialized at a specific memory address and then its actual value, the contents of the string are at a slightly different memory address. At the memory address where the variable is stored, it actually also contains a pointer to the place where the value is stored. The reason why this is important, Leslie, is because of how these two types are stored in memory within the down and runtime. If you think about our memory, it's essentially like a physical block, it's a finite resource. There's a limited amount of space we have to store data and allocate memory. Within that memory, the way it's managed in.NET is at a high level two partitions; there's the stack and the heap. The stack is generally a partition where things that are more short-term are stored, local variables, things of that nature. The heap is a partition where more long-term objects are stored in general, so it's more objects and things of that nature, things that are more long lived. This is oversimplified and more high level, but that just gives you a sense of where those two things are stored. The reason this is important is even though value types are stored on the stack, if you have an integer and you cast it to an object, that actually becomes a reference type. So then it ends up being stored on the heap as well. Now, you have a value type that's taking up memory on the stack and the heap. In other words, it's taking up twice as much memory. You have to be on the lookout for value types. That's why we surface them with these icons here. So yeah, we have value types, we have reference types. Also, if we go back and look at this backtrace, you may notice I mentioned that there are two sub-types. There are also these blue icons with these buckets under them and also, we have the yellow icons with the buckets under them as well. In that case, those are value type collections and reference type collection. Again, the blue icon is the value type, so this bucket is showing a collection, and then the yellow one with the bucket is showing collection as well. That's essentially just taking a value type and showing a collection of it. In this case, this type is effective value entry and then it's like a list of that type. Then, in this case, here we have like system.object and it's like a list or an array of objects or a collection of objects. That's what those icons are showing. Anyway, that's a bit of a tangent backstory on different types, just thought I'd go into that. But getting back into the code a little bit. Right now, we have this sorted by bytes and so whatever has the highest number of bytes bubbles up to the top. This is taking up a lot of memory. What if we wanted to investigate this a little bit more and see what's happening in our code here? If you double-click on this line, what we show you in this right panel, and let me modify this window a bit, is a backtrace. After we click on this ellipse type, we want to look through the backtrace and see where in the code it's being allocated a lot. We see this GeneratePrimes function allocating a lot of memory and a lot of bytes. Now, what I want to do is ultimately go back to source code and see if there's any modifications that can be done to optimize this. I can right-click and hit "Go To Source File" and in this case I have the code up and so we come to this GeneratePrimes function. >> Cool. I'm a little surprised because in some of the previous tools that we talked about in past episodes like the CPU usage tool and I think memory tool, database tool, they all had hot path function tables with little fire icons that indicated here are the functions that you should consider honing in on because they're hotspots for all the CPU usage or memory per issues, that sort of thing. >> Yeah, exactly. We'll actually touch on that again with this tool. In fact, just a sneak peek, we have that same functionality in the Call Trees window, and so we'll talk about that a little bit later and here it is. But yeah, we'll come back to that and for now, keep going through the Allocations view. But yeah, definitely the expand the hot path is definitely useful feature and we do preserve it in this tool as well. >> Cool. >> Now, let's see. We're looking back at some source code and trying to see where we can optimize and see why ellipse is allocating so much memory. We have this GeneratePrimes function, we have these long values, the min and max that we saw in the application before. If we scroll through here, we see we have a for loop going from min to max. Within this for loop, we have an ellipse, which is in this case a class, and so we're creating a new ellipse object for each different iteration of this for loop. That's why when we had the application before that is a lot of ellipses. >> Yeah. >> One thing I will point out here is based on the nature of this particular application and this visualization, even though we are allocating a lot of memory towards ellipses, for this visualization, we actually do want each of those ellipses because we wanted an ellipse, printed out or shown for each number. So yes, it is taking up a lot of memory but you just have to think about, is this an actual bottleneck or are you willing to live with this? Because if we were to want to improve this, for example, there's only so much we can do to get around this particular issue based on the way we've currently implemented this code. Right now, yes, even though we are creating a new ellipse for each iteration of the for loop, we actually want to do that and paint it a specific color based on whether it's a prime number or not. If we wanted to not have to use an ellipse, we'd have to really think about how to instruct this code very differently. It might not be very easy to do that. We probably wouldn't use an ellipse class seeing as the height and the width are the same, maybe we use like a circle or something different. Even though we can fix this or optimize that, that's not what I want to focus on for this particular demo because ultimately, it might be a bit more involved. A question I might have instead though is this function seems to be getting a lot of bandwidth and doing a lot of work. Yes, there's a lot of type ellipse being allocated and created. But are there other types within this exact same function that are also being allocated a lot and is there another optimization to be had? To answer that question, I want to go over to another one of our views within the diag session. The first thing I want do is actually copy this function because I want to investigate this function more. The question I want to ask myself now is not so much what are the top types being allocated for a specific function, in particular that function I was just looking at, what are the top types that that function is allocating? Actually, we have a Functions view that can help you do just that. This is the Functions view showing you similar data to what we were looking at before, but just grouped differently and visualized a bit differently. Something I want to emphasize with the Allocations view and the Call Tree view, which we'll look at shortly here, and the Functions view is you're looking at similar data, it's just grouped a little bit different. It's like a slightly different pivot table, if you will. In the case of the Functions view, we have the process ID up top and then within that, we have different modules and then within modules, we have specific functions. Now, I had a function in mind that I was interested in. We actually have the search bar here. I'm going to post GeneratePrimes in here and hit "Enter". Now, it's going to bring me straight to that function of interest. When I come to this function of interest, if I expand this particular node, I see all of the top allocation types for this specific function. We have the total allocations for this specific function, we see the self allocations which is the amount of allocations that just this one function is doing, total is including what this function generates and all of its children. Then, we also have the self size in bytes, so the actual amount of memory. But if we dig into the types before, once again, we see okay, ellipse is the top, there's a lot of ellipses being generated within this function. But if we look at this a little bit more, Leslie, there are other types that are also allocating in quite high quantities, even more so than the ellipse. We have 30,000 colors being allocated and 30,000 colored brushes being allocated. They don't take up quite as much memory as the ellipse, but they still take up a sizable amount; over a million bytes for each of these. Now that with this information in mind, I want to go back to source code and say, hey, look, let's look at that function again, but not so much focused on the ellipse, but other of these types and maybe there are other optimizations we had. If we optimize this, even though it wasn't the top type, it's still memory we're saving. I want to go back to the source code. I can do that by right-clicking and saying "Go To Source File" and I have the code up, so let's go back here. As we're looking at this code again, if we look at this for loop more closely, so yes, we're creating an ellipse for each iteration, but we're also filling it with a specific color. That color is actually not changing. But based on the way we have this implemented, we're creating another solid color brush object each time. In this case, this is the color of yellow and in the case of the prime fill color, we had a green. We're creating another solid color brush object and making that green. But we're creating a new object each time in this for loop too. >> Yeah, that's a lot. In the case of the Ellipse, like I was saying before it, yes, there are ways if we want to optimize that we could, but we would be a little bit more involved. We probably wouldn't use the Ellipse class. But in the case of the fill color, we can actually do something pretty quick here. What we can do is instead I have the code commented out. But as an example, we could pull out this new solid color brush objects, and bring it up to a static member right here, and we could assign that to a variable like Fill Color and then prime fill color in the case of the yellow and green. Then instead of doing fill equals this new instance of an object, we just assign it to this static member fill color up here. That's what I have commented out here. So I want to rerun the code because it might take a little while. But basically, if you do that, what you'll see is that all of them will go away. Because we're not necessarily creating a new solid color objects every single time for every iteration of the for loop, we just have that static member that we declared at the top and that's not changing, and then we just reference it each time within the fore loop and it just gets painted that particular color. So that will save us a lot of allocations. Because if you come back to the functions view. This function alone had 30,000 plus allocations of both color and solid color brush and over a million bytes each. All of that will go away essentially. Something I want to point out here is that the nature of the investigation. It's up to you to figure out what you want to optimize. Maybe you do want to optimize and go after the top frame, which was Ellipse in this case. But as we pointed out, the code change might be more expensive or more involved. So it's ultimately up to you to figure out what's worth your time, and what the trade-off is. But in the case of color and solid color brush, that's a quick win, and it's still only going to help your application. So I just wanted to point that out. >> Yeah. I think that's a theme for a lot of profiling investigations like that. Ultimately, it's like, "Okay. How badly do you want to fix this path, if it means having to modify your code in a way that maybe doesn't make sense depending on the context?" >> Totally. In software development you have to deal with these trade-offs all the time. Ultimately, on a profiling team, what we're trying to show you is just data and insights into how your application is actually performing. What you want to do with that data ultimately is up to you. But you have to ultimately decide where you want to spend the time, and what's worth it. >> Not that it's been quoted a bazillion times already, but with great power comes great responsibility. >> Great responsibility. Exactly. Talked about a little bit before how we're showing you different pivots on similar data between the allocations functions and Call Tree View. Now, I want to dig into that third view, the Call Tree View, and we alluded to it earlier, Leslie with the Expand Hot Path feature. So what the Call Tree View is showing you is just what are the code pathways that are allocating the most amount of memory. So with the allocations view, you're filtering by a specific datatype or object type. With the functions view, maybe you had a specific function you want to really drill down into and say, "Okay, I'm interested in this function across the entire time span. " or if you're using the Steamlane Filtered Downtime spin, what are all the allocations happening here? The Call Tree View is just saying, "Okay, maybe I'm not interested, or focused on a specific object type yet, or I'm not focused on a specific function yet, but what are just the code pathways? Just show me the path that a lot of allocations are happening." One thing you can do with the Call Tree View is expand nodes individually. But as we alluded to earlier, what I would recommend people do is start at a node of interest and use the Expand Hot Path feature. What this is doing is essentially showing you where most of your allocations are happening for a given path. To walk through some of the metrics in this view again, so at any given node, we have the total amount of allocations happening. So that's all the allocations at this particular frame and then all of its children. We have self allocations, which is all the allocations just at this particular level. So this is native, but if we wanted to look at Maine, for example, Maine is allocating a different things. Then we have the bytes in terms of memory, not in terms of the number of objects, but in terms of the memory. Then we have the Module Name, which is essentially showing what module that function is associated with. Sometimes they'll be associated with multiple modules. So with the Expand Hot Path algorithm is essentially doing is saying, "Hey, as we're walking down this, if there's a lot of self allocations happening within total allocations, that means you should go into the next function and dig into that a little bit more because that function is contributing to a lot of applications." So I started it up here and then you'd expand Hot Path. What it brings us down to is essentially two things. One is this Generate button click method, which is certainly allocating a lot of memories because that's the button that's essentially triggering that visualization. Then also this allocations frames. So walking through each of these two individually. The allocations frame is saying, hey, at this particular node right above it. So in this case it would be like application.rb, system dot windows. What are all of the top allocations happening for this particular method? So similar to the functions view, but it's specific to this particular Call Tree and Call Path there. That's something important to denote because there are functions in the Call Tree View as well as functions in the functions view. Someone might ask, what's the difference? The difference is the function within the functions view is looking at the data all up. It's combining the functions across all the times it's being called and adding the allocations across all types within that. The Call Tree View is looking at a specific call stack. Any given function might be called many ways, and if we drill down into these different nodes, you'll see a lot of the same functions being called multiple times, but it's showing you each different iteration of a specific call stack. Something you can see in the Hot Path is the specific allocations for node of interest. Also it will sometimes end with another function to look at. I started the Hot Path from this highest node. Something to note is you can really start at any level you want. Let's say I wanted to look at the generate primes function. I can start dot path here too and it'll show me the allocations or other UI external calls that are happening as well. This is just another view another pivot on that data and it's allowing you to go through the Call Trees, and ultimately see what code pathways are allocating the most amount of memory. >> That can be really useful. It seems for a lot of dotNET peeps out there, especially if you're dealing with a lot of graphic intensive things like that application with the prime numbers. >> Yeah. Something I want to call it again is especially in this view, this is a time where you're really probably going to engage some of the time filtering if you're really interested in garbage collection and you want to see, what were the calls. I don't care about necessarily all the functions in the world, but what were the functions being called at this particular time period because when you combine the graphs, we just call it review. Again, we're improving preference bit slow, but it will show up eventually. Yeah, you can combine those two views together. >> Awesome. You mentioned that [inaudible] is still a work in progress for this tool. So anything else on the roadmap for dotNET allocation? >> Yeah, so this actually set ways perfectly into our last view, which is the collections view. Absolutely clear that selection and then go to the collection view. Admittedly, this view is pretty young right now and so we want to work on it. But essentially, as we alluded to earlier, there's a limited amount of memory you have to work with. Is a question of how to allocate and manage it best. Luckily, as we mentioned previously, dotNET does a good job of having the garbage collector come through and automatically scan the keep portion in particular of the memory, and looks at what are objects that are allocating memory but are not being used in cleaning that up. So what the collections view shows you is one instances where garbage collection is being occurred. If I click on a specific row within this table, I see first of all the number of objects that were collected and how many survived. Then we also get our pie charts over here which shows you the top types within each of those garbage collection, what were the types that went away and what survived? Like I said before, this view is more in its infancy. Of course, you can also still time filter, you can still see on the graph where the red bars are occurring. But what we want do here, and we're still working on designs and the best way to bring this out, is what are the actionable insights from this code? So yes, this is showing me a little bit about where garbage collection is taking place and what are the top objects they're surviving are being collected. But we want to in the future show you more insights all around. How do you go back to source and what are the optimizations within your code you can do. It's maybe not how garbage collection happen as often or it more efficiently or better. That's something we want to look at, improving this view as well as also path with the tool. >> Awesome, so many tables to choose from. So it's called the options. Great. >> A lot of different [inaudible] on the same day. >> I like options personally. More customization the better. >> Absolutely. >> Thank you so much [inaudible] for sharing the dotNET allocation tool. If people want to try this out or learn more about this particular tool, where can they go? >> We've got Docs as always, with all of our tools. Docs for the on and out as well are updated. Yeah, we'll point you to those documentations and you get some more of those details. If you have any questions of course, always reach out to us. >> Awesome. This is not the end of our profiling series, so what are we going to talk about next time? >> Next time, S21 is going to cover the dotNET performance tool. So really excited for that one, that is actually our newest tool, if I'm remembering correctly. So that one should be really fun. >> Great. Well, thanks once again for coming, probably going to see in the near future. >> Absolutely. Pleasure as always, Leslie. Thanks for having me. >> Likewise. Until next time, happy coding. [MUSIC]
Info
Channel: Microsoft Visual Studio
Views: 5,260
Rating: undefined out of 5
Keywords:
Id: 60euKwSqT-U
Channel Id: undefined
Length: 32min 27sec (1947 seconds)
Published: Thu Nov 19 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.