Diagnosing .NET memory dumps in Visual Studio 2022

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC] >> Hi there. My name is Mark Downie. I'm a Program Manager on the Visual Studio Production Diagnostics team. Today, I'd like to talk to you about debugging memory dumps with Visual Studio. For problems that do not manifest in logs or that you cannot investigate by debugging locally, you might attempt to capture a diagnostic artifact like a memory dump. Capturing a memory dump in essence is like taking a high-fidelity photo of your application. It represents a single moment in time. It's kind of the equivalent of stopping your application at a break point. A memory dump is typically taken when an app running in production is exhibiting some behavior that you need to mitigate. Unfortunately for most scenarios, attaching a debugger like Visual Studio in production environments is typically not practical or even possible. To navigate this limitation, we can capture a memory dump and copy that file to our local PCs and then open that file in Visual Studio and use the same set of first-class live debugging tools we've grown accustomed to. Today, I'm going to show you how easy it is to get important insights from a variety of memory dumps. The dump I want to look at today are referred to as crash dumps. A crash is simply when your app unexpectedly terminates and we usually capture a crash dump right at that critical moment. There are many reasons an app might crash. The most common are typically unhandled exceptions. These occur where an exception is raised as a first chance exception, but your code does not handle it very well. The exception goes up the stack and becomes what we refer to as a second chance exception and crashes at your process. In order to capture a memory dump of any kind, I tend to rely on tools like ProcDump, and this is from Sysinternals, there's lots of documentation on this and makes a CLI command line tool that allows you to capture dumps under a variety of circumstances. In fact, they have a nice set of parameters that allow you to capture it for may be CPU thresholds or because maybe you have too much memory. Towards the bottom here they have a nice list of examples for you to pull from so I can catch a full dumps of a particular process ID. I can capture a memory dump if it's exceeding 20 percent of the CPU for an extended amount of time, I can capture a memory dump if I see a particular type of exception. All these are great ways of capturing a variety of memory dumps and we'll look at a few of those, over the course of this video. I'm going to go ahead and get started in open a crash-stop that I've collected by hitting Control+O in Visual Studio, which opens up the file open window. I've got to dump here and the extension is.dmp here, CrashStackOverflow and to manage dump, is one of the ones I used for examples here, so I'm going to go ahead and open that. That lands me immediately on the memory dump file summary page, the mini dump file summary page. It has a really important information, obviously the name of the file, but the last, basically, essentially when this dump was taking, the last right that was taken to this file. It gives me the process name, the application running as. It gives me the process architecture and things like the exception code. The exception code, especially something that you can use being in search for to find out more details about that. Gives you the OS version and the CLR version. Really importantly, it gives you a list and version of the modules that were loaded by this particular process. I can use that to identified maybe if there's version mismatches in my assumed environment. On the right-hand side are the important actions you can take against this memory dump. Certainly, starting with things like setting symbols. As part of your build process, you'll probably produce a bunch of program database files which essentially help you marry up your code to events in memory that are occurring in the process. Essentially, if there's an issue I can get right to a particular line of code that's associated with it by setting the symbol path. The most important action for managed applications is to debug with managed only. Some of you will have the scenarios with either native or mixed applications, but for the sake of this particular demonstration, we're going to focus on Debug with Managed Only. I'm going to go ahead and hit, "Debug with Managed Only," and start my debugging session. What this is going to do is, make it as if I'm at a break point, right at the moment this particular exception occurred or this crash exception was captured. What that sets for me is a perfect opportunity for me to review this as an use of all the tools I'm used to when I'm live debugging. Let's take for example this exception help. This is a typical exceptional, but we would see during live debugging, if we were to capture unhandled exception. It's telling me quite explicitly that we have an Exception of type System.StackOverflowException type. Again, this is just like any live debugging except for obvious cause. Of course I cannot go forward. I cannot use F5 or anything like that. All I can do is look at the details of this particular moment in time. I'm going to view details, I'm going to click on the, "View Details," and that will pop up the quick watch window. Again, I can get way more details about the exceptions here if there's additional information of the innerexception, I can use that or additional information in the message exception, I can also use that. This is just a great way of gathering much more data about this particular scenario. Given that this is a StackOverflowException, I'm thinking about the fact that this is out of memory essentially and it's run out of essentially frames on the call stack, because it's running out of frames on the call stack, I want to actually go to the call stack window and I've got my down the bottom here, click on the, Call Stack," window. What you'll notice here is, we've literally run out to frames and Visual Studio has intelligently pared down the number of frames on confu here so I can see what the root of this call StackOverflowException is. Obviously it starts in main. I've got a method called infinite recursive, it's obviously quite deliberate. Then it tells me how many times it's essentially repeated the number of frames and you can see that I've repeated these number of frames over almost 20,000 times. At this point, I know the origins of this StackOverflowException. It would be great at this moment if I could look at the code there just to double-check any assumptions right here. I can actually double-click on the call stack. What that will do, at this moment I don't have my symbols, my PDB files lined up. What I could do here is decide, actually go get my symbols maybe from my build machine and that's the best way, or if I don't have those handy, I can go ahead and decompile the source code. That will pull me directly into the line of code. It looks just like a live session now. I'm sitting here at what is essentially a recursive function calling itself. This is the issue that I have and now I can go ahead and resolve that and maybe open a case with my developers and tell them how to resolve this particular issue. Once now that I really love using memory dump analysis for a role revolves around growth in the memory footprint of your process. When there is unaccounted for growth over time and unchecked growth over time, I like the idea of using a memory dump to analyze where that growth might be coming from and whether it's really healthy or not. I may have see growth in my memory footprint of an application over the course of hours or days. If I don't see that memory recovering, I may decide that this is a great opportunity to use two memory dumps, one taken at the beginning and one taken at the end, and compare where the growth is coming from and to see if it's possible that this memory won't be reclaimed. If it won't be reclaimed, that is something that I then need to resolve. Let's have a look at that process. If you want to collect a memory dump, I typically start with a tool-like dotnet-gcdump. This is a great command-line tool. You can create essentially a dump that's really super compact and just concerns itself with the heap. You don't get to see threads, you don't get to see values of particular objects. You just simply get a list of objects on the heap and their sizes. This is just a great compact way of comparing two memory snapshots. I'm going to go ahead and hit, "Control+O," in Visual Studio and open up the memory dump. This is the actually the second of two memory dumps that I've taken out. I've taken one at the start after restarting the application because I want to understand the nature of the growth. Then I would have taken a second memory dump once that growth has reached growth mechanism where I think that it's obvious that they're having unreclaimed growth. I'm going to open up the second of the two dumps so that I know what the situation is once I've leaked my memory. This opens up the Managed Memory viewer for Visual Studio. It shows me a list of object types and it tells me the count of those object types. The size of the actual object itself is the first one, sizing bytes. Then inclusive size includes anything that particular object references directly. For example, I've a list object as my first one. It has a size of 8,000 bytes. However, the inclusive size that is including the things it's referencing inside the list. It's close to 16 million bytes or 16 brugner or 16 million. In my mind's eye, I'm immediately thinking to myself, these are likely culprits for my growth, so this is obvious place to start. Now, normal application here, it may not be this obvious, which is why I think the ability to compare to your original memory, dump the one you captured first as the baseline, is incredibly important. I'm going to go ahead and click on "Compare With" and go ahead and open what is the baseline, the first memory dump I took. This allows me to compare the first memory dump to the second memory dump. Again, now I have conclusively seen where the growth is. It's definitely in data records. I'm seeing this increased by an incredible amount, by 1000 or so. But the question really is, why didn't I reclaim this? Why wasn't this memory reclaimed? This looked like a regular list. Why isn't this list gone away? I've taken these two memory dumps at two separate points and if I go and look at the path to root, it makes it quite obvious to me that this list is in fact a static variable. If you know about static variables, you know that they actually hang around for the entire lifetime of the process. This won't be cleaned up unless I am deliberate about cleaning up. Here I found what is essentially a leak in a static list. In addition to things like crashes or memory leaks, I also like to think about how we can use memory dumps when your app isn't responding correctly, when it's slow or completely unresponsive. For that, I often like to use the Parallel Stacks window in Visual Studio. The Parallel Stacks window is a great way to get the big picture view about an application that you're using. My friends in our debugging and diagnostics team like to think of the approach to debugging as thinking about the big picture first. We want to think about what the condition of our threads are. Then once we figured out what the threads are doing, maybe we identify a particular call stack on a particular thread, and then from there we might dive deeper and get to code and objects and look at the analysis from that perspective. The way in which we can use the Parallel Stacks window is to think from big picture and we slowly get closer and closer to the problem at hand. Let's dive in. I've some great friends in the open source community who have created this application. It is designed to mimic the stock market in some way, but they've deliberately created it so that essentially it becomes unresponsive after a few moments. I'd like to use Visual Studio to help us understand why it's doing that. I have started my debug action that we've debugged with Managed Only. I can start it up and I'm going to go ahead and navigate because I want to see the big picture here. I'm going to go ahead and navigate to the parallel stack window. We can use that by going to the debug window, debug windows and parallel stacks. I want to see the big picture here. Here what it does is give me a graphical overview of all the threads currently running in this application. I'm going to scroll here. I'm actually going to zoom out just a little bit so I can see a little bit more of the threads here. You'll see it'll show the relationships between threads, it'll show the unique call stacks that each of these threads are running. For example, there is a big thought in my head, when I'm seeing this, I'm seeing 90 threads over here on the right-hand side, I'm seeing 89 of those threads here, and one going in this direction. Same similarly over here, I have three threads and these are the unique portion. Of these three threads here, this is the common portion right here and they split off and go in their different directions up here. What's interesting to me is this little icon. You'll notice this icon here that is representative of a deadlock. Now, the fact that we have an unresponsive app and parallel stacks is immediately telling us that we have a deadlock scenario, that's really important because now I'm starting to think about the ways in which we get deadlocks. From a purely hypothetical or from a theoretical standpoint, excuse me, you would think about a deadlock as essentially thread A, having a lock and waiting for a lock that another thread owns. Typically what usually happens is that thread B is then waiting on the first thread to release a lock that it owns. We have this deadlock situation where both threads are waiting for the other thread to release something. Now my job is to find out where that deadlock is originating from. I'm going to use this window here to review the threads. Here just looking, what I'm seeing, especially right over here, obviously I've got 90 threads, so these are all going to wait, they're going to continue to wait indefinitely. If I look here on thread 7048, what I've noticed is that it's waiting on a lock that is owned by thread 28964. I'm going to go find thread 28964 and see what it's waiting for. If I go over to thread 28964 over here, I've noticed that it's waiting on a lock owned by thread 7048 and so exactly the deadlock scenario. We have one thread waiting on a lock that is owned by another thread, simultaneously, that other thread is waiting on a lock owned by the first thread. This is where at deadlock is occurring and parallel stacks window is shouting that out to me. What would be interesting right now is that I'd like to see the code that both this Worker Thread and this Cat Grumpy thread is running. What I can do here is "Double-click" on the frame that is essentially doing the waiting here. The Monitor Enter is actually some core code from system.threading. But this looks like a user code or is in fact user code, so I'm going to go ahead and "Double-click" there. If I had symbols from my build process, I could go find them and associate them and load them here. But I don't, so I'm going to go ahead and decompile the source code. Now I see the locks I've been mentioning. I have a lock on buyer and a lock on orders. Actually a lock on seller, buyer, and orders. Interesting. Let's go back to Parallel Stacks window and do that same exercise for the other thread. We're going to go and look at this thread and we "Double-click" on that frame and this time we have orders and sellers. That's interesting. If you remember, the other thread was sellers and orders, this one is orders and sellers. Essentially, this is a typical deadlock scenario. If we want to eliminate this problem, we have to ensure that our locks are done in the same order. Both threads should lock orders and sellers in that order, or sellers and orders in that order to avoid the deadlock scenario. Thank you for joining today. I hope this helped in your understanding of how Visual Studio can be used to help you debug managed memory dumps. Good look hunting those bugs.
Info
Channel: Microsoft Visual Studio
Views: 7,103
Rating: undefined out of 5
Keywords:
Id: JBRKZyf7Db4
Channel Id: undefined
Length: 18min 12sec (1092 seconds)
Published: Mon Nov 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.