[MUSIC] >> Hi there. My name is Mark Downie. I'm a Program Manager on the Visual Studio Production
Diagnostics team. Today, I'd like to talk
to you about debugging memory dumps with Visual Studio. For problems that do not
manifest in logs or that you cannot investigate by
debugging locally, you might attempt to capture a diagnostic artifact
like a memory dump. Capturing a memory dump
in essence is like taking a high-fidelity
photo of your application. It represents a single
moment in time. It's kind of the
equivalent of stopping your application at a break point. A memory dump is typically
taken when an app running in production is exhibiting some
behavior that you need to mitigate. Unfortunately for most scenarios, attaching a debugger
like Visual Studio in production environments is typically not practical or even possible. To navigate this limitation, we can capture a memory
dump and copy that file to our local PCs and then
open that file in Visual Studio and
use the same set of first-class live debugging tools
we've grown accustomed to. Today, I'm going to show
you how easy it is to get important insights from a
variety of memory dumps. The dump I want to look at today
are referred to as crash dumps. A crash is simply when
your app unexpectedly terminates and we usually capture a crash dump right at
that critical moment. There are many reasons
an app might crash. The most common are typically
unhandled exceptions. These occur where an exception is raised as a first chance exception, but your code does not
handle it very well. The exception goes up the stack
and becomes what we refer to as a second chance exception
and crashes at your process. In order to capture a
memory dump of any kind, I tend to rely on
tools like ProcDump, and this is from Sysinternals, there's lots of
documentation on this and makes a CLI
command line tool that allows you to capture dumps under
a variety of circumstances. In fact, they have a nice
set of parameters that allow you to capture it for may be CPU thresholds or because maybe
you have too much memory. Towards the bottom here
they have a nice list of examples for you to pull
from so I can catch a full dumps of a
particular process ID. I can capture a memory dump if it's exceeding 20 percent of the CPU
for an extended amount of time, I can capture a memory dump if I see a particular
type of exception. All these are great
ways of capturing a variety of memory dumps and
we'll look at a few of those, over the course of this video. I'm going to go ahead
and get started in open a crash-stop that I've collected by hitting
Control+O in Visual Studio, which opens up the file open window. I've got to dump here and
the extension is.dmp here, CrashStackOverflow
and to manage dump, is one of the ones I
used for examples here, so I'm going to go
ahead and open that. That lands me immediately on the
memory dump file summary page, the mini dump file summary page. It has a really
important information, obviously the name of
the file, but the last, basically, essentially
when this dump was taking, the last right that was
taken to this file. It gives me the process name, the application running as. It gives me the process architecture and things like the exception code. The exception code, especially something that you can
use being in search for to find out more
details about that. Gives you the OS version
and the CLR version. Really importantly, it gives
you a list and version of the modules that were loaded
by this particular process. I can use that to
identified maybe if there's version mismatches in
my assumed environment. On the right-hand side are the important actions you can
take against this memory dump. Certainly, starting with
things like setting symbols. As part of your build process, you'll probably produce a bunch of program database files which
essentially help you marry up your code to events in memory that are
occurring in the process. Essentially, if there's an issue I can get right to a particular line of code that's associated with
it by setting the symbol path. The most important action
for managed applications is to debug with managed only. Some of you will have the scenarios with either native or
mixed applications, but for the sake of this
particular demonstration, we're going to focus on
Debug with Managed Only. I'm going to go ahead and hit, "Debug with Managed Only," and
start my debugging session. What this is going to do is, make it as if I'm at a break point, right at the moment this particular exception occurred or this crash exception
was captured. What that sets for me is a
perfect opportunity for me to review this as an use of all the tools I'm used to
when I'm live debugging. Let's take for example
this exception help. This is a typical exceptional, but we would see
during live debugging, if we were to capture
unhandled exception. It's telling me quite explicitly that we have an Exception of type
System.StackOverflowException type. Again, this is just like any live debugging
except for obvious cause. Of course I cannot go forward. I cannot use F5 or
anything like that. All I can do is look at the details of this particular moment in time. I'm going to view details,
I'm going to click on the, "View Details," and that will
pop up the quick watch window. Again, I can get way
more details about the exceptions here if there's additional information
of the innerexception, I can use that or
additional information in the message exception, I can also use that. This is just a great
way of gathering much more data about this
particular scenario. Given that this is a
StackOverflowException, I'm thinking about the fact
that this is out of memory essentially and it's run out of essentially frames
on the call stack, because it's running out of
frames on the call stack, I want to actually go to
the call stack window and I've got my down
the bottom here, click on the, Call Stack," window. What you'll notice here is, we've literally run out to
frames and Visual Studio has intelligently pared down the number of frames on confu
here so I can see what the root of this call
StackOverflowException is. Obviously it starts in main. I've got a method called
infinite recursive, it's obviously quite deliberate. Then it tells me how many times it's essentially repeated the number of frames and you can see that
I've repeated these number of frames over almost 20,000 times. At this point, I know the origins
of this StackOverflowException. It would be great at this
moment if I could look at the code there just to double-check
any assumptions right here. I can actually double-click
on the call stack. What that will do, at this
moment I don't have my symbols, my PDB files lined up. What I could do here is decide, actually go get my
symbols maybe from my build machine and
that's the best way, or if I don't have those handy, I can go ahead and
decompile the source code. That will pull me directly
into the line of code. It looks just like
a live session now. I'm sitting here at
what is essentially a recursive function calling itself. This is the issue that I have and now I can go ahead
and resolve that and maybe open a case with my developers and tell them how
to resolve this particular issue. Once now that I really love using
memory dump analysis for a role revolves around growth in the
memory footprint of your process. When there is unaccounted
for growth over time and unchecked growth over time, I like the idea of
using a memory dump to analyze where that growth might be coming from and whether
it's really healthy or not. I may have see growth in
my memory footprint of an application over the
course of hours or days. If I don't see that
memory recovering, I may decide that this is a great opportunity to
use two memory dumps, one taken at the beginning
and one taken at the end, and compare where the growth is coming from and to see
if it's possible that this memory won't be reclaimed. If it won't be reclaimed,
that is something that I then need to resolve. Let's have a look at that process. If you want to collect
a memory dump, I typically start with a
tool-like dotnet-gcdump. This is a great command-line tool. You can create essentially
a dump that's really super compact and just
concerns itself with the heap. You don't get to see threads, you don't get to see values
of particular objects. You just simply get a list of
objects on the heap and their sizes. This is just a great compact way of comparing two memory snapshots. I'm going to go ahead
and hit, "Control+O," in Visual Studio and open
up the memory dump. This is the actually the second of two memory dumps
that I've taken out. I've taken one at the
start after restarting the application because I want to understand the nature of the growth. Then I would have taken a
second memory dump once that growth has reached growth mechanism where I
think that it's obvious that they're having unreclaimed growth. I'm going to open up the second of the two dumps so that I know what the situation is once
I've leaked my memory. This opens up the Managed Memory
viewer for Visual Studio. It shows me a list of object types and it tells me the count
of those object types. The size of the actual object itself is the first
one, sizing bytes. Then inclusive size includes anything that particular
object references directly. For example, I've a list
object as my first one. It has a size of 8,000 bytes. However, the inclusive size that is including the things it's
referencing inside the list. It's close to 16 million bytes
or 16 brugner or 16 million. In my mind's eye, I'm
immediately thinking to myself, these are likely
culprits for my growth, so this is obvious place to start. Now, normal application here, it may not be this obvious, which is why I think
the ability to compare to your original memory, dump the one you captured first as the baseline, is
incredibly important. I'm going to go ahead and click on "Compare With" and go ahead
and open what is the baseline, the first memory dump I took. This allows me to compare the first memory dump to
the second memory dump. Again, now I have conclusively
seen where the growth is. It's definitely in data records. I'm seeing this increased
by an incredible amount, by 1000 or so. But the question really is, why didn't I reclaim this? Why wasn't this memory reclaimed? This looked like a regular list. Why isn't this list gone away? I've taken these two memory
dumps at two separate points and if I go and look
at the path to root, it makes it quite obvious to me that this list is in fact
a static variable. If you know about static variables, you know that they
actually hang around for the entire lifetime of the process. This won't be cleaned up unless I
am deliberate about cleaning up. Here I found what is essentially
a leak in a static list. In addition to things like
crashes or memory leaks, I also like to think
about how we can use memory dumps when your app
isn't responding correctly, when it's slow or
completely unresponsive. For that, I often like to use the Parallel Stacks
window in Visual Studio. The Parallel Stacks window
is a great way to get the big picture view about an
application that you're using. My friends in our debugging
and diagnostics team like to think of the approach to debugging as thinking about
the big picture first. We want to think about what the
condition of our threads are. Then once we figured out
what the threads are doing, maybe we identify a particular
call stack on a particular thread, and then from there we might
dive deeper and get to code and objects and look at the
analysis from that perspective. The way in which we can use the Parallel Stacks
window is to think from big picture and we
slowly get closer and closer to the problem
at hand. Let's dive in. I've some great friends in
the open source community who have created this application. It is designed to mimic the
stock market in some way, but they've deliberately
created it so that essentially it becomes
unresponsive after a few moments. I'd like to use Visual Studio to help us understand
why it's doing that. I have started my debug action that we've debugged
with Managed Only. I can start it up
and I'm going to go ahead and navigate because I want
to see the big picture here. I'm going to go ahead and navigate
to the parallel stack window. We can use that by going
to the debug window, debug windows and parallel stacks. I want to see the big picture here. Here what it does is give
me a graphical overview of all the threads currently
running in this application. I'm going to scroll here. I'm actually going to zoom out
just a little bit so I can see a little bit more
of the threads here. You'll see it'll show the
relationships between threads, it'll show the unique call stacks that each of these
threads are running. For example, there is a big thought in my head, when I'm seeing this, I'm seeing 90 threads over
here on the right-hand side, I'm seeing 89 of those threads here, and one going in this direction. Same similarly over here, I have three threads and
these are the unique portion. Of these three threads here, this is the common
portion right here and they split off and go in their
different directions up here. What's interesting to
me is this little icon. You'll notice this icon here that
is representative of a deadlock. Now, the fact that we have
an unresponsive app and parallel stacks is immediately telling us that we have
a deadlock scenario, that's really important
because now I'm starting to think about the ways in
which we get deadlocks. From a purely hypothetical or from a theoretical
standpoint, excuse me, you would think about a deadlock
as essentially thread A, having a lock and waiting for a
lock that another thread owns. Typically what usually happens
is that thread B is then waiting on the first thread to
release a lock that it owns. We have this deadlock
situation where both threads are waiting for the other
thread to release something. Now my job is to find out where
that deadlock is originating from. I'm going to use this window
here to review the threads. Here just looking, what I'm seeing, especially
right over here, obviously I've got 90 threads, so these are all going to wait, they're going to continue
to wait indefinitely. If I look here on thread 7048, what I've noticed is that it's
waiting on a lock that is owned by thread 28964. I'm going to go find thread 28964
and see what it's waiting for. If I go over to thread
28964 over here, I've noticed that it's
waiting on a lock owned by thread 7048 and so exactly
the deadlock scenario. We have one thread waiting on a lock that is
owned by another thread, simultaneously, that other thread is waiting on a lock owned
by the first thread. This is where at
deadlock is occurring and parallel stacks window
is shouting that out to me. What would be interesting
right now is that I'd like to see the code that both this Worker Thread and this
Cat Grumpy thread is running. What I can do here
is "Double-click" on the frame that is essentially
doing the waiting here. The Monitor Enter is actually some core code from
system.threading. But this looks like a user
code or is in fact user code, so I'm going to go ahead
and "Double-click" there. If I had symbols from
my build process, I could go find them and associate
them and load them here. But I don't, so I'm
going to go ahead and decompile the source code. Now I see the locks
I've been mentioning. I have a lock on buyer
and a lock on orders. Actually a lock on seller, buyer, and orders. Interesting. Let's go back to
Parallel Stacks window and do that same exercise
for the other thread. We're going to go and look at this thread and we "Double-click" on that
frame and this time we have orders and sellers. That's interesting. If you remember, the other thread was
sellers and orders, this one is orders and sellers. Essentially, this is a
typical deadlock scenario. If we want to eliminate
this problem, we have to ensure that our locks
are done in the same order. Both threads should lock orders
and sellers in that order, or sellers and orders in that order to avoid
the deadlock scenario. Thank you for joining today. I hope this helped in your
understanding of how Visual Studio can be used to help you debug
managed memory dumps. Good look hunting those bugs.