- This talk is liberating
the Debugging Experience with the GDB Python API, and I'm Jeff Trull. I have to stand near my computer I guess. Alright, so first a little bit about me. I was a hardware person for a long time doing microprocessors. I went from there into electronic CAD, and from that into C++ programming, and I'm now doing independent consulting. I'm the organizer of the San
Francisco Emacs Meetup groups. If any of you like
Emacs, tomorrow at 8 am, we're gonna have a Birds
of the Feather session. So I hope you can all make it
to that if you're interested. And finally, I'm available
for your projects. General theme of the talk. Actually, two fold. First of all, tools
are a force multiplier. It's worth it to take
a portion of your team, a portion of your time and
build tools that help the rest of the team accelerate their work. Secondly, gdb is powerful and Python has a rich ecosystem of modules, and when you put the two of them together, you can make some amazing tools, and I hope to show you
some of those today. General outline in my talk. First of all, I'm going to go over some very basic Python stuff. Including, just the beginnings
of the gdb Python API. Then we're gonna dive into
four different applications. For each one of those applications,
I'm going to introduce problem area, I'm going to
talk about some gdb Python API area that might be able to help it. Then most of the time,
I'm also going to bring in an external Python module, that can, in combination with
the gdb API solve the problem. Then we'll do a little demo, and then we'll wrap up. So first of all, just enough python. Gotta stand closer to this computer. Basic stuff, first of all. Python is Whitespace sensitive, I think that's probably the
most famous thing about python. Indentation is used to indicate blocks, so this is your basic "if then" statement. There's no curly braces. Classes in python. I put the same class as a C++ class, and a python class here to make it a bit easier to understand. In this example I've got, a member function, some member data, and the equivalent of a constructor. Also both of these
classes are derived from a base class called Base. You're gonna find that
this is a common pattern, when you're using the gdb python API. You take a class derived from it, and it overwrites the member function to accomplish your goal. Python in gdb. So, to access python,
you just type python, and then you can put a
lot of things after it. You can do a one line of code. Here we reverse a list. You can do multiple lines where you just, type everything, hit
python, type in python, then enter all the lines
you want, then end. Then it runs all of those lines
together as a single script. You can also load and
import your own scripts, if you specify the python path right. Very basic gdb API usage. Everything that we can normally
do on the command line, can also be done with a
function called gdb.execute, and you just give, as a string, the same stuff you would've
typed on the command line. You can capture the output as well, and then parse it, if you want, and that's one way to
interact with the API. But it's not really the best way. A better way, is to use
the richer stuff in the API that gives you strongly typed... Well maybe that's the
wrong word for python. But gives you objects back with methods and stuff like that. For example, the gdb
parse and eval method. If you supply an expression, it gives you something
called a gdb value back, and you can do things like, get the address of the value, or the type of the value. You can convert it to a
plain python type as well. So let's dive into a few applications. The first one I want to talk about is: Improving Stack Traces. Backtraces in C++ can be pretty confusing. They tend to expose library internals. Often the functional signature
is because we have all these types are expanded
in the template arguments. They can be very, very verbose. They also can have... They can also show you, the internal calls within the library, that it may be too much
information for someone, who's just trying to figure out what's wrong with their code. So, this is a goal I have, for this application, is to take the normal things
that you see in a backtrace, and shrink some of the verbose names down, and then to eliminate
stack frames that are external to libraries. Does anybody know what this type is? Std::string, that's right. I know, we know it now. But only for libs did C++. But the first time you see it, it's like oh my god, and if you have a vector of these things, with allocators. It just becomes very messy. The tools we can use from the API to help solve this problem are, first of all Frame Decorators, which can change how
each frame is displayed. Secondly, Frame Filters
which can use to remove frames that are not interesting to us. Like, frames that are
inside of library calls. First of all, Decorators. You can change the
appearance of any frame. In this case, we are making a decorator, that inherits-- This is that pattern I described earlier. We're inheriting from gdb frame decorator, and we're overriding a member function. In this case, the one that
prints the name of the function. This particular decorator, transforms the original
name of the function, by writing wrought 13 on it. This is a good trick to
play on your co-workers. Frame Filtering, you can remove frames that
you don't want to see. For example, if anyone didn't
want to see anything from boost in their backtraces, this code would work. So we're going to , to
solve the original problem, we're gonna take decorators, and we're gonna build a
decorator that uses regexes, to simplify, the complex expanded types, in the backtrace, to make a more concise function name. We're also gonna use the
filter to eliminate everything except the original call
to the standard library. So here's our demo. The example program I'm going to use is, a broken sort. We're trying here, to sort
vectors of vectors of strings, based on a lexicographic compare, on the first two strings in each vector. So, we're using std::sort, and supplying our own comparator. It's just that it's wrong. So if we're using the
debugger to debug this, we may not have the best time, the way things are right now. So, here's our std::sort call. We want to get down inside this lamb dist, so that's line 33. So let's place a breakpoint
on line 33 and continue. Alright, great we're at our lambda, we're finally back into our code, we passed through the standard library. Now let's see what that
backtrace looks like. Uh-oh. So many allocators. Oh that's so painful. Okay now, let's apply our code, our frame decorator, and our frame filter. And see what that looks like now. Ah yeah. We've taken four frames out of here that were internal to std::sort. We've also, changed the names a little bit, so that we just have std
vector of std vector of string, instead of that giant thing that expanded. So I think that's a big improvement. Okay, next application. Better Stepping. Often we supply, just like we saw, we supply our own code to a library, in order to... For it to use. So in order to see our code, in order to get to our code, and see what might be wrong with it, we have to step through a whole lot of library code to get there. We can do, like I just did, and set a breakpoint in the middle of our lambda or our visitor, or whatever we're
supplying to the algorithm. But it's really painful, and we might miss stuff and so on. So we want to have some way
of doing that automatically. The tools we can use
from the API for this... Breakpoint is basically, the main function that we're going to use. It's going to help us by
creating temporary breakpoints, just like I did manually. To make a breakpoint through the API, it's the main interface is gdb.Breakpoint. You can use the same kind
of strings that you used on the command line to
indicate breakpoints. You can also take those breakpoints, and then, because they're real objects, you can then manipulate them, you can enable or disable them, you can add a condition. You can even put commands
on that breakpoint that will execute when the breakpoint is hit. For example, in this case, when breakpoint is hit, you'll get a popup YouTube window, showing a video of Steve Ballmer. We also have finish breakpoints, which is one of my favorite features. When you type finish on
the command line in gdb, and you're inside some function, it jumps to the end of the function, and it does that, behind the scenes by creating breakpoints at the exitpoints of the function. But you're not normally
able to add commands and stuff like that to it. But you can do it through the python API, and we're going to do that later. The python module that
we're going to apply to this is actually libclang. This is a way to insert
symantec information about your running program, which gdb normally doesn't
have any idea about. We're going to put these together, and use that to figure out where we should be inserting breakpoints. So, libClangs python bindings, allow you to: Find the current statement, given a filename and a line number. To identify calls, objects with methods, and even lambdas that
you supplied within them. What we're gonna do then, is figure out which of
those are library calls, and which of those are calls to our code. Actually, it'll be anything
that's not a library call. Then we'll use gdb to
set temporary breakpoints on that user code, so that we can continue
straight through to it. Putting it all together. Connecting gdb to libClang, we can access the current frame, from gdb. We can then ask the frame, what the current file and line number are. With that information then, we can go over to libClang, give that to libClang, and
have it give us what is called a cursor, into the AST, that's the Abstract Syntax Tree. The symantec information
describing your running program. From there, from the cursor,
we can ask question about everything kind of downstream in the tree. Including things like calls
to our code and so forth. And finally, we're going
to fake single step, by creating and removing breakpoints. So, having all this information
about the downstream stuff from libClang. We can then go, make these breakpoints, run continue, and then delete all the breakpoints, it'll as though we just, sort of magically, single-stepped through
all the library code. We're going to use the previous example again with the lambda and the sort. Let's see here. Alright, let's see what's
stepping into this looks like without our functionality. Not good. Okay. Let's run again. Okay, now we're at sort, and I will import... Our special command. This is the special command I defined. Let's see what happens. Oh yeah, there we are, right in the middle of the lambda. So... It worked. Next example, finding leaks. I don't know if any of you
have had this experience, where, you've been trying
to persuade other people, your company, to adopt some
more modern techniques, and they said "No raw pointers!" And one day they came back to
you and said "I've done it, "I've replaced all the raw
pointers with shared pointer, "and the program doesn't
crash anymore but it does run "out of memory." And so you think, yeah. I kind of think I know
maybe what's going on there, there's probably some kind of
circular reference problem. This is common enough
that I think it's helpful to make a tool, to help us figure out what's
going on in cases like that. Now the tool, interestingly. Well, one tool we can
use an external tool, is valgrind, and I don't know, unless. Did any of you go to
Fred's talk yesterday? I'm sorry, I'm thinking about
Greg Laws' talk on, Tuesday. We'll bring up Fred later. Valgrind actually can mimic a gdb server. So, you can actually connect
to it with a gdb client, and then it adds features. So, I'll show you how this works. We start valgrind in
server mode, like this. And then, we start gdb and
we connect to that server. Once we do that, we now
have, a regular looking gdb, except, we now have the extra commands, we have commands
available through monitor, We've got leak_check,
block_list and who_points_at. Unfortunately, there's
no python API for these. So we're going to do what
I talked about earlier, run gdb execute, parse the output. So now we have, from the monitor commands, we have, information on
blocks of allocated memory, and the pointers that
they have to other blocks. This can be visualized
as a directed graph. We already know, from graph algorithms, that if we have a directed graph, we can find loops within it, and with some well-known algorithms. I'm going to use a python
module called graph_tool, which is actually a bound
version of boost graph. Bound into python and
with some extra features. So the way it's gonna work is, I'm gonna start with a block
that we know that has leaked. We're going to then, ask through the monitor commands, what other pointers and
what other blocks are pointing to that block? And then, incrementally
add on to the graphs. So we get this big directed graph, which gives us all of the pointers and references that are going
on in our leaked blocks. Then we'll run the depth
first search algorithm on it, and when we re-encounter
the same loop, again, the same vertex again, we will know, that we
have a reference loop, and we can use, when we recorded
each vertex we came from, then we can just read
out what the loop was. So let's see how that might work. Here's an example where
you've got six blocks and some pointers. We're going to turn each of
the blocks into vertices, and each of those pointers
is going to become an edge in the directed graph. We're going to start
searching at one block that we know has leaked, and then move through the graph, in a depth first manner. Now, as soon as we encounter, a vertex that we've already seen, we know that there's a loop there, and we can simply report them. So let's do a demo. This is my test case for this demo. It's the world's dumbest tasking system. There's just a queue, and on the queue I store, functions or functors
that take no arguments, and return void. And then we sort of go and
execute them, one at a time. The code that uses this queue, adds, just one task to the list, which does some work, and then stages up another task. And in order to stage the other task, it keeps a reference to the task list. Unfortunately, as you can see, we forgot to actually go and execute any of the tasks, in the tasks list. And as a result, when we exit the main, we're going to have a task list, with references to a task, which has a reference to the task list. So, let's give this a try, let's see if we can
find this with our code. Okay, first I start the valgrind server, on my leaky code, and then I start the client, we'll break on main, and continue. Alright, here we are in main. Let's try running one of
those monitor commands. Alright, this is just what you
normally see from valgrind, it says there aren't any problems yet. So let's go forward here, let's try it again. Still nothing. Okay, here's the end. Now let's try. Oh yeah, we have leaked. Alright, let's see whether it
can find any reference loops. This command is called print
commander loop, I'm sorry I should've thought of a
better name, but anyway. Oh yeah. So we have three blocks out there. It's probably the task list, the block of memory that it allocates, and then the original task itself. So, this doesn't give you
enough information though, because, it doesn't-- It's hard to figure out where
those blocks came from right. So, I added another parameter. You can add custom parameters. We'll turn this on, and now it's going to show
us for each one of these blocks, where it was
allocated with a backtrace. Alright, there's the backtrace. So this should be enough
information to debug and to figure out where
the reference loop is. Alright. Final example, visualizing algorithms. I think that most peoples'
code bases contain, some critical, super
important, central piece, and then a whole bunch of other stuff. And what I found is that, when we're debugging our code, we have a bug report,
or something like that. As long as there's nothing
erroneous in the input, we find ourselves always
asking the question, what is that core central piece of functionality doing? And so often we, we go in and we have to, turn on logging, or some if def, we have to set the special macro to, recompile so it has all the extra logging. Then we sit there with
these pages and pages of log reports and be like,
okay, what's going on here? And you start reading and descripting and your drawing a little
picture on your desk, or something like that. Wouldn't it be nice if
we had some visualization tooling for our critical
pieces of code and data in the middle of the application? So, the goal now, is to
build a graphical display of an algorithm in action. A simple one. So, we're going to use
std::sort on a vector for this purpose. From the API, we're just gonna
use breakpoints basically. We're going to use them
drive display updates, showing what's happening
with the algorithm. We'll use, as a module, we'll use the pyqt5 module. This is, surprise Qt bound into python. It's really easy to use actually, I'm a long time user of Qt, and I found the python
version, much easier. The general approach is, I'm going to take the value type, which is a surprise integer, and make a special wrapper for it, then I'm going to instrument that wrapper, so that when something
interesting happens, we'll have a breakpoint, basically. And then we can go and update the display with what just happened. We're going to use separate threads, and then a thread safe
queue to communicate. So instrumenting the value class. Basically, what I did was, I made it just move only, that makes two fewer
things I have to write, I guess, or instrument. So we have move assignment operator, and move constructor, that I'm going instrument
with breakpoints, and then I also need to swap, and this is where the
finish breakpoint comes in. When you enter swap, we're going to call it std::swap, and that is implemented with move symantecs. So, if I didn't then go and
disable the move constructor, and move assignment
operator's instrumentation, then we would get a
confusing result there, so, I'm disabling those, performing the swap and at the end I re-enable, with a finish breakpoint. Because otherwise, there's
no way to breakpoint after that thing, after swap runs. This is sort of the system diagram here. We've got the running program, it's running under gdb,
it's omitting breakpoints, we put them through the thread safe queue. We update the event loop, and pie Qt. Alright. The demo and this is the code. We're just randomly shuffling it, and then the instrumentation all starts when we call sort. So, let's give this a try. So, I brought up Fred. If you saw Fred's talk
on sorting yesterday, you would've learned that, std::sort works first
by running intro sort, which does a bunch of exchanges, it says recursive partitioning, and then it does insertion sort finish up. And that's just what
you're going to see here. I kind of thought that when
we got to this point it would seem surprisingly long, and indeed it is. (laughing) It actually runs much faster than this. Almost done. Alright, there we go. So, back to the slides. Alright. So, wrapping up. Investing in debug tooling pays off. I truly believe, that for teams
of more than a few people, reserving some portion of one
engineer, several engineers, for tool development that
makes a lot of sense. Focusing on your key data
structures and algorithms, or focusing on categories of bugs that seem to come up all the time. Like we did with the leak. Once you develop a body of code, that you want to use to make
things easier for debugging. You can put them in a
specially named file, gdb will automatically load them, whenever you run gdb, so, you can make this
part of your debug build. Python, generally speaking,
it's a game changer, because of its vast ecosystem. You can take just anything really, measuring in the program. Imagine like, tracking every memory allocation, of a lifetime of every block, and then doing statistical analysis. I mean, you can do just about anything. There are endless possibilities. And so in conclusion, let's go make some tools. (audience clapping) (laughs) - [Man] Just a short question? - Yeah. - [Man] You mentioned
with Clang in python. - Yeah. - [Man] Do I need to compile
my project with Clang in order to use that. - Yeah, you kind of do. Yeah, that's true. It's helpful anyway. You also need a compilation database. I can tell you how they do it. It's not too bad. - [Man] I have a question
visualization tool. So, for the visualization, can we make it much faster because you have to wait to watch all the other business happening. - Oh my goodness yes. I throttled it, to make
it look interesting. Otherwise, it would just be like this. Like you would literally
see a sorted array and nothing else. Yeah, this is throttled to like, five or six hundred
milliseconds per operation. Just so that you can see it happening. - [Man] Thank you. - [Spectator] I made this because it we went by it so fast, but, couldn't you coordinate
to the smart stepping and the filtering of the
backtrace together, that is. Can I coordinate so that
I step over something, that will skip the same
things that are being filtered out of the backtrace. - Oh. Filtered out of the backtrace. Yeah, I think so because there actually, I made parameters for those
and I didn't show them. But there's a regexe that you can set right there in gdb set blah, blah, blah regexe, some expression. You can set them both to the same thing. Yeah? - [Man] So, is there some kind of global store where I can
co-ordinate with the various, things that I'm writing for gdb? - I'm not sure what you
mean. Like, a repost or... - [Man] No, I mean. Would I have to copy
and paste code around, or can I actually synthesize it. - You know there's a whole... Yeah, gdb has all these
rules about where it looks for things, and yeah there's, there's
like global directories, there's local ones. You know it's like
infinitely configurable, as you would imagine. - [Man] I think it was very useful to see, that shorter output of about
the templated string time, and I wonder if that's
going to be possible to add the same kind of
peak processing to planned. Because, let's say you
compiled a template, and there's like, 4000 lines, and I'll put
them in aw one single thing, and which it wrong. So, I think it would be
useful feature to add, but, I'm not sure if that's possible, if clang has this kind of python interface to write an extension. - Do you mean to improve the error output from playing? - [Man] Yeah, I mean,
client is outputting. We could call it improved. But just to make it shorter, to make the debugging, not debugging, but compiling at least, like faster, does the process become
as gone with reading all these messages, it
takes a lot of time. - Yeah, I don't know. Clang is pretty hackable, though. - [Man] Good demo. - Well thank you. Anybody else? Okay, I guess that's it then. Thanks for coming. (audience clapping)