CppCon 2018: Jeff Trull “Liberating the Debugging Experience with the GDB Python API”

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- This talk is liberating the Debugging Experience with the GDB Python API, and I'm Jeff Trull. I have to stand near my computer I guess. Alright, so first a little bit about me. I was a hardware person for a long time doing microprocessors. I went from there into electronic CAD, and from that into C++ programming, and I'm now doing independent consulting. I'm the organizer of the San Francisco Emacs Meetup groups. If any of you like Emacs, tomorrow at 8 am, we're gonna have a Birds of the Feather session. So I hope you can all make it to that if you're interested. And finally, I'm available for your projects. General theme of the talk. Actually, two fold. First of all, tools are a force multiplier. It's worth it to take a portion of your team, a portion of your time and build tools that help the rest of the team accelerate their work. Secondly, gdb is powerful and Python has a rich ecosystem of modules, and when you put the two of them together, you can make some amazing tools, and I hope to show you some of those today. General outline in my talk. First of all, I'm going to go over some very basic Python stuff. Including, just the beginnings of the gdb Python API. Then we're gonna dive into four different applications. For each one of those applications, I'm going to introduce problem area, I'm going to talk about some gdb Python API area that might be able to help it. Then most of the time, I'm also going to bring in an external Python module, that can, in combination with the gdb API solve the problem. Then we'll do a little demo, and then we'll wrap up. So first of all, just enough python. Gotta stand closer to this computer. Basic stuff, first of all. Python is Whitespace sensitive, I think that's probably the most famous thing about python. Indentation is used to indicate blocks, so this is your basic "if then" statement. There's no curly braces. Classes in python. I put the same class as a C++ class, and a python class here to make it a bit easier to understand. In this example I've got, a member function, some member data, and the equivalent of a constructor. Also both of these classes are derived from a base class called Base. You're gonna find that this is a common pattern, when you're using the gdb python API. You take a class derived from it, and it overwrites the member function to accomplish your goal. Python in gdb. So, to access python, you just type python, and then you can put a lot of things after it. You can do a one line of code. Here we reverse a list. You can do multiple lines where you just, type everything, hit python, type in python, then enter all the lines you want, then end. Then it runs all of those lines together as a single script. You can also load and import your own scripts, if you specify the python path right. Very basic gdb API usage. Everything that we can normally do on the command line, can also be done with a function called gdb.execute, and you just give, as a string, the same stuff you would've typed on the command line. You can capture the output as well, and then parse it, if you want, and that's one way to interact with the API. But it's not really the best way. A better way, is to use the richer stuff in the API that gives you strongly typed... Well maybe that's the wrong word for python. But gives you objects back with methods and stuff like that. For example, the gdb parse and eval method. If you supply an expression, it gives you something called a gdb value back, and you can do things like, get the address of the value, or the type of the value. You can convert it to a plain python type as well. So let's dive into a few applications. The first one I want to talk about is: Improving Stack Traces. Backtraces in C++ can be pretty confusing. They tend to expose library internals. Often the functional signature is because we have all these types are expanded in the template arguments. They can be very, very verbose. They also can have... They can also show you, the internal calls within the library, that it may be too much information for someone, who's just trying to figure out what's wrong with their code. So, this is a goal I have, for this application, is to take the normal things that you see in a backtrace, and shrink some of the verbose names down, and then to eliminate stack frames that are external to libraries. Does anybody know what this type is? Std::string, that's right. I know, we know it now. But only for libs did C++. But the first time you see it, it's like oh my god, and if you have a vector of these things, with allocators. It just becomes very messy. The tools we can use from the API to help solve this problem are, first of all Frame Decorators, which can change how each frame is displayed. Secondly, Frame Filters which can use to remove frames that are not interesting to us. Like, frames that are inside of library calls. First of all, Decorators. You can change the appearance of any frame. In this case, we are making a decorator, that inherits-- This is that pattern I described earlier. We're inheriting from gdb frame decorator, and we're overriding a member function. In this case, the one that prints the name of the function. This particular decorator, transforms the original name of the function, by writing wrought 13 on it. This is a good trick to play on your co-workers. Frame Filtering, you can remove frames that you don't want to see. For example, if anyone didn't want to see anything from boost in their backtraces, this code would work. So we're going to , to solve the original problem, we're gonna take decorators, and we're gonna build a decorator that uses regexes, to simplify, the complex expanded types, in the backtrace, to make a more concise function name. We're also gonna use the filter to eliminate everything except the original call to the standard library. So here's our demo. The example program I'm going to use is, a broken sort. We're trying here, to sort vectors of vectors of strings, based on a lexicographic compare, on the first two strings in each vector. So, we're using std::sort, and supplying our own comparator. It's just that it's wrong. So if we're using the debugger to debug this, we may not have the best time, the way things are right now. So, here's our std::sort call. We want to get down inside this lamb dist, so that's line 33. So let's place a breakpoint on line 33 and continue. Alright, great we're at our lambda, we're finally back into our code, we passed through the standard library. Now let's see what that backtrace looks like. Uh-oh. So many allocators. Oh that's so painful. Okay now, let's apply our code, our frame decorator, and our frame filter. And see what that looks like now. Ah yeah. We've taken four frames out of here that were internal to std::sort. We've also, changed the names a little bit, so that we just have std vector of std vector of string, instead of that giant thing that expanded. So I think that's a big improvement. Okay, next application. Better Stepping. Often we supply, just like we saw, we supply our own code to a library, in order to... For it to use. So in order to see our code, in order to get to our code, and see what might be wrong with it, we have to step through a whole lot of library code to get there. We can do, like I just did, and set a breakpoint in the middle of our lambda or our visitor, or whatever we're supplying to the algorithm. But it's really painful, and we might miss stuff and so on. So we want to have some way of doing that automatically. The tools we can use from the API for this... Breakpoint is basically, the main function that we're going to use. It's going to help us by creating temporary breakpoints, just like I did manually. To make a breakpoint through the API, it's the main interface is gdb.Breakpoint. You can use the same kind of strings that you used on the command line to indicate breakpoints. You can also take those breakpoints, and then, because they're real objects, you can then manipulate them, you can enable or disable them, you can add a condition. You can even put commands on that breakpoint that will execute when the breakpoint is hit. For example, in this case, when breakpoint is hit, you'll get a popup YouTube window, showing a video of Steve Ballmer. We also have finish breakpoints, which is one of my favorite features. When you type finish on the command line in gdb, and you're inside some function, it jumps to the end of the function, and it does that, behind the scenes by creating breakpoints at the exitpoints of the function. But you're not normally able to add commands and stuff like that to it. But you can do it through the python API, and we're going to do that later. The python module that we're going to apply to this is actually libclang. This is a way to insert symantec information about your running program, which gdb normally doesn't have any idea about. We're going to put these together, and use that to figure out where we should be inserting breakpoints. So, libClangs python bindings, allow you to: Find the current statement, given a filename and a line number. To identify calls, objects with methods, and even lambdas that you supplied within them. What we're gonna do then, is figure out which of those are library calls, and which of those are calls to our code. Actually, it'll be anything that's not a library call. Then we'll use gdb to set temporary breakpoints on that user code, so that we can continue straight through to it. Putting it all together. Connecting gdb to libClang, we can access the current frame, from gdb. We can then ask the frame, what the current file and line number are. With that information then, we can go over to libClang, give that to libClang, and have it give us what is called a cursor, into the AST, that's the Abstract Syntax Tree. The symantec information describing your running program. From there, from the cursor, we can ask question about everything kind of downstream in the tree. Including things like calls to our code and so forth. And finally, we're going to fake single step, by creating and removing breakpoints. So, having all this information about the downstream stuff from libClang. We can then go, make these breakpoints, run continue, and then delete all the breakpoints, it'll as though we just, sort of magically, single-stepped through all the library code. We're going to use the previous example again with the lambda and the sort. Let's see here. Alright, let's see what's stepping into this looks like without our functionality. Not good. Okay. Let's run again. Okay, now we're at sort, and I will import... Our special command. This is the special command I defined. Let's see what happens. Oh yeah, there we are, right in the middle of the lambda. So... It worked. Next example, finding leaks. I don't know if any of you have had this experience, where, you've been trying to persuade other people, your company, to adopt some more modern techniques, and they said "No raw pointers!" And one day they came back to you and said "I've done it, "I've replaced all the raw pointers with shared pointer, "and the program doesn't crash anymore but it does run "out of memory." And so you think, yeah. I kind of think I know maybe what's going on there, there's probably some kind of circular reference problem. This is common enough that I think it's helpful to make a tool, to help us figure out what's going on in cases like that. Now the tool, interestingly. Well, one tool we can use an external tool, is valgrind, and I don't know, unless. Did any of you go to Fred's talk yesterday? I'm sorry, I'm thinking about Greg Laws' talk on, Tuesday. We'll bring up Fred later. Valgrind actually can mimic a gdb server. So, you can actually connect to it with a gdb client, and then it adds features. So, I'll show you how this works. We start valgrind in server mode, like this. And then, we start gdb and we connect to that server. Once we do that, we now have, a regular looking gdb, except, we now have the extra commands, we have commands available through monitor, We've got leak_check, block_list and who_points_at. Unfortunately, there's no python API for these. So we're going to do what I talked about earlier, run gdb execute, parse the output. So now we have, from the monitor commands, we have, information on blocks of allocated memory, and the pointers that they have to other blocks. This can be visualized as a directed graph. We already know, from graph algorithms, that if we have a directed graph, we can find loops within it, and with some well-known algorithms. I'm going to use a python module called graph_tool, which is actually a bound version of boost graph. Bound into python and with some extra features. So the way it's gonna work is, I'm gonna start with a block that we know that has leaked. We're going to then, ask through the monitor commands, what other pointers and what other blocks are pointing to that block? And then, incrementally add on to the graphs. So we get this big directed graph, which gives us all of the pointers and references that are going on in our leaked blocks. Then we'll run the depth first search algorithm on it, and when we re-encounter the same loop, again, the same vertex again, we will know, that we have a reference loop, and we can use, when we recorded each vertex we came from, then we can just read out what the loop was. So let's see how that might work. Here's an example where you've got six blocks and some pointers. We're going to turn each of the blocks into vertices, and each of those pointers is going to become an edge in the directed graph. We're going to start searching at one block that we know has leaked, and then move through the graph, in a depth first manner. Now, as soon as we encounter, a vertex that we've already seen, we know that there's a loop there, and we can simply report them. So let's do a demo. This is my test case for this demo. It's the world's dumbest tasking system. There's just a queue, and on the queue I store, functions or functors that take no arguments, and return void. And then we sort of go and execute them, one at a time. The code that uses this queue, adds, just one task to the list, which does some work, and then stages up another task. And in order to stage the other task, it keeps a reference to the task list. Unfortunately, as you can see, we forgot to actually go and execute any of the tasks, in the tasks list. And as a result, when we exit the main, we're going to have a task list, with references to a task, which has a reference to the task list. So, let's give this a try, let's see if we can find this with our code. Okay, first I start the valgrind server, on my leaky code, and then I start the client, we'll break on main, and continue. Alright, here we are in main. Let's try running one of those monitor commands. Alright, this is just what you normally see from valgrind, it says there aren't any problems yet. So let's go forward here, let's try it again. Still nothing. Okay, here's the end. Now let's try. Oh yeah, we have leaked. Alright, let's see whether it can find any reference loops. This command is called print commander loop, I'm sorry I should've thought of a better name, but anyway. Oh yeah. So we have three blocks out there. It's probably the task list, the block of memory that it allocates, and then the original task itself. So, this doesn't give you enough information though, because, it doesn't-- It's hard to figure out where those blocks came from right. So, I added another parameter. You can add custom parameters. We'll turn this on, and now it's going to show us for each one of these blocks, where it was allocated with a backtrace. Alright, there's the backtrace. So this should be enough information to debug and to figure out where the reference loop is. Alright. Final example, visualizing algorithms. I think that most peoples' code bases contain, some critical, super important, central piece, and then a whole bunch of other stuff. And what I found is that, when we're debugging our code, we have a bug report, or something like that. As long as there's nothing erroneous in the input, we find ourselves always asking the question, what is that core central piece of functionality doing? And so often we, we go in and we have to, turn on logging, or some if def, we have to set the special macro to, recompile so it has all the extra logging. Then we sit there with these pages and pages of log reports and be like, okay, what's going on here? And you start reading and descripting and your drawing a little picture on your desk, or something like that. Wouldn't it be nice if we had some visualization tooling for our critical pieces of code and data in the middle of the application? So, the goal now, is to build a graphical display of an algorithm in action. A simple one. So, we're going to use std::sort on a vector for this purpose. From the API, we're just gonna use breakpoints basically. We're going to use them drive display updates, showing what's happening with the algorithm. We'll use, as a module, we'll use the pyqt5 module. This is, surprise Qt bound into python. It's really easy to use actually, I'm a long time user of Qt, and I found the python version, much easier. The general approach is, I'm going to take the value type, which is a surprise integer, and make a special wrapper for it, then I'm going to instrument that wrapper, so that when something interesting happens, we'll have a breakpoint, basically. And then we can go and update the display with what just happened. We're going to use separate threads, and then a thread safe queue to communicate. So instrumenting the value class. Basically, what I did was, I made it just move only, that makes two fewer things I have to write, I guess, or instrument. So we have move assignment operator, and move constructor, that I'm going instrument with breakpoints, and then I also need to swap, and this is where the finish breakpoint comes in. When you enter swap, we're going to call it std::swap, and that is implemented with move symantecs. So, if I didn't then go and disable the move constructor, and move assignment operator's instrumentation, then we would get a confusing result there, so, I'm disabling those, performing the swap and at the end I re-enable, with a finish breakpoint. Because otherwise, there's no way to breakpoint after that thing, after swap runs. This is sort of the system diagram here. We've got the running program, it's running under gdb, it's omitting breakpoints, we put them through the thread safe queue. We update the event loop, and pie Qt. Alright. The demo and this is the code. We're just randomly shuffling it, and then the instrumentation all starts when we call sort. So, let's give this a try. So, I brought up Fred. If you saw Fred's talk on sorting yesterday, you would've learned that, std::sort works first by running intro sort, which does a bunch of exchanges, it says recursive partitioning, and then it does insertion sort finish up. And that's just what you're going to see here. I kind of thought that when we got to this point it would seem surprisingly long, and indeed it is. (laughing) It actually runs much faster than this. Almost done. Alright, there we go. So, back to the slides. Alright. So, wrapping up. Investing in debug tooling pays off. I truly believe, that for teams of more than a few people, reserving some portion of one engineer, several engineers, for tool development that makes a lot of sense. Focusing on your key data structures and algorithms, or focusing on categories of bugs that seem to come up all the time. Like we did with the leak. Once you develop a body of code, that you want to use to make things easier for debugging. You can put them in a specially named file, gdb will automatically load them, whenever you run gdb, so, you can make this part of your debug build. Python, generally speaking, it's a game changer, because of its vast ecosystem. You can take just anything really, measuring in the program. Imagine like, tracking every memory allocation, of a lifetime of every block, and then doing statistical analysis. I mean, you can do just about anything. There are endless possibilities. And so in conclusion, let's go make some tools. (audience clapping) (laughs) - [Man] Just a short question? - Yeah. - [Man] You mentioned with Clang in python. - Yeah. - [Man] Do I need to compile my project with Clang in order to use that. - Yeah, you kind of do. Yeah, that's true. It's helpful anyway. You also need a compilation database. I can tell you how they do it. It's not too bad. - [Man] I have a question visualization tool. So, for the visualization, can we make it much faster because you have to wait to watch all the other business happening. - Oh my goodness yes. I throttled it, to make it look interesting. Otherwise, it would just be like this. Like you would literally see a sorted array and nothing else. Yeah, this is throttled to like, five or six hundred milliseconds per operation. Just so that you can see it happening. - [Man] Thank you. - [Spectator] I made this because it we went by it so fast, but, couldn't you coordinate to the smart stepping and the filtering of the backtrace together, that is. Can I coordinate so that I step over something, that will skip the same things that are being filtered out of the backtrace. - Oh. Filtered out of the backtrace. Yeah, I think so because there actually, I made parameters for those and I didn't show them. But there's a regexe that you can set right there in gdb set blah, blah, blah regexe, some expression. You can set them both to the same thing. Yeah? - [Man] So, is there some kind of global store where I can co-ordinate with the various, things that I'm writing for gdb? - I'm not sure what you mean. Like, a repost or... - [Man] No, I mean. Would I have to copy and paste code around, or can I actually synthesize it. - You know there's a whole... Yeah, gdb has all these rules about where it looks for things, and yeah there's, there's like global directories, there's local ones. You know it's like infinitely configurable, as you would imagine. - [Man] I think it was very useful to see, that shorter output of about the templated string time, and I wonder if that's going to be possible to add the same kind of peak processing to planned. Because, let's say you compiled a template, and there's like, 4000 lines, and I'll put them in aw one single thing, and which it wrong. So, I think it would be useful feature to add, but, I'm not sure if that's possible, if clang has this kind of python interface to write an extension. - Do you mean to improve the error output from playing? - [Man] Yeah, I mean, client is outputting. We could call it improved. But just to make it shorter, to make the debugging, not debugging, but compiling at least, like faster, does the process become as gone with reading all these messages, it takes a lot of time. - Yeah, I don't know. Clang is pretty hackable, though. - [Man] Good demo. - Well thank you. Anybody else? Okay, I guess that's it then. Thanks for coming. (audience clapping)

Info

Channel: CppCon

Views: 4,148

Rating: undefined out of 5

Keywords: Jeff Trull, CppCon 2018, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording

Id: ck_jCH_G7pA

Channel Id: undefined

Length: 29min 9sec (1749 seconds)

Published: Mon Nov 12 2018