Extending GDB with Python - Lisa Roach

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] cool so I'm actually really excited today I get to talk about one of my favorite things with Python that I think is really not super well known has anybody here tried extending gdb with Python before using the gdb Python extensions okay we have like two people awesome so yeah this is not a super popular thing to do but I think it's really interesting and really fun to just play with so this is an advanced talk hopefully you already know Python hopefully you maybe know gdb even and might know some of the Python internals that would help as well quick about me I am a C Python core developer I work at Facebook as a production engineer on the Python Foundation team so you could say I dabble in Python sometimes and I actually started playing with this as a part of my job I wanted to be able to access running Python programs and debug them without having to stop so I started looking into gdb because that's what gdb is really really good at so our high level goal today we want to access a running Python process perform live debugging on that process and do a little bit extra beyond what gdb can normally do so a little bit beyond Gd B's typical capabilities before we get into that though just in case people don't know a lot about gdb will do a quick gdb primer and this will just be the building box the most important things you need to know for this talk we can have a lot of talks a lot of much longer talks on gdb and all the capabilities it has so I'm just gonna talk about the basics you need to know to understand this talk today first things first what is GDB GDB is probably the most popular unix debugger of all time I dare say it allows you to see what's going on inside a process so while that process is running we can use gdb to attach to it see what's happening inside if you've ever programmed on Linux using C or C++ you've probably used gdb before what can I do well if you go to the gdb Docs you will see a lot of things that it can do to sum it up though in one sentence we can read and write data from the program so the program that we're attempting to debug we can read data we can see what variables are storing and we can write data we can send function calls we can change the what variables store it's really similar to PDB so if people are just Python developers and you've used PDB to debug stuff PDB you can look inside variables see what they're doing you can send code to the process it's pretty much the same out-of-the-box gdb supports a pretty good number of languages but it does not by default support Python you can see it sports c c++ go rust assembly but python out-of-the-box I cannot debug a running Python process and that's pretty much my main goal for today it's debug a Python process so we'll have to go over how you add the Python extensions to allow gdb to debug Python now this is almost so basic it's hard to explain gdb commands if you've used PDB you use PDB commands commands are the kind of one word one letter thing you type into the gdb CLI that does stuff so break will set a breakpoint for us next we'll move us to the next line if you have no idea what you're doing help is a gdb command that will tell you all the commands you can use so this is a gdb command just know what that is understand what that is the one command I want you to be aware of is the call command pretty much the same as the print command we'll go into it in a little bit more detail but the call command is going to allow us to call code that the debug e the program being debugged will execute itself so not the gdb process it's we can use this to inject code that gets run by the debug process then there's command files again this is actually really simple a command file is essentially just a script that you take gdb commands that we saw in the last page write them in a text file gdb could execute it the same way you would normally execute execute a script it just runs all the command all the commands sequentially unless you use flow control to run a gdb command file you can source it from within gdb or you can launch it on the command line with dash X okay whew we're past all of the boring stuff that is gdb and that's everything you need to know to get this done now we're gonna talk about extending gdb with Python so first things first out of the box gdb cannot debug Python that's because it doesn't understand the high-level c python interpreter information about python it can see you have an object and that it's a pointer but it doesn't know it's a PI object it doesn't know anything python ee about that object so we need to configure gdb with the width Python flag this gives us Python extensions with gdb so it's adding a bunch of capabilities to GDP what it actually does is it embeds a c python interpreter inside of gdp which is kind of wild so don't ask me how it works gdb now has a c python interpreter inside of it now there's another step that's actually a bit harder you need to run your process that you want to debug with python debug symbols i have done this on CentOS open two and MacOS with varying degrees of success and difficulty this seems like it should be super easy but sometimes it is sometimes it's just download a new debug package or compile Python with debug flag on other times you need to move up stuff around add things to GDP in it I will include on the very last slide some documentation that will point you to how to set this up hopefully it's easy for you sometimes it was for me but you have to run your debug program with Python debug symbols in it so that way GDP has access to the symbol table so what do we get well we add gdb extensions well we get a few things first thing of course we have access to high-level cpython interpreter information from gdb so now we no longer just see that that PI object is a generic pointer we now know it is a Python object we also get a bunch of new commands that are Python specific like PI list and PI PT which I'll show you on the next slide in addition to that we can access gdb commands from Python so gdb gives us this new module it's like a secret module you can only find it when you launch gdb but the gdb module gives you a whole API that lets you do gdb stuff using Python code and then on top of that kind of bundled into two we can subclass the gdb command class and create our own custom gdb commands in Python so this is how we're going to take what gdb can normally do and make it more we can extend it using Python because we can build our own custom Python commands all right so some of the basic commands that we get these are now built into Python with gdb we have PI list this is just like PD B's list PI bTW it's like gdb SBT except it's the back-trace for python and then the Python command this will actually drop us into an ant that see Python interpreter that's embedded into gdb so let me show you what that looks like on the left hand side I will launch a super simple example process it's just a loop that prints out its own pit over and over again so that way we don't have to look for the pit every time and on the right hand side we'll use gdb to attach to that pit and I'll show you what these commands look like so pi list first off will show us the list yeah whoo it shows us the Python code and it shows us where we are in that Python code where we're executing pi bTW we can see the back trace so really useful if you're debugging of a tricky bug in a C extension maybe and then Python so python drops us into an interpreter it's not a repple like you're used to a system Python it doesn't execute right away we have to end it and then gdb will execute all of our code but this is the C Python interpreter that is embedded into gdb so pretty neat now we're able to run Python code within gdb directly then the second thing that the gdb Python extensions gave us was the ability to call gdb commands from Python we can do that with the import gdb the gdb module now the gdb module is only available to you in that C Python interpreter in gdb so if you're on your laptop right now and a Python interpreter trying to import gdb it will not work you cannot pip install gdb this is like a secret module that you can only get to if you pull all the right levers and configure everything correctly and do it all right you can get the magical gdb module so just to be super clear I drew your diagram I'm not a designer so gdb see python is one thing in gdb and then there's the system Python says some Python will not import gdb for you you have to be within gdb so kind of cool let's try it so we're in gdb let's get into our little Python interpreter and I decided to just print the dirt of gdb to show you what all is within this module as you can imagine it's a lot of stuff there are a lot of attributes there are a bunch of methods if you know GDB GDB on unix systems it's p trace essentially under the hood so you have a lot of low-level kernel stuff you can select this frame you can unwind things you can get the stack lookup symbols it's all super low-level all really powerful pretty cool but programming within that C Python interpreter is not the best experience I don't want to be typing in that little interpreter and hitting end and having to do that every single time I want to write stuff in a script and execute things in a script like I normally do as a developer but it's actually pretty easy to do so I can create an new file a new script do this exact same command and treat it like a command file that's another thing that gdb extensions give up gives us is gdb now understands dot py extensions and can find it and execute it the same way you would execute a gdb command file so all we have to do is pass this command file with - XOR with source and it'll execute the code for us I didn't show you it again because it looks exactly the same as what you saw in the last screen but that's how you would do it in a command file so we saw we have a whole bunch of gdb methods available to us but there is one method to rule them all and that is gdb execute gdb execute allows us to execute any gdb command as a string form so if we don't have built into the API some command we can just pass it into gdb not execute and gdb will execute it so for example we talked before about the call function was going to be important I can pass in a gdb not execute call in quotes and gdb will execute this function call so let's take a minute to talk about call call is actually a fairly complicated thing that's going on what it will do is it will take the it will look up in the symbol table the function that you've passed in here so let's say you can you call function power it will look up in the symbol table find function power find the memory address of POW and then force the debug process to jump to that memory address and start running from there so basically it pushes to the top of the stack of the debug process this function that you've passed in this is also why we need to have symbols loaded so this is why you need to use the Python debug symbols because we need symbols available so that call can look up functions and execute them in the Python debug symbol case we get debug symbols for the C Python interpreter so that means I can do C Python code calls here which might not be something you're super familiar with because you're a Python developer you don't know see you don't necessarily understand the whole back-end of Python but Python has given you a wonderful little tool called the C API so the C API allows us to make C Python calls fairly simply and there's a few calls you'll need to know and it will be all the C code you ever need to know for the rest of your life first thing PI runs simple string what this will do this is C code it will execute any Python string that you pass into it so take all those text books you bought in college that taught you see throw them on a bonfire learn this one C command of a teach yourself Python because now you can run C Python code you're on Python code directly in the C Python interpreter so this is a pretty powerful command and this is gonna get us really far without knowing C so we start piecing these things together we have gdb execute gdb executes can execute call commands call can execute C code because we've loaded the symbols for C Python into our GDP into our gdb process so put it all together we call pi run simple string we can just print a simple hello world whatever Python string you want and we put it in gdb dot execute and what this will do is push print hello world onto the top of the stack of our debug process so the process that's currently running and we want to debug this will now say hello world so let's see how amazingly this works no worries this isn't dangerous to do at all this is totally fine gdb execute you wouldn't believe how many times I had to type this before I got it just right especially because you have to escape the quotes yeah did to do where you can see it going this is going to be super great I missed a quote oh that doesn't that doesn't look good Oh Oh will you gdb the other put the program were debugging has seg faulted gdb gave us a bunch of errors oh no we just willy-nilly did whatever we wanted we didn't have any safety checks at all you can't just inject code into a running process you're a maverick it's not what are you doing so what happened someone someone might have just said it so that's something something everyone loves to hate but in our case it's actually gonna work out really well for us it's a little something called the Gil the global interpreter lock the process that we are debugging currently has the lock we can't just start pushing code it's going to give us a segmentation fault as you obviously saw we need to acquire the Gil so that way of that process that has it can't do anything for a little bit but we can do whatever we want it's actually really easy to acquire the Gil if you know two more C commands from the C API first one PI Gil state ensure this gets us the Gil we take the Gil from the process that's currently holding it we say this is now our Gil we have the lock e you can't do anything then when we're done doing whatever foolery we want we release the Gil back to the process that's being debugged so PI Gil release this is all the C code you need to know pi runs simple string PI Gil stayed in sure and PI Gil state release that is it so let's try that whole rigmarole again and I think this time I just copied and pasted X it took like five minutes to type that all out so copy and paste gdb execute call PI Gil say Dan Schorr call PI run simple string with hello world then release it back end it and we can see in the program that has been running and that we're debugging hello world actually got printed out in the middle of all the stuff it was doing so we grabbed the Gil we ran hello world we released the Gil it continued on so he did this all without breaking the program alright so little building blocks we're working our way up to being able to do some more interesting debugging next that was a lot of code I copied and pasted ik sit took me so long to write into that dinky little interpreter what I would rather do is have this all in a nice single line command perhaps just like those help commands line break all that stuff just one word we can do that by building our own custom commands this looks like a lot of code but it's actually really easy so we import gdb then we subclass the gdb command class you need to initialize it with the name of the command that you want your command to be called so we did hello underscore world so we should be able to from the gdb CLI type hello underscore world and what it'll do is it calls the invoke function in our case we've done the exact same code we just saw we locked the gill we run hello world we release the gill so this is exactly what you just saw it's now in a repeated reuse repeatable reusable format as a command called hello command so do it all over again launch our little Python script on the left and try to debug it on the right this should look the same as what we just saw with no segmentation faults so all I have to do now is type HelloWorld the command that I just registered to gdb so I registered it I passed in the command file with the - X so gdb loaded our command file up everything that was within there is now within gdb it understands it and I can just run hello world and it pushes that onto the stack of my debug program pretty cool that leads us to what our final solution is we've got all of our pieces we can build them up and together into one workable script that can do something pretty neat so in our case we want to debug memory usage in a running Python process without hurting the process without killing the process first step is to write a scripts that analyzes memory this is not important to this talk so I pretty much just took a third-party library called pimp learn that already does a bunch of memory analysis for us and print it out a summary you can see I did this in a whole different file you can have whole program a whole program written in separate files doing a lot of complicated stuff whatever you want in this case it's just a simple couple lines to print out the memory of our program the second step and you've already seen this is we want to acquire the Gil in this case we wanted to make the Gil a decorator so that way for any new command we want to write we don't have to keep writing those same two lines over and over again around all the code so we just write it as a reusable decorator then we create a new slightly more complicated custom gdb command but it's still using the same essential stuff we still use gdb execute we call PI runs simple string the only thing we do different is now we open the file that we've written and executed I'm not saying this is the safest thing to do in the world but it's very effective for this presentation so we can open the file name execute it with PI runs simple string and we will get all of that memory analysis that we wrote in another file pushed to our debug code last step within the gdb commands file is we need to execute that command so we can execute any command we want since we've registered this command as file underscore command it's fairly easy to just call file command and then pass in an argument in this case the file name and last thing we don't want to be running gdb dash X on the command line over and over again we want this to all be in a nice Python script so we make a main a main script with a main class and put a subprocess in it that will do the launching of gdb for us and you can see that we pass in with dash X the gdb commands file that we were just working on for the last few slides and that's it so let's run it and see what it looks like same super simple example it actually doesn't have very much interesting going on in its memory but that's okay and then instead of Gd being hopefully the curtains not blocking it but you can do sudo python 3 main dot pipe pass in the pit and boom we get this nice table in the debug process telling us the memory of the debug process so we can see all the objects all of their size what their types are of the debug process using gdb and then the process continues on so we didn't kill it we didn't affect it for very long you do affect it for a period of time but hopefully not an aggressively large period of time and that's it so [Laughter] oh uh so if you guys want to see the full code a little bit more fleshed out I know you just saw like a bunch of pieces on slides I recently open-source the memory analyzer tool which is this a bit bigger it's in the Facebook incubator github so please go check it out and make pull requests if you're interested then there's the gdb documentation and Python gdb documentation so here you can find all the information about the GD P API how to set up gdb how to set up Python with the correct debug symbols all that jazz so please go ahead and check that out if you have questions around that and now that's actually it [Applause] okay so I just want to recap exactly what happened in the very beginning okay okay so just from the part that I totally understood so you start gdb right yes and you attach to a process yes and then you start a see Python interpreter with in gdb well it's built into gdb so if yes if just in the beginning when you saw me type Python and it dropped into a little an interpreter then yeah that's like going into the sea Python interpreter it's it's technically already all built into gdb for you so when we pass a gdb command file or a spy thon script to it it accesses the C Python stuff that it knows about inside of itself to run that Wow okay the last thing i noticed that i actually saw happened was that then you use the Python c interpreter to tell gdb to add a function call to the top of the stack of the process the gdb is attached to yes and then the process runs that function call yes the call command is like very complicated and magical I spent a while understanding what it does but at like the top highest level that's how it works yes okay thank you oh cool yeah I was wondering you know if you don't use PDB anywhere and I mean it was it why did you go this route instead of trying to extend PDB to make it I always dreamed of PDB being attachable to a running process like like gdb but I guess if you went this route there's a good reason right that it's not feasible or I think that you I mean I don't think that out of the box PDB can attach to a running process at least not that I'm aware of so gdb seemed like the simplest solution in order to attach to a running process I think if I had wanted to use P trace or something like that it maybe could have been a bit more powerful because it would be accessing P trace directly instead of through gdb but I you GB gives you the extra benefit of being compatible with Mac OS and windows as well as Linux but yeah I basically picked you to be instead of PT because I don't think P DP can inject into a running process the same way Thanks mm-hmm hi I'm curious can you debug other languages with Python using this method or you should be able to create gdb extensions like gdb command commands and stuff like that in Python and have it work on other languages I think it should work I don't see why not it's just like analyzing Python you need the extra C Python the symbols loaded in order to see Python but I don't see why you couldn't do some Python code it would depend I guess on what you're trying to do but there's no reason it couldn't run it may be just the Python code would fail but I think it it's worth a try I don't know I've never tried it thank you for the talk I thought was a lot of fun cool I have one question which is have you explored making a repla for this so I can instead of writing a script just write Python commands and then see the impact of that out of my process I don't quite like writing gdb commands and so the fact that you write Python would be a lot more fun so I think a repple for doing this might be something really cool for to interface with my existing Python processes yeah I haven't thought about it no but it would be pretty interesting I don't think it'd be terribly hard to do just meet some of the fundamentals built into it so yeah that could be fun yeah I think your talk had a great foundation for that especially with like the decorators someone's good we're doing awesome thank you hi Lisa hi my name is Paul hi Paul I'm curious if the implementation is similar in Python - oh so yes it is similar to if you want to configure pipe gdb with Python you can select which version of Python you want to configured for so gdb extensions work for Python - as far as I where Emma where you have to pick either/or like gdb either Python 2 or Python 3 but in terms of sub classing the commands and stuff the only thing that would be different on my slides would probably be the F strings otherwise it's pretty much the same okay all right thank you yeah hi I'm one of the early slides there was a dot slash configure - - with - Python and I didn't see that in the demo where does that fit in sorry so that's configuring gdb so if you download gdb from source and you want to build it yourself you can configure it with Python with with Python flag I have seen some distres where you're able to install gdb with Python already turned on for you but it's very totally depends on like what system you're on so if you download GB from source you can configure it yourself with python flags okay one more question so is there anything special that needs to be taken account if you debug something that like just using a c extension in python let's say a numpy or more complicated library so anything special that needs to be taken into account i just let's say if to see i mean let's say if the C extension if it releases the Gil oh then you can grab Ethan it should be fine yeah yeah that could happen where the C extension is doing funky stuff with the Gil I haven't attempted to play with it too much but it might be interesting you might want to try to acquire the Gil only at certain points only at certain specific like specific points instead of just acquiring it like whenever the way I'm doing it is kind of sloppy here where you just run the code whenever you execute GDP and it starts right away you could definitely be a little bit smarter about it and figuring out where you are within the code before you start executing this that would probably be significantly safer for C extensions yeah that would make a lot more yeah I wonder it like let's say you have a network intensive like a piece of Python code and like it says I'm somewhere into lower levels I mean is it safe I mean it's not the safest thing it could go could go bad alright cool well I'll hang out for a little bit if anyone has any more questions I just think this is really fun to like nerd out about so happy to chat thank you [Music] [Applause] [Music] [Applause] [Music]
Info
Channel: SF Python
Views: 3,676
Rating: 4.818182 out of 5
Keywords: pybay, pybay2019, python, gdb, coding
Id: xt9v5t4_zvE
Channel Id: undefined
Length: 30min 33sec (1833 seconds)
Published: Tue Oct 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.