Ned Batchelder - Machete-mode debugging: Hacking your way out of a tight spot - PyCon 2016

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
(presenter) Hello, this is machete-mode debugging with Ned Batchelder. Ned has been programming Python since 1999. He is the maintainer of coverage.py, and he works for edx.org. Please give a warm welcome to Ned Batchelder. [applause] (Ned Batchelder) Hi everyone, thank you. Oops, hi everyone, thank you. As Jesse mentioned, my name is Ned Batchelder. You can find me on Twitter or IRC or GitHub as nedbat. If you want to follow along online, the slides for this talk and another version of the text are at that bit.ly URL. A very important announcement: today at 7 o'clock, we're going to be having a juggling Open Space, and I welcome any of you to come and juggle with us. [applause] OK, machete-mode debugging. Whoops, let's get that clickable. OK, so I've been programming, as Jesse mentioned, in Python, for a very long time. Not as long as some, but a very long time, and it can be a bit chaotic. I love Python for its dynamic nature. One of the things that fascinates me about Python is that in its deepest structure, it's really very unstructured and you can build lots of different structures for your program out of it. But through conventions and agreement, we tend to build programs in a way similar to more strict languages which lets us build very large systems that work together. We can reason about our code. But in its nature, Python can be chaotic. It has dynamic typing, which means that names can take on values of different types at different times and sometimes unpredictably. There's no access control on your objects, no protected, private, or final, which means that things can change from far away in your program that you didn't expect. All of the objects are on the heap. There is no stack allocation, so things can live for much longer than you expect. Fundamentally, nothing is off limits in Python, right? We get questions about, "How can I ensure that someone doesn't do blah?" And generally Python doesn't do, "You can't do," very well, right? Whatever you want to do in Python, you can do. This can cause problems. If you build large systems, you'll get yourself into trouble where you have to debug situations because the chaos got a little farther out of hand than you had intended. So, let's use that to our advantage. The chaos got us into this mess. We can use the chaos to get us out of this mess. That's the fundamental thesis of this talk, which is: Python is very dynamic but we can -- in a karate-like move, we can use that against our opponent (our own program), and we can take the upper hand by using that chaos to get the information we need to get ourselves out of a sticky situation. The bulk of this talk is going to be a discussion of actual problems from a real project. And I won't tell you what project it is, because I want you to like Open edX. [laughter] The point is that these are actual problems that happened at work. I wrote up blog posts about them. People tended to like those blog posts, and so I've collected together the experiences, here in this talk. The other thing is, I really want to emphasize this. The things I’m going to show you, you shouldn't use in real code. Most of the code that I’m going to show you has meant to be in your code base for about 10 minutes. You write this code, you get the information you need out of it, you fix the problem, and then you get rid of that awful thing that I’m about to show you. If I hear that any of you are using any of this in production later, I’m going to feel really, really bad, and I’m going to not like you personally anymore. [laughter] So, don't do it. All right, Case 1. We've got four cases to cover. Case 1: Double importing. The problem was that we had modules in our system that were being imported more than once. And if you know about importing modules, you know that one of the fundamental ideas is that when you import a module you always get the same object no matter how many times you import it, but that wasn't the case for us. And the classes in those modules were then defined twice, which means we had two classes floating around which had the same code in the same name. And usually that's not a problem, although sometimes it is, but modern -- recent versions of Django actually complain about this. They will detect that this is happening and head off the eventual problems by complaining about it and preventing your program from running. And when we upgraded our code from Django 1.4 to Django 1.8, we started to see those complaints and we had to fix them. Now, how can it be that modules are imported more than once? So, here's a quick refresher. Oh by the way, as I’m going through this, what I’m going to show you is what the problems were, what mechanisms in Python made those problems possible, and then what mechanisms helped us debug those problems and fix them. So, I’m hoping along the way, in addition to showing you versions of code that you're not supposed to use in production, that you'll come away with a deeper understanding of some of the mechanisms underlying Python that got us into the mess and got us out. So, here's a quick refresher of how modules work. When you import a module, you ask for a module name. The first thing that happens is there's a dictionary in the sys module called sys.modules which has, as its keys, the names of all the modules that have been imported and as its values, the actual module objects. So, when you import a module, the first thing that happens is it looks in that dictionary to see if the module has already been imported. And if it has been, it just returns it. These two lines of code are what make it so that when you import modules, first of all, it goes very fast the second time, and you get the same object back. And if it's not found, then for every directory in the thing called sys.path, which is a list of directory names, it looks to see if it can make a file name in that directory from the module name that exists, and if it does, then it's going to actually execute that file to get an object that's going to stuff the object back in the sys.modules under the key and return it to you. That's how import works the first time. And if after going through all that loops, it doesn't find anything, it raises an import error. So, this is a wildly simplified version of how importing modules actually works, but this is good enough to get you about 10 years into your career with Python. So, this is pretty much -- I mean, nothing bad about Brett and all the good work he’s done, but this is enough for you to understand how importing works. So, with all this machinery in place, how did we have modules being imported more than once, right? We've somehow broken this fundamental promise that Python gives us. And how are we gonna find it, most importantly? So, this is the code that I actually put into an actual file of Python. And it's a little bit dense for you to read right now, but the idea is to get across a couple of points. One is, I actually went to the models file that Django was complaining about, and I actually put real code right into the top of the module. Right? What you're not supposed to do. And the code I put in was gonna use a module in the standard library called inspect. Inspect is a really useful tool for understanding how your program is structured. It can tell you about the contents of modules and classes and methods. In this case what we’re going to use is -- we're going to use a function in inspect called stack which gives you a list of tuples, every tuple representing one call frame in your stack, showing you who called you, and who called them, and who called them and so on. And in those tuples are information about the file name, the function name, and the line number so that you can essentially create a traceback of your current position. So, here, what I did is, right there in the module, when it gets imported, I’m going to open a file name, and I’m going to append to it. And what I’m going to append to it is that I’m importing the file. And then for all of those objects in the stack, I’m going to write out a nicely formatted line, and I’m going to write that line. And then I actually have the models, right? Because I've just dumped this code straight into a file that has nothing to do with what I’m trying to find out, right? I’m not -- this isn't a file about stack traces. It’s a file about Django models. But like I said, I’m doing things the wrong way, because I just need to get the information I need. And when I ran it, I got results like this. It told me that it was importing first/models.py and that was being imported from this place, and it also told me that it was importing that file again and that it was coming from this place. And so, now I had the two locations where the file was being imported. And both of these locations were importing it and somehow executing the file. And when I looked at those locations, I could see what the problem was. One of them said "import thing.apps.first.models," and the other place said "import first.models." And the reason that's a problem is because in our directory tree -- I've got a map of the directory tree here, twice, and the stars are the directories that are on sys.path. So, the first import in the project directory found a thing directory, with an apps directory, with a first directory, with the models.py so it could import thing.apps.first.models which put thing.apps.first.models into sys.modules. The second import, because apps was also on the system path, could find a first directory with a models in it. And because the keys are different, thing.apps.first.models versus first.models, the uniqueness check, didn't kick in, right? So, I had that little bit of code that printed out a stack trace that told me exactly what I needed to know: where are the two modules being imported? From there, I can get the clues that I needed to fix it. But the reason, by the way, that sys.path is like this is because in our code we literally have sys.path.append to append extra directories onto the system path. And this is one of the reasons you shouldn't go around appending things onto system path, right? So, that's Case 1 solved, and by the way, the solution to the double import: the best solution, frankly, would be to get rid of the sys.path.append. I’m looking forward to that in the future. That's going to be awesome. The way we actually fixed it was to at least make all the imports have the same form, so that everyone who was talking about the module talked about it in the same way and the uniqueness check would work. So, what have we learned from Case 1? First, we learned that import really runs code. Now, if you're coming from another language, perhaps with static typing, you may think of an import as being, "There are classes and functions to find somewhere. Go and find those definitions, and let me use them." And in a way that's true, but the way Python does that is, it really executes all the code in that .py file. Now if you happen to write your .py file to have nothing but imports, and class, and def statements, then all that's going to happen when you execute the code is to define classes and functions. But if you put in a "with" statement and "print" statements, and if you put in global mutation statements, they're all going to run. Importing doesn't have a special mode where it just looks for definitions. All it does is it executes all the code. And we used that to our advantage in this case because we wanted to print out a stack trace when we imported the code, right? The file was being imported twice, we wanted to get two stack traces. It worked great to just dump the stack trace at the top level of the module as part of the import. But you shouldn't do that in real code because it makes it very difficult to reason about the code because you have code that's executed one time when you import it but not all the other times that you import it. So, don't put code at the top level of the module, but understand that that's how Python does imports. The second lesson we learned about machete-mode debugging is we just hardcoded a bunch of stuff in there, right? I just said, "with open/temp/," you know, "my information.txt." You'd never put that in real code. But the code is only going to live for 10 minutes, who cares? Just write straight to the file, and be done with it. In this case, wrong is OK, because we just need to get the information. And in terms of a positive lesson, don't append to sys.path, right? Don't fiddle with your system path to try to make your imports convenient or something. Choose a disciplined way to do it. Keep everything straight, and you won't run into this kind of chaos. Case 2: Finding temp file creators The problem was that we had tests -- that's not the problem. [laughter] The problem was that we had tests that would make temp files like this using tempfile.mkdtemp -- in this case a temp directory. And some of them would add a cleanup so that the temp directory would be sure to get cleaned up at the end of the test, but some tests would make a temp directory and didn't clean it up. And so you’d run your whole test suite and you end up with 20 temp files and directories left behind, which isn't really a problem, but you know my OCD kicks in, "That seems kind of messy; we should clean that up." But how do we find them, right? There's lots of tests. I think in our test suite we have about 8,000 tests. I’m not gonna be able to grep the whole test suite and find the places where it gets created but not cleaned up. Sometimes the cleanup is far away, sometimes it's a helper function that’s called from lots of places. It's just too hard. That's another underlying current here, which is -- other languages have really great static analysis tools, and that's something that Python has a difficulty with because of its dynamic nature. So, we'll just skip the static analysis. And notice here I’m upgrading grep to static analysis, which sounds fancy... [laughter] But that's fundamentally what it is. It's a tool for looking at your source code without running it and trying to understand it. That's what static analysis is about. What I’m doing here is all dynamic analysis. Let's put something in the program that when you run it will tell you what you need to know. So, the temp files aren't getting cleaned up, and there's too many to eyeball. What I wanted to do is -- I wanted to put some information in the file itself, right? After all, the whole problem here is that there's something left behind when something goes wrong. What if I could just use that thing left behind to give me the information I need, right? Unfortunately, I can't write into the temp file itself. The contents of the file are important to the test. They’ll fail if I just start writing random junk into it. But the interesting thing about temp files is that no one cares what they're called. So, we're going to put the information into the file name. And the way we’re going to do that is we're going to monkeypatch the standard library. So, monkeypatching is a technique where you write a function and you stuff it in place of some preexisting function. So, in this case, we're going to import a temp file. We're going to write a function called "my sneaky function," and we're just going to assign it to tempfile.mkdtemp. And what that means is that the unsuspecting product code is going to import tempfile and call tempfile.mkdtemp, but now that's referring to "my function." So, when the product code tries to make a temp directory it's actually going to be calling "my function." This is called monkeypatching. And the key idea from Python that makes this possible is that any name can be reassigned. It feels a little bit weird, you know? The standard library's this thing that’s been handed down to us on engraved tablets, right? It's the foundation upon which we build our programs. It’s something we’ve come to count on. But it's just a Python module with attributes like anything else, and they can all be reassigned, so, we can just go ahead and reassign it when we want to. Of all the things I’m telling you not to do in production, definitely don't do this one [laughter] Now, what are we supposed to monkeypatch? Well here's where we can just read the source, right? Tempfile.py is a file on your disk in the standard library. You can go and find it and you can open it in your editor and you can read it, right? If you look in the temp file module, there are actually a half dozen or so different functions for making temporary things in different ways. We had some directories and some files, so we actually needed to deal with a number of those. And by the way, we only wanted to tweak the file names. There's a bunch of machinery in creating temp files that we didn't want to interfere with. We just wanted to give them new names. It turns out that there is a helper function inside tempfile called get_candidate_names. And the way the temporary functions work is, they use get_candidate_names to produce a series of those classic tempfile junky randomy things and then they use those names to find a file that doesn't exist yet and then they go ahead and make their file, and so this is perfect. Get_candidate_names solves both of our problems. It's used by all of the temporary-making things and it's only where the name comes from. So, if we monkeypatch get_candidate_names, it will do exactly what we want. But the other trick with monkeypatching is that you have to do it before the function gets called, right? If the function gets called before you monkeypatch then your code is way too late. It's not gonna work. What we'd like to have is a feature in Python that says, "Before you run the program, "run this little piece of code so I can monkeypatch first." Python doesn't have a switch like that. Perl has a switch that says, "Use this prologue before the main program." Python doesn't have that feature, but it has a thing called .pth files. Now path files are essentially symbolic links in your site packages directory. And you can go and look; you probably have a few of them. And they do this very odd thing which is when Python starts up, it finds all the .pth files, and it looks at every line in the .pth file. And literally, if the line starts with "import (space)" it executes the line. [audience chuckles] I’m not, I didn't -- OK look, I’m showing you lots of weird code. I didn't write this, OK? [laughter] This is really in there and every time you run Python, this is happening. And if it doesn't start with "import" then it just appends the line to sys.path. So, this is how sys.path gets really, really long and points to all of your imported modules. So, if you create, sorry -- if you create a 000.pth file in your site packages directory that just imports "first thing," then you can write a first_thing.py and it will run before any other code in your Python process. And what we're going to do here in first_thing.py, again, is, I’m going to use inspect.stack to get information. First, I’m going to save off the original value of get_candidate_names because I actually like that randomy stuff. That's still important to keep, so I’m going to keep that function as real_get_candidate_names. And here again, functions -- Python’s functions as first-order objects, first-class objects, lets us just hold that function with a new name, and we can use it later. Then I’m going to make my own get_candidate_names and I’m gonna take inspect.stack and join it together in such a way that I get a really long string that's still kind of readable so that I can see who's been calling me. And then I’m going to get the actual randomness from real -- from get_candidate_names and I’m going to yield my own sequence, right? And then I’m going to do the real monkeypatch. Again, I know this code is really dense. It's all online, you can go and study it later. But I’m trying to get the point across that we’re monkeypatching the standard library, and as a result, we get tempfile names that now look like this. And in this file, you can see that at case.py, Line 53 called case.py Line 78 which called test_import_export Line 289 So, I can go into test_import_export.py Line 289 and see there's a mkdtemp right there. And that's when it's not getting cleaned up. So, I can fix that line and then go on to the next one where test_video 143 is calling tempfile line 455 and etc., etc., etc. So, what did we learn? One, this is often overlooked. Forget monkeypatching for a second. You can just go and read the standard library, and sometimes that's all you need, right? The very fact that Python is open source -- and forget the contribution, and the license, and all that stuff. The source is on your system. You don't even have to go to hg.cpython.org to dig it up. The standard library is all on your disk as Python source code and you can read it to figure out what it does. It's also patchable, so we can go in there and affect its behavior where we need to to get information. And for this kind of debugging, you should use whatever you can. Whatever you can touch and change, use it, it's fine. That code is only going to live for 10 minutes. You only have to feel really, really bad about yourself for 10 minutes and then you'll have the solution and everyone will think you're a hero and you don't have to explain to them how dirty your hands got in the process [laughter] And by the way, do use addCleanup. So, if you're using the unittest library and you're used to setups and teardowns, addCleanup is a much nicer way to clean up the behavior of your setup function than a teardown is, so look into that. OK, Case 3. Who is changing sys.path? The problem we had was that sys.path had an extra directory in it that we didn't expect, and in this case, it actually caused a problem because of some naming collisions where when we tried to import a certain block.py it was finding the wrong one, and we couldn't understand why that was. And again, grep couldn't find sys.path. And here, of course, I mean, as you remember from Case 1, we were doing some really ugly things to sys.path. My first thought is, "Well I guess there was "some more sys.path shenanigans in there that we should look for." But no, it wasn't our fault this time. We weren't doing a sys.path append. So, we needed to find who was adding that directory to sys.path. So, we figured it had to be in third-party code, right? Because we can grep all of our own code. Now, we're not going to go and grep all of the third-party code, right? Open edX has a requirements.txt suite that includes about a hundred packages, including NumPy, and SciPy, and SimPy, and you're not going to go and grep all that code, so, you need dynamic analysis to get at it. What we wanted was a data breakpoint. It would be really awesome if we could go into pdb and say, "Not break when you get to this line in this file, "but break whenever that piece of data changes in a certain way," right? What we wanted to know was: when does sys.path get a new entry at element 0 that ends with /lib? That's what we wanted to know. Who is adding that thing to sys.path? Pdb doesn't have that as a feature. There's no way to implement that directly in the debugger, so we write a trace function. Trace functions -- if you haven't encountered them before, CPython has a very simple-sounding feature which is that you can write a function and you can register it with the interpreter, and it will call your function for every line of your program that gets executed. And this is actually how debuggers are implemented, and profile tools, and coverage.py. The way a lot of these dynamic analysis tools understand the running of your program is that they write a trace function and then CPython calls them over and over again for every line of your program that gets executed. This makes it go very slow but you're only going to need it for a little while. Here is an example of a trace function. In fact, this is the entire trace function that I wrote. So, a trace function gets the frame that you're running in, it gets an event which is called a return or a line and it gets an arg which, in this case, isn’t interesting to us. In fact, none of the arguments are interesting to us, because we don't care where in the program we are and we don't care what's happening. What we want to know is -- if the first element of sys.path ends with/lib, we want to stop right there and see what's going on. To make the trace function work, you call sys.settrace and you give it your trace function, and from then on it gets called on every line. Now, what we did here -- if you've seen this before, pdb.set_trace -- that's the horribly-named API to getting pdb to break, right? It should be called break_into_debugger," but it's called set_trace because literally this is where pdb sets its trace function as the trace function, right? This is a great example of an API being named for the internal concerns rather than for the external use, but this isn't a talk about API usability. And I apologize that pdb.set_trace has an underscore and sys.settrace does not. Again, see some other talk about API usability. [audience chuckles] But in this case, the trace function is incredibly simple, right? In fact, what I’m doing here is using what sounds like a really, really advanced feature, a trace function, but the amount of code and the complexity of code I had to write to use it was much simpler than the previous examples I've shown. And frankly when I wrote it, I wasn't quite sure: Am I allowed to call pdb.set_trace while I’m actually inside a trace function that is already being invoked by CPython? I figured there was about a 50-50 chance that this just wouldn't work at all, right? But it took me about a minute to write that function, so what have I got to lose? And in fact it worked great. I ran this, and it broke into the debugger and it was "nose," the test runner. So, nose has a helpful feature where if it sees that you have a directory called lib, it figures you probably want to import from it, and it adds it to the sys.path for you. Luckily, it also has a switch where you can just say, "Don't do that," and we set the switch, and the problem was fixed. So, here's a trace function. It's a very advanced feature, but sometimes, it's exactly what you need. So, what did we learn from this? One, it's not just your code, right? It's a classic beginner mistake to think that it's a compiler bug or, you know, the standard library has a bug. Sometimes, it is other tools that do have bugs, right? You have to be open to that possibility. And because of Cases 1 through 2 or 3, whatever we’re up to here, you know, I was very willing to believe that it was our own code that was at fault, but it wasn't, and we needed to figure out a way to get at the behavior of these other third-party tools. Again, dynamic analysis is very, very powerful. This was an expensive thing to do, run an 8,000-test test suite with a Python implementation of a trace function. You can imagine how much slower it would run. Luckily it was very early on in that test suite that it hit that breakpoint, because it was the test runner setting it. But even if it took eight hours, that's probably faster than finding it some other way. And sometimes, you have to use big hammers, right? This, frankly, is kind of overkill to find that, but it was actually less time on my part and more time on the computer’s part, and it worked out really well. All right, Case 4: Why is random different? The problem: so Open edX presents problems to students, and we have a massive number of students. What we wanted to do is we wanted to present problems that were randomized so the problem I saw was different than the problem you saw. But we wanted them to be repeatable so that the next time I came back to look at a problem, I’d see the same problem I'd seen before. And so we do that by seeding the random number generation with a seed that's particular to the student. So, each student has a seed, we seed the random number generator, and then when it comes time to run the problem code that's going to present the problem, when the random number is generated, it comes out predictably. So, what I've shown here is the problem code generating a random number from 1 to 1,000, and it should be 420. The problem we had was that the first time that code ran, it came out different -- it came out as 284. And then the second, third, fourth, all the rest of the times, it came out as 420. So, there's something weird about how the random number seed was being used to produce the random number sequence. And the fact that we had that first time different than the other times made us think, maybe it's about that import thing, right? Remember, code gets run on import and then not the next time you import it, right? Different the first time than times 2 through n. So, how were we gonna find it? Well we're gonna monkeypatch again but we're going to use a new technique. And this is one of my favorite techniques. Well, this looks like, maybe, an esoteric thing. No, it's actually just 1 divided by 0. This is a really easy piece of code you can drop into anywhere. It generates an exception because you're not allowed to divide by 0. It's really fast to type because it's only three characters, and this is an exception that your real code probably never generates, right? So, if you put this code in the middle of anywhere and then you see an actual ZeroDivisionError come out on your console, it's that code that's making it happen. So, it's really easy to spot, right? These are -- this has got to be my favorite three-character Python expression. And I’d be glad to hear other candidates for great three-character Python expressions. I don't think you're going to be able to top 1/0. So, what we're gonna do is we’re going to monkeypatch again. We're gonna monkeypatch "random" with a booby trap, right? We're going to import "random," and we’re going to say random.random = lambda: 1/0, right? And now, notice how reckless we’re being here. We don't care what the arguments are, we're not trying to reproduce the behavior, we're not returning anything. It's just an exception... but it worked great. So, we've got a booby-trapped random and what actually happened is, we got a ZeroDivisionError, and we could see that in one of our third party packages was a default value for a function of random.random, right? There was actually a class for this package for its tests. And one of the arguments to the dunder init was random.random. And remember that all the code in your modules is executed when it's imported, and when you define a function, the default values are evaluated so the value can be stored with the function. And so it was actually calling random.random once during import but only the first time. So, that was taking one of the numbers out of the sequence which put us off by one number which is why we got a different number the first time than all the other times, right? And I see some of you scrunching up your eyebrows, like, "Why would someone do that?" Just for an extra bonus, they never actually used that default value -- [laughter] -- because the only places this function were ever called, actually supplied their own value for that. So, it was kind of a comedy of errors. The good news is we reported the bug, and they were very, very, uh, understanding, and fixed it. So, what’d we learn here? One: exceptions are a really good way to get information, right? The great thing about exceptions is that if no one catches it, it will come all the way back up to the top, right? So, you can have an exception way deep down in your program, and unless it's something that might get caught somewhere else like AttributeError -- ZeroDivisionError is very unlikely to be caught unless you have an "exceptException" someplace, or, God forbid, an "except:" someplace. But it's very likely that it will come all the way out to the top of your program. And by the way, another good technique is that you can -- oop, sorry. You -- no, we're not? Sorry. So, exceptions are a good way to get information. And you can actually put information in the exception, right? You can put a string in your exception. That doesn't have to be a hard-coded string, right? Whatever value, deep down there, you want to see, format it into the message and let the exception bring it all the way up to the top. And don't be afraid to blow things up, right? This monkeypatch was horrible; The program wasn't gonna run, right? But I didn't care. I just needed to find out where the random.random was,and it told me that. And sometimes you get lucky, because, of course, there's an obvious flaw in this monkeypatch which is, maybe, it wasn't the first value that was going wrong. Maybe there were three values getting taken and every one did that right, but then a fourth, extra fourth value was being pulled off the sequence. And if that had been true, then I wouldn't have found out anything interesting because the first one would blow up. Well, then I’d have to try something else, right? In this case what I did is I tried the simplest thing I could think. Maybe it'll work, maybe it won't. It worked great, now we can move on. If it hadn't worked, well then I’d have to come up with a different way to see maybe more of random. It’d be a trickier monkeypatch, but I could still get in there and see where all the randoms were going. But sometimes you get lucky and it works out that way. So, don't over-engineer these things, just hack away at it, right? That's what machete mode’s all about. You're in the jungle, you need to get out. You’re not planning a whole paved road with road signs and traffic lights and everything. You just use the machete to cut your way straight through. Now, the real problem here was that we were sharing global state, right? There was one global random number sequence that we were using and this program was using -- this package was using for its random numbers, right? The real solution was that we started creating our own random object to get our own random numbers from. Shared mutable state is a very, very difficult thing because it means that anywhere in your program could be filling with that, and it's very hard to reason at that kind of distance. So, do use your own random object. And do suspect third-party code. Again, you know, this is kind of a messed up piece of code that we got from a big well-known project that we trusted to do a lot of other stuff and it was kind of in a weird part of their code. By the way, the other weird thing is that just importing their main code was importing their test helpers which is where this code was. You know, people get sloppy, it's all right, you know? We're all in this together. But you have to be prepared for that kind of thing to happen. All right, the big lessons from the whole talk. One: break conventions to get what you need, right? This code doesn't even have to be checked in, right? It's all on your machine. You can use the full dynamic nature of Python to get the information you need, but only for debugging, right? So, the nefarious among you may be jotting down notes about how you're going to do that thing on that server somewhere. And I don't know who you are, so I can’t take any blame, but I really recommend you don't do that or I’ll be here next year debugging what you put on your server. So, and again, dynamic analysis is something that Python’s introspectability and malleability really lends itself to, so use it. And understand the mechanisms that underlie Python, right? If you understand how import really works, or what path files are, or the global and shared state of random, it will help you reason about the problems that you're seeing and get you the answers sooner. Any questions? Thank you. [applause] Do we have time for questions? Jesse, do we have time for questions? He's got no mic. (Jesse) Kind of. I think we can take just one or two. (Ned Batchelder) One or two. (audience member) So, Ned, you told us -- great talk, by the way, Ned. You told us when you have a bad third-party library, you submitted a patch, or you told them what their problem was, but in the meantime, between the submission of the patch, how do you fix the problem? Do you actually patch the code and run it locally yourself or do you change your own code? (Ned Batchelder) No, in this case, it's the second bullet from the bottom: we used our own random object to avoid the global mutable state completely. (audience member) OK, so all right, so, you changed your code. (Ned Batchelder) In this case, we had an option. It could have been worse, and it could have been that we would have to fork the project and not an aggressive fork, a fork in the GitHub sense, and have our own copy of the code. And we've had to do that in a few places, too, just to keep things working. (audience member) OK, thank you. (Ned Batchelder) Sure, thanks. I don't know. Are we still -- (audience member) Thanks for the talk, Ned. So, we've seen the answers to these debugging situations. Can you talk a bit about the thought process of, kind of, coming up with these? Like, were these your first suggestions and they just kind of worked out, or did you have a few that kind of didn't work out? How did you come up with these pretty, kind of, clever -- (Ned Batchelder) That's a good question. I’m not, I’m not sure I've got any good answers for how to come up with these ideas other than to think outside the box and understand that it's all possible and you can -- you can play around with that malleability. You can break outside of, sort of, the strict style of coding and treat it more like the touchable thing that Python is. I don't know how else to say it than that. I think we have to go, unfortunately, but thank you for coming. I'd be glad to talk about it with anyone else, outside. (audience member) Thank you. (presenter) Thank you very much.
Info
Channel: PyCon 2016
Views: 11,544
Rating: undefined out of 5
Keywords:
Id: bAcfPzxB3dk
Channel Id: undefined
Length: 35min 13sec (2113 seconds)
Published: Thu Jun 09 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.