Stop Writing Classes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

The basic tenets of the talk are good but not convincing or terribly compelling. I wanted this to resonant. And I already write python using a nearly purely functional style.

He picked a straw man by attacking an already weak codebase. He could have used it as an example, but then spent more time showing how one would code in a reduced-class style.

👍︎︎ 6 👤︎︎ u/fullouterjoin 📅︎︎ Apr 03 2012 🗫︎ replies
Captions
introduce Jack Didrik a core developer I was going to talk to us about how to stop fighting glasses check the trick I don't mean to disappoint but I'm actually going to agree with everything Raymond just said so you've all read the Zen a Python probably many times a couple items simple is better than complex flat is better than nested readability counts if the implementation is hard to explain it's a bad idea if the implementation is easy to explain it may be a good idea this is written by Tim Peters he's smarter than you are he's smarter than I am how many people do you know with a sorting algorithm named after them that's the guy who wrote this and all of these points say don't do hard things do easy things and that's what this talk about is about so don't do hard things in the first place when you can avoid it classes as Raymond knows well are can be very complicated he laid out some principles how you should figure out what you shouldn't be doing this talk is mostly about all the places where people go wrong and how you can just not do that at all you end up doing that anyway even when you're trying to avoid it so this talk is actually going to be rehashing some code how to notice when you've gone down the wrong path and then how to work backwards from there this is what I tell the guys at work all the time I hate code and I want as little of it as possible in our product we ship features we do not ship code we don't have customers because we have lots of code we have customers because we have lots of features so anytime you can excise code from the product there's four of us and over the last year we've actually dropped our line count and our shipping product while shipping new features when you make it a goal it's possible so if you remember one slide from this talk it's this one this is the biggest overuse of classes you see out in the wild this is not a class it looks like a class the name is a noun greeting okay that's kind of class like it takes arguments and stores data Internet all right that's kind of class like it has a method that uses that state to do something else all right that feels like a class at the bottom you can see it being used it instantiates are greeting and then it uses that greeting to do something else this is not a class or it shouldn't be a class so anytime so the signature of this shouldn't be a class is that it has two methods one of which is in it anytime you see that you should probably think hey maybe I just need the one method anytime you see this you'll know that you should have just written a function so anytime you start ailing aliasing your classes to just instantiate them once use them once and then throw them away in your brain you should be thinking oh I should reflector that it can be simpler much simpler so the other slide was with ten lines this is does the exact same thing and it's too much if you find that you're always passing the first same argument to that function hey standard library has a thing for that func tools partial yeah add an argument and and then you can reuse it I don't know how many of you have CS degrees I have one I learned about separation of concerns the coupling encapsulation implementation hiding I haven't used those words in 15 years since I graduated anytime you hear someone using those words they're trying to pull a fast one on you it just doesn't come up and even if even if it does come up people mean different things when they use them it's not useful for furthering conversation so the the first of the examples from out in the while lots of you use third-party AP is in your day-to-day job anytime you have to use someone else's code the first thing you have to do is read it you don't know what's in there you don't know what the quality is you don't know if they have tests so you just have to check it before you include it so sometimes this is going to be kind of heavy a third party API muffin mal we'll call it shipped with one package 22 modules 20 classes and 660 source lines of code I had to read all this before we included it in the product but it was their official API so we're using it anytime they ship an update to the API you have to dip it because you don't know what they're doing maybe you sent them some upstream patches did they bring them back down so 660 lines of code and 20 classes is kind of heavy when all we do is we give them a list of email addresses and an email to send and all we ask back is which emails bounced and which people unsubscribed so the overuse of classes lots of times people think they might need something later you don't or you can just do it later if it comes up you know do it so the muffin mail library has a module muffin hash that has two months someone thought they were going to have to specialize a dictionary later they didn't but everywhere in the code is that first line so I was reading the code and muffin male muffin hash muffin hash is used everywhere there might be a dick the second two lines no one in this room has to look twice at and the fact that they're on a slide you're like why are these on a slide it's because that first thing was all over the code some other signs that you don't need this the you have to type muffin three times you're not you're not helping anyone you're making users angry when you have to type muffin three times that's not the actual name of the company by the way so they fired that guy at some point and they brought in a guy who knew what he was doing so this is version two of that API that 20 class 660 line API this is 15 lines people sitting near the front have probably read through the whole thing all it does is it uses standard library methods it's readable you can tell what it does hey it even ships with a test suite that's 20 lines you know this is the kind of thing you want if they do another version of the API I can read the diff in like two seconds but you'll notice there's a problem but this has two methods one of which isn't it and they weren't even really trying to hide it the the other method is name call so so I sent them version 3 of the API OOP ok so I'll get to get a print it's like so this is how you end up using that thing that isn't really a class you instantiate it or you have long lines or you alias it all these they if you ever see these things that's when you know you shouldn't have been using a class so this is version 3 of the API that I sent them it takes up zero modules in our code base because I just put it at the top of the module that I was going to use it in this does everything that their 15 line API did which does everything that their 660 line API did this is infinitely readable it just uses standard library parts and it works so we started out with one package 20 modules got it down to one module and eventually 0 modules started out with 20 classes went down to one class realized that that wasn't really a class and now we have 0 130 methods in that original thing down to two methods down to one function 660 lines 15 lines 5 lines it's easier to use it's easier to write no one has to wonder what's going on so to echo what Raymond said namespaces are not for creating taxonomy x' if you come from Java you probably think they are namespaces are for preventing name collisions if you have a deep hierarchy you're not doing anyone any favors the muffin male muffin hash muffin hash is just extra things for people to remember and type the standard library has very flat namespace because you either remember what the module is called or you have to look it up it doesn't help if you have to look up the direct the package that it's in the package that it's in the package that it's in and the module name you just want to know the module name so embarassingly this one is from our own code base and you can see some of the the same sins in here services crawler crawler exceptions article not found exception the besides the two method class one of which is in it exceptions are overused that's anytime you type class you should be thinking about what am i doing this for so this was a package with a module that had two lines in it which was an exception you have to type crawler twice to use it an exception twice to use it the name itself article not found exception is repeating itself it's unnecessary so when you're naming exceptions you can call it empty beer beer beer not found but beer not found error is excessive you can just use standard library exceptions people understand them unless you want to catch a very specific condition lookup air is just as good as anything else you know if you get an email with a traceback in it you're going to have to go read it anyway and it doesn't really matter what the exception was named another reason you don't have to complicate the names of your exceptions is because anyone who is reading the code sees that the thing right after a raise or an except it's got to be an exception so adding the word exception for the name of the class doesn't help by third standard library it's got some rusty corners but it's a pretty good example of how you should do things 200,000 source lines of code 200 top-level module averages 10 files per package and that's only because there's a couple third-party projects they were added to the sender library that were already packages that have like two files it only defines a hundred and sixty-five exceptions and two hundred thousand lines of code so once again anytime you think you need to write an exception you probably don't because the Python standard library gets along quite fine with just 165 so I'm not totally any class they're classes are great for things that classes are great for so yes when you have a bundle of mutable data and a bunch of related functions that you want to use with that data then yes classes are the right thing to do you don't have to do this much in your day to day because if you're pulling something from the standard library someone else has already done it for you there is one case in the standard library where this isn't true which is the keep queue module so a heap is just basically an array that always stays sorted and there's about ten methods in the heap queue module and they all act on the same heap so their first argument is always the same if there's a second argument and may differ but they're always operating on the same thing which kind of implies yes this actually is a class every time I actually need to use a heap I end up implementing this I keep it in my toolkit it's all those functions plus a little helper which is the the key similar to sort so when you're adding things to a heap you're doing an a star and you're trying to score how close is is this thing for pathfinding the the key is your scoring function so anytime you push something onto the heap you run the score anytime pops thing off the heap you throw it away Danny Greenfield is going to be doing an OAuth open space summit the state of OAuth and python is rusty so again with the using third-party libraries you know before you use them and import them using their code base you have to go and read them I was trying to use the the Google Earl shorter I just wanted to take URLs and make them small so there's one massive project that Google exports 10,000 source lines of code 115 modules 207 classes I did a rant on this on G+ which almost no one saw and Guido said I decline any responsibility for Google API code in 10,000 lines there's bound to be some stinkers one like the muffin hash there's a class called flow which is sub class by other classes it's pass but it gets its own module and anytime you're reading any class that derives from that you have to go back and check and assure yourself that it is indeed an empty class so someone thinking towards the future said I can write three lines of code now so I don't have to change three lines of code in the future and cost everyone who who's reading the library sometime similarly there's a class called storage which doesn't do much it almost does the right thing with raising errors it reuses a standard library but it alias is it so you again have to go out and read their code to figure out what it's actually doing so I spent a week doing Oh auth to stuff it took a couple days to read through that 10,000 lines of code so then I went and looked for other libraries there's Python OAuth 2 it's the second version of Python o auth it doesn't actually do Python Oh you that Oh F - but that took a while to figure out so that one was a little bit better it's only five hundred and forty lines of code fifteen classes but I wasn't going to put that in either because it didn't actually do what I needed to do so I rewrote that to just do bearer token ooofff it's Python fell off or if you prefer F L off and and it's just 135 lines it could three classes because I didn't react or enough one of them is class error exception pass it's an embarrassment finally most of you would seen Conway's Game of Life you recognize a native even if you don't know it by name you have a grid that has cells in it and you each turn you count the number of cells that are nearby and depending on the number of neighbors you have you either turn on or turn off and you can end up with cool patterns like this the glider the the cells in front are coming alive just as the cells in the back are dying so you get this thing running across the grid we asked this in interviews as a kind of fizzbuzz question because it's just a couple rules and a board so hey if you can't do this we don't really want to talk to you cells are now so most people first thing off they say cells a noun I'm going to make a class and what are the properties of a class well it has a position it's either alive or it's not it's going to have a different state next time maybe and what else oh it has neighbors and then they get around to defining the board they say well boards just a bunch of cells yeah it's a grid and so it's a bunch of cells and then every turn you click it and it turns into the next thing and it's at this point where you should say hey I have a class that has two methods one of which isn't it and the only thing it has is dictionary so I should just use a dictionary and then since using a dictionary you're like well I don't really need to keep track of the positions of the cell because those are in the dictionary and alive is just a boolean so instead I can say is it in the dictionaries and not in the dictionary so I don't need that one either and then you say well I don't really need a next date I can just create a new dictionary so then you end up with Conway's really simple game of life this is your neighbors does almost nothing but very succinctly and this is an entire implementation of Conway's Game of Life that's it so you don't need two classes instead of a dictionary we're using a set because we don't really care about the values they're either in there or not and it just recounts the board the bottom is the coordinates for the glider you saw earlier and you pass it in and the glider goes around and that that's the whole thing that's all you need to implement Conway's Game of Life so in conclusion if you see a class with two methods in it it's not a class don't make new exceptions when you don't need to and you don't need to and and then refactor revivalist li and and that's it mr. head-injured it's people who I'm sorry what causes people to do things like the the muffin hash to immediately think that you know I might need this later it's X Java people it's what they teach in CS curriculum it took years to beat it out of me and so yes it's a form of premature optimization test one of the problems I've run into when library providers use funk tools or these closures like you suggested for some of these parameters like in your greet method that greeting is no longer accessible so as a lot as the user of that library I can't introspect that greeting I can call greet as many times as I want but I can never know what the greeting is going to be and that's where I find value in there being a class so I see what you're saying in terms of if you have control of the code you're writing your application make it as simple as possible but as a library writer I want to expose a little bit more information how would you what would you say to that the question was using some tools partial and partial in particular you can't get at what was passed in what was curried the answer is you can but it's a dunder and not documented I wrote it so so yeah there's a danger that you can oversimplify or do this too much like yeah I want to pick the point with you about the class you showed which declares an empty dict and I was it it was float class and then everything else subclasses from that so is your point that this should never have been written or that after it was written it should and became obviously unnecessary you should have been refactored out because he talked about implementation hiding earlier so okay this is not the empty decked one I'm talking about the empty decked one and so the one that the one with the empty date they might have come back later on and said and make it a named Topol or make it an ordered dict or it must be a dict such that we can iterate over it deterministically or it must be a dict with a random method so you know those are all legitimate reasons for declaring the base class and if sometime later on you you find out oh it wasn't necessary are you saying it should have been refactored out well there's two things one you shouldn't shouldn't plan for the future like that in the first place preferably but if you think you had really good reasons for it and then you notice later that they weren't good reasons you can go back and refactor it out so anytime you have an empty anytime you write class and then pass on the next line you should really look into your soul and figure out where you sure but you know let's let's give them the presumption of sanity that they didn't just write a blank class with pass they iterated a few times and they had rational reasons why they thought it was going to be non-empty and and later on maybe the you know the API was to set in concrete and there are sort of social reasons why they weren't able to refactor it yeah I think your criticism is more at the design process it's not of the Gani the wrote the three lines of code so the question was they might have had good reason and they might have when I'm saying you're assuming they were in chart and they were a total owners of the design process and they didn't have people pushing back and saying no you've written the API you can't refactor it um yeah I mean that's a process thing it may be true so uh I've actually done a little bit of work where we would reuse the built-in exceptions the two big ones are always kir an attribute error and one of the things I found is that sometimes you end up actually catching an error that wasn't the one you intended to which is a time you know you'd make your own error class right I just want to point out that what we've been doing a lot is subclassing the built-in Python types so that way you can still catch your error but everything else that expects it knows to what to look like the other thing is when you mentioned with the classes you know just use a dict we do have a few times when you've got a bunch of when you have a you know a lot of data you get maybe like 10,000 20,000 dicks and you're trying to find out where your memory leak is or your itting exception because some data is going somewhere that you don't expect it it's helpful to be able to tell what you're looking at in other words like a dick with five things that have numbers in it not all that useful so I mean that was one of my comments but would you say that as far as the question just is sub classing the built-in exceptions still overkill or is that more acceptable so the question was is it okay to subclass built-in exceptions and yeah from the angle that sometimes you might if you just catch all key errors you were going to catch errors you don't intend to catch right and in general can you add information to things even if you're going to use them just the same as the built-ins can you give them a name I mean does that fit with your philosophy or is that still overkill for exceptions I don't use custom exceptions until I need to so it might be tough in the case where you're raising an attribute or they're raising attribute error and it takes you a while to figure it out so define your own exceptions only when you have to my mission is I've seen a lot of the pattern of classes as mere containers for constants they don't actually do anything they just hold a bunch of constants how do you feel about those so it's sometimes useful to use glasses as tiny namespaces when you don't want to add a module and you just need like four lines it's even useful to hold functions in in Python 3 unbound methods go away and 20 years no one ever found a practical use for them so now when you pull a function out of a class as opposed to say in instance when you pull a function out of the class you just get the function so if you want a small namespace with you know some constants in it and maybe a small bag of functions then then classes are fine okay all right thank you
Info
Channel: Next Day Video
Views: 886,895
Rating: undefined out of 5
Keywords: psf, python pycon pycon2012, pycon_2012, JackDiederich
Id: o9pEzgHorH0
Channel Id: undefined
Length: 27min 29sec (1649 seconds)
Published: Thu Mar 15 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.