James Powell: Objectionable Content | PyData Austin 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we're at PI data Austin it's Saturday sorry this is objectionable content we're at PI data Austin it's Saturday December 7th 2019 this is a talk about the Python object model my name is James Powell if you like this talk you can follow me on Twitter I don't use this code in fact you might want to open that up right now because the screen may not be large enough for you to see in the audience so I've put up teammate link on my Twitter feed so that you can follow along on your computer and watch everything I'm doing on my screen without having to squint at the poor resolution in front of you here so before I get started with this tutorial rather I want to just briefly touch upon why this is a topic that anybody at a PI data conference should ever care about what we're going to talk about here is almost entirely going to be about how Python works how the object model in python works how design of systems with the Python object model works and we're going to touch upon almost no scientific computing and almost no absolutely no machine learning and probably very little data science or anything similar to data science as an anecdote I was very recently working with a company that silicon photonics and the work that they do is very deeply scientific and very deeply mathematical however that work is situated within a production context they're not doing research for silicon photonics they're actually going and producing physical pieces of hardware and so while as part of that work they may employ machine learning in order to figure out yield predictions or in order to look at images of the chips and try and figure out if those chips are going to fail or not in production fundamentally they have some kind of production process whereby their design needs to incorporate into some system which can go and produce these chips there's a lot of production level programming beyond just the analytical side and as part of my work with them trying to train some of their scientists to be able to use Python more effectively we looked at a couple of sample problems that were relevant to them it turns out and was quite surprising to me that there's very little work in layout and placement for silicon photonics and well that's not the exactly the example that we use here we'll use a much simplified example that might touch or be within spitting distance of that fundamentally in order to do a layout system you're gonna have to figure out how to write a business rule system in order write a business rule system you're gonna to figure out how to use Python more effectively than just writing functions and so for many of you in your use of Python as a data scientist or as a machine learning expert you may have really only gone so far as to write maybe a couple of functions you might play around with keyword arguments in Python you might have written a couple of classes but you've never had to write large system design and what I try and convey to you in this talk is that that kind of large system design is necessary the moment your analytical work gets situated in some kind of business context and somebody needs to do something real with it for example to drive some automated testing process or in order to drive some yield optimization process your code your analytical code can't stand alone and so you have to branch away from just using your numpy NDA directly or your pandas data frames directly to do some analysis you have to start building systems and mechanisms around that now I have a very brief agenda for you we've got about 90 minutes for this tutorial and I'm not sure how the timing will work out but I'll spend as much time as I can showing you a couple of things that you may or may not know this is a pretty novice level tutorial so there may be some content at the beginning that you're already very familiar with and then once I show you some things we'll try some things we'll work on an example perhaps together in order to that way make use of what we see about the Python object model in order to actually design some kind of system now I suspect that we won't have enough time in the example portion to complete this example the example that I'll show you in a moment is about modeling simple linear planar circuits and automatically computing what the current and voltage at different points are there's a little bit of mathematical work in there and we may not get as far as that but hopefully we'll get just far enough to think about how we would design a tool that could do that using what we see in this first portion okay and so let's get started so this is me on Twitter you can see that teammate link there if you click on it you'll see something that looks like this this is the session that I'm gonna be working out of and so if you go to my Twitter feed at don't use this code and you click on that link you won't have to squint at the screen up here and also all the people at home who follow me on Twitter will get to see this talk for free but they won't get the audio so I'll have to figure out what we're doing now let's get started so let's talk about Python objects in the Python data model and we'll start very simply I mean a lot of your data scientists and really when we talk about Python objects the most you may have ever touched upon is maybe a panda's data frame or numpy ndra and some dotted methods on it instead of saying you know sort of from numpy import some you did you know numpy ndra dot some and you know that there's methods and there's attributes on these objects we haven't gotten that much further and a lot of your code you may just work pretty much almost entirely with the built-in types and so it may be the case that if we had a simple program that was trying to model like a simple circuit we might start by modeling each of the components of that circle it's just a tuple and when we can see if we do that you know we get a simple tuple type and it has some fields associated with it and there's nothing too complex about that now we actually try to interact with that we have to access the fields of this tuple and we'll see that that's going to be a little bit unpleasant we have to use direct field accessing and it's kind of ugly it's not clear to us what component 0 1 2 or 3 might be if this block of code that's accessing the fields is very far away from this block of code but you know it kind of works and we might at some points do you know make use of tuple D structuring in order to give names to these fields so it might be structure this into four different variables and then access those variables directly and that's kind of nice but we still have this fundamental problem that the tuple when you D structure it you need to know the order of the fields themselves and if this code here is very far away from this code here it may be very difficult to keep these two and sick what we might then try is we might try something a little bit fancier it might say you know what I want to be able to access the fields of this object directly I want to have some self documenting behavior here so instead I'll use a dictionary and the keys the dictionary will be the field and the values the dictionary will be the actual value on that field and so that that kind of works there's nothing too fancy there when I go about accessing the fields it's not altogether that difficult I just use my square bracket indexing for the dictionary and it's a little bit ugly a little bit clumsy I have to add square brackets I have to have these single quotes or double quotes but it's a little bit nicer than the tuple in that if I were to add fields later I don't have to go and change all the code that I've already written and visually it's a little bit closer to being self documenting now I might then try something very simple maybe for example I'm a job script programmer and I really like how JavaScript the objects and Java scripts can be accessed using the square bracket syntax or the dotted syntax so I create something like an adder dict and I just override the get adder set adder and Dell adder for a Python object so it just calls the Dixon get item set item and del item and so here I have effectively the same code as before except now I can access the fields by name using this dotted syntax just like I can in JavaScript now this is kind of silly and please don't ever do this this is a really bad idea it's doesn't gain you a whole lot and I don't know why people just really seem to hate typing those extra square brackets and the double quotes but if you really think about it you really dive a little bit deeper and you think is it that silly it's not really that silly because if we were to write this as a normal Python object you know I just a regular class with an init method I'm going to look at that class we'll see that inside that class there's a attribute called dict with two underscores before and after and it's pretty much the same dictionary as a dictionary above and that dict object is where it's storing all the instance state so we might say you know what this adder dict structure despite being a very bad idea and by the way if you try and look up how to write that there's two formulations that you'll see people talk about online and one of them is a terrible terrible formulation where you set the dict of the instance to itself and you end up with potential memory pressure because it has a circular reference and so when the garbage collector tries to collect this there's some delay in its collection and so don't do it at all but if you're gonna do it use a formulation I showed you above however what argument you might make is that it's not altogether that different than how Python dictionaries are actually implemented under the covers now if we try that one of the things that we might have run across if we've ever done some object modeling of Python is we have some system and we have lots of these kind of resistor objects because we're modeling humongous circuits and we're seeing that's using a lot of memory and it turns out that this formulation of the Python object that has this dict objects inside of it representing all of the instance date does use a lot of memory and so we can look at this object and we can ask the system the system on Jul what's the size of this thing and this this module will tell us how it's 48 bytes in size and so it's pretty big for one individual piece of data that really doesn't store a whole lot of them for just some part number some manufacturer and just a resistance presumably in base units and ohms of this component now if we were to really get a bug about us to try and do some kind of management or some sort of analysis of the memory use of our program we could go pretty far there's quite a few tools in Python especially in Python 3 for doing memory management there's a trace malloc module and the trace malloc module you can take snapshots and when you take snapshots you can see what got allocated in between those two snapshots and so for a very simple example like this we could see is this thing really as large as I thought it was and if you actually run this you can see on average this line 11 which is the creation of this one object indicated had 48 bytes of additional memory allocated so it probably takes about 48 bytes to store this structure which is still very large but at least we have some fixed bound on how big it is and we can play around with trying to make it smaller now bear in mind another thing I would never advise you to do is the moment anybody gives you a project that's our applications using too much memory can you figure out how to reduce that amount of memory and by the way it's written in Python just find a new job because it's those projects are always giant dead ends because even though this analysis kind of looks like it works it only works like this because we're dealing with very simple heap-allocated Python objects the moment you start to do memory analysis on large Python programs you'll see that there are structures within a Python interpreter like free lists in your attribute lookup mechanism that never get D allocated so an actual Python program may not necessarily deallocate memory that it no longer uses it'll just hit a high-water mark and stay at that watermark so all of these exercises where somebody says oh our app is using a gigabyte of memory and how do you improve the memory usage and then you go and you try and change a bunch of things you won't be able to get the effects that you think you might be able to get and fundamentally unless you really understand what memory means on all the platforms or application runs like on your Windows platform you understand what that VSS and an RSS and all that means or when you do free at your Linux terminal you understand what all those numbers mean it's a dead end and so we won't go any deeper into this example because this is pretty much a dead end career wise if you ever get a project like that and also analytically because the typical the typical answer to this in Python is to use a MEK provided by the Python object model called slots which no longer or in which a Python object no longer has an instance dictionary so you save all that memory from that extra dictionary and instead you have explicit locations for where the instance date of that type is stored and so here this particular object cannot store additional information outside of the number manufacturer resistance if I try to amend this to add another field at runtime it'll fail you can see it'll fail it doesn't have that attribute and so it loses that dynamic behavior in exchange for presumably being a little bit smaller in fact but in fact in this case it's not smaller it's actually bigger and so you can see just don't don't bother with memory analysis in Python it's not worth your time now what you should do instead and the guidance that I should give you it's a very important very high-level piece of guidance for whenever you're writing a program or even working on the example that we're going to look at you know after we go through some of these topics is to think about Python as an orchestration language Python was originally called back in the day when we were trying to popularize Python though the ultimate glue language the ultimate orchestration language a language which takes components that are written in other languages like C or C++ and orchestrates they're mechanisms inside those components which might be some kind of analytical code that somebody wrote ten years ago that nobody wants to rewrite but we don't want to today be designing experiments or setting up models and still using malloc infer you I have a nicer interface for that well then we reserved the manipulation of the problem the manipulation of the business entities or the manipulation of the experiment to the Python level and the actual core computational use to this library and C or C++ or Fortran or whatever it might be and so those entities turn into what I like to call restricted computation domains a very common one that you're almost certainly familiar with is the numpy ND array if you think about it a Python object is always boxed there's no distinction in Python between boxed and unboxed objects like there is in Java and Python everything is boxed and so everything is an object that you can call methods on that also is heap-allocated and takes or with the exception of certain things that might be interned or other optimizations but for the most part our heap-allocated and take a large amount of space whereas with the numpy NDA the content of the umpire in the array are contiguous machine types so they're unboxed well what happens in Python is you often design your system so that you have business entities that are unboxed entities then you put them into a computation domain like a numpy and D array or a panda's data frame inside that computation domain they get unboxed so that the computation domain which restricts your access to those entities can then perform certain specializations and optimizations for example putting these in some kind of cache contiguous structure so that you can do fast sums or taking Python integers and turning them into Machine integers with specific bit widths so then you can use processor level instructions in order to compute operations on them and you can see more or less that pattern occurs in any code example uses pandas or numpy or network X or any of these other tools and so in the case of numpad you might see some code that looks like this you have three unboxed Python integers you put them into a numpy ndra you perform some computation inside that array once it's inside that ndra because it is a restricted computation to me because numpy says these are machine types there they are fixed bit with integers so they are in 64's there is no inter dependencies between these items it can then choose to perform this element-wise multiplication in whatever fashion it finds is most efficient so it can auto paralyze it or I could use machine level instructions or it could use some kind of more sophisticated processor level instructions to auto vector to vectorize this multiplication and then you pull them back out and so you can see there's almost always this case of you have the unboxed the boxed object you put into the computation in main which draws a box around everything the objects themselves become unboxed and then you pull them back out as boxed objects and so this is tends to be the pattern for where large object systems work and so as we go through the example one of the things you should think about is anytime you're working on some production system in Python and you're modeling in terms of Python objects one of the limitations that you have is it's just gonna be slow it's gonna be slow and it's gonna use a lot of memory and it's gonna be slow and use a lot of memory because python objects are heap allocated they're not contiguous and so even if you have computations that should be local in some fashion you won't get that locality in terms of where the objects are in the heap and also they're just huge right I mean in the previous example we needed 48 bytes to store what two strings and an int that's a lot of space to store not a lot of data and the reason is there's a lot of overhead for a Python object you need to manage what the type is you need to manage its reference count and so forth and so on and so what you should think is whenever you're designing a system like that it'll inevitably turn into a situation where you draw some box around the business entity some manager box and inside of that you create a computation domain and then you restrict access to it and that's how you get your performance back and so one of the things when I talk about this people sometimes bring up is they like to say well is a restricted computation domain a monad and it's not really a monad you think about monads you often think about putting a box around items and interacting with them only through some specific API in relation to that box but what makes a monad interesting I think is the bind operation and there's really no notion of a bind operation here and also the ability to then provide Combinator's on top of them we don't have that this is a lot simpler than a monad it's just some computation domain that gives you the ability to specify how the underlying data is stored and how you process underlying data so what happens when you do these things in Python is you end up having some kind of manager class and you end up writing code that looks like it has some manager class that manages everything and so for our resistor network let's say we had some product that had some resistors on it we ought to provide some computation on that maybe that computation would be that we wanted to sum or find the mean resistance of all these resistors and so we might have some individual Python object that represents this boxed individual business entity that stores information that's not computational nature like the manufacturer and the part number are not computational denchu they're not used in any computations but they're used for your production management of that system for example you ingest a bill of materials and that bill of materials needs to know what's the part number and what's the manufacturer and what's the resistance you then perform some computation validate this product or this piece of hardware and then you say ok I valuated it now who do I go and send shipping orders to and so you need that business information but it's not actually used as part of the core computation then you might have some computational entity and that this product here represents this computational entity it takes the components that you give it in this case just a bunch of resistors and it extracts the information for them and puts them into a it's data frame now if you remember how pin is not a frame works internally it has this block manager in the case of the pan has data frame here it'll probably be a numpy and D right so effectively my manager class then has none PI India race which are now contiguous machine types structures where these computations become very efficiently and when I add a get item to pull items out of this computation domain I can then restore the boxed Python structure by just reconstructing the resistor object on the way out and so what you can see here in this example is you know we start with a product and we give it a couple of Python objects it deconstruct those Python objects to pull the information out stores in an efficient format and that pan is not a frame I can then perform operations on it like I can do a dot mean operation that will then be performed in a very efficient fashion and then I can ask it to give me the object back out and they'll reconstruct the object in the way out so this is how these structures tend to look and you wouldn't you really wouldn't do something like this in C++ and C++ you'd often just write a C++ object and you assume that the you know zero cost abstraction H or C++ would make this automatically efficient and Python you have to add this additional abstraction that it's just a layer because the pure Python objects are very good for orchestrating the operation you're trying to perform but they're not very good at performing or representing the data in a fashion that is optimal for computation so with that set aside let's go back to our friend the tuple and we probably don't want to really use a tuple to represent any sort of data in our program even if we're doing you know simple data analysis and all the only sophisticated type we deal with is maybe our input data and then we stick it into an umpire in the array we probably won't don't want to really use tuples passed maybe the first prototyping stage instead we might want to use something like from the collections module the named tuple which is just the mechanism by which we can automatically generate code that subclasses from a tuple object and in that subclass in that automatic generated code gives you very nice things like a nice principal representation if we look at this code you can see it gives us a very nice printable representation of the underlying object and we have this dotted attribute access to these fields so we don't have this unfortunate organ ah mcc's of the under of the tuple where we have to access the fields by position and then if we add extra fields and we have to go and update our exist code now the limitation of the name tuple is that it's still a tuple and so you still have that immutability which may or may not be desirable for you but fundamentally this is probably better than your underlying tuple structure but if you were to show me some code and you said I'm modeling some system and it's in a production system and I started by representing all of my objects that's just built in Python types tuples and dates did I do the right thing or not the real answer to learn from all of this is that one of the design principles of Python is spend more time with your family in other words if you're given a task and that task is loosely defined as most business tasks in a production context are and you're told I need you to do this one specific task well then do the bare minimum that you can do in order to achieve that one requirement and what you'll see as we go through some of these examples is Python gives you certain ways to take a very simple representation of an object and to increase its complexity as the complexity of the requirements increase and so the answer is not in Python to take a C++ approach to designing an object to designing your system in C++ maybe the approach that you might take is you first think really hard about the underlying types that you're representing and the operations between them and you write everything out and if you get it right then each marginal requirement that you get is only a very small marginal amount of work and so you can think there's a lot of upfront effort then a very small marginal amount of work as you go and Python we try to get something a little bit more linear in the sense that you produce a very marginal amount of work today wait until the requirement comes in and then see if you can extend that without causing changes to code that uses the code that you've written and so things like the elevation from bear tuple to a named tuple in his example because if I had started this example with just a regular tuple other than the dotted lookup anywhere else so this code might be used would work and so turning this into a named tuple only turn this into rather turning this here into a named tuple just means that I add some functionality but I don't subtract an existing footer I don't have to rewrite any code that was already using this structure and so this is maybe the idea behind some of how Python is designed really keep it as simple as possible from day one if you're given a requirement try and get it done by 5:00 p.m. go home spend time with your family and then wait until tomorrow when you get a new requirement to then escalate the complexity of the code that you've written and so as we go deeper into this tutorial we're going to see some examples where the complexity is indeed very very very high and there's some mechanisms that are very sophisticated and maybe not mechanisms you want to use from day one there are mechanisms that you should employ when a requirement says the only way for you to get this done with the system that you have written here is to use this the sophisticated tool don't use it from day one because you're just gonna be sitting writing code all night and you're your family's gonna be at home just crying where's I said like you said he'd be home at 8:00 p.m. so one thing that I want to draw your attention to and and maybe one of the motivations for why we're going to look into some of the sophistication is if you look at where we started this first section of our tutorial with representing this resistor object is just a tuple and you look at where we've ended up representing is a name tuple there's one little piece that's missing in the tuple representation we had one extra field and that extra field was the type of component that we had and the reason we might have that extra field is we might have some code somewhere that says look at the type and if it's a resistor do this particular operation and if it's a capacitor do this other operation or if it's a resistor then we know that we know a certain relationship between the current that passes through it and the voltage or if it's a capacitor it's a completely different relationship and may be nonlinear for certain circuit elements and when we turn this into a named tuple we see that the code slightly changes instead of directly checking the type we use this is instance method that's provided to imply that in the built-ins and ultimately these two blocks of code do the same thing they're checking some they're checking some object and see what its type is and then they're dispatching on that type to perform some operation this isn't even type based polymorphism this is just very simple check are you a resistor then do this if you're a capacitor to do that now one thing that is very interesting and meaningful here is notice that the code at the top and the bottom is now moving towards using what you might call a shared vocabulary one of the things that we can think about for the Python object model is that it provides us with a shared vocabulary that forces our code to maybe integrate better into the Python vernacular into Python systems as a whole in other words if I ask you give me an API for figuring out if a thing is a type of a thing well one way that you could do is you could represent those things as tuples and you could say well there's a field and check to see if that field is exactly this string another way you could do it is you could say in Python I'm given a built-in function and it's called is instance and that built-in function reads to somebody reading through that code as asking this question of us is X a thing of this type and how I implement that check may be customizable and in this particular case it is very customizable we'll see later how to customize that instance check but fundamentally what you're looking to do is you're looking to add a particular vocabulary to employ a particular vocabulary that is available to you and made available to you by the built-in functions and by the objects the the data model in Python so that your code reads very closely to the intention of what you have and so when you read through this code it reads as less mechanical or it should read as less mechanical it should read as asking the exact intention of what you were trying to ask are you an instance of this thing whereas this code here looks a lot more mechanical Oh is your zeroth field this particular field and it doesn't have that close correspondence between the intention and the mechanism that's used to express that intention so let's talk about the basic object model in Python what we know about the basic object model in Python is that we often use it to implement what we might call protocols and so you can think for a very simple Python object like this resistor class there's some protocol that represents initialization and there's some syntax associated with that and there's some close correspondence between the syntax that that invokes a protocol and the way that we go and we implement that protocol so for example the syntax here would be this type name parenthesis syntax that is what we call initializing and so when we look at this line we say ok initialize a resistor with this particular data and we need to implement what that means for this resistor type so we write internet function and the net function is very simple it just sets fields there's nothing more complicated than that and if we do this we can see it works and it's ok although the printed representation of this thing doesn't look very nice at all kind of ugly and so we might think ok what's what's missing here what do we need to implement in order to improve the printed representation and we may know that in Python there's a method called repper it's another method provided to at the built-ins and it means give me the human principal representation of this object and so what we might think is given that close correspondence between some built-in function or some syntax or some teepee or some other bytecode you'll see in a moment there should be some way to implement that protocol and in the case of implementing the built-in repre it means implementing the underscore repper data model method and so if we go ahead and implement this data model method then we've satisfied the ability for this object to get a human principal representation and you can see here when I print this out it as a human principal a human understandable representation it doesn't tell us just the type of the object and the object's ID it tells us information about that object now if we go to the part of the Python documentation Doc's not python.org / 3 / I don't know just go there and then find it from there you'll see that actually just Google for a data model method or for data model Python you'll see that there are certain rules around the implementation of all these protocols there are certain guidelines what you'll see often is that these protocols are rightfully called protocols because usually what you're doing is you're taking some fixed mechanism by which you answer a question or perform some operation you're hooking into it to add some small amendment or some small addendum or some small modification and then you're dispatching back to that same protocol on either constituent object or on base class and so one example of that is if you go to that data model page and you look at the rules around repper one of the rules that people generally try to abide by when they're writing the wrapper is that the thing that this prints out to be able to cut and paste and your Python terminal and when you cut and paste it you should be able to create an equivalent object of the same term that doesn't work because you can see this is a numeric expression not a string and this is a variable name not a string so there's something missing here I need to put extra quotes around that it may seem like a very minor thing oh I'm just missing some quotes I should just add some extra quotes here and if I add these extra quotes here and here then what I get is a repper that conforms to the standard practice where if I cut and paste this line right here and I paste it on my terminal I'll get another resistor object that should be equivalent to the first one but what you might think is well this case it may not make sense or maybe it might make sense what if this field here has double quotes in it maybe that field has single quotes in it and then I have to figure out how to do that quoting I have to come up with some complex some complex code to figure out how to quote that thing or to escape quotes and that thing and instead what I'm missing is that final piece which is these protocols almost always dispatch back to the same protocol and either constituent object or on a base class and so the answer here should be something like this the only real change that I've made is I put us a bang R at the end of this I put this field formatting specifier in this F string and what that means is this is the equivalent to just calling repper inside my repper these are equivalent and so the guidance here is that whenever you're implementing with these protocols almost always you're gonna call the protocol from within the protocol and so all you're really doing in that implementation of that protocol is just adding your slight modification and then dispatching back to the protocol to do the work as expected and when I finally make this work and I add an EQ here so that I can actually see they're equal and I add in I cut and paste that result then I'll be printed on this line over here we'll see everything works as expected and so I get the resistor printed out nicely if I cut and paste it at my terminal its back equal to the original object and so that's the general guidance that I have for repper this is why you might this may answer why for example if you ever opened a file in Python say the file that we'll look at in a moment network dot JSON as F and you tried to print out the wrapper for that thing and you looked at it you saw it had angle brackets before and after and some objects you might see in Python have there's angle brackets before and after you might wonder why that's in the wrapper well this is not violet Python syntax and so in some cases impossible for that human principal representation to give you something that you could cut and paste and get an equivalent version of the same of the original object in the case of a file it's because there's going to be a lot of hidden information here like the file pointer at the burning system-level or certain buffers that may not be recreated Belen any fashion or there could be write buffers that may not be you may not be able to recreate in any fashion to reconstitute this and so they explicitly put these ankle brackets before and after the repper to indicate that what I'm showing you is a human printable representation that could not ever be reconstituted into the original object because there's some hidden state that is impossible for us to capture it's no longer some kind of immutable or stateless object like a resistor type was or some object with some controllable invisible and manageable state now it turns out that this stuff can kind of be a little tricky to get right because let's say I start with my resistor and I add in like a variable resistor like a potentiometer into this problem and my potentiometer just subclasses from my resistive nothing too fancy and I create a potentiometer here and maybe I should call it p4 potentiometer and then I just try and print it out nothing fancy at all what you'll see is that the represent right tells me it's a resistor but it's not it's a potentiometer and it's just simply wrong and the reason for that is you know when I wrote this rep really I hard-coded the name of the type and so maybe I need to do something fancier I need to dynamically present the name of the type as part of the repper by actually looking up the type at runtime and printing that out now is this an actual improvement well in this particular case you can see it prints this out correctly but this isn't quite an improvement because it turns out that this idea of dispatching back to the underlying protocol doesn't always work for every protocol in a very nice fashion it's often the case that you do have to just go and write the boilerplate to re-implement something and repairs are very clean a very clear example of that because if in my potentiometer class I then add in maybe the max and the the min and the max if you remember a potentiometer has kind of a slider on it and kind of slide it there and it goes between two levels of resistance and then wherever the slider is that's the current resistance so maybe I add in a minute of Max resistant field but then the rest of the fields are the same as the underlying resistor so it really is just extending this underlying resistor class you know I got to write a new repper because I have these extra fields I need to print out and how does this represent it might be able to dynamically figure out what the type of the class is like we saw in the previous example but I'll never be able to figure out to print out those extra fields as well and so I'll have to rewrite this and actually I missed one field here so we'll add in that extra field so I should do it like yes so there you go now you might then look at this and say tell me doesn't sit right with me because every time I implement these protocols like I implement Len or get adder or get or get item or call I'm you almost always invariably reeling that protocol and here I'm not really doing that there's just seems to be a lot of boilerplate with this repper and unfortunately it's unavoidable because the only other way you can avoid this is if you go ahead and try and create your own protocols or your own object system and so here's an example of something that you might try to do you might try to tell the resistor type here you know what these are the fields that you have these are the fields that you have and along the way we'll also make those slots because for some reason people seem to really love talking about slots even though I think they're just absolutely boring this can be and so because I can identify what the fields are then when I write the wrapper for my resistor down here I can just have it dynamically pick out what those fields are gonna be and print that out and then that means that any subclass of this just needs to say oh I'm a potentiometer I'm just all the resistor fields plus these two extra fields and I get that repper for free and what you can think that you're doing here is you're kind of creating your own universe your own set of protocols your own object model you're creating a set of guidance or set of rules that every type in this hierarchy needs to abide by in order to be consistent because it could be the case that for example you subclass something from this a different type of resistor and you know this is some some other type of resistor and you forget to fill in the fields and if you forget to fill in the fields and the repertory thing fails and so you might additionally need to add in some guarantee mechanisms or some mechanism by which you can enforce from the base class down to the derived class that you fill out this object model but fundamentally what you're what you're seeing is you're beginning to diverge from what a normal Python object with the rules of a normal Python object dictate by adding your own rules if you create your own subclass here you also have to fill out these extra field this extra field structure otherwise you start to potentially lead to inconsistencies in the way this operates but for the most part this works now is this worthwhile is all of this additional mechanism worthwhile in this particular case no all you save yourself is a little bit of boilerplate then you would have written just just written once that would have just been a cut and paste job it's almost certainly not worthwhile to start adding these complex object systems on top of the Python based objects just to save yourself a job they just give to the intern right and so there are definitely cases where you want to build up these object systems for example if you're designing a library like an ORM of course you need to have some kind of object system you need to have a tight set of rules for how people subclass from your base ORM object but that's something as simple as this no it's really not worth it and so I think one of the other guidance is for how to make best use of the Python object model let's really don't overthink it sometimes there is just cut and paste sometimes there's boilerplate whenever you're writing your knits there's some on some necessary amount of boilerplate there and just live with it just learn to live with it because trying to get around it starts getting you into this deeper and deeper and deeper hole because for example this object system has other duplication right these fields here are repeated here so am I gonna automatically generate the units as well and what if the anit needs to take some modality that's not part of the field and then how do I build into that abstraction the ability to have these these additional modalities and this is gonna become more and more and more complex as it starts to interact with the real world whereas the cut and paste approach would give you that infinite flexibility where oh this particular subclass needs to have an extra flag and that flag is not a field it's just some modality flag for guiding the rest like the units in which the resistance are in and then you can see that just everything will fall apart and so it really it really you really have to be judicious about using that now if you really wanted to be clever you could try you could really try very hard to use some of pythons introspective nature in order to get around not building your own object system but fundamentally you're basically doing the same thing and so in this example here instead of identifying what the fields are I use the inspect module signature to look at the signature of the init to figure out from the signature of the anit what the fields would be and then to automatically generate the rapper from that and so the magic line here is just this line here which figure out what figures out what the fields are and then automatically Princeton so you see both of these representation where if there were some modality here this modality would be interpreted as a field not a mode and I'd have to figure I'd have to say here oh look at all the look at all of the parameters to this init function that happened to not be keyword our only arguments or whatnot or you'd have to get people to say you know this is a read-only field and have to use annotations or some nonsense like that and it's you're just you're just adding a lot of complexity for what is really just a little bit of boilerplate that you just have to suck up the right now all of this kind of discussion of an object system or creating an auto system should remind you of something else that you've seen in the Python standard library Python 3.7 and we got the ability to or we got in the standard library our own version of tools that have existed outside the standard library for some time which is automatic generation of the boilerplate for a class and the standard libraries they're called data classes and these are immutable types that automatically generate a repper and in it for you you use the variable annotations that were added in Python 3.7 I believe in order to specify information that's never actually used anywhere so don't think that this is actually checking that these are strings in a float no it doesn't actually do that but maybe your maybe your type hinting tool does that I don't know I don't use those I don't think they're that useful either but a lot of a lot of a lot of code for free and so from our perspective of doing the least amount of work in order to implement the system I think there's something worthwhile here and I am personally a big fan of data class although I would say you know it doesn't really matter what you put there unless you're really using the type int and stuff although maybe if you were to extend data classes yourself to add your own mechanisms you might put something here that might be more interesting like some automatic validator or initializer something better than the pepp 44 type int Inc stuff because I mean the the variable annotation just gives you some structure place to put some information which you could then retrieve as part of your object system now one of the things I want to draw your attention to is that as we can see and as we saw in the previous example as we get deeper into the Python object modeling we implement all these underscore functions we're effectively creating a vocabulary that describes how our objects interact and how we interact with those objects and so in this case we have some network of circuit elements and they and we can ask that network how big are you like how many elements that you have what would that what would that mean how many elements you have how many connections you have how many nodes you have but there's some particular meaning that you might you might assume when somebody asks how big is this network and the way that you might implement that is by using Len Len is a very natural way in Python to express trying to figure out how big this thing is and then you implement it with your underscore Len or you might need to retrieve something from this network for example you might need to retrieve one of the elements by it's part number and you can think this idea of retrieving something corresponds in Python to the square bracket syntax and so you to implement get item and this would be better than writing your own you know get part by number of function or your own get size function because what you're thinking is that the thinking behind this is by sticking close to the vocabulary provided to you by the built-in functions and a Python based data model what what you're doing is you're reusing a vocabulary that already exists that describes the way that Python objects that people are very familiar with interact you're encouraging the objects that you create to better integrate with the core language itself you're encouraging the code that you write to minimize the additional entities that somebody needs to understand in order to understand the code that you wrote and so I would say that implementing Len might be a net benefit over writing a getsize even though you might say this is more explicit this is something where somebody looks a Python programmer looks at Len on an object and they intuitively think this is saying what's the size of this thing and as long as this network has one unique privilege notion of its size in this case we're assuming that that would be the number of elements then this this produces code that's fluent integrates nicely into existing Python code and that minimizes the amount of additional API documentation we have to have because you can think here if you had a get size you'd have to write some API documentation and I suspect that the API documentation right for Len would be much smaller Len would just be give the slides of network in elements whereas here you might need to add additional qualifications now unfortunately one of the downsides is it's not always clear where this or how this vocabulary is specified or what the terms in this vocabulary mean unless you've been using Python for a very long time and there may be cases where you look at how Python works and you might assume that there's a vocabulary in place when there really isn't and one example that I ran into a very long time ago was the hash method in Python we know there's a built-in function called hash in Python and if we look at that we could say oh this gives us some unique identifier for an object that might represent this object in a fixed space and a fixed integer space and so I might then use that if I need to store my objects in my own custom object database or store them in some fashion that needs to be retrieved off of disk I might say well I need to find some way to compute this hash you know what I'll just use the built-in hash protocol I'll use the built-in hash function and the underscore hash method and that's how I'll make it work but unfortunately that's not what hash means in Python hash means something much narrower in Python it means give me a single integer that can then be used to compute what the array index for how this thing will be stored in a set or a dictionary because for example if you have a Python object and it's hash always returns negative one it does not always return negative one it is impossible in C Python for the hash of an object to be any to be negative one always it scores to a negative two-y because deep in the bowels of c python this negative one value is used as a c value to indicate that there was an error in the computation of the hash so every time a hash every time in underscore hash method returns negative one it gets coerced to a negative two there's a line of code there where it says if it's negative one return negative two and so you can think why does this make sense because it doesn't happen pi pi doesn't happen in any alternative Python at any non C Python implementation well it's because this protocol does not mean oh just compute a hash they could be used in many different places and could be used in order to store.this an object database is stored on disk this means very specifically and very narrowly compute something that will be used by the Python interpreter itself for the purposes of that running instance of that Python interpreter and no more and sometimes that small and very subtle distinction may not be visible when you're reading through the documentation we're reading through Python code for the most part these these corner cases are fairly rare but there are these corner cases where you look at something you say oh this is a nice way to express an idea using the built in using the Python data model and it's not because there's some nuance there for example if you look a little bit deeper hache may also in some cases add some randomization factor in order to avoid DDoS attempts for python systems that are built to have some kind of user exposure and so what the actual hash value that's produced by this method may not always be predictable and it may change between different invitations of the same interpreter with the same version of Python and the same code based on the same machine and so it's not something that you can reliably use it can use to compute something and so in this case it would actually be the case that if you wanted to have your objects serializable into your own object dictionary and they need to have some unique identifier in that sorry in that object store then you might actually have to write your own hash without the underscores function and you might have to now duplicate the entities because this object here might then have both a hash of something well just do it may have both this and its own hash method and this may seem unnecessarily macey unnecessarily redundant but this one is being used for internal use by the Python interpreter in this one's being used for some kind of external protocol and so being able to distinguish what these protocols are whether they're actually for broad used to describe the interactions in your object or they're just used for hooking into the Python runtime can sometimes be a little bit tricky to see although I can tell you that there's fairly few circumstances where this occurs most of the time you can reuse these protocols to express whatever meaning you want an example of that would be and you this made this may make some of you think of the path Lib module in Python and one of the things that people like the path Lib module is it gives you controllable programmatic access to paths like you can switch out suffixes and get parent paths you can resolve them you get relative paths and absolute paths so trying to do string manipulation so what you do is you put that string into some structured you put that string data into some structured object that object then gives you structured ways to manipulate it but other people like it because you can use the solidus you can use the slash to mean the same thing as like you know directory slash directory and this is this this I think was controversial because I think a lot of people really objected to the use of what appears to be a mathematical operator to just kind of do some cute syntax and you can really go as far as you want with this and so here's a stupid example of using map mole the app sign to automatically generate emails this is absolutely pointless this is not an arithmetic operation all of the assumptions that you might bring to mat Mall like for example it performs some non element-wise operation you know it may not be it may not be associated or may not be commutative those all break this is just gimmicky syntax and some people seem to like that I don't know I think you can go a little bit too far so this is an example of going way too far now that said path Lib for step for whatever reason people seem to like so you can't you can't complain too much now let me let me dive a little bit deeper into maybe some of the motivation behind all of this because I think that sometimes looking at some of these mechanics might be overwhelming and so I want to give you a case study a real-world or simplified example of a real-world case study of where some of these mechanisms are valuable and one of the one of the thought processes you can have by for when you might use some of the mechanisms especially some of the more sophisticated mechanisms we're about to look at is to think about how use of the Python object model not only can give you this common vocabulary to describe in relations between your objects but it can also give you the ability to rearrange the complexity of your program to serve certain human needs so give me an example let's say that we have a simple report and that report has some information before and after so we've actually done the circuit analysis and we figured out where the currents are and we can see given some layout of the circuit some dialing on that potentiometer the sar where they were in the before line and then after we dial that potentiometer differently the currents are there and for some reason in this particular example one of the currents dropped out the other one dropped in so maybe we dialed one of the potentiometers so that it became so that there was no longer like a span between those two nodes and one of the things we might want to do is me I want to actually compare like what's the before in the after so we might want to generate a simple report I'm gonna generate a simple report that looks like this these are the four currents these are the after currents this is the absolute Delta and that's the percentage Delta and it turns out that in many fields this report was actually something that I spent quite a bit of time doing when I was when I was building PNL systems for front office trading desks because there's often the case that what you'd have to do is you'd have to come out with a new financial model for a bond and you'd have to see how that model compared against the old model so you might price this over a set of days and look at the differences and one of the common things that when you generate a report like this somebody asks for is they want to be able to flag things that are severely out of variance because if you look at the differences here you can see for current one it went from ten I guess amps to 14 amps that's a meaningful change current to went from 15 to 14 that could have been you know if we were using some kind of kickoffs Laplace you know inverse matrix formulation this could have been some kind of rounding error so that might not even be worthwhile to look at and then here you can see this current 5 went from 1 to 50 so clearly something went really wild with resistance on that span because we increase the we increase the current by you know five thousand percent and so how might we want to look at that well in the confines of a very simple text report maybe we'll just add a little star to the end of the lines well at the end of the lines add a little star we'll add a little flag so we'll say you know looking at the percent difference if it's greater than 50 percent at a star star if it's greater than 10 percent a little star and as simple it can be if we look at a report now we can see we stored that and that as being very meaningful things to look at it's a very common thing to do imagine if you're comparing two runs of a model and you need to figure out where were the meaningful changes from day 1 to day 2 or the meaningful changes before and after we made a change now the question is is it likely that this functionality here when you're constructing that flag is it likely that might appear in different places within your system of course because the guidance for how generate this flag might be something that is necessary not only in this particular report but in any other subsequent report that you do maybe you have some model and you're looking at its performance its memory use I'm sorry or maybe even you're looking at its predictive capabilities and this flagging there's percentage flag and some of you want to use in all three places and so instead of cutting and pasting these five lines of code you need to find some abstraction allows you to reuse the same flagging mechanism same percent flagging mechanism in three places now as a you know a data scientist who's maybe mostly familiar with writing functions or mostly comfortable writing functions you might say this is a perfect example of refactoring just extract this to a function so I have a simple function called get flag I give it the percentage difference it looks at the percentage difference and it gives me that flag and I generate my report from that and everything works as I expected and now in the different parts of my system where I need to use that flag I just reuse this function this is very basic modularity right but I want to suggest you maybe a different approach to take and a different approach to take that might comport a little bit better with what we've been looking at so far about building this common vocabulary but also about how we can rearrange the complexity of a program using this Python object system and so as an aside I have a small question for you we know that there exist many protocols in Python like get adder get item and call get I get adders for looking up an attribute by dada Dame syntax get item is by looking at something with square brackets and call is whenever you invoke something with the parentheses after it what is the difference between these three clearly there's a difference between get adder and the other two because the argument for get adder is always a string with maybe underscores in it but you can never really implement get item with get adder that's impossible but what's enormous we can get item and call like why does it matter if I chose in my protocol to use square brackets here are parentheses you might say what it conveys to the reader of this code may be different in this case you're conveying look something up in this case you're conveying compute something but mechanically there really is no difference the code here is the same the protocol is different the syntax is different but it may not be clear where that difference is now putting that aside let's rearrange the complexity in this original example instead of having a function that gets the flag let's create a structure and we'll call it a range dictionary and just like a normal dictionary it'll look things up but unlike a normal dictionary it won't look things up exactly it'll look things up belong to some range so if the value that it's looking up is between the range of 0 and 10% they'll give me just an empty string 10% to 50% will give me a single star 50% to infinity I'll give me a double store and then I'll amend my report so in my report I just look up the flag using this lookup and that kind of makes sense because we know that square bracket kind of means look something up so this means in this very last line of my report look up the flag given the percent difference and we could see there's a very close correspondence with what we originally wrote which used a function formulation and we're writing here which is using some kind of look up some getitem formulation now the actual mechanism that we might employ to to make this work might look something like this we'll have ranged extrema dict it'll implement part of the object make part of the dictionary protocol called missing when you look for a key that doesn't exist you invoke this this is a very inefficient implementation because it's a linear search through the ranges but you can imagine that you could recode this as a more efficient maybe logarithmic search through the ranges and you could also even add some logic up here that make sure the ranges are not overlapping and that there's no gaps between them but fundamentally beyond all of those mechanics and I'll hide this mechanics here once you think about how this is different than the first example here if we compare these two and the first example you have a function called get flag we have one function called get flag that tries to figure out what the flag should be and in the latter example we have a piece of data that indicates the flags and a type and you can see there's the same fundamental complexity in both of them there in fact the same operation exists in both of them if we wanted to make the first one generic in the same way this is we could pass in the ranges as a as a piece of code as a some input data and we'd have some code that looks like this inside of that function to figure out are you within the ranges or not but fundamentally what's the difference here well I think what you can see is that what we've done is we've taken that fixed amount of complexity and we've moved it into two places we've moved the complexity for the lookup mechanism into the type we move the complexity for the data that feeds that lookup mechanism into the data here and so we've taken that complexity we split it apart now is that beneficial or not well this is a lot more sophisticated in terms of the the use of Python than the first example but what you could think is if these ranges were to change later be a lot safer to change it here then with the change in the first one if you wanted to make sure that this was done correctly not programmatically but just by you know eyeballing it it's probably a lot easier to eyeball it here than the first version and so you can see that when you can rearrange the complexity in your program in the right way you might facilitate certain human actions like refactoring maintenance or the confidence of correctness maybe not automatic or verified correctness but just human understandability of the code in different ways now all of that should remind you of something else from the Python standard library you know our chain map and our chain map is a perfect example of this it's often the case and the problems that we solve we have some mechanism that we see somewhere that looks kind of like layers or scopes or something like that there's some lookup and there's some way to add some layer in front of that lookup to modify that lookup and that's exactly what the chain map is it's a data type that gives you this just chained lookup and so in the chain map you might have two resistors and this may be part of some simulation example you're trying to do for example you might have the resistance of the of the network supplied by the chain map and then when you want to test something you might push into this chain map a new value for the first resistor and when you're done testing you might just delete it off the front and what you can see when you run this example is you know the value of this resistor goes from ten to fifteen to ten because the chain map is just a ordered lookup through these different maps and the first map that it finds it in it it returns that value now you might dislike this because you can say you know some of this list manipulation isn't Dell and insert it's probably not particularly efficient we're always told to try and avoid those structures if you want to you can always be cute in Python people might advise you not to be cute but you should be who you are and so you could always swap out the maps and the chain map for the deck and now you have very high efficiency insertion or removal from either side of the chain map and so now you can write a very high-performance kind of almost sort of like a copy-on-write or a Qi view mechanism on this underlying data now you probably might not want to use the same app directly but you can think if you lift that up into your own custom type then you can have your custom type have this very nice automatic you know undo behavior without having to employ a very sophisticated mechanism taking the sophistication of how that undo works and isolating it in the type itself and using something from the standard library to that there are some severe limitations to this approach to writing Python one of the questions I asked you before was what's the difference between get a target item and pull and get adder is very very clearly different from the other to get item and call may be very similar here's an example of three things and I'm curious what the difference is between these two a named function get item and call and I can do the same thing it's the syntax is very very slightly different I switch between parentheses to square brackets to a dotted name lookup and then parentheses very slight differences here what's the fundamental difference which one do I choose I'm told that maybe I should choose one of these two because this adds unnecessary entities for somebody who's reading my code to try to understand so they have to look up what this means whereas here there may be some certain assumptions that if you correctly abide by somebody can read that code and not have it just take the assumptions that pop into their head and continue understanding what the code does in a very high level before they have to dig into the documentation whereas every time you have a dotted function somebody's going to really feel like despite what it might what its name might be I really need to look at the docs for that and so maybe that's one guidance but that's a human guidance is there any technical distinction between these two and it turns out that there are there are some significant ones first what if you need to pass two arguments in you can't do that with getitem game takes literally one argument it's just the thing to look up right whereas call can take as many arguments as you want now you might say well that's not such a big deal because if I pass in two arguments I'll pass them in using the same syntax but it'll be passed in as a tuple and I'll just de structure that tuple there that's a little hokey but where this really breaks is what if it what if you need to use keyword arguments there's no possible way to add keyword arguments in this structure unless you start mimicking like JavaScript you know passing objects to functions there's no other way to add key word document here with the nice syntax whereas with your call there's a way to do key word organs and so what I said originally was oftentimes which you might do in Python is you might start with the simplest implementation and then increase the complexity is the complexity of the problem we're trying to solve increases when you're choosing how to implement the parts of the data model that may not always be the case because sometimes you can code yourself into a corner for example imagine that this type here represented some kind of object database some storage and you wanted to use the square bracket lookup to pull some object out of that database by a path but then you need to add a modality to that you know are you running against the production database for this database or you're doing a dry run versus a non dry run or are you doing a verbose or non verbose you immediately are no longer able to use the square bracket syntax because it can no longer accommodate that keyword argument modality and so you have to drop back into using maybe the call protocol and you'll have to go and rewrite a lot of code and so sometimes being very judicious about how you pick which protocol that you implement may be important because switching between these is harder than you might expect in certain cases or they may or there may be sufficient technical requirements that force you into one of these versus the other now you might say well as a precaution I'll always only use calling on ever use square brackets energies getitem but that may not be that may not be a wise or judicious move because there may be cases where this getitem really is the most natural way to look at it but you really have to keep at the forefront of your mind what are the limitations that I'm buying into there's another also very serious and severe limitation here that we haven't talked about at all but we've hinted at slightly and I want to talk about one thing that you may have seen and you may have wondered about which is a class method and often times when people talk about the object model they kind of think about the object model less well there's a bunch of underscore methods there's properties there's static methods and there's class methods and that's about it and they say they might talk about class method as oh it doesn't take an instance it takes a class and static method doesn't take any arguments that may be as deep as they go but instead I want to guide your you through a different way to think about some of these things and one of the other serious limitations of how you choose to use a Python object model to implement this common vocabulary here I have a simple Network that's initialized with a set of resistors and it just stores that information it doesn't do anything interesting with it right what if this data came from a file should I write my in it like this maybe it takes a file name and if you provide the file name then you open up that file and you read the resistors from that and it's a very common thing I might not always want to instantiate this object from data that exists written out in my program I want to give it a file and have it figure out how to read and parse that file and do something with it and so you might say oh that's not so bad it's a little bit clumsy because I'm not sure what I do if I provide both this argument in this argument but it's not so bad don't do this one of the guidances that is not explicate 'add in the data model documentation for python is guidance around what an it should look like and as I told you previously sometimes it's not worth creating your own object systems your own kind of pseudo versions of data classes not data class in order to avoid writing for their plate and one of the two or the two common boilerplates that you might want to avoid writing or repper and in it my guidance for you is this init is always and should always be boilerplate and it's just boilerplate and just leave it a boilerplate your NIT should almost never do anything fancier than setting attributes checking that you're constructing a valid object or constructing derived information you'll see an example later where we construct some derived information like we take the connections of a network coming in for maybe what the elements are but here doing something like reading from a file is a terrible idea you know why what format is this file in it's in an XML file ok great I have the data in CSV format what do I do now am I gonna pass in an argument called file name equals this and I'm gonna say if file name is not none if and I'll use path Lib file name dot suffix equals Excel do something here otherwise if it's something else do something here no this is gonna be atrocious right and certainly I do not want to do maybe it doesn't come from just a file name maybe it comes from a file name or from a database do I want to pass extra arguments and have all these different ways to construct it and all them at the of modalities mine it explode and fundamentally if I can start there from flower name the replicate how it got constructed and then you're gonna find that that round-tripping of the rapper to the unit it's gonna be very difficult to make it consistent fundamentally what you can also think is here and this is one of the serious limitations that you have to keep in mind when you're designing objects in Python the protocol methods give you a common vocabulary for how to describe the interrelations between your objects assuming that for that each for each element to that vocabulary there is one privileged unique meaning for that so in the previous example we talked about a network and we talked about using Len to figure out what the size of the network was is there only one meaning for the size of the network well maybe maybe not maybe in our use case the only one meaning is the number of elements but another use case if we were for example doing some kind of mesh analysis maybe it's the number of meshes maybe we need to surface both use cases both the number of meshes the number of loops that we're going to compute so that's the number of rows in our laplace matrix that will actually have to compute the the voltage in the car and on the matrix on or the other network on and maybe also the elements because we have some production need to say ok this thing pass pass some verification step and go and go through these elements this is how many elements you're gonna have to you're gonna have to purchase in order to produce this know if there are two meanings which gets to use the underlying protocol method you can see this problem here here we're saying there's many ways to initialize this object you can initialize from the actual data or from a file or from a database or from something else which ones that one true right way to do it and if there is no one true White Way one unique privileged way to implement that VOC that piece of vocabulary to call and you should not use a protocol at all because you're going to end up confusing somebody because they're gonna call Len and the result they're gonna get is not going to comport with their assumption for what you meant by the size of the searcher similarly here this init the arguments that you pass to it should comport what somebody's understanding for how to initializes now an it is special in the case that an object must have it in it whereas an object doesn't necessarily have to have any of the other pieces of the protocol of the data model and so the guidance there should be to always have the unique singular meaning for your for your objects for the initialization of your objects to just be initialized them with the raw underlying data and do a bunch of boilerplate to set this object up and don't do anything interesting don't do any stateful actions don't read a file don't read from the database don't do any of that that should be the answer and what you should instead do is use class methods think about what class method does it gives you and it gives you a class from which you can derive whatever you want and so what you could think is all the different ways to construct this object that are not constructing it from the raw data itself can be a class because class methods are not singular and privileged you can have as many class methods as you want if you want to construct this from my file you can use a class method called from file if you want to construct it from an excel file you can use one call from excel file and these can be duplicated as you desire and so a class method is not only this idea that you have some method that gives you a class and you can from that construct an instance but also the idea that this is a mechanism by which you can add to a class protocols that are not unique and privileged but instead might be duplicated in order to satisfy a variety of use cases when there's not one privileged waited for example construct an object and so that's the general guidance for where class methods tend to be used what what you might call factory functions and the reason why they're used in that fashion and you should never use Internet is we no longer have one privileged way to construct this object but instead we have multiple possible ways to distort it now with that said let's dive a little bit deeper into the object model and I'll touch upon a couple of small things that you might be interested in or might be might be useful to you to know here we have our named tuple representation of the resistor and nothing too fancy there now when we look at this example here we can see whoops let's do this one first we can see that there's a problem and the data we've passed you pass a negative resistance which for the purposes of example shouldn't really exist these resistors should always be positive now it may be the case that in our program that's not an error that anybody makes and so maybe we don't need to do anything about it maybe this is something that could just as simply be caught by code review but if it were the case that it was not something that could be simply caught by code review or there might be some additional value to catching it ahead of time then we may need some mechanism by which we try to catch that error now one of the things we could try to do is we could try to take the name tuple subclass from it and implement new and in this new we could check to see if the resistance was negative and if it was than to provide an error message and it say oh it's got to be a positive number you gave me a negative resistance and then that gives us some better guidance but the thing is if it's a named tuple this thing is immutable so that's a fair thing to do but if we need this resistance to be mutable for example if that potentiometer where we might dial it back and forth to change the resistance and it has to just be between the min and the max value then it very well need to be the case that we need to change the resistance later and here in this case we can't change the reason because this thing is a it's a subclass of a tuple it's immutable but if this were a rich object and we needed to be able to change this then we need to figure out some way to do that now if you're a Java programmers circa Java like 2015 maybe we might write getters and setters we might say in order to prevent this update anomaly or in order to prevent this anomaly whereby we change this field to it to invalid value we're going to force you to go through this getter in the center to interact with the underlying data on this on this class and so here this check that we put up here might also be duplicated down here and we do fundamentally get the ability to check that if you set the resistance on this thing if you mutate this thing and set the resistance that it's no there's a this would be value that it's it's indeed positive now the problem here is that how do we choose when to write our getters and setters in fact we may decide that every field always needs navigator insider because there could always be some need later down the line to add validation how do we know ahead of time that this is the validation we need and we're told as a guidance that we should never write code that hasn't been explicitly asked for us because it's just going to draw just gonna add more lines of code that we didn't have to refactor later as a business problem changes and so there's something missing from here now what you might already know the answer is is that this is why the property pattern exists in Python this is why the property decorator exists the property decorator gives you the ability to add in hooks into the lookup or setting of an attribute after the class has already been constructed and already been used and so here in this example I just use property and then when you try and pull out the resistance you pull out this field that has been demarcated as being private in this instance when you're trying to set it we put the check here and then the code works now the guidance here is that not just that this is something that we didn't need to have from day one in fact our first version of this code might very well just look like this right it could just very well look just like that it doesn't do any checks whatsoever and maybe there's a bug there but maybe if that bug is not common enough it's not does it warrant the effort for those extra seven lines of code and that's fine and then when we discover that there is a requirement we can add in these two checks here it'll start to catch invalid code more aggressively but the rest of the code that we wrote for this you no longer change the API doesn't change so we don't have to go from having a raw attribute to a setter and a getter however another thing that I would like you to think about from this is that what is the difference between a property and a named function they both can compute something right like there's both space here and here to compute something so what's it ever seen this and this well clearly you can't fit arguments into the property right so the property can operate on the state of the class but there's no way to fit in either positional or keyword arguments so just like the difference you can get item and call you can see there's a functional here's a there's a formal difference between these two there's also another difference which is typically when somebody looks at a property they don't expect there to be a computation that's very time-consuming or that has stateful effect whereas with a call there's usually a free assumption for something that might have stateful effect now as we said before why not just always use this this syntax what I always use call it's the least restricted one of the other principles I think of Python is you'll often find in Python that there are do a letís there are structures that are more flexible instructions that are less flexible or more constrained a very simple example is you have your def formulation for for writing a function you have a lambda why bother to use lambda yeah it saves a couple of lines of code but you can only have one line you can't have any you can't have any statements in it can only be expressions why have list comprehensions when you can just build the list with the for loop well the reason is oftentimes in your reading through code if somebody makes a choice of using a more restricted structure then it helps the reader because they're guesswork for what actually happens behind the scenes in order for them to kind of pierce the veil of that abstraction is less and so one of the reasons why you might choose to use a more restricted a more restricted part of the data model is not only because you forecast that you don't need keyword arguments or some additional functionality but you're also subtly broadcasting to the reader that this thing only does so much and only can do so much and only will ever be designed to do so much and so in the case of property here you're saying this is something that doesn't need to know any information other than the current state of the object whereas call could need much more information that gets passed in and you may not always be clear about what gets passing because it could have like a star star kwargs argument there and so there could be fields or that are that are passed and parameters they're passing they're not x placated in the doc string more fundamentally however this guides us to some information about something kind of interesting which is there is part of the Python object model about how do you look up properties and so I wanted to do a deep dive into in the remaining 13 minutes we have I want to do a deep dive into two parts of the Python object model to show you how much flexibility there is what mentally what I want you to think about here is that python is not is not fundamentally dynamic it's more operationally dynamic in the sense that somebody has built a model they've added hook points specific parts you can hook into those hook points but there are going to be gaps whereby there's just not a way to hook into this behavior or there's a corner case or there's a wart and so you'll see that even with get add or get attribute and get you may already be familiar with get adder we've seen it before this is what gets hooked when you do attribute look like ABC right and so you can see here this get adder just takes what it's looking up and gives you the string back upper case very simple now you might also be familiar that if this attribute already exists then get out or doesn't get called so you can see get add or didn't get called so you may be familiar that as part of that object lookup protocol there's actually also a get attribute that gets triggered before you actually check to see if that have to be existed or not and so if you switch this from get adder to get attribute suddenly that call is intercepted and so what you can think here is that there's some protocol there's some steps for how you retrieve an attribute from an object and you go through those steps and in each step somebody has de notated these are hook points one of them is before you even look it up call get attribute then look it up normally if it exists there return it otherwise call get out and even things like property fit into that because things like property implemented using another protocol called the descriptor protocol what happens is when you look something up in an object if you don't see it on the instance and you don't get it back from get ederoar from get attribute but you find it on the class and the thing that you find on the class has a get method then you call that kept method so you have yet another hook point and this is how class method static method actually no that's not a static method it works but it's how class method normal methods and or an all Python functions actually the class method normal methods and property are implemented using this descriptive protocol and so it's jet another hook point now the reason to go through these and the reason to go through this quite quickly is that knowing exactly the details of how this protocol works is less important than knowing that the protocol exists and knowing that when you're asking a Python object to give you an attribute there just tend to be certain hook points in that and having a broad awareness those hook points exist as part of the design of your system the details of those hook points our multifaceted like what is this instance an owner quite mean and what can we use to employ that there's quite a lot of detail there but it's not necessary to know that at a very high level what's probably most important for most of you to understand as you set out to design production systems is that this exists at all and how it's designed and structured similarly I want to talk about the object construction protocol we know that we can construct an object by class T and then there's an object but this is actually dynamic code this doesn't this isn't C++ this isn't C this isn't Java there isn't some bag of bits that represent this class or in C++ and C no actual runtime representation of this class but only some compile time representation to figure out how you can translate pointer accesses into you know direct offset you know load operations this is actually runtime code and we may be familiar that there's a disk module in Python and that this module disassembles code so I have a simple piece of code like X plus y and it disassemble it I can see the Python bytecode a very brief introduction to it this is the line number this is the offset for the bytecode this is the bytecode this is the operand this is the interpretation of the operand you can see when we add two values we just call a binary add and that's how this is how Python works underneath the covers it's running through these byte codes it's running these byte codes it sees the binary ad and then it dispatches to pi PI number ad or PI object ad or whatever one of those methods that go and figure out what this what this addition operation actually does in fact in the case of binary ad it'll first check to see if that's if they're both unicode types and then i'll perform a sequence concatenation as a fast path and otherwise they'll dispatch to the numeric addition mechanisms which will then dispatch to on an american dition or concatenation operations now one interesting thing that you might see is if you try and create a class inside a function you disassemble that you look at that you can see this thing called load build class now if we're running Python 2 which i really suggest most of you do because it's going away soon so you might as well get it just your last few minutes of Python 2 in it's actually even easier to see in Python 2 you see there's a there's a bytecode called build class and one of the correspondences that we saw very early in this talk was a correspondence between some built-in function sum underscore method that implements that built-in function as part of the vocabulary there's also an other correspondence for every one of these bike coats there's usually or for many of these buy coats there's a correspondence between this and some mechanism that implements that and so it turns out there's a mechanism that implements how a class is built in Python there is in fact a built-in function in the built-ins called build class and if you write your own version of it you hook every class construction and the entire runtime of your program and so here if we run this block of code we'll see when we built this class teeth we hooked into this code here and we got the function that would create T in the name of T now there's not a mechanism that you'll commonly that you might be aware of in part because it is not a surgical mechanism it is a incredibly broad broad sort and as a consequent is not particularly useful for anything production it's not used for any sort of business logic I'll show you why let's say we hooked into build class to hook into when classes were constructed and then we just very innocently imported JSON module and data classes module this hooks into every class construction in your running program and so it hooks into a bunch of stuff and there's very low likelihood you want to hook a notice all those so this build class mechanism is there and it exists and what it might be useful for is if for example you need to modify certain things like memory allocation or you want to do some debugging you want to see what classes were constructed you wanted to see you wanted to notate where classes were constructed when and why you wanted to do some kind of logging or some kind of automated testing for example every time a class is run maybe you run some tests associated with it and under some certain debug mode something like that but other than that it's not particularly useful more useful might be the metaclass mechanism now we don't have enough time to talk about what amount of class truly is and I gave a talk about two years ago called how to be a Python Bob Loblaw expert blah blah blah and it got a lot of hits on YouTube because the title was just provocative enough that people wanted to click on it either put you down and say oh that's so easy that's not Python expertise or they were genuine student learning about it but I would suggest you might look up that talk if you're interested when I'm at a class truly is so our perspective here we'll just talk about a metal class in terms of the mechanism that it provides it's just a hook into the object construction process in other words when you construct the type here and you give it a meta class there's two hook points for that type is constructed once called new and one coordinate is actually one more called call I'm going to that here and there just hook points for when this object is constructed so if we run this code we can see as we constructed this T we got hooks into them and because we have a hook into the construction of this T we can do things like modify this we can add extra fields we do some fancy things one one fancy thing we might be able to do is we might be able to supply the dictionary that's used for processing the body of that class and so here this prepare which is actually also available in Python to gives you the ability to hook into the dictionary that's provided for as a namespace for the constructor in the by this class which may seem very abstract but you almost certainly seen something like this before that's how enum works in the standard library this auto think like what is it - well what's happening is in the enum enum has a meta class which it's inherited so this components inherits that meta class when you inherit that meta class and you process this code you're given a dictionary when that dictionary is filled up actually that's not actually I'll tell that exactly how he num works but as a dictionary is filled up you can do whatever magics you want one of the things that is necessary is in Python - you have to provide an ordered dictionary cuz Python - dictionaries are longer ordered so you do have to use prepare that gives you these in ordered and then you can use the new or the in it on the meta class in order to swap out these auto fields with an actual numerical constant in Python 3 you just can use auto but in Python 2 that's not possible because I didn't add auto to the enum type but it really only about looks like this in Python as I said you just provide an order dict when you're providing the namespace for executing this class as you execute this class you just look for all the items that are auto and if they're auto items then you replace them with a numerical value this is how you can implement this about 15 lines of code to implement some fairly sophisticated functionality now again there's quite a few details here and this is also Python 2 code so you know get your last bits of Python 2 in before it goes away but fundamentally what's important to understand is that as part of these as part of building this universe in Python there are a lot of these hook points in both get attribute look up or act or object construction that give you the ability to customize how Python does these operations you might take for granted and so the combination of all of these together or what gives you your own custom object system this is how most large production systems end up looking like I don't think we'll have time to go over the example but in the example I'd sketched out a couple of places where you might see that beginning to peak out most Python systems start very simple and they don't use any of this functionality but instead they'll grow towards having their own custom object there'll be some set pattern for how all the objects are designed what the objects need to have they may use mechanisms like in its subclass and its subclasses or is the replacement for or is the intended replacement for a wide variety of uses of metaclasses and in its subclass what you can do is you can actually do things like you can add in extra keyword arguments to your class definition to define things that should be automatically generated on that class and so one thing that you could do is you could make this resistor you know automatically JSON serializable automatically decodable from json and you can provide guidance for how that's done in the keyword arguments when you derive this thing here I'm just giving it some measure information so maybe it'll be able to print out nicely what the units are could do some unit conversion that might be the use case but that's what it's subclass is for now the question is if I don't do this explicitly am I going to end up doing this implicitly and in my in my experience most large production systems in Python end up creating their own object systems internally and they end up not using the mechanisms provided to you by Python and so they end up being both as complicated as all the examples that we've looked at so far but much harder to understand because all of the mechanisms employed are completely ad-hoc that were just written on it as needed and so their final guidance that I would give you is that it is very valuable to understand the details of how all these hopes work it's very valuable to understand how the Python object model works because a large production system not an analytical system but a production system that has analytical components will eventually evolve in this direction and if you don't understand all of the things that we've talked about in this session there's a very high likelihood that the code will be just as messy and just as complex as everything we've seen so far but even harder to understand because I won't use anything language so I hope you enjoyed that think that all the time that we have for this session that was objectionable content I'm James Powell thank you so much [Applause]
Info
Channel: PyData
Views: 3,618
Rating: 5 out of 5
Keywords: Python, Tutorial, Education, NumFOCUS, PyData, Opensource, download, learn, syntax, software, python 3
Id: 1SHi1kriJI4
Channel Id: undefined
Length: 86min 39sec (5199 seconds)
Published: Wed Dec 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.