The Clean Architecture in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Brandon Rhodes is a spectacular speaker, I highly recommand all his other videos as well.

πŸ‘οΈŽ︎ 25 πŸ‘€οΈŽ︎ u/cym13 πŸ“…οΈŽ︎ Jan 18 2016 πŸ—«︎ replies

An oldie but a goodie. Worth a repost.

I too like his emphasis on funtional programming and quarantining of side effects.

πŸ‘οΈŽ︎ 10 πŸ‘€οΈŽ︎ u/funkiestj πŸ“…οΈŽ︎ Jan 18 2016 πŸ—«︎ replies

Brandon Rhodes is awesome. Combines great content with a good amount of humor. Communicates very well.

πŸ‘οΈŽ︎ 5 πŸ‘€οΈŽ︎ u/santiagobasulto πŸ“…οΈŽ︎ Jan 18 2016 πŸ—«︎ replies

It's a good talk, but I really don't like how he takes a big shit all over dependency injection by giving bad examples of it.

Here's his example:

def thing(web, database, file):
    ...
    stuff = Foo(web, file).whatever()
    ...
    more = Bar(db).do_it()
    ...

Instead, you should declare your dependency on Foo and Bar. And not try creating them in the function.

A good example of DI looks like this:

foo = Foo(web, file)
bar = Bar(db)

def thing(foo, bar):
    ...

Now thing doesn't care about the database, the web or the file system because those are concerns of Foo and Bar. They're inconsequential details to thing.

Maybe tomorrow, we'll decide that we'd rather call Ms. Cleo instead of talking to the database:

bar = MsCleoBar(phone)

thing didn't change, didn't suddenly ask for a phone, it just wants some object with a do_it callable attached to it.

πŸ‘οΈŽ︎ 15 πŸ‘€οΈŽ︎ u/[deleted] πŸ“…οΈŽ︎ Jan 19 2016 πŸ—«︎ replies

If only the audio was cleaner.

πŸ‘οΈŽ︎ 6 πŸ‘€οΈŽ︎ u/Sir_not_sir πŸ“…οΈŽ︎ Jan 18 2016 πŸ—«︎ replies

Indeed, a very good lesson. While I was already familiar with the concept from my foray into Haskell, and an attempt to make my Python code more functional, seeing it outlined like this does make it a fair bit easier to think about.

πŸ‘οΈŽ︎ 5 πŸ‘€οΈŽ︎ u/Gstayton πŸ“…οΈŽ︎ Jan 19 2016 πŸ—«︎ replies

I'm not a trained developer, and am just starting my journey - I want to make sure I'm taking the correct information away from this talk. I enjoyed the discussion but am a little curious about where the end conclusion can be followed; throughout the talk he kept taking the larger functions into smaller and smaller functions, to be collected at the top by procedural processes (using his terminology, coupling the various functions - data or IO).

Is this meant to suggest that our programs should be a collection of many many many smaller functions? And, if so, should this be taken into other languages, including scripting-based languages (PowerShell, Bash, etc.)? To what end should this be taken, how narrow should we get our functions down to?

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/Vance84 πŸ“…οΈŽ︎ Jan 19 2016 πŸ—«︎ replies

I believe the Uncle Bob talk he refers to is this one.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/NeoZoan πŸ“…οΈŽ︎ Jan 19 2016 πŸ—«︎ replies

Thanks for posting this.

I had what alcoholics call a moment of clarity.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/[deleted] πŸ“…οΈŽ︎ Jan 20 2016 πŸ—«︎ replies
Captions
all right everybody welcome to glad you've hung in there the second afternoon of the PI Ohio I'm Brandon Rhodes when I'm not writing Python api's or writing Python applications I'm wondering why the code in my API is in applications is such a mess the industry as a whole I'm told but at the moment the numbers are hard to find that more software projects even to this day more projects fail than succeed worldwide and businesses and institutions and as an industry we're still explaining we're still trying to learn why a piece of that puzzle I think is the recent work that's been done in propounding the clean architecture and I'm going to give some examples of how I believe that applies to Python the inspiration for this talk is someone called Uncle Bob Martin who is he's really big in the Java and the strong object-oriented statically typed languages and he recently in 2011 and 12 was thinking about a new way of organizing his code his applications that he called the clean architecture one of several ideas that came out at about the around the same time with about the same goal but his became more popular because I think he drew a better picture there was someone else that came out with something like called the hexagonal architecture and it just wasn't as pretty it didn't use colors and so this is what people often talk about if they're going to refer to this idea we'll explore of putting IO at the top level of your program instead of at the bottom the pith the center of the idea this is this is not how he puts it this is my spin on it you're familiar with the idea of a subroutine where your code can be running along and then make a call in Python the two forms of subroutine or the function method where you can stop invoke some other code and wait for it to come back with an answer the pith of the idea here is that we programmers have been spontaneously using subroutines backwards for how long if programmers tended to use subroutines completely backwards the wrong way by my count we have been doing it for 62 years and my proof is that I went back and I found the 1952 ACM National meeting paper in Pittsburgh Pennsylvania it was the second meeting of the ACM but the first for which proceedings papers were published and I found the use of sub routines in programs by mr. DJ wheeler of dr. DJ wheeler of Cambridge and Illinois universities and you might wonder am I really going to pull anything of relevance out of this guy's paper because it was a very different world a typical computer at the time had about a thousand words of RAM could do about a thousand operations per second and required a dozen people to operate could programming a computer with a thousand words of RAM really be anything like computing but writing a code in a modern language today here's just one example of something that you'll find familiar from this paper how complex could programming even be with only one k of memory in the paper he says the preparation of a library subroutine requires a considerable amount of work however even after it has been coded and tested there still remains the considerable task of writing a description so that people not acquainted with the interior coding can nevertheless use it easily this last task may be the most difficult you had 1,000 bytes in which to write your code or I should say 1000 words in which to write your code and they still didn't want to document so I think the world he was working in as I read this paper seemed very familiar though in some ways very strange he's advocating instead of just having one huge piece of code in your 1000 words of memory you split it into routines that call one another like instead of having a single function in your Python file having several what does he advertised subroutines as being good at why would you organize code this way he says that you the primary reason is to hide complexity all complexities should if possible be buried out of sight and as you see is where everything went wrong and he he doomed us for the next several lifetimes of programming because that then leads programmers to a quite natural mistake IO is always a mess trying to talk to a database trying to parse JSON trying to get things in and out of a file it's a mess it's often very idiosyncratic code that doesn't have a lot to do with the pure essence of what our programs trying to accomplish and the characteristic error that we make is that we bury the IO rather than cleanly and completely decoupling from it in the the time allotted for this talk I'm only going to attempt one code example so if you'll if we spend a second on this it will get you set up for the rest of the listings in the talk this is a simple function in Python that uses a now deprecated API on DuckDuckGo in order to look up the definition of a word builds a URL in this case uses the requests library I was a good citizen and unmarked those two lines with there's the i/o there's that ugly complexity we'd like to make disappear and then having gotten the JSON data back it can look and see if a definition was in fact returned for the word the natural thing that we tend to do is say well IO is kind of messy who knows tomorrow whether I might not be using some other library in order to do my HTTP who knows whether I might not have a different way to ask for definitions for instance abductor go deprecates the api but i guess that would invalidate all of its all stay with the example of what if the way that I do the IO what if the way that I make the HTTP request changes we want to get that complexity and bury it and so we make the fundamental mistake of the last 60 years we get the IO pluck it out and feel proud of ourselves from having that exactly what dr. Wheeler said we've hidden it in a subroutine we have hidden the IO but have we really decoupled it pakka wheeler I assert that hiding is not enough if you want to control the complexity of your programs here's the listing again and I will just ask this if you want to call find definition so that it doesn't really do any IO because you're testing it or because you've cashed a result it just want to hand it the cache result instead of calling your your lower-level code how do you do that how do I call find definition without it actually doing IO and at least as the code is presented here it's not possible I have you see hidden the API you don't see any API if you read find definition but I'm still tightly coupled to it the IO is an inevitable consequence of calling fine definition whether it's visible in its code or not I have hidden but I've not cleanly decoupled what if we did everything the other way around what if when we saw a routine with IO in it that's ugly an idiosyncratic it might change tomorrow what if we rescued the logic instead of hiding vio this is exactly the same lines of code but in this case I have pulled out the data operations and made them separate and left the i/o stranded at the top level of the programming program rather than leaving my logic there and my claim is this that listing 3 that we just looked at is an architectural success while the others were architectural failures listing three shows in miniature what the clean architecture does for entire applications here's that top function from listing 3 the coupling between the logic in the i/o the thing in my program that brings together logic in IO in a way that Heather that they both have to be called at once is now isolated to a small procedure that mates my logic and my external i/o operations together it's very readable because instead of blocks of logic operations I now have names for them build URL pluck definition from this data the document what each section of code was doing the previous of the first listing had no documentation for what those series of operations did this should remind you of a little bit of the extreme programming movement from the late 90s early 2000s where remember they said that if you ever see a piece of code with a comment at the top that's a sign that you actually have what wants to be a fashion and they would say you know if you're writing high speed C code and you want it to run fast market is in line static but make it so so it gets in line at compile time but semantically make it something separate XP people actually believed it was a bug that this is why it was called extreme programming they actually believed every comment was a bug because every comment was knowledge that wasn't in your code and if your code isn't explaining everything about what it's doing to them it's bad code so before you could commit in Ex extreme programming all the comments had to disappear as in this case we introduced a new name a new identifier into the code build URL that wasn't there before so that semantic information about what do these three lines do becomes a part of our programs actual semantics and in this so this maneuver we see is we turn pure logic into functions and thus have to give them names in the same way that XP did we find that we're adding more semantic content to our code so our architecture enlisting one was simply a procedure procedure meaning something that has side effects you call it and some i/o has happened when it's all done listing to the natural way of using a subroutine since the 1950s to hide complexity resulted in hiding the i/o but the top level code there was still a procedure all of our logic was stranded in a routine that did i/o every time he called it listing three by doing the opposite maneuver left the i/o up in the procedure and resulted in pure functions it resulted in a downstream Python functions that don't do IO that don't have side-effects they simply take some arguments that are data and return some results that are just data this has incredible ramifications among other things for testing how would we have tested listing one or two where the goal is to not have your tests need the network and to talk to DuckDuckGo imagine that you want your tests to run on an airplane or at the airport or you don't have Wi-Fi or something two techniques have been developed over the 2000s in being pioneered I believe in Java and the other big oo statically typed languages they are dependency injection and the idea of mocking which in Python we can do both through monkey patching without even modifying our code do something I'll show in a moment called mock patch dependency injection was pioneered in 2004 by another of the big oo thinkers named Martin Fowler his idea was to make the i/o library or function that the routine needs to call itself a parameter and this is really easy in Python functions in Python are first-class objects you can pass them modules are first-class objects and can be an argument to a function so instead of having find definition from listing one literally and always use the requests library you could make that a parameter whose default if it's not provided is to use kenneth rights as requests library but which lets you substitute any other kind of module looking object in instead if you want to skip the call out to DuckDuckGo and here's how you might write a test against that function I just showed you you'd make a request a fake requests library with a get call inside of it just like the real requests library but when it was asked for its JSON data it can just return a constant the test therefore can just set up a fake answer we're not really doing any i/o here we're just going to answer this fake JSON data back when the definition is asked for and we can now call our code find definition and avoid any i/o by having it use our fake little requests library instead of the real one so we get a self-contained test that doesn't actually spam vo with lots of requests and a couple of stood up don't go needing to be up and running and not having blocked or IP address yet because we're running so many tests the problems with this are obvious first that fake requests library we wrote well it's not the real request library so who knows whether calling that you know the fact that we called it we got data back doesn't tell us that calling real DuckDuckGo will give us data back it might look simple for one service an i/o routine that just needs to make an HTTP request but a procedure that also needs let's say database and file system access is going to need lots of injection what you tend to get if you use dependency injection is high-level functions that need everything in the kitchen sink because if way down beneath them anyone tries to talk to the database it's got to be dependency injected if another procedure needs the web it needs to be dependency injected and this problem is actually spun up to the level of having huge dependency frameworks dependency injection frameworks they're called in the larger oo languages because of this problem of if the very bottom guy is got to talk to the web and you ever want to be able to test that code then the top level procedure has somehow got to get the information about what the web is right now is it a test mock or is it the real thing when it's called now a dynamic language like Python fortunately has ways around dependency injection so we don't wind up with that problem I just described thanks to the mock library incredible resource created by Michael Ford we have the ability to live patch our i/o libraries to briefly substitute fake versions of their calls that will return the data we want and I believe the mock library is now part of the most recent Python 3s it's so important it was added to the standard library in that case we can use the original listing one of the original listing two and we can just ask our other work whether the patch call 'evil from the requests library to patch requests get to be our fake version of it instead inside of the width statement inside of this context this block of code during which that patch is active our test gets run no real connection is made the outside world and we find out if our function works against purported data from dot go whether you do dependency injection or whether you call mock dot patch I find that the result is kind of awkward and kind of sad as I test I just feel like I'm fighting the structure of my application I feel like I'm trying to make it do something that it would really rather not do so how does testing improve when we factor out our logic as in listing 3 where we get the logic that simply deals with data structures and rescue it by putting it beneath the i/o rather than above it why definition pure functions can be tested using only data arguments go in the top a list or a string or some other data structure is going to come out the bottom so for example if I want to test the build URL I just call it I don't have to set up objects I don't have to build things I just call it with different arguments and instead of going and hunting for side-effects I can just look at the return value and see whether it's what I expected no special setup is needed no special preparation I don't have to build a mock and the test calls I'm making look exactly like the calls that are used in production so I know they have a high probability of telling me whether my code will work in production here I'm going to test the second half of the logic pluck definition which needs to pull out the value of the definition key or raise an exception to simple tests and I have a hundred percent test coverage of it again making a pure call that is not in any way adulterated or changed or adjusted from the way this function will be experiencing reality when this code is in production it's seeing exactly the same kind of things come in and go out as it will when I use it for real being able by the way to write the tests like that taught me about a symptom of coupling I had never observed a symptom that tells me that I might have walked logic together that it could be more cleanly split out you'll know that all I had to do there was write one set of tests rebuilding the URL and a completely different set of tests for whether I could parse the data that came back and I noticed that in a lot of my older projects I had bigger more complicated routines where I had we're doing the test for a good URL and good data was very easy to call but that I then had to essentially start doing different permutations of argument to get each part of my logic to fail separately because it wasn't out separate where I could call it and so having a big series of pieces of logic where I want to make each part fail individually I first have to make a bunch of calls with a bad URL but that don't pass in a second piece of data because I never reach that part of the code and then a series of tests that give a good URL so we survived the first half of the but then bad data so that that part will fail and I now consider that a pattern that I see a symptom a cry for help if you will from my application code telling me that I have coupled two pieces of logic together that are really separate they do different things they're going to fail in different circumstances and that instead of leaving them coupled and then having to fiddle in turn each variable while leaving the others constant I might be able to rescue these pieces of logic into separate functions this does become I do sometimes leave this pattern in my tests if there's just so much state shared between the first and the second piece of logic that it's just not reasonable to return all 20 things so that they can then then be the arguments to the second piece of logic this comes up a lot in astronomy where an initial routine might set up a bunch of variables that the conclusion of a logic then needs to succeed or fail on before throwing them away and returning a simple value but if you look at the output to the first piece of logic and find it's rather modest rescuing the two pieces of logic out into separate routines can make your tests less expensive simpler and easy to think about by the fact that you're not getting big tall sequences of logic and contorting yourself to try to get the third thing that happens to fail all of which is invalidated by the way if you then change the order of your operations because now you need something different to succeed in order to reach the second or third error exception that could happen in your code so that is a really really simple example that we've just gone through almost trivially simple I I made it only as complicated as I thought it would you needed to get the point in real life the clean architecture often involves much much bigger pieces of code and the question of how they hook together not nine line functions and the fact that we can pull one or two pieces out what Uncle Bob Martin does is he as he's designing his entire application he's thinking through what parts of my business logic can survive being split off where they take arguments take data structures and return data structures such that the top level glues all of these pieces together so that the i/o stays up at the top level and the bottom levels are simply objects or functions that don't need to know where the data is coming from where it's going how it's getting persisted but instead simply enforce your business rules do your computation and leave it up to the caller where to put the results he says in one of those blog posts in general the further in you go in his architecture the higher level the software becomes the outer circles are mechanisms the inner circles are policies the important thing is that isolated simple data structures are what is passed across the boundaries when any of the external parts of the system become obsolete like the database or the web framework you replace those obsolete elements with a minimum of fuss because all the innards don't know about the database the innards don't know that the web is there the and so if you need to replace the way your data is stored the way data flows in or out you just make adjustments at the outside level and everything else should keep working back to our code to make this concrete we could change how we do the i/o we could change how we batch up these operations we could change what happens up at the top without having to change either of these functions down inside because they take simple data as input manipulate it and return new data as output alright you might say I would like to know whether my app works against DuckDuckGo I do want to test my IO code at least once even if this pattern does let me do most of my testing with pure data how do you test the top-level procedural glue and here I'd refer you to Gary Bernhardt's talks at PyCon 2011 through 2013 where he from the Ruby world that's his primary language explored a different form of this same kind of approach and talking about how to make the majority of your tests very fast and only investing in a few tests doing the end to end IO bound operations that there at the end tell you yes my app actually works and will actually fetch in real information from a database or whatever and work with it his terminology is a little different than Uncle Bob's but works in much the same way an imperative shell at the top level that does IO that wraps and uses your functional core functional core because it takes and returns data can have lots of fast unit tests exercising directly all the ways it could fail all the conditions it has to detect but the top your imperative shell hopefully only needs a few integration tests in order to verify for you that it works because you're not having to hit the imperative shell with the 20 different ways that a a word definition you're looking up could be misformed you're doing that by testing the functional core you just test the imperative shell to make sure the pieces are then hooked together correctly here's our top level function from listing 3 I mean there's not even any if statements here it shouldn't require very many tests to confirm for you that this is doing the steps of your application in the right order this pattern by the way is already familiar to a lot of people who do functional programming languages like Lisp Pascal closure and F sharp make it quite natural to write most of your code as pure functions and then you get awkward and put the procedural stuff up at the top functional languages naturally lead you to process data structures while avoiding side-effect io you tend to call functions and functional languages for what they return not for the sequence of things they happen to do while that code is running this is an example of IO as a side-effect I'm getting a data processing task eater rating over a Python iterator to get a series of words and upper casing them with an IO task when you call uppercase words in this example you're not expecting to see any uppercase words as its return value you're expecting that when it returns nothing to you it will have had a side effect in the outside world of producing those as output if you want to test this you're going to have to use mock patch or something else to intercept the standard output here is an example of the same code split into a purely logical piece where it consumes an iterator that gives it words and produces as an iterator as a generator in this case a series of uppercase words separately from the question of any side-effects and it can then be quite naturally plugged into a top level as Gary Bernhardt calls it procedural glue routine which then does the i/o on its behalf procedural code tends to be called because it's going to return anything interesting but because of what it does because of what it tosses out or pulls from the world it tends to output as it runs functional code on the other hand tends to be organized in discrete stages that each produce data that then finally gets output at the end in gary Bernhardt's talks he talks a lot about the immutability a lot of these functional programming languages imagine Python where you didn't have lists but only tuples that once a list was built you couldn't change it anymore imagine Python with dictionaries that once you built them you could never change them where if you wanted to produce a new dictionary you had to ask for the a copy of the old dictionary with like one thing changed or something a lot of these functional languages having mutable data structures that never change and some programmers who are fans of the functional programming style say that that there'll it much much easier if they pass a data structure to a function knowing it can't be changed that they don't have to go search to see if it looks different that every data structure is immutable and they claim some of them like to claim that the whole point of this programming style is immutable data structures so that you would feel guilty about having objects with writable attributes or dictionaries that you might update and I'm going to make the argument that it is not immutability that makes the functional programming language so clean or it's not the only thing my guess is that the biggest advantage of data and a functional programming style isn't its immutability it is simply the fact that it's data and that data structures you can see them you can reason about them unlike a moving process that you're worried about whether it's spinning off consequences in the right order a data structure is just something you can look at and understand two examples from computing history that I'll use to back me up on this the famous Fred Brooks book the mythical man-month about successes and failures in managing projects written in 1975 it's very famous you've probably heard of it before because of the quote this is back when if a project was going slowly they would just keep throwing more developers in the bearing of a child takes nine months no matter how many women are assigned there are some processes that do not get faster because you flood the organization with young with untrained people who don't know what's going on and and he often found projects he's the famous aphorism that projects go more slowly the more people you add in many cases he said the following on the question of what's easier to understand data or code at the time code was usually written out as flowcharts and data was usually organized in memory and what they call tables he said show me your flow chart and conceal your tables and I shall continue to be mystified show me your tables and I won't usually need your flow chart it'll be obvious very often if you just show someone the way that you've laid out your dictionaries and lists and other data structures they can probably guess how you're going to run through those data structures and get your job done there's something that's much clearer about seeing the data that you wind up producing or the data in an intermediate step than to stare at the steps in your program and try starting there without any bigger picture of what they're creating trying to guess or understand what result is being built or generated so that's one example of a famous thinker in computer science so I think would back me up that the data is where it's at but also cite in the 1986 famous showdown between McElroy and Donald Knuth who Knuth who largely invented computer science as in the 70s as he wrote the art of computer programming Knuth very very famous programmer uh there is was asked to write a routine he practiced something called literate programming where he had lots and lots of comments and in it where computer program could actually be published as a book explaining itself and he was given by a programming magazine the tasks given a text file and an integer K can you tell the computer science was invented by mathematicians print the K most common words in the file and the number of their occurrences in decreasing frequency he produced 10 pages of Pascal code that did this and MacElroy said I mean this is a 40 admitted this is a formidable solution Knuth solution is to tally in an associative data data structure something like our Python dictionary each word as it is read from the file the data structure is a tree with 26 way well for technical reasons actually 27 way fan-out at each letter to avoid wasting space all of the sparse 26 element arrays are cleverly interleaved in one common arena with hashing used to assign homes ten pages of Pascal at the conclusion of his article after reviewing Canute's code pointing out several bugs in it and edge cases that would make it crash in one of the most famous moments in computer science McElroy replaced Donald Knuth with a six line shell script the first line finds every run of letters A through Z or lowercase letter through Z in the file and puts them together it gets everything that's not a letter and turns it into a new line second command makes everything lowercase so that we don't count words twice if they're at the beginning and in the middle of a sentence sort is going to bring you know art of all of the word instances of the word aardvark together in a row and then all of the instances of the word Brandon and Python and so forth uniq is going to get those runs of identical words count them until you six are ARC's five Brandan's and so forth I suppose I rate my popularity slightly below that of the aardvark then it is then going to sort that out put on the numeric field sitting in front of each word so that six aardvark goes first and four python goes next and and so forth and then finally we asked said for the first in lines and then quit McElroy points out that every one of these tools back as Unix was being invented was written first for a particular need but then untangled from the specific application the person who first needed to put a sorter inside of their program got it written and instead back and said you know that someone else might need to sort something someday and went to the work which is difficult of pulling that out so it could work on any text input file with any format now the traditional lesson and the one that McElroy drew here was it's better to use simple small tools that can be easily linked together and I would say that if this is the only lesson we can draw from his showdown with Knuth it's a very good one because Python has lots of simply because of the iterator protocol especially it's really easy in Python to link together a series of generators to throw in sets and lists and dictionaries at just the right point to get a lot of really interesting data processing done but today I want to draw a different lesson that make although I did not to me the shell script is simpler not simply because the steps are easy but because I can picture the data it's because in between each of these pipes in between each of these commands that data is flowing between I can close my eyes and know exactly what that data looks like at easy at each step the shell script is the simpler solution because it operates the stepwise transformation of data and what's key here is not simply that the steps are easy to describe that many of you even who didn't know the TR command before could probably tomorrow explain this shell script to someone after a bit of head scratching it's not just that the steps are easy though they are it's that I can just close my eyes and picture what the output looks like at the conclusion of each command running and that's very powerful because it lets you visualize very accurately what this is doing in a way that is not going to happen with ten pages of dense Pascal that are producing an in-memory hash table 26:27 way fan-out binary tree this approach continually surfaces intermediate results that can be checked examined if you find this doesn't work you can just go back and find it which step it failed in each case as simple plain text so if I'm right that one of the big wins of a functional programming style is simply that it deals in data which our minds can picture very easily what then is the value of immutability and I think Gary Bernhardt got this right when he said the front of immutability I believe this was in the 2012 talk is distributed computing is that this is my only slide about this is that if all of your routines just take a data structure and return a data data structure it doesn't much matter what core they run on in a big cluster you can push data out to a bunch of servers one the data steps separately and then collect the output back and a task that you've broken down into steps that simply pull in data and return data can then be hooked up to a message queue and fanned out across a very wide data server so long as it's the return value that's important and not the side-effect if I call a routine and its value is that my data structure will now look different it's got to live on the same machine so it's changing the copy of the data structure I've got in memory but if it's its return value that's important and not the way it monkeys with the data I already have in memory it can run anywhere so long as the result is delivered data and transforms are easier to understand and I think they're easier to maintain than couple procedures now if that's the case Python has been evolving recently in exactly the right direction if you think about the kind of innovations that have marked the last decade especially of an a decade and a half as Python has grown from the language it was in the 1990s in October 2000 we got the list comprehension the list comprehension seems like a slight convenience but it really changes you as a programmer it takes someone whose job our job used to be modifying data structures make an empty list and then go through and start changing it and it turned us into people that build new data structures that we often never touch often my code today takes in a list and in a series of comprehension x' just like that shell script generates a series of intermediate results and a final data structure that it returns without ever reaching back into one of the earlier results and feeling the need to change it list comprehensions make it really easy to write Python code that's purely functional where you're right you're using right once throwaway data structures for your intermediate results rather than doing constant modification Python 2.4 saw the introduction of the sorted built-in remember how we used to sort used to have to build a list give it a name call its dot sort method which returns none so you couldn't use it in the middle of an expression and then go back to the list which has now been modified which is now changed to see the result thanks to Raymond Ettinger's addition of the sorted built-in we now just ask for data to be sorted and returned to us in a single step instead of having to build an via data structure in several and if you try applying these different patterns to your Python code remember the Python has several different ways of breaking out a pattern from your code we do have functions or methods but remember that if it's iteration that you want to factor out you can build a generator if you have some setup and teardown that you want to pull out you can build a context manager there's actually whereas older programming languages let's say Java will have one way to break out a subroutine and then you have to come up with design patterns that fill in the lack of generators and the lack of context managers with something like the visitor pattern or something like that Python just has all three of them built in you can pull out the middle of a loop as a function you can pull out the loop logic itself as a generator you can pull up set pull out setup and teardown as a context manager we have a lot of different ways of getting logic and doing that rescue operation where we decouple it from our i/o so that it can live separately to real world examples just current projects of mine are sky field and object based API for astronomy backed by dozens of pure functions that were really easy to test functions implement the actual operations and the miserable thing by the way about a method is that it implicitly depends on the state of the whole object it's often hard to test a method because it's not clear how much of the object in the test needs to be set up and initialized before the method can run whereas the beautiful thing about a function is you just read the arguments and you know now what to provide to it there's a well-written function doesn't need extra Global's or other persistent state set up and available for it to run which makes testing bug-fixing a lot of easier remember in the zen of python which i hope you look at every morning so you get ready to code second only in pythons motto behind beautiful is better than ugly is explicit is better than implicit and a function if nothing else is very explicit about its needs right there in the argument list tells you what needs to succeed so sky field is one example I've done recently that has turned out really well when I didn't strand my important logic up coupled into my objects or i/o but where I spun off everything I could in an easy to call function the other is something I use for filling in tax forms for myself called lucha it's also on github the temptation there and actually the first version of it as it ran along computing fields would immediately then call the low-level PDF operations to you know write them onto the 1040 tax form or whatever I then was able to rescue that that deeply deeply compromised and very difficult to maintain code by breaking it into phases I first read in the entire input of the tax form while resisting the temptation to do anything with it I simply read it into a data structure and returned that I then have a routine that takes the inputs to a tax form and it's very easy to write these little routines add up the numbers do the rounding multiply the percents and produce all of the output lines in the tax form and I resist the temptation to write that out to the PDF I hand that data data structure then to my PDF writer whose only job is to fill text into fields it was much easier to write maintain code that was split so that data structures pass between phases rather than making a slightly shorter program that immediately tried to go have side effects but thereby made itself very difficult to test so the pith of the idea here is that in the old days if we wanted to get rid of all of that pesky i/o we would try to accomplish that by turning it into a subroutine the new idea on pre pounding is that if you really want to get rid of someone make them a manager put them in charge get all of that io Laden code and make it feel important by putting it of at the top of your program make it the procedural glue leaving all of the little functions of ordnance free to do their jobs let's return to wheeler I have one last quote in 1952 he gave us the sub routine and I think because of that initial phase of computing history in which we tried to use it wrongly we have yet to realize its full power and promise but I'd like to end with a quote in which he described what he thought would someday happen now that we have subroutines he said when a program has been made from a set of subroutines the breakdown of the code is more complete than it otherwise would be this allows the coder to concentrate on one section of the program at a time without the overall detailed program continually intruding thus the subroutines can be more easily coded and be tested in isolation from the rest of the program when the entire program has to be tested it is with the foreknowledge that the incidents of mistakes in the subroutine is zero or at least one order of magnitude below that of the untested portions of the program thank you very much for listening I'm Brandon Road you
Info
Channel: Next Day Video
Views: 91,619
Rating: 4.9436007 out of 5
Keywords: pyohio, pyohio_2014, talk, BrandonRhodes
Id: DJtef410XaM
Channel Id: undefined
Length: 49min 53sec (2993 seconds)
Published: Sat Aug 09 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.