Beautiful Python Refactoring II - Conor Hoekstra - code::dive 2022

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so uh just before in front of us something about python this time so beautiful python refactoring part two so Connor hackstra is the person who is going to present it to you with some random funny facts about him I am an avid Runner so guess how many kilometers per month his he runs any guesses a lot sounds good 400 kilometers that's what he's boasting I right here in the description we're gonna check I wasn't allowed computer games or video games as a kid except for one which was Scrabble okay and now I am a competitive Scrabble player and I think he's also competitive in terms of the topics around the python so please give a warm Applause once again inside this room room number nine during the code dive 2022 Mr Conor hextra thank you all right thank you for coming to my code dive 2022 talk uh I was here three years ago and this is actually if you don't count CPP North which I helped organize back in June the first conference I've been to in person and definitely the first conference I've traveled to since obviously the pandemic so very exciting to be here and I'm super looking forward to giving this talk because it's a super fun talk I apologize for the people I think on YouTube when you're watching this the colors will be fine but uh the yellow's a bit off here which is upsetting because that's definitely not the color of the Python logo but we will make do so my name is Connor Hoekstra I go by uh code underscore report online here's a little bit of a an about me I've worked at a couple different companies over the last decade but for the last three years I've been working at Nvidia so we make gpus and currently I work on a research team as a research scientist but for the first two years one I was working at Nvidia I was a full-time C plus developer which for the majority of my career is what I've been doing so I've dabbled in a bunch of other languages on one of my podcasts I refer to myself as a polyglot but predominantly I code in C plus plus all the way up to C plus plus 20 and I'm familiar with the standards so excited about C plus 23 coming out but I've also done a bunch of Python Programming which is what we're going to talk about today fun facts uh Haskell at one point was my favorite programming language back when I discovered it when I was at Amazon and since then I've become a huge huge array language fan so if you've heard of languages like APL J bqn anybody in this audience sort of bqn I see one hand so yeah bqn is a very very Niche language it's actually my favorite language though but I'm not going to talk about that today you can find some talks online if you want to see some really esoteric um sort of Unicode symbols of programming um and recently I've been picking up uh rust which is quickly becoming one of my favorite programming languages uh the fact that I'm a professional C plus developer and at one point Haskell was my favorite language should have I should have started looking at Russ sooner um because I've heard that rust is basically C plus plus Haskell and uh yeah we'll we'll talk about Russ a little bit in this python presentation a couple more last things I have a YouTube channel 300 plus YouTube videos on on various things lately it's been sort of programming language focused as I mentioned before I'm code underscore report on Twitter I also have a programming languages virtual Meetup where we go through different textbooks the first one we went through was the structure interpretation of computer programs and we've gone through a couple others category theory of programmers and right now I've actually delegated the running of it to another individual by the name of Leslie and they're working on two books right now so if you want you can find that on meetup.com and I also have two podcasts which I'm going to plug uh a little bit later so moving on this is a sequel to a talk that I gave at pycon not in person but virtually in 2020 and uh it was uh very well received and I've been looking forward to giving the second version of this talk um that's that one on the right and uh it was also talked about on this python podcast if you want you can go and I have a chat with the host there it's one of my favorite python podcasts that I listen to uh if you are a podcast listener then you might want to check out that podcast and as I mentioned speaking of podcasts I host two different podcasts the first one algorithms Plus data structures equals programs or as Bryce and I refer to it adsp is I think we're at 105 episodes now and Bryce is a colleague of mine that works at Nvidia as well he's also on the C plus standards committee so it is largely centered around around C plus but we talk about other programming languages iwax rhapsodic about array languages every once in a while which entertain some of our listeners and bores others but actually we're temporarily becoming a rust podcast for like the next I don't know we'll see how long it lasts um because that's what I'm interested right now and a raycast we talk about all the array languages that I mentioned before so in the first talk beautiful python refactorings I refactored code from a blog post so if you've seen that talk this will be old to you I'm not going to spend any time on this if you're interested in what I'm about to show you can go back and watch the 30 minute talk that's on YouTube but basically this code was from a blog post and I took it from this code and refactored it to this and then refactored it to this and then refactored it to this but I felt bad about this because this wasn't my code whenever you're giving code reviews or even worse uh talks at conferences that people are going to watch online I feel bad you know reviewing other people's code because you I don't want to hurt people's feelings you know um I always look back at code that I've written in the past and think holy smokes what was I thinking back then so um this time I'm going to be doing something a little bit differently so in the last talk I talked about sort of these different features that you can use enumerate is a super useful function we're getting it in C plus 23 I think list comprehensions conditional expression slicing um but uh I'm no longer going to be reviewing code from a blog post I'm going to be reviewing code in this code dive 2022 talk from code that I wrote um so this is uh a game that I've written it's actually more of a game trainer called hook star sort of a play on my last name and it's basically uh a a Scrabble trainer if you will so if you want to get better at Scrabble this is a game or a training program that will help you do that um so this is MIT license you can go use it download it it works on Windows and Linux at least that I know I haven't tried on Mac and um if you aren't familiar with the game Scrabble I'm going to show a little demo in a sec but basically you're given a 15 by 15 grid and you have a rack of seven letters and the goal is to build up words in a crossword fashion and score points along the way and the most it's if you can usually play with two people but up to four and the person with the most points at the end of the game wins so each tile has a certain number of points the details don't really matter but it's basically you're building up a crossword and as I mentioned before I wasn't allowed video games as a kid because I don't know parental decisions but my parents did let me have a Scrabble game and so uh it's one of my favorites and I think at this point we are going to an attempt a live demo so I did a live demo in my first talk ever which was a lightning talk at C plus plus now 2019 and after that talk it was only five minutes I vowed never to do a live demo again but this is not a coding live demo it's just playing a game live demo so it's totally fine um is everyone's famous last words so we're gonna run this Python program on the big screen oh the coloring I mean people on YouTube Can't notice but if you're live I apologize uh it's not exactly so uh you know this game could crash we'll see but uh we're gonna start off so I'm gonna play the first word um rather quickly what should we play Let's Play Za is a two-letter word it's short for pizza and you can see that when you have an invalid word it's orange and when it's valid it's green so the reason is invalid with just a z is one there's no one letter words in a Scrabble dictionary and two the first word has to go through the middle Square which is the pink one yeah it's pink up there just a hot pink up there so we'll hit this note though this is not one of the top 14 words so this is kind of like Scrabble plus Family Feud anybody familiar with the TV show Family Feud you guys might not have it oh there's a few hands it's like top five answers on the board you know uh favorite place to go on vacation and people say Hawaii and that's the number two answer so if you can find one of the top 14 words it'll reveal it while you're sort of searching for your words so this is not going to be one of the top 14 words these are the top 14 words here so Mavis I've never heard of that before ads I actually did know and zaz so if I had to added an S you know we would have had a valid word so now we'll let the computer play and wow so this computer program is very good uh the green sort of highlights means it's a seven letter word which you get which you get a bonus 50 points for so we're losing now 72 to 22. pretty embarrassing um does anybody want to try and guess I mean I have terrible letters right now um them we can try it might be a word Vim might be a word let's try it up here because I know Vim is a word it's not one of the top 14 um so just to show that like the Family Feud part works we'll play Vim but I think Qi which is uh Chinese uh energy I think is the definition of that word like uh when you're doing Tai Chi um so that's the top play but we'll play Vim because uh that's a great great one and we'll do uh vims holy smokes was uh so and once you're in this so there's three phases of the game when I'm playing when the computer's playing and then this is called called the pause for analysis mode so if I want to see where could have I played vims I can just hit the arrow keys and it'll then basically show me on the board so if I had played Etc whatever that is and vims that would have been actually the second highest scoring word um and so we can scroll through all these you can see most of them are hooking off of Etc and maybe we'll do one more does anybody have well I see there's a terrible letter this is a problem with doing a live demo um Q is based Qi is basically the only good option here so we could probably do q i s so that's number seven the yellow highlighting means that there are two other words that also score 19 points um because I think that's just a cool thing to show sorry what was the uh so I heard a suggestion from someone no [Laughter] I just uh yes there is a non-suitable for work word and I think technically that is a valid Scrabble word but uh for the interest of protecting the children on the internet I will not play that um but yes uh anyways we'll leave it we'll leave it at that you've got the gist of the game and so the purpose of this talk is to go and look at some of the code that I've written and refactor that and I intentionally tried to write this code the way that I would have back when I was in university you know 10 plus years ago before I knew all the things that I know now in order to sort of give myself code that is more suitable for refactoring and uh definitely the code is not in a beautiful state but there are a lot of parts that I have managed to find and refactor that are definitely nice so let's hop back to our um our presentation here so time to refactor and we're going to look at hopefully six things if I don't run out of time and the first one um and so yeah you can find this game at uh code report Scrabble GitHub repo um all of the commits are labeled the ones where I'm doing refactoring Capital refactor colon and so if you want to go take a look at these um you know you can see a bunch of these some of them are just bug fixes and stuff like that but if you want to go look all of these are you know cataloged by hashes so that you can sort of isolate the different refactorings that I did so the first one we're going to look at is at data class uh I had heard about this on python podcasts but never used it and it was by far the happiest moment of refactoring that I did on this code base so a part of this code base is a position for figuring out basically where to play your Scrabble word you can play across you can play down and you need a row and a column pretty straightforward but when basically generating all of the possible words you can play you put that in a list and you need to be able to sort them and a position is a part of your plays you're sorting them by score but this needs to be sortable as well so it started like this but at some point I needed to add the equivalent of like comparison operators and equality operators and I also wanted a way to print them and so I ended up with something like this but one of the things about a position is that it never changes once you generate your list of all your possible plays you're never going to mutate that so similar as similar to the previous talk where Hebert was talking about how uh you don't have it's const by default you have to you know declare mute if you want to change something I know these objects are never going to change so I actually put a comment in the code you know figure out how to make this immutable and so when it came time to start doing factorings I discovered something called Data classes and this is the equivalent code that you can use to replace what I just had so if we go back a sec all of this the less than the equal disappears if you just use add data class is phenomenal and the I don't I don't know these like you know inside and out but for my purposes I needed something that was comparable and sortable which is why you go order equals true and I knew I was never going to change it so you can set Frozen equal to true and python doesn't have you know const or a mute keyword like rust or C plus plus but this Frozen equals true gets close-ish to that which is super super nice and if you look at the diff like this is the actual diff where I'm deleting all that code you can see my little comment to do make a mutable and just replace that with add data class I thought this was phenomenal um so to the python Pros you might be thinking yeah yeah we've known about this you know for a while but for someone who's not a python Pro um this was super awesome and I wish we had something built in in C plus plus to do this kind of thing so this is available in Python 3.7 so if you're in a python before that uh I think I did see something once that they backported something to 3.6 but if you're on an earlier version you might need to upgrade in order to get access to this so this brings us to refactor number two um f strings once again I had heard about F strings on python podcast but whenever it came to writing my small little scripts uh I never really looked into how to use them and they are phenomenal um once you've used them you're never going to go back so specifically the piece of code that we're first going to look at is if you look at my sort of Family Feud list of top 14 answers you can see that I have sort of the ranking a one colon and then the word and then in parentheses the score of that word and so the way that I initially built this up was just you know how you naively would do this without knowing about something first I have to convert my numbers into Strings with the Str paren paren and then I need to add you know the colon with the space and then I get the word and then I add a parenthesis turn my score into a string once again and then end my parentheses and I'm sure that there's a ton of people out in the wild that don't know about F strings that are writing code like this this is just you know the naive way that you would build up any string in any language with f strings you can do the following this is so beautiful in my opinion if you compare this to what we have before note that not only are we doing things in line without having to build up the little strings with the colon and the parentheses but I no longer have to worry about what the type that I'm trying to print is integers are automatically converted to Strings I don't have the two string paren parent call they're gone they disappeared and this is just like when I I didn't realize like it's such a simple change but I was going through my code and I have little formatting statements all over the place because I'm drawing a board that has a ton of text everywhere and I just didn't I didn't know that F quote and then I can do these things with braces and once you've done it it's once again you're never going to go back to writing strings the way that I was I think this is another example so here I'm basically creating a string that represents all of the tiles that are left in the Scrabble tile bag so in Scrabble there are 100 tiles and as you are playing your words you're drawing more tiles out of the tile bag and in competitive Scrabble there's something called end game where you want to keep track of all of the tiles that are in the bag because when it gets down to no tiles left in the bag you can know exactly what seven tiles your com your opponent has and that can be very useful if it's a close game you don't want to play a word that potentially could give him or her a very you know high scoring word so it's anticipating you know because you know what your opponent has um you know what to do and this is the code note that I'm sort of leaving some of it off but so it starts off with the space join and then inside once again I'm doing a plus colon and then plus string B we can change this to something that looks very similar except for the F string part which is just brace a colon brace B simple change I'm sure a ton of python folks in the room know this but I know that there's going to be a ton of folks on YouTube that are watching this being like how come I didn't know this before and no one told me this so this is going to be our first digression if you've watched talks from me before you know that I like to go on digressions and this is probably my favorite part of the talk um so if we take a look at this full python line so you know after 4A comma B in dot dot dot uh I've left off the dot dot so if we fill that in uh you can see why I left it off it's uh it's a lot there's a lot going on but this might be my favorite line is one of my favorite lines in the whole python code base so if we for reformat it into the following it's a little bit easier to read but let's uh look at sort of what this is doing first so I sort of explained it uh verbally but if we zoom into the very bottom you can see here that there's the count of the number of tiles left in the bag zero and then that's basically indicates what my opponent has so we're at the end of the game the opponent only has two tiles left on the rack there's one t and one U so the building up of this T colon number of T's U colon number of U's is what we're doing here so what like this is one of my favorite lines of code in the code base because we can do this in Python to do this in C plus is non-trivial I don't have all of the utilities that I need in order to make this a one-liner but you can do it in Python and if we break it sort of in the order that you read things we have a join based on single spaces so that's what's in between the U colon 1 and T colon one then we have the F strings for formatting then we have a list comprehension you can debate what defines it is it the brackets or is it the 4A comma and B because that's also in Loops but there's a list comprehension there and then we have sorted which is a built-in function and then we have counter which is a collection in the collections module that basically gives you a hash map where each of the keys is a unique value that shows up in what you pass to it and the values are the frequencies that that shows up which is exactly what we want right now we want how many T's are left how many A's are left how many letters are left and last but not least we have slicing and the way this works is I store my uh tiles in a list of 100 elements and I just keep an index of where where we draw from next so there's tons of ways that you could you know basically deal out the tiles but the way I do is I just take a list of all of the letters in the distribution that it comes in Shuffle it and then as the game progresses do you just move this index and when you're displaying this you basically just want to slice from where the index is to the end of your list and that's the tiles that you're going to pass to counter so what I find really interesting about this code is that you're reading everything in the exact opposite order of the way things are being processed so what we're actually doing programmatically is the opposite we're slicing first then passing those letter letters to counter our collection that gives us a hash map of frequencies then we're sorting those because a hash map is not sorted and then we're doing our list comprehension where we map our F strings in order to get the formatting we want and then we join them together and this is even complicated a little bit more by the fact that technically the way you look at the list comprehension if you look at the brackets then it's sort of out of order and some of these are built-in functions some of these are methods on strings you know I ignored the items but that's a method on a hash map it's so irregular it's beautiful that we can do this in Python but it's just not it's not regular like I'm reaching for a method here a built-in function here a collection here slicing here it's just messy in my opinion amazing that we can do it it's better than C plus because I can't do it in C plus plus however this is the equivalent code in Rust that's our slicing skip most languages call this drop if you have a list of 10 elements and you drop four that just gets rid of the first four and leaves you with the last six is the equivalent of our counter collection it's a method that's defined on the iterator trait you can find in the inner tools Library sorted is our sorted our list comprehension is our DOT map so list comprehensions are just a different way to spell a map in Python and it's more idiomatic in my opinion so you should prefer list comprehensions in Python and then we've got our F strings here on the format exclamation mark which is a macro and then we have our join this is so beautiful in my opinion and part of the reason why I'm falling in love with rust this is very pythonic in my opinion it's rust code but it's a better version of basically the code I wrote in Python and is doing the exact same thing the fact that counts doesn't require a data structure and is just a method implemented on iter and all of these things are basically functions implemented on the iterator trait which I'm not going to get into now because this is not a rust talk but the point is is that you either find these in the standard iter library or the itter tools Library which is why you can chain these all together which is just it's just so so beautiful note that uh the actually the most common name for counts is frequencies at least that I've been able to find so you can see that there's an asterisk next to the python counter because that's a data structure it's technically not a function and this tool that I've built basically it catalogs all these same functions under different names across languages so Haskell calls this count pandas calls this value counts rust calls us counts and closure and Racket which are both list dialects clock call this frequencies and uh yeah just very beautiful code and um I think the the last thing I kind of want to say is as we just saw in the last presentation from Hubert about what C and C plus developers can learn from rust um you always hear this is that rust is kind of a competitor or successor language to C plus and C but I just finished listening listening to every single rust podcast episode from every single rust podcast that I could find and uh there's a common theme that comes up that yes you do hear this but there's also a huge category of programmers that are coming from Ruby and python that actually don't care about performance they don't care about memory safety and they just find it a nicer python for certain things that's not to say is stop writing python I just think that like this story is getting missed by communities at large is that you hear that rust is this memory safe version of CNC plus plus but when I code and rust I'm not particularly concerned with memory safety um like I write C plus for a living I know how to write memory safe code for the most part you know everyone writes bugs but for the most part I know what I'm doing so when I go to rust it's just natural way to write code um anyways I just think this is food for thought that if you're in the python Community um definitely check out Russ because there's a lot of things that feel very pythonic about Russ when going to the language all right that's the end of my digression uh back to refactoring so F strings um similar to as I mentioned about data classes data classes are available in uh 3.7 F strings are available in 3.6 so you have to make sure that you're on those versions in order for um to be able to use those features and that brings us to uh refactoring number three so we're actually going to do this in two parts um because this is a this is really important and there's sort of two different things that I want to highlight here so the first thing we're going to talk about when using enums is this first example where basically I'm using an enum to replace two booleans and your first thought is going to be you know why did you write the code like this and and the the thing is is that like I wrote it initially there was only two phases of the game My Turn and the computer's turn there was no pause for analysis turn and so at the moment I have this code sort of actively you know before I refactored it I have this two booleans to keep track of what's the state of my game is it my turn or is it the pause for analysis part of the game and the way I kept track if it was the computer's term was that if both of these were false then it was the computer's turn which clearly is not in hindsight a great way to do this so I can replace this um with a single enum so this is how it started two boot like one Boolean so ignore the pause for analysis I had two phases players turn computer's turn when it was true for players turn it's much turn and vice versa for the computer's turn but then I added this Boolean when I I wanted a phase for analysis which then led to the following behaviors so when it was true for players turn and false for pause for analysis then his player's turn when it's vice versa then it's pause for analysis and when they're both false then it's the computer's term which then begs the question what if they're both through well that should never happen so in this table you can see that like I've made a design mistake and the key is get rid of the booleans and just replace this with an enum so python does not have great support for what are known as some types but in I think Python 3.4 which my slides will confirm later they added the enum module which basically enables you to write some types through the use of your product type class so here you have your three different phases and you think you can think of some types as basically where you always are saying or it's either the player's turn or it's pause for analysis time or it's the computer's term it's never going to be two of these things at the same time and uh when you do this it leads to really really nice cleanup of the code so this is the sort of code diff where I'm deleting two booleans and just replacing it with a single enum and it leads to refactors like the following so this is when it's time to do the sort of generating all the words for the computer you can see I think if it's not my turn and it's not pause for analysis then implicitly it's the computer's turn but this is it's terrible you like there is tribal knowledge that I know and I have in my head that I can make sense of this but when we refactor this it now goes to if it is the computer's turn which is way way more readable and it gets rid of the tribal knowledge and I'm sure all of us whether you're in python or C plus plus serve Java whatever your languages have looked at code like this where you have this sort of more complicated than it needs to be conditional expression in an if statement that relies upon some tribal knowledge by using these subtypes that explicitly lays out the different options that you currently have it makes the code more readable and it also leads to places in the code where I'm setting two different booleans at once in order to make sure that I have the right phase of the game I can just replace this with a single setting of my phase it's now time for a pause for analysis so um enums super super useful at some point I will give a long-winded talk on algebraic data types and product types and some types um but today is not that talk but just know that enums are very very useful and that brings us to the second refactoring that involves enums um which isn't really I'm not going to show much code it just shows that you know all over the place while I was coding this up because when I started this project I didn't know that enums were quote unquote possible in Python so whenever I needed something that was enumlike I would just write a comment convert to enum and I'll figure that out later when I was refactoring and then we just list a bunch of integers you know one two three four five and this right here is for the bonus tiles on the grid so no stands for nothing it's just a basic tile DL is double letter DW is double word and then triple letter and triple word but in order to convert this to an enum it's as simple as that and now all over the code where I'm have just these constants DL DW it now says tile.dl tile.dw which still you know you might argue why aren't I writing out you know the full uh version of these and it's just for style because I'm building up manually a 15 by 15 grid and so for each thing if I have to write out double word double letter nothing nothing it's going to get quite verbose and um on a lot of Scrabble boards this is actually the words that they put on the physical boards they don't spell out double letter they just put DL so it's kind of nomenclature that if you're a Scrabble player you'll be familiar with but the greater Point here that I want to make is that these enums once you start to use them are all over the place I'm not sure this is every single enum that I have in my uh repo but I think it's most of them and they're so useful and half of these I think I mean some of them I couldn't I knew at a certain point through my project I knew that enums existed so I couldn't bring myself to write the you know equals one equals two equals three without using an enum but the point being is that learn these and you'll start to use them everywhere because it's it's such a fundamental type like it's it's a crime in my opinion that so many languages have classes or structs which are product types and then had no support for some types in C plus plus we got you know optional and C plus plus 17 we got variant in C plus 17 and we had unions from all the way back but like some types are the analog to product types and you need like Rich support for both of them in a modern language in my opinion so you need use enums because they show up everywhere and they're super useful and enum the enum so enum is a module uh is available in 3.4 which I which I mentioned earlier so so far we've gotten through three of our refactorings number one was Data class which is in 3.7 two was fstrings which is in 3.6 and 3 is enums which is in 3.4 but for the fourth refactoring uh we're gonna do a little bit difference this is interactive now I know they said questions at the end but just shout out the answer if you think you know it so I'm going to show the code first and let me know what you think we should use in order to refactor this so this is the line of code and it basically actually we can show a little bit of a demo so and and let me so the the comments a little bit hard to read so I'll brighten it up so I have a comment that says zero means off one means a cross and two means down absolutely terrible and if we go back to the game you can see that I have this little little arrow here so right now it's in a cross mode now it's in download if I hit the down arrow and if I play the word that we had you can see now the arrow disappeared because I'm I'm typing letters so there's there's three different modes for this so the question is does anybody know what we should use to refactor this instead of having this comment that describes implicitly what I'm doing all over the place so I was hoping people would say Eno enum is part of the answer but it's not the top level thing we want to use uh variant is close here's a hint it's not actually something that exists in python as a module or a feature it's a library that you have to go and hunt for uh in enum is a good guess as well but it's not and optional so if you've been in this room all day you'll have heard from Victor's talk who is right there and you'll have heard from Hubert's talk um mention of the optionals and so this is a library but it's basically just an implementation in Python of what you can get in C plus plus 17 and in Rust and enum is a great guess because it's an optional of an enum but I already have this enum that's existing for a cross and down but the third state shouldn't be added into this direction off is not a direction it's the absence of a direction and what an optional is is basically saying do I have this thing or do I not have this thing and when I have this thing it can either be across or down um so this is super useful once again this optional is another sum type so enums are some type optional is a some type like this is a fundamental thing and I'm sure there's some people thinking this is not idiomatic python code idiomatic python code was you would use none and that may be true but I like to write rust flavored Python and rust flavored everything because rust uses algebraic data types and they have built into their language both product types and some types and so this is some type that I just think you always want to reach for in the cases where you have sort of the absence of something or something that's what an optional is for and uh so this is refactor number four is optional and uh I'm going on a regression now which is that I talked about this in the programming languages virtual Meetup that I used to run and sort of delegated off this was a year ago I guess when we were working on category theory for programmers chapter six talks about simple algebraic data types and if you zoom in on the thumbnail for this video you can see that it's got a picture of the book but then also a bunch of programming language logos and then what they call their option type so if we make this unblurry you can see the following so Scala rust oh camel F sharp all use option and then there are two different states are some and none Haskell purescript and Elm use Maybe and Just in nothing for their two different states C plus plus and Swift do different things they call theirs optional C plus plus is a little bit icky with the dot value and null opt and Swift is nice it borrows the sum and num but they just added an Al instead of calling it option they called it optional and in Python we have the following so if you get it from the library optional.py you have of to represent something being there and then empty when you don't have anything there and if you use this so that's the end of the digression you'll end up with sort of the following refactors so all over the place I'm relying on sort of this input where I'm changing where the arrow is and where I need to start typing this is an integral part of the Scrabble training program um and I'm I'm checking you know is this equal to zero meaning is there no cursor there and that changes to is empty uh if I'm relying on the fact that it's not zero with an implicit conversion to Boolean instead I replace that with is present and note that everything on the right is more verbose it's definitely more that you have to type but it's so much more readable like self dot cursor equals zero I'm assigning it to be zero but like what does that mean whereas when I'm calling uh optional.empty but technically that should be self dot cursor equals optional.empty so much more clear that we're I mean it doesn't actually have the type there so you still don't know what it's an optional of which is one of the sort of downfalls of python is it's not a statically typed language um but still in general all of these things are much more readable in my opinion and optional is another tool that you should put in your toolkit even if you're writing python which doesn't sort of have built-in support for it and you never know maybe in a future 3.12 or 3.13 they might add this as sort of you know similar to how they added enum as a module that comes with the standard Library they might do that for optional as well I wouldn't be surprised if they did so this Library you can find it at optional.py on Pipi and I think it works for everything greater than 2.7 there are some restrictions where they have not equals to 3.0 point something but in general you can use this in in most python code bases this brings us to maybe the funnest of my refactoring so if you've watched any of my C plus plus talks I love to do these like name the algorithm where I find a piece of code in the wild you know you follow the no uh the Sean parent rule um who's a very well-known person in the C plus Community uh who says you know you shouldn't have just raw Loops floating in the middle of your functions you know there's usually a name for that algorithm if you can find a loop in the midst of a bunch of other C plus code so we're gonna do the same thing with python which is a little bit harder because they don't really have algorithms the same way that c plus does but nonetheless here's a raw Loop that exists at one point in my code base where we're basically trying to find the rank from the back of a list of words so they've got negative index slicing which is beautiful actually taken from one of my favorite languages APL which is an array language invented back in the 1960s the ability to have a negative index and that wraps around the end so you're basically trying to find the play that is equal to so while it's not equal it's going to keep on incrementing the rank and then search from the back and basically write the code that this is used in is how hard do I want the computer to be so right now I have the computer playing the third best word which still the computer usually beats me pretty handily with that but as time goes on you can adjust it to become harder or worse so I think the first thing we're going to do is talk about what the anti-pattern this is so if you watch my first talk you'll know and I've given this in a couple talks is itm is my least favorite pattern anti-pattern and it's initialized then modify so when you're initializing some variable and then specifically in a loop but sometimes in you know conditional statements you're immediately modifying that which is what we're doing here I really really did test code like this because a lot of times you can make it more declarative by finding the function or the algorithm that does this piece of code and just you can assign it once again we don't have const or we don't have sort of cost by default in Rust but you can still write the code declaratively as if you could make it an immutable value so first we're going to switch this to C plus plus so we're now going to you know at one point it was a rest talk now it's a c plus Talk inside of a python talk so does anybody know what the algorithm in C plus is that we would use to replace this code it's not entailment no this is a linear search so you could probably use uh an element but this is a semantically we're making the same replacement here uh not countif our find is correct but I think our find is a method on string it's not actually a free function but uh it is exactly what we're looking for because the equivalent of R fine in a generic sense is just fine with reverse iterators so if you notice we're calling dot R begin and Dot R end which means it's going to search from the back and then you have to call stood distance to basically get the integer value because Farine returns you an iterator but this is the game in C plus plus and technically some of you might be thinking this is pre-c plus 20 why aren't you using the ranges algorithms because the range is algorithms I would have to basically insert the namespace and then pipe my player plays to the reverse views which I'm not actually sure if I like more so I'd have to pipe the sequence to a reverse view that then reverses that lazily and then passes it to a find algorithm in the C plus 20 algorithm so I'm aware that you could write this slightly differently but for the purposes of this talk where we're doing it with pre-c Plus 20. all right so back to the python code does anybody know what combination of two things which isn't really an algorithm that we could use to solve this filter and slice you could do it that way but that's not what I've done you could do a reduction but there's something even more specific that python provides us with that we can use for this that index so there's actually a function that does the combination of stood fine and stood distance in Python called.index and note we're using one of my favorite little sort of python idiosyncrasies the colon colon negative one to reverse a list you could also do this with the reversed paren and then past self player.plays but as a fan of languages that spell whole functions with single Unicode characters that no one can read I obviously I'm going to prefer the uh incomprehensible colon colon negative one um but basically yeah and this is it the most efficient way to do this probably not there might be a thing that doesn't reverse the whole list and searches from the back maybe there's an r index that I just don't know about but definitely this code is an improvement in my opinion on the previous code where we had a whole Loop and if you've watched my other talks you know that I'm not a big fan of loops so this brings us to our last refactor and we're doing great on time um which is just more of this avoid itm so we started with the big ones data classes enums f-strings optional and now we're just having fun um because I actually I have a very hard time writing code like this because if I know that there's a better way to do this in Python I just I can't bring myself to write a loop that one while loop is I think the only function that I didn't know like what index is that I had to go and look up everything else if I know that there's something for it I just I can't bring myself to write it however I borrowed these Scrabble engines so I wrote the first Scrabble engine in terms of finding all the solutions myself it was terribly inefficient um it just the combinatorial time complexity of it was egregious and I was Distributing it using a python Library called job live across 15 different cores it was so slow and so at a certain point when I was going to give this talk I realized I can't have a you know terribly slow program you know in front of a bunch of people so I went and looked up uh on YouTube because obviously that's where I'm gonna go find Solutions and I found this talk by someone that goes by Boring Cactus I believe their names actually Melody horn and they have this hour and 18 minutes uh live programming of basically a Scrabble engine that they have also posted on their GitHub it's not licensed but I did DM the person on Twitter and ask if I could borrow the code and they said that was totally fine and it is an implementation of this 1988 paper called the world's fastest Scrabble program that makes use of a try which is a data structure that is used for storing you know dictionaries and stuff like that I'm not going to go into the details but maybe there'll be a whole other talk on sort of the implementation of the algorithms behind this this program and they basically implemented this paper and so I went and borrowed that and while they were implementing pieces of this engine they specifically said that they weren't going to use certain features in order for this to be readable for for everyone so if there was something that was more pythonic idiomatic pythonically if you can say that they were going to avoid that just so that it was more readable to regardless to to the folks watching the video regardless if they were a python developer or not and so these are three methods on the board class that they designed and we're going to finish this talk by refactoring these three so the first one we're going to zoom in on is this init function which basically uh creates a 15 by 15 or in this case size by size so the first thing we're going to do is I don't need a size I'm playing Scrabble here I'm not playing some custom version um it's always going to be a 15 by 15 grid unless if you're playing Super Scrabble which is a different game that's actually way bigger but I'm not going to talk about that um and so we've done that now and so it's these two for Loops that are irritating to me so if you've watched the first talk you'll know anytime you see this for Loop where you can see where on the inside we've got row equals bracket bracket and then we're appending does anybody know what feature we can use to get rid of these three lines list comprehension exactly anytime you see initialization of a list and then inside a loop append a pen to pen to pen think list comprehension it's a beautiful feature of python and you can change this into a single line just by multiplying uh this single list uh with a period times self.size and if you're thinking why just do it for the first one why not do it for both of them you're exactly correct we're going to get rid of the other for loop as well and so now we have two nested list comprehensions and uh it's it's beautiful like this is so much nicer than the lines we have before and I think this is more idiomatic because list comprehensions are in general more performant and I think more relied on by python Developers so moving to our second Method All positions this one's a bit tricky uh does anybody know how we can refactor this into a single line and hint it's not list comprehensions it makes use of something in a library called hitter tools I heard it from someone product so product is an extremely useful function that is more commonly known as Cartesian product where given uh two sequences or even more than that but in this case we just need two it generates every single possible combination of each of the elements in those two sequences so if you have l n elements in your first one M elements in your second one you're going to end up with a list of zipped sort of two tuples or pairs that is n times M elements long and uh you can see here most languages call this Cartesian product obviously it exists in Rus because Russ has everything C plus plus is getting this it says range V3 here but that's actually out of date we're getting this in C plus plus 23. other languages just call this product product F sharp calls this all pairs which although is a a different name it's actually a pretty good name because that's exactly what it's doing it's giving you all pairs of numbers but it doesn't generalize to the case where you might have n sequence and set n sequences instead of two but yes product and inner Tools in general in general as a library is uh super useful so we've taken two of our functions and refactored them which leaves us or methods and leaves us with one and it's this copy function does anybody know this depends on a library as well we can produce this to a single line of code deep copy I didn't know about this in fact I was very confused one day when I was trying to copy my list of lists and then I was like why isn't this working because you know I don't fully didn't fully understand the python memory model at the time so then I I started copying the top level things and it still wasn't working then I realized wow if you have an arbitrarily nested list of lists you have to go and copy it every single level so there's a library called copy which provides you with this deep copy which is basically exactly what we were just doing manually there so as before we had all of this code at the beginning and we've refactored it down into basically a bunch of one-liners if you ignore the setting of self.size equal to 15 which um doing stuff like this makes me incredibly happy and uh like I said before this is not a criticism of of Melody's code they specifically mentioned that they were not you know trying to make use of these you know Cartesian products or deep copies and things like that they just wanted to spell things out manually so that everyone could understand them so this is sort of the recap uh six different refactorings data classes fstrings enums optionals user algorithms or you know built-in functions if you will or methods on containers and avoid my least favorite anti-pattern itm if you can and as I mentioned before all of the different commits including ones that I haven't showed in this talk you can find on the GitHub repository and with that I will say thank you thank you thank you thank you thank you so much I knewan Connor it was John Connor from Terminator about your Dynamics and you know you're better than than him so thank you so much once again Peter Applause for Mr color beautiful python manufacturing part two I guess we also have some Terminators like trying to ask difficult questions right here so maybe two of them from the room if anyone please do raise your hand right now this is the moment I'll just oh okay the microphone is closer to you right now yeah thanks uh yeah um with the optional uh you used to get to get the value from it for the comparison yeah that can be total what happens if there's a non-value there's a compare a non or so yeah I'll repeat the question because I'm not sure if the mic picked it up but it was uh you use dot get to access the value inside the optional in Python that's similar to using the dot value in C plus plus on the optional what happens if you do that and it's uh in the empty State uh bad things I actually not sure if it crashes the program I'm pretty sure it crashes the program though um so yes you and that's the nice thing about I think languages like rust correct me if I'm wrong but you can't actually do that in Rust Russ will not give you the ability like you have to exhaustively pattern match on all the different states um in order to get in inside so you'll never I could be wrong about that but yes you have to be careful and so there are pieces of my code where I'm doing Dot get but in all of those circumstances I basically know that I'm at the point where the cursor has a value so there's a certain amount of defensive programming that you need to do on your own but yeah it's a it's a great observation and a language like python it's definitely not going to help you with doing that you cannot just compare the wrapped values in python or like sorry uh what's the second part um like in Hesco you could say uh the just how you compare it with another just value uh or a known maybe how you compare it with the chest so you compare the reptiles instead of unwrapping it and comparing it is that an option in python as far as you know I actually don't know the answer to that question but if you find me after I can we can just try and see um but yeah I have I have not used that technique in my code base but if it is possible that would be useful to know great question there was at least one more person yeah exactly thanks uh I wanted to come back to this enum refactoring for phases so those phases were not only enums but like correct me if I'm wrong I'm not a pro player but they come in a sequence and they form a cycle right they come in a specific order uh can you think of a solution to uh benefit from this observation so that we can not only just say that now is the phase of my turn but it is like the next phase uh so that it is like clear pythonic and elegant so thanks to the fact that I have my laptop here um it's a great observation and if we go to a function called so I actually one of the refactorings I did is I created a class called cursor that has a method called rotator um can I skip to this 12. and it's basically doing exactly what you said so it's checking the state of it and then switching to the next one is this the most here and I can go Ctrl B and so basically the code that I'm highlighting here is this so this is what you're talking about it's it's a cyclic sort of thing where you're always going from the player's turn to analysis and then from an analysis to the computer's turn and then from the computer's turn uh back to my turn and so this is the way I solved it I basically just have this sort of rotating method on a class that contains sort of one of these optional directions and then it also combines you know the X and Y which are the row and column um so this is what I did might not be the most idiomatic thing to do and potentially there is some kind of enum that has this automatic cycle I mean I wouldn't be surprised if there's a library out there that has something called a cycle enum or something like that um but yeah this is this is how I solved it in in the code base ladies and gentlemen thank you so much once again for all the questions I guess there will be still more because we are in for now uh lunch break so actually a good moment to grasp something to eat thank you for attention once again I think he really deserves that Conor extra with us today thank you [Applause]
Info
Channel: code::dive conference
Views: 3,960
Rating: undefined out of 5
Keywords:
Id: nXZQfdxWgh0
Channel Id: undefined
Length: 54min 6sec (3246 seconds)
Published: Tue Dec 13 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.