James Powell: Does Code Quality Really Matter

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay it is close to 245 we're gonna get started I please to introduce to you James Powell I first met James Powell when he was giving his generator showcase showdown talk I believe at a strata at a strata and at the time he was working for bammel and they were who knows what bammel is make of America Merrill Lynch there right off there they doing some really progressive things with Python have over 3000 Python developers but he was working for bammel at a time and he got me giving talks on best practices and so since then I've I've given a few talks at the Python meetup group and had a - twice at PI Gotham and so I really owe a lot to James and he has since given talks on compiling Python two and three together quite impressive he's also known for his bad puns and one-liners and now he will address the the issue of the century I think does code quality really matter please welcome James Powell Thank You Aaron for that very thoughtful and heartfelt introduction so the title of this talk is does code quality really matter we're at PI data in New York it's Wednesday November 11th 2015 and I'm James Powell there's a reason I say that because there are some people who are seeing this talk some point in the future on YouTube so it kind of helps to know where this talk is being presented and what the data is everybody looks at me like I'm crazy when I say that out loud so hi I'm James so that's actually not as bad as I thought it would be but try one more time hi I'm James even better so this is one of the last talks I'm going to give this year this is a talk that I gave this past weekend at PyCon Canada the demo that I wanted to show for the talk failed and when I tried to resurrect the demo the backup demo failed and then I had a third demo in my back pocket which also failed and about 15 minutes into a 30-minute presentation everything was going wrong it was an absolute catastrophe I was so embarrassed so I actually have PyCon Canada on this schedule with the cross out because didn't work out but hopefully it'll work out a little bit better for you and I rejiggered a couple of things just to make this a little bit easier or a presentation to give so I've given a lot of talks this year this is about the second to last talk I'm going to give I'm really glad to be here at PI down in New York this is the 12th PI data event we've hosted in the 12th PI that I've spoken at I think it's been an amazing event so far and I really like to thank all of you for sticking around our program is not quite over and I'd like if all of you could stick around there's some really cool lightning talks at the end of the day so hopefully you'll all be able to see those so you might wonder what on earth are you doing if you're giving talks at 12 conferences you some of these are lightning talks but what on earth might you be talking about and Aaron mentioned a little bit about my background but interestingly enough I almost never talk about anything related to the finance industry or data science I usually give what I call avant-garde talks and if you you know a little bit about the avant-garde these are talks which are intended to provoke and offend the sensibilities thus if you go back to the first slide my email address and my Twitter handle so bear in mind that you know I'm here to be a little bit provocative now I'll go into a little bit of a meta talk and I'll show you why that works or why that's might be important but before I do that I want to show you a little bit of a demo and it's a demo that sets the story for what I'm going to talk about it's not intimately related to this topic but it's something that I want well you know I just wanted to show you go to this demo so some of you saw lightning talks on Monday and on Monday I showed off something that I was very proud of that I put together at PI Texas earlier this year it is one of my worst and most gratuitous demos who saw who saw the lightning talks so a bunch of you what I was able to add to Python is Reid watches and I showed you some really horrible things you could do with them just to start over or to show that to you one more time you can import our watch this is a module that can be pip installed so let me just make sure in this tab here import our watch and then from the sis module I can just I can create a variable sis that set up let me create a function frame object print saw object in right returned the object and I can set a read watch on this so every time somebody tries to access this variable it'll trigger that custom code and one thing that I left you with is how on earth did I do this and the way that I did this and some of the demos that you can do with this horrible horrible feature are actually quite enlightening in trying to better understand what software is how soccer fits in how engineering works or trying to get it maybe a slightly more formalized or rigorous or mathematical view of software so the way that I did this is cpython is a stack based virtual machine and in that stack based virtual machine let me go into my C Val dot C we have some or actually in this virtual machine we have some for loop that runs over all of our op codes and for every op code it just performs what the OP code would do so one example would be binary multiply and you can see when you try multiply two things it pops two values off the stack it multiplies them together and it sets that back onto the top of the stack well every time you try and perform an operation on some variable in Python it has to be on the stack somewhere and you can see there are these macros pop and top so what if I just went into the macro we have to start at the beginning the macro for pushing something onto the stack and added some custom code there so every single time you push something onto the Python stack you can have a trigger that's exactly what I wanted for a read watch and so I added this it was actually a fairly simple exercise to just add some code to trigger something now the really wild thing is here I have a a patched C Python interpreter and yet in the demo that I gave during the lightening talk I didn't I showed you that there were no tricks up my sleeve and what I showed you was this demo working for an unpatched C Python interpreter for a C Python chopper that I built from source and had molested in no way whatsoever all I did was pip install a module and the deeper question is how do you pip install patches to an interpreter it turns out it's very easy and it's a little technique that I cooked up earlier this year so I have a version of this where I have a let's go in to sideload where I have a side loadable version and an aside loadable version we have our sitar Watch module and you can see it adds some stuff to the Syst module here and you can see here what I do is I hook PI eval eval frame X and it turns out it's actually really easy to just go into a c function in linux or mac and just add a trampoline and swap out c functions and by this mechanism what I can do is I can arbitrarily distribute or I can distribute arbitrary changes to a c python interpreter in a form that you can just pip install so there's a couple of things that I've shown in previous talks and previous lightning talks where I added features to Python right try it experimenting with things and I always thought the standard for how people judge these or the standard for how these things enter into the into C Python itself into you know the project itself is that people write a pep and they argue about it for months and months and months and then somebody eventually has to write some implementation they argue about that for months and months and months and at some point something gets done and I thought it'd be much more much more interesting if you could just go and write a feature that did anything at all added any kind of feature whatsoever to cpython and you could make it pip installable by anybody out there get a bunch of users and then short-circuit some of the discussion that said you can see this is an enormous this is absolutely the most gratuitous demo I have and it's the reason why I came up with this I wanted to see what the worst thing I could possibly do with cpython loss I think the demo I showed you during lightning talks where I do a read watch on the number one and return to so when I add one plus one it gives you four and when I assign it to a variable I do one plus one and it gives you X you want to see that again I can't show that to you again now it's neat so that's easy just do a watch and the watch is a frame object and you say if is instance object and int and the int is between zero and ten the return this times two otherwise you return the object and then you just pull from collections a default ticked and you set an hour assistant set or watch and you just do a default dict where you always return this watch object and now the number one becomes a number two if I sign this and I try and add it to itself one plus one is now eight and you can really give somebody a very bad day and we can see you can do all sorts of crazy things with this who knows what's happening here this is gratuitous this is supposed to provoke you that said I decided for this talk to instead of going in the direction of the avant-garde to go in the direction of the postmodern I'm afraid I was afraid I was putting together the slides for this that even if the demos didn't fail and I didn't have egg on my face the this would be a little bit too high concept for you so bear with me but there is a point the question that I'm posing you for this talk is does code quality matter and the answer for that I can give you right now if you're here and you're trying to watch a talk about does code quality matter you already know the answer which is no it doesn't you wouldn't come to this talk if you thought code quality mattered you'd only come if you didn't think it mattered and you want somebody to back you up on that so the answer is no code quality doesn't matter we're done because a lot of people are going to come and see this or view it on YouTube and they're gonna be asking this question does code quality matter and they can ask they're gonna be asking themselves things like do I really need to learn Python that well do I need to know Python well enough to understand the crazy code that he put on the screen just there or any the see stuff and the answer is no if you're asking that question the answer is probably no so what's the point why am I even bothering why am I here I asked that question to myself all the time and it's not because I'm existential but because I'm trying to give you a good talk so I'll tell you about the process this is the secret this talk is a total bait and switch by the way I'll tell you the process of how people give talks how normal people give talks how you get into speaking if you are somebody who wrote scikit-learn or in the audience matplotlib or do have any other core developers and projects in the audience we have Brett cannon who did see who does don't be doing Python for 12 years if you're one of those people you've already made it you don't have to worry about getting it you don't have to worry about profile you have to worry about giving cool talks and people will link to your already famous you can just give talks about that project until you're dead and you'll be exactly Tom Tom Caswell is in the audience he's giving a lightning talk today what are you giving it are you gonna be lightning talk on is it matplotlib oh it is okay okay very good does it work with matplotlib okay they're very good very good so you can see you've already made it if you're like me what you do is you fight you start by talking about something somebody else did the reason you do that is it's an easy pitch to a conference you say I'm gonna talk about something that somebody else did and as long as I read through the documentation and the blog post and the Stack Overflow questions well enough you can be pretty certain dat as long as i'm an energetic personality you can be certain that the talk will be okay and that's exactly how I started in talking my very first conference talk was in PI Gotham was at PI Gotham 2012 and I talked about decorators and context managers and I just explained to the audience how they worked and the only prep that I needed was reading through the documentation about decorators and context managers and coming up with some demos and some slides and it was very easy since then I've given I counted last night I think 51 talks at 30-something conferences so it's been a lot and I'll show you the next step that you do once you do that you say well I don't want to keep talking about somebody else's work I want to start talking about my own work so you talk about some small library or some tool that you worked on or if you don't have an open source library worked on or some tool or something you've built at work you talk about some not quite totally mainstream approach to a common problem so you say well here's a problem we have in data munging or here's an approach that people aren't using correctly in that a visualization exactly what I did I started talking about generators and if any of you have attended any of the past Python events for about a year there I gave a lot of talks about generators and different ways you could conceptualize them different ways you could model code with them it's the standard practice then the third phase is you talk about your opinions and you try and talk about some controversial ideas and there's some message I gave my my second or third keynote ever at PI Texas earlier this year and I was going to give a version of I talked talking about that did you this other project that I worked on this really small hacky thing and I thought well I can't just go up in front of a bunch of people and show them some weird hacks I have to have a message so you have to go there and that's kind of where we are right now so there's some message here buried under the layers of meta and crazy cpython hacks the most important thing is this step is not then you get famous the step is it doesn't work unless you already have become famous and this is something that a lot of you may not realize and it's very interesting what happens is when you're famous you have more bandwidth to communicate to people so I've been giving these kind of talks for a while and one difficulty is when you when you're famous you have a certain amount of bandwidth so you can write a one-liner somewhere and people will pour over that you'll try and understand all the nuances and all the depth of it and when you're not famous and by the way you do not want to be famous there's a certain obligation that comes to fame it's not something that you want but when you do get famous you and you have to be like Drake famous not like Guido famous you have to be like super famous you end you could and even in those situations you could end up like Leslie Lamport who knows the story of Leslie Lamport and Paxos in 1989 Leslie Lamport who is already a seminal figure in computer science publish the paper or tried to publish a paper on a distributed a solution to the Consensus problem in distributed algorithms called paxos and he had surrounded it with the same humorous trappings as he had another topic that he had made very famous the two generals problem or the Byzantine generals problem and the paper was rejected they said we don't want this jokey trash he put it in his in his filing cabinet he sent it to some of his friends and then six months later asked them do you know a novel approach a novel solution to the Consensus problem and they all said no and he put it in his he put her in his filing cabinet and about eight years later when there was more desire for a solution to these problems he took it out of his filing cabinet and republished it and it turns out that Paxos is one of the I would say it's one of the most it's one of the most exciting and one of the seminal developments in all of this distributed algorithms and it took him eight years for anybody to notice it so you do have to be famous for this to work so if you come away from this talk having no idea what happened don't worry I won't be offended however an easy way to fame is controversy so I thought how do I give the best talk I'll talk about something that everybody has an opinion on like does code quality matter so it turns out you've seen this talk before in fact you've seen this talk twice and not for me but from other people you've seen somebody say does code quality matter and they told you yes of course it matters and they probably show you something that in order if you need scalability or anytime to market or rather if you need to scale up these slides are a little bit swapped if you need scalability or if you need solve really hard problems then coal quality really matters and you've also seen this talk before and somebody said no and they said something like well it doesn't really matter if the code is good or bad what most important is time to market user experience you know the code is just what we tell those trolls in the cave to do in order to make our customers happy so you've seen this talk before and they've given wildly different answers unfortunately when you've seen this talk before it probably wasn't even about core quality it probably was about something else entirely and I hate to say this and it's gonna be very controversial but sometimes when you've seen this talk before you've seen a correlation between the speaker and the answer they've given so if somebody comes up to stage and they say well I really really good at writing good code and code ecology matters and then somebody comes up on stage and they say you know I've dabbled a little bit here and there but I'm a really good product person and user experience person and it turns out that's more important it turns out for everybody in the world what's most important in the world also happens to be what they're good at so that may hint at you a certain bias for where this talk is going because you know I know a couple of things about Python most of these talks are about selling some agenda that code quality is just a prop for selling that agenda for saying user experience is important colloidosome matters it was meant to provoke human to get you to start thinking about the issue but it wasn't really about coke quality and I would say the reason for that is and as you'll see this is an issue that we need to deconstruct now I'll say the avant-garde is not controversial the stuff that I showed you before and some of these wild things I can do with Python I always thought those would be very controversial which is why I surrounded myself with this verbiage of don't use this code I thought I'm going to show you something really weird you can do in Python I'm going to show you you know the best memorizing you can write if you look at my github there's or if you look at the gist on github of that have I have all these little code samples of really neat things you could do that require a little bit of you require a lot of thought and a lot of parsing in order to understand what they do but that's not actually controversial I've never had anybody come up to me and say I hated what you do you know this is nobody cares about that kind of code never ever had that happen I thought I would which is why you know my email address is James that don't use this code comm I posted some of this stuff on a blog at seriously don't use code comm you know my twitter handle is well on on IRC I'm D UTC for don't use this code and also because I didn't want a really long one and when that one's taken cuz every so often you know you get a net split or something and your IRC handles taken I fall back onto D you TCH for don't use this code Hass I'm from Texas it's not controversial the only time I've ever had anybody yell at me after a talk is when I talked about Python 2 versus 3 and I said all I said I seriously all I said was Python 3 is where a lot of the effort for the core developers is going and if you're able to use Python 3 there's a lot of neat features in it and I had somebody harass me for 45 minutes at a bar about Python 2 as Python three and white code doesn't matter and I was shocked I wanna I want to set aside and before we can even talk about code quality I want to discuss with you the important question of what is good code so who here in the audience has ever seen good code in their life who here in the audience has seen bad code who here in the audience works with data scientists yeah and all of you have seen bad code too huh I know in the original version of these slides I actually named dropped a couple of libraries that we all know are not that great but who here thinks they know what good code is who has some metric does it involve elegance does it involve understandability I'm sure you all have some metric that you use and it turns out all of your metrics can be described in one very easy very simple way oops good code is code that I wrote and bad code is code that you wrote and every single time I've ever ever ever heard anybody talk about good code it's always good code is code good code is this code and also that's the code I wrote and bad code is the code that you wrote for example unnecessarily complicated code is the code you wrote the code that covers all the complex special cases and make sure to really solve the problem fundamentally that's the code I wrote this the code that's so self-indulgent and over-engineered that's the code you wrote but the code that scales and adapts the problems that we're gonna see in five months that we don't even know that's the code I wrote every single time this topic comes up it's like an interview every time you go to interview for a position the questions you're gonna be asked are the questions which make you look the most like the interviewer so if that person is a really good programmer they can ask you programming questions if they're not that comfortable their programming skills they'll ask you the more important questions which are whatever they're good at we see a lot of ourselves in this problem so here's the here's a question does quote quality matter the real answer to it is go away just go away don't ask the question it's pointless every time you ask the question you're going to be pushed towards one of these two extremes and you're not gonna be able answer but that would be a really lame talk if I said if I just spent 25 minutes of your time telling you no there's no answer there is an answer to this and the answer to this is we need to deconstruct the question if we really care about Co quality if we really care about understanding weather code quality matters or not we need to actually have some semi formal way to talk about code quality to talk about what it is when we're writing code to really analyze where it might matter or not matter because we've seen and the different talks on this topic we've seen sometimes people say code quality doesn't matter and they have very good reasons for it they say my startup could never have succeeded unless we made the right investments into these technologies we would never have scaled to 20 billion users that's not quite correct we would never a scale to so many users with only 17 employees or some people say well you know what we pivoted 10 times and the eventual market that we found was something that wasn't even related to the first thing that we were doing the co quality of those first 19 approaches it doesn't matter what matters is execution time and they're right as well so it's clear that we can construct a scenario in which code quality does matter and we can construct a new scenario in which code quality does not matter and if I were to tell you well code quality matters when it does and it doesn't matter when it doesn't that would be useless it'd start illogical unfortunately this question is useless as it's specified and we need to find a better way to think about this this is the talk that I've been trying to give all year which is how do we how do we encourage better ways of conceptualizing topics that are important to us as software developers I did some of my background some of my degree you know when I did my undergraduate I did it in liberal arts and one of the topics that I covered was comparative literature and literary analysis and it's something that you know we all make fun of casually you all think that you know you can sit in a literary class and somebody asked what was the author thinking you can just BS whatever you want and that'll be accepted but it turns out that that's a very difficult field and you need to build some very abstract and oolitic 'el tools to be able to effectively analyze something as simple as a poem or a literary work and derive any meaning from that and one of these tools are given some of these tools are given to us by these different frameworks in literary analysis structuralism post-structuralism modernism post-modernism and post modernist gives some very interesting tools about taking questions that on face looked like to go in both directions and instead of answering them by choosing yes or no by finding a middle path where we question the question itself where we deconstruct a question now I do want to make since I have time I do want to make a slight comment on this aside here this is a total rant but I have the time for it and frankly what are you gonna do right I mean I'm on stage I have the mic come on benchmarks earlier this year I was at Julia Khan which was a fantastic event Julia is an amazing language has come up in the last two or three years it's an unfocused supportive project it is a fantastic tool if you're an AR user or if you're interested in machine learning or data science at that event one of the things that every single speaker wanted to show off about Julia was benchmarks and it turned out that all the speakers were statisticians and they would come up onstage and say well look this is a thousand times faster and then I have one slide with the 1000 X or let's say look this is 23 times faster than have a slide 23 X or they even have a graph not a single slide ever said was this the average the median the variance the standard deviation and I thought you're all statisticians and if I came to you on your topic of interest and I told you well the answer is three I don't even tell you I didn't even give you a model you would throw me out of the room and yet four benchmarks we're allowed to do it and it's very interesting because I thought you know if you have a thousand times speed-up there's probably something there maybe if you have a two and a half times speed up our statisticians gonna tell you how do you even we won't get into the null hypothesis right baguette but how do you even validate that how do you even validate that there's something there and so benchmarks that's my mini rant this this is look look how many topics were covering this talk this is amazing we got some real good density of content here but it it's useless that that said that that said that doesn't mean that we can never talk about benchmark said we can never talk about optimization but we have to be more careful or more delicate about how we talk about them we have to talk about the model behind them and the purpose of this talk is to try and give you some model under which we can talk about this I won't give you the only model but I'll give you one model that I use to think about these so in in in order to preface this model I'd like to say that most of the time the code that we write is some artifact of some business process we're at PI data here a lot of us are data scientists and we're writing code to supplement some business that exists somewhere we're not writing code for our own personal pleasure we're not writing as an artistic endeavor there are people in the world who do write code in that fashion but for the most part we're doing it as part of our job and I think that a lot of the time some of the disagreement that we have about how code should be written doesn't really account for most programmers who are writing code in a professional environment are trying their best to solve some problem in that professional environment they're trying their best to supplement some business process with that code and that they do deserve a little bit of the benefit of the doubt we shouldn't just bandy about things like you're overcomplicating it there may be a reason why the code was written that way we should be a little bit more sensitive about asking what these reasons might be that said I will say in these environments there always exists some business process and that business process is amenable to automation or it's amenable to mechanization and you could think in traditional manufacturing technology is all about mechanization you take something that a human being would do and you do it better with the Machine you can do it more accurately you can do it faster and this exists in the software world as well there's some process something that a human being is doing that the computer can do more accurately or faster or even in some cases there's a tension between the two of them the computer can do it less accurately but faster or the computer into it more accurately but slower and we could see that's that's a very common description for much of the work that we do we take that problem somebody described it to us they say well I have this particular problem and I don't want all of my I don't want to have 50 analysts you know building these reports every day I'd rather have one machine that builds a reports takes all the data in normalizes it and builds a report and so when we look at that we say how do we model this problem and this is one area which I feel in professional situations we put almost no effort into the actual process of understanding the model for the problem we're trying to solve because it turns out and I've seen this happen so many times it turns out that the world has in in the words of some postmodern theorists has an infinite profundity the degrees of freedom in the world and when you're especially when you're trying to solve a human problem a problem created by human beings have an infinite degrees of freedom so some of these problems that we run into our localization right how many times have you tried have found a very simple solution to a problem that assumed you know that each element of a string is a letter and that things could capitalize in a consistent fashion or how about date/time processing that you assume that you know one time comes before the other because they're lexicographically ordered or that you assume things like you know there's no random gaps in date times or that you can't add seconds here or add seconds there a lot of the times the problems that we're solving are human problems and so they're completely unconstrained by any logic or any reason and this is an enormous number of the problems were trying to solve how many people have tried to build a billing system or you know in the 90s the typical project was here let's build a payroll system and they suddenly realize that it's possible for you to have cycles in a payroll system where somebody's boss is their own boss or you can have all sorts of undefined behaviors and it doesn't bug the human beings who created that system but it makes for an infinite complexity for the people who are trying to program some solution to automate that system what we see is that we always have to have some model and that model always has to reduce the amount of information that can be expressed in the real world system so we take some very complex business problem and we say we can cut we can with a set of assumption that a set of axioms we can handle some subset of this in some reliable fashion that is tractable we can actually within our lifetimes write the code to solve this once we do that we encode that model in a programming language now it turns out that there are many layers of models and I would think that I'll give you an example for how this works when I was working in finance actually I still work in finance but I was working at big bank finance we worked on a lot of projects building large systems for trading desks I worked I always worked for the front office and I was always working attached to a trading desk we had to build some system and it turned out in that circumstance somebody was being paid money to pick up the phone and buy and sell things why they were being paid money I don't know but that was the business problem that we had and our goal was to automate that to apply computers to do something to the human beings camped and one very easy dimension of that then we could automate was a your mortgage desk and you're trading a million things a day there is no way that somebody can have on a piece of paper a list of all the things you traded and the profit or loss that you made and it's a very it's a seemingly simple problem for a human being to delegate that work to a computer which can process data in that form where it's mostly fairly regular and it's a fairly reasonable thing that you could say well in order to supplement this business process I need to have some reporting mechanism to do positions P&L and risk that facilitates the business process you hand that off to your development team and they go wild but in order to do that you have to make some simplifying assumptions and these simplifying assumptions are almost always wrong and over the course of a trading system that might last for ten years each area where these are wrong will always show up so an example from my career is we once had a system and it assumed that fixed income instruments settled in one currency so if you bought something in USD it would settle in USD so you're sorry if you bought if you bought some instrument and you paid American dollars that was the end of it and when it matured you'd get American dollars back and only about two or three years in we suddenly had some situations where you buy something in American dollars and it would settle in three currencies so you get a little bit of Canadian dollars a little bit of Japanese yen and a little bit of US Dollars and it completely broke the models that we had because we had some assumptions there there are other assumptions that we typically make that seem very reasonable so another one was the market value of a position is equal to how much you hold of it and how much a unit quantity is worth and that almost every trade system I've seen makes that assumption the market value is equal to the market price times the volume but there's an assumption there it's an assumption of linearity and at one point I had an opportunity to work on a desk that traded a very illiquid instrument it was a mortgage instrument and the more of it you bought the price changed so there was no longer a linear relationship between the market value and the amount that you held it may be that if you held a lot of it the price went up corresponding to that as you started to trade towards the extent the extent of the liquidity and all of these assumptions were never spoken about and never acknowledged and they were so core to the system that we were building that the moment we had a problem that broke this it would break the system we'd have to have some very Nyack's and I feel that the problem here was that we failed to acknowledge up front some of the assumptions we were making we failed to acknowledge the model that we had and we failed to say to some business leader that the system that we built made these assumptions if if the circumstances have changed and these are some just no longer valid we may not be able to use the system or we need some acknowledgement this system is no longer applicable now the solution to this and the last step just for completeness is that we encode this in Python and we may encode it using embedded DSL so for example in the case of pricing a portfolio one thing we might realize is we have many instruments in our portfolio and they may repeat so you might have CD s and CD s are bilateral contracts and so each CD S is kind of unique except they're standardized and so given the same parameters for the CD s the pricing should come out the same given the same recovery rate given the same survival curve it should come out the same and so you can avoid a lot of redundant computations by building this as a graph and doing computations only for unique instruments however you decided to define instruments and you might build a graph yourself and all of these big trading systems have some graph components somewhere and the ways that you encode this graph component can be wildly different so there are some large organizations where they build this graph as part of the modeling software themselves every instrument that they trade whether a bond or a CD f's will have some price method and that price method will call some other method we'll call some other method and the actual method chains in the software will be built into a graph that is then that you apply some algorithms to to cache and one up another organization I worked at it was completely different everything was built off of a price and so the price of something was just you pull some curve that represents its price and then behind the scenes there was a graph that would tell you if you change this curve in this way it triggers a change here that was a graph is a graph built in XML and it would it would represent the derivation rules for how a bump to a survival curve changes something over here it was exactly the same model but it was encoded in a very different way and everything that one piece of software could do despite looking very different the other piece of software they could do the exact same things and have the exact same restrictions but the encoding was different I think that in many cases the different encodings that we see amount to spelling they amount to you know we have some Canadians in the audience who have been gracious enough to join us and there are cases in which Canadians but and we have an Australia in the audience as well so there are there are situations where Australians and Canadians spell things a little bit differently than Americans and there could be an argument as to whether this is a better or worse spelling but for the most part it's just a different spelling there's really no unless somebody in the audience really wants to object are you or no you doesn't matter too much for a lot of dynamic languages and especially for languages orientate around protocols these different encodings these different spellings don't really matter that much so you can already see that we're getting a much richer answer to our question does code quality matter well well I'll show you an example of this and so this is a common example in Python Python is a very protocol based language and you can see you know people go wild about generators and I gave so many talks about generators and you could see you could write a generating this way and generate this way there's some subtle differences between the two but there were only real difference between the two of these is notational or I'd say the most major difference being the two of these is notational and these two different spellings for the same concept you could swap you could probably swap one for the other and it wouldn't make that much of a difference it turns out that this deconstruction of the question I think gives us a much richer answer to the question that is if you're asking about code quality in the case of how do I spell this thing well once you've once you've restricted yourself to answering at that level you can often make a very specific complaint about the code you can say well I'd rather do this because it's notationally simpler it's less code it's easier to read and if you understand what a generator is that's fine and it does require that maybe some newbies on your team who don't know what generators are might have to learn what generators are and they might have to go beyond what you know some very basic Python that they may have learned oh you could make an argument for this and you can say well I don't really wanna make an argument this but but you could you could you could potentially and you can make a very target and very specific argument that's better than well I wrote this one and go away versus you wrote this one and I don't want to know what a generator is I don't have I don't want to have to know that this is the point of this talk how do we talk about code how can how can we answer the question of whether code quality matters in a richer fashion how can we build a framework for thinking about this or we can answer this in a very specific fashion so it's totally tautological to just say well sometimes the matter sometimes it doesn't instead with some framework we can we can answer these questions more specifically so for example in the case where you say coke Ollie definitely matters without the right code quality my start-up couldn't have scaled to the millions and millions of users they have without having to hire lots of people well maybe that's a code quality issue that's not in terms of the spelling or the encoding in the programming language but it's a code quality issue in terms of the fundamental modeling maybe somewhere along the way somebody who made the right ones you say you're building a platform like Twitter you can think very easily that the right model for how messages move within that platform can lead to something that can scale very nicely and the wrong way for modeling that can lead to something that can scale very poorly we even see this in programming language themselves certain models for programming languages lead to languages that are very easy to optimize and certain programming models for languages lead to languages that are not very easy to optimize and at that level you could definitely say well the code quality does or does not matter definitely if you go one level higher and you talk about the business problem if you don't have a business problem then you know what you know whatever you do it's the weekends have fun if you do have a business problem and the business problem is get this get this code out say you work at a company like Microsoft and the thing is get version one out as soon as you possibly can no matter how good or bad it is then the end the questions already answered for you we have some people from Microsoft in the audience or if the problem is you know we're building something that's very important that's safety critical then you can see the right framework for thing yet about the problem can actually lead you to the answer much better than I think the way that we typically discuss things like code quality now I will say that the model that I've given you involves a little bit of sleight of hand I don't like sleight of hand even though you can see it here because there's a lot of presumptions for example I presume certain things about the model existing and the model leading to rather that the business problem exists in a certain fashion and it leads to a model a certain fashion if you look you think very carefully and kind of slice and dice the model I've given you or at least the framework I've given you can see a couple of ways where I've made presumptions and then answered it with that presumption that said the purpose of this talk is not to get famous it's not to be controversial it's not to be avantgarde but it's to give you a little dose of post-modernism to imbue a little bit more meaning and a little bit more thoughtfulness and to this very fundamental question of whether quality matters or not now one thing I never asked are never presented to you was whether or not the question itself is important and I think that by virtue of you all being here on Wednesday at 324 p.m. it's probably important enough for you unless the talks in the room kind of stink but I think they're great so it's probably a pretty important question I will say that here you can see this is maybe at least one of my clearest postmodern talks I add a little bit of a bond card in the beginning because I really hate giving talks I don't have any code involved in them that was the that was just an excuse but I hope you've enjoyed this and this is something where I'm very passionate about so if somebody asks you does code quality matter you can say yes certain a coatings are notation better than others you can say yes certain coatings proved to be poorly compatible with other coatings you can say yes certain coatings are redundant you can answer very specific questions and you can answer them given some framework for what does it mean to write code I hope you've enjoyed this talk I hope it's given you something to tweet about or tumblr about or Pinterest about or I don't know what periscope is I do know that tinder is it you can tinder about this so so if you're like hey what are you doing tonight you could say oh I'm thinking about a talk that I saw up high data that is appropriate you can do that but I hope you've enjoyed it I'm James Powell thank you very much what do people have nobody asked questions nobody ever has questions for talks okay okay okay come on time up shut up yeah yeah but that's not right right I mean there's actually many different ways to do things I think well I think what that means is give it give it give so I think I think I think actually given a specific model for the pond they're trying to solve there often is one obvious way so one common thing is let's say we're doing something manipulating paths on a system right the way that we typically do it is we might do it with just a string and you say you want to split up the components of a path you take a string and do dot split on a sledge and that's the one right way to do it if you're able to make the assumption that a path is just a string of characters with a slash in it right and then when you make that problem more sophisticated you say well actually I want to work on other platforms which they don't have a slash going in the same direction you might make a more sophisticated approach and say okay I'm use oh s dot path or path Lib or something like that so for the different layers of the types of problems I think Python is very nice at giving you a way to solve it with that assumption there was a there was a conversation in the hallway earlier about strings in Python and you could say for certain ways that you think about strings in Python the built in string might be appropriate if you think about it as some you know sequence of characters that you manipulate in a particular fashion just regular strings are enough if you need to think about them in a slightly more sophisticated fashion maybe you need to start introducing your own string library or own library there but target it on some model where you where you have knowledge of how simple of the problems I want to solve how easily do I want to solve it and I'll say that if you ever if you ever sit in any of the classes I gave it to a lot of corporate training you'll see that in the way that we present the examples will always present the example with the simplest approach first and there's usually one right way to do it and then we'll say well consider this this and this corner cases that expands the problem you're trying to solve and also expands the code Mike like cyclomatic complexity I can I could never actually figure out how to take coma graph and actually relate coma graph to cyclomatic complexity like you're just trying to so I will say this I think that I had I had an opportunity to look over a lot of metrics for software quality that I don't think anybody actually uses in practice things like measuring definition use pairs looking at you know definition clear paths some formalizations I had a textbook that I was going through and nobody seems to ever do that but I thought these might be interesting guiding you know interesting ways for us to guide or add a little bit of additional information I think one big problem that we have by the way code reviews are garbage most code reviews are most code reviews are just people expressing their own political will right like I like you so go ahead and our I don't like you or and don't go ahead and I think that as you start to do things like add metrics to that you get over focused on the metric I think the only way to do a code review is when both parties are equally invested in the code actually working and being in production and also being completed so it's not just keep submitting me the code review until you guess what I want but some who has anyone ever had the situation just keep submitting the code to me until I have until you're able to guess exactly what I'm looking for even even incorporate on developer right these could be guiding forces but it's very important to not let them lead you astray other questions now this is the best question answer ever had I need to give more so within the degrees of freedom that grammar provides you write publication has this for the longest time the various parts have had style manual you know the rest of the universe do it you will but for our purposes for what we do and how we think about things you know I I think that's a fantastic and very specific question one thing that I'm personally very averse to is talking about computer languages in the same way we talk about human languages I think human languages are a lot richer and a lot more complicated than programming languages are even though sometimes we can talk about fluency and both and things like that I feel we're underselling human language that said that said pep 8 everybody loves to like install flake eight and say oh this is not Pepe compliant but the thing is in my view pep eight is how you want to write code to satisfy somebody who's approving your code that's part of the C Python standard library approval or code review mechanism and so I agree for things like coding standards coding standards is if you're an engineer your job as an engineer is to subordinate yourself to some higher purpose some business objective right so I have my own way of writing code I don't use spaces what intent this commotion came up during PyCon Canada during one of the keynote somebody asked who uses to space and uses for spaces in my own code I use taps tabs for indentation and spaces for alignment but if I were to submit a patch to see Python I would do exactly what I see around me because I care more about getting the patch in than the specifics and so something like a style guide I think is a very appropriate way to draw the boundaries for we know you have your own opinions we know you're an individual but this is what we've decided commonly as the way to do things other questions Aaron if you're designing an API you mean the the quality of the API or the discoverability mm-hmm so CPC Python is a perfect example of this the C Python source code is very easy to read through and understand yes but what I think what you see in practice is a data scientist will look at a program exercise and they don't care about it it's not like they don't want the code to be good it's just that they care more about the thing that they do which is data science and a programmer like a data science will say well who cares if the code sucks at least it works we'll just get some programmers to fix it up and a programmer will look at the code and we'll look at look at the same project and say well the code so it's most important the data science I just pull out of a book I just from scikit-learn import model model train and I'm done right and so there is this contention between the two of them but I don't think that one side is necessarily right or wrong I think there's some very unexplored assumptions behind the two you're right in some specific cases though when you're talking about things like ap is then the design of the API matters but I would say that if you're talking about things like Python for humans and api's like that just as important as the way that the API looks the way the function calls are named is the fundamental model the API exposes which i think is something that people consistently ignore because who here has worked at some organization where you had many different asset masters or corporate act corporate actions databases and somebody said well can't we have one central repository for all the things that can happen to a company and you just query it and there turns out there's some very large information companies in the finance industry that are trying to build like one standard corporate actions database for this company split into these companies in this fashion but what they seem to fail to realize is the way that we want to deal with that information or tractable fashion is usually specialized to a problem we want to solve and so the reason we have this proliferation of these different asset masters is that sometimes what we consider to be a fungible asset is different sometimes the fundamental model of what we talk about is different sometimes to bilateral contracts with the exact same terms are the same for example in the case of pricing and other times they're different if you're looking at them in terms of different position line items so api's I think the API problem is a lot more subtle than just did you use a decorator rather than wrap it yourself or did you use four spaces I think we're out of time but I'm I'm glad that you enjoyed my talk I really enjoyed presenting and please stick around for lightning talks we have we have some really good lightning talks thank you so much you
Info
Channel: PyData
Views: 20,368
Rating: 4.6180372 out of 5
Keywords:
Id: QuTmLeWL3C0
Channel Id: undefined
Length: 50min 26sec (3026 seconds)
Published: Fri Dec 04 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.