Unit Testing for Data Scientists - Hanna Torrence

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone so today we're going to talk about unit testing for Deva scientists and as we just said my name is Hanna I'm from Chicago I work at shopper in our shop Brenner is a chicago-based company ecommerce company we like to refer to ourselves as the Amazon Prime for everyone else so we work with a network of partners of retail partners to whom we provide a bunch of kind of data-driven tools for to like help them better connect with their customers and then we have a membership base that we provide sweet benefits like these like free two-day shipping and free returns and deals across this really broad network of partners despite our name we don't actually do any of the delivery or logistics so we do particularly on the data science team we spend a ton of time digging into this really rich really cross network data set which is a lot of fun if you're interested in shopping I have a bunch of free membership cards you should come see me after the stock all right so we're gonna talk about unit testing today everybody's super excited right you know testing is so much fun I hope you're as excited is this cat I like cats there are a lot of cats in this talk and all of the there's gonna be a bunch of code examples all the code examples are in a github repo that I will share at the end and these slides will also be in that rebuilding later today so testing has a whole bunch of jargon that if you're not in software engineering you may not be super familiar with these are some of the terms today we are just going to worry about unit testing so there's all other types of testing that we're gonna put to the side we're gonna talk about unit testing so unit testing is testing a single piece of code in an isolated context so we're not gonna worry about how everything please together but we're gonna make sure that each piece is doing what we wanted to be doing well twisted code has a bunch of benefits it's really nice to work at a code base that has a lot of good unit tests it makes you it makes it super easy to find bugs just very nice always better to find them earlier rather than later it makes it much easier to aerate you're not scared to make changes in a code base that has really good tests because if you break something you're gonna know and you're gonna get to make conscious choices about what to do about that it makes it much easier to debug things if something goes wrong and you don't know what's happening you have this nice suite of like isolated pieces that tells you where exactly something work and it pushes you to actually design better code so if you're writing tests for something and it's really hard to isolate pieces and it doesn't all fit together that's often a clue that your code is not structured as well as it could be and if you write code with the knowledge in mind that you're going to be writing tests for it you will write cleaner code but basically unit tests give you confidence that your code does what you think it does my favorite and least favorite thing about computers so they always do exactly what you tell them to this is a great way to make sure that what you're telling them is what you think you're telling them oh so these all seem pretty nice so why doesn't everybody write a whole bunch of unit tests we just list it off a bunch of great benefits but it's Peeta scientists I bet many of you in the room have written a fair amount of code and not unit tested it so why not learning to write good tests is definitely an investment so it takes time to learn to do it well and it takes time to actually do it but if you're ever gonna have to maintain the packages that you write which if you want them to be used anywhere you're gonna have to maintain them means that this investment investment is worthwhile and you will gain the benefits over time of having good test Suites built out another kind of tricky part as data scientists is that data science work follows a lot of different patterns than pure software engineering does so when I first started as a data scientist I was coming out of academia and I had vaguely heard of a lot of these terms before but I never actually written a unit test and I think that's pretty common I think a lot of us come the APIs like this is awesome field and that it draws people from a really wide breadth of fields many of them academic and many of them not super well-versed in this kind of software engineering core skills that the people who maintain your websites at companies and whatnot are so as I started reaching out and trying to learn how to do this better I kept coming up with there's a super wide variety of resources I kept coming up with well-written blog posts and example sets and all of this but they were all tailored to a slightly different approach than all of the code that I was trying to write and thus all the code that I was trying to test so I'm here today to talk about specifically like the patterns that data scientists work with a lot and how we can test them well so in theory we have workflows that look something like this you're eating a bunch of data we build a bunch of features we build a bunch of models we hand results off to someone we evaluate the results do some stuff at the end looks pretty straightforward but in practice particularly as you move towards kind of production systems you know things that look more like this this is for example what a lot of our workflows look like a JEP retur where you have a bunch of pieces and are playing you're reading data in you're writing it out you're calling api's you've gotten on non-deterministic things in there this quickly gets overwhelming and this makes you just not want to write tests at all but that is a bad idea and you will regret it later so why do we start so when you guys start writing unit tests whether you're working with a code base that is mature and in existence but has never been unit tested or you're starting to write a new package from scratch you want to start really small you want to start with a particular specific functionality that you want to verify is actually happening the way you think it is you want to use all the available tools many of which were going to talk about in a moment to get everything else out of the way unit testing is all about isolation it's all about taking this one defined functionality and ignoring everything else and saying is this piece working the way I think it is and then you do that for all of your pieces eventually and you've got to gain a lot of confidence in what's happening overall if you have an existing code base don't try to write tests for it all at once you will get overwhelmed it will be frustrating and you will give up and that's really unfortunate you really need to start small you say okay we on the data science team at shop Runner we have only had a data science team for a couple of years so we've worked mostly with new libraries that we've written but we have other teams at chapter who work was really extensive legacy code bases that no one entirely understands anymore and in those cases they're kind of slowly working to build test coverage by writing tests on the pieces as you work with them so if I am need to add something to this little piece of the code base I'm gonna add tests for that little piece of the code base as I work on it instead of sitting down and trying to test this enormous monolith so they don't really understand and then this is a super vague guideline and I'm upfront about that but right test as early as you feel like they are invaluable so in a data science workflow oftentimes we start out in notebooks we do some exploratory work we play around with stuff it's not super fixed writing unit tested that stage is not really useful but you don't want to sit down with a system that is about to be deployed to some production infrastructure and say oh but I need to write tests because at that point it's still useful for the future but you missed out on a lot of benefits along the way so a good rule of thumb that we use your mileage may vary is to kind of as you start pulling things into functions and pulling things in the classes and organizing your code into libraries it's a really good time to start writing unit tests - okay so now we have kind of the idea of unit tests how do we actually write them so PI test is one testing framework for Python there are many many of them are good I like best because it's both highly configurable it's super flexible and there's very little boilerplate so it's also super simple to get started with and this is like it does the balance of those two really nicely my team will also tell you that I get really excited when I get these like green lines to appear on my screen if you write a bunch of tests you will learn that - this is a bit of a side note but if you are using PI test these are super helpful flags for a PI test that help you get a lot more useful information out of the test runner so this is more for reference if you're interested but super helpful okay so now we know something about unit tests we have a framework let's write some code so we have this super useful function that adds a column to a pandas dataframe all of my functions that I'm testing in this talk are not functions I would actually advocate that you write because they are extraordinarily simple but they kind of show they're just gonna give us a thing that we can test to mimic some of the patterns that you see in data science work so we have this add column function we have a data frame we have a column name we have a default value for this column name and then we're gonna make a new column and assign it that done cool no this is our first unit test so this is an example of a high test unit test it's just a Python function we're gonna give our function a name it's gonna start with test underscore which is gonna let PI test find our tests so pi test does really sweet test discovery like a lot of the testing frameworks do where if you name your tests in a consistent manner when you just run high tests it'll go and find them all for you and run them all for you we're then going to define the input to the function that we want to test so in this case we're setting up a data frame we're gonna call the function that we want to test we're going to set the expected value but we think we should get out of this function and then we're gonna check that the value that we got was the value that we expected super simple this pattern this kind of setup calling the function setting what you expect to happen and then asserting that what you expected to happen actually happened is basically the pattern that all of your unit tests are going to follow they're gonna get more complicated but this same structure is a good way to think about the pattern okay so we tested our super simple function we know that it works that's awesome let's now make our tests a little simpler and give us some tools to expand to test a bunch more things went I got fixtures so fixtures are a special type of function that high test keeps track of in order to let you safely share resources and definitions of resources across tests this may sound a little vague we're gonna give some examples in a moment if you've used other testing frameworks before they're basically a really modular way to do setup and teardown methods so this is defining a couple of new fixtures you will recognize these data frames that we just defined in the test before we have a data frame called DF sorry we have a function called DF which returns a data frame and we have a function called DF with column D that returns another data frame that has a column D so we have this normal Python function that returns a value it can do whatever kind of stuff you want a Python function to do sorry and then we have a decorator that tells PI tests that this is indeed a fixture there are a couple of ways that you can get PI tests to find your fixtures if you define fixtures in the same file as your tests it will just find them that's fine as you get larger and larger test Suites that gets more and more complicated and as usual you want to start separating stuff out into different files we often have a fixtures file or folder depending on how large a library we're talking about and then you have this contest PI file which is part of PI test setup that you import all of your fixtures into and then PI tests knows that these all these fixtures exists and we can do cool stuff with them now so now I'm gonna write it test using these fixtures you will notice that this test is very similar to the test that we did a few moments ago except it now it looks way simpler so we have these two arguments to our tests now you will notice that these the names of these arguments match the names of the functions on the other side this is not my accident and this is important because PI tests says okay I have this argument to this test I'm gonna go and look at all of my fixtures I'm gonna see if I have a fixture with that name if I do in this case we do we found this DF fixture we're gonna run this function execute any code in it and we're gonna return the value that this function returns and funnel it into this variable that is an argument to the function so when you call when you call this test case on the right you get an argument that's a date that's called DF that contains this data frame with columns a B and C similarly the same thing happens with DF with call T and this data frame gets passed into the function as well and so all of our setup work in this test has now been pulled out into these fixtures and all we need to do is call the function and make assertions that we care about this is super useful in that it lets you use the same data sup across multiple tests if you were to try to do this with just like global variables or something you run into all sorts of complications where you try to edit something and you change it and now your next test gets a different value and you can either break things or think that things are working when they're not super easily that way so fixtures are super nice safe way to handle all that without having to worry about those dependencies there are also built-in fixtures that high-test comes with then some of these are kind of esoteric things you're not like who do you use that much and some of them are really useful caps this is one that I use a fair amount it is a fixture that exists that captures everything written to standard out or standard error during the execution of your test and if you give caps as' as a variable to your function remember PI test will look at that variable say okay this variable is called cap sis I'm gonna go look for a cap sis fixture okay I have one so I'm going to capture everything that is set to standard out or standard error and I'm gonna return it in this variable or this object and then you can do this readout error function so in this test example function one is literally just a function that prints out inside function one and here we can assert that that is in fact what is happening this is often helpful if you have kind of modeling jobs that print out debugging stuff in them like print out information about what's happening this is a good way to verify that you're printing out the things that you think you are which may not feel feels super critical to the functionality of what's happening but is super critical to understanding the that the things that you are using to debug your code are actually real another useful side note you can run high test - - fixtures will list out all of the available fixtures so this includes the fixtures that PI test defines itself as well as any that you have defined in your own code with their doctrines so just a nice way to check that everything's being pulled together the way you think it is cool so we have to find some fixtures but we do find super simple ones fixtures in and of themselves are actually really flexible you can do all kinds of stuff with them the kind of most basic thing to do is to just return on value like we did with the data frames a couple slides ago but they can actually do way more stuff than that so you can define the scope of a fixture so this means that the your fixture function might be run by default is run at the start of every test but you can also define it so it's only run at the start of every session or every class or a different bunch of different levels you can compose fixture so you can have fixtures that depend on other fixtures you can execute custom teardown code when the fixture leaves the scope so you can have a fixture that needs to be torn down at the end of a test and have that all happen automatically so this is the teardown side of the setup and teardown methods you can also access the kind of text context when you're in the fixture basically this is a way to let you pass parameters from a test to fixtures and then you can actually parameterize the fixtures to depend on different variables it may not be obvious why you want to do many of these things but there cases where they're useful this is one case where they're useful we read a lot of theis part code a trap runner so we then want to write tests for this code we want to run the tests on a single machine we don't want to spin up a spark cluster to run our tests so you can define a local spark session which is what we're doing right here and this is also then showing several of the flexibility features that I just mentioned so we're defining the scope the scope of this fixture is the session so that means when you run PI tests everything that happens after that is one test session so what this does is it's going to create a spark session and then it's gonna run all of my tests in it and then it's gonna tear down the spark session this is useful because it takes a while to spin up session so you don't really want to do it look forward after every single one of your tests because then you spend most of your time creating and destroying SPARC sessions and very little of your time actually testing your code so I've talked about accessing the test context that's what this request argument is so the request argument sends a bunch of information about what's happening in the test case to the fixture and then the fixture can access it like it's happening here with this request add finalizer which is one way there are a couple to add custom teardown code so this says that when I exit this scope so in this case my scope is a session so when I'm done with the testing session I'm gonna call spark dot stop and that stops the spark session so I don't just you know have a spark session running and terminate on my computer so this is a one example of like a thing then we want to do a lot that is really easy in this in this setup because of the flexibility of these problem fixtures one other thing I mentioned is dependencies so if we're testing spark code we're gonna want fixtures that instead of being pandas dataframes our spark data frames so but in order to create a spark data frame I need a spark session so this is a another fixture that depends on the first fixture so it takes the spark session defined in the first fixture and it uses it to create a data frame that it then returns so then I can reference this spark the a fixture in my tests and be good to go so fixtures do a whole bunch of things if you need to set something up and you're not sure how to do it definitely go look at fixture dots because they do all sorts of crazy things that I had no idea they would do when I got started another super useful tool that you may have heard of is mocking so mocking is like a general idea that's used in all sorts of kind of programming languages the name of the library in Python that does mocking for Python so some Python specific and some not there so we're gonna talk about we're gonna give a given example here just think about why we might want to mock sings so I met everybody in this room has read from a database in Python code before it's a thing that we do all the time because our data often lives in databases and we might have a function that looks like this so I wrote a super general generate features function take some credentials do some stuff then I'm going to create a sequel alchemy engine and pass it to read panda's read sequel function if you're not familiar with this set up it's basically just a way to connect to a database and execute a sequel query and get a data frame back and then I'm gonna do some processing on that data frame generate some features return this features function so this is one snippet in a larger function and I want to be able to test this function I want to know that my future generation code is doing the right things but when I'm running my tests I don't want to actually access the database maybe I'm running them in a different setup right can't actually connect to the database maybe I just don't want to touch production servers from my laptop because that's scary all sorts of things so we want to be able to get rid of this so just like checkout out of the way this database access so we want to replace this connection with like a fake thing so it looks like a connection but doesn't it actually connect to anything and then we want to be able to return some data from this read sequel function because if we don't actually return anything then the rest of the code is gonna fail we don't want that we want to know what happens in the rest of the code but we when I just kind of short-circuit as somehow so this is what Marco's you do so this is a test using mock there are a couple of pieces here that are doing some interesting things so you will notice these mock dot patch decorators on this function patch is a method of mock of like a function in mock that does this short-circuiting part so it's it goes and finds the place in the code we're using this function that you don't want to actually use and it sticks something else in there instead and then there are mock objects as well that usually is the thing you stick in and we're gonna talk a little bit more about mock objects in a moment but this mock dot patch essentially goes and grabs this create engine function and this reads equal function sticks this mock object in its place and then I can call my generate features function and as it runs through it instead of actually accessing the database it's going to make those same calls to these mock objects instead which aren't going to do anything except what I tell them to so I have said here I have this read sequel mock we're going to talk more about the logistics of how all of this gets passed through in a moment so don't worry too much about the details but I'm setting a return value here as a data frame so this means that when I call this mock which is going to happen because it got patched into place instead of not having anything to return because it didn't actually talk to the database I've given it this data frame to return so now I can run my test and my tests can run generate features and now it has a data frame to play with and I can make sure that the actual features that my generated feature function return' are the features that I would expect given DF as the input data frame as like the input data so we talk a little bit of the specific mock objects we have this really awesome mock stuffed animal over there and so a mock object is a funky thing it's just a fake object but you can stick in a spot that does basically two things and that's let you get rid and and the point of those things is just to let you cut out dependencies so we talked about unit testing being testing things in isolation and this is a really powerful tool to isolate things in complicated systems so things you might want to mock as we just talked about database reads and writes probably don't to actually do that during testing so you can mock them away API calls probably don't to actually call the api's so you can mock them away and then just in general external functions that you don't care about testing that aren't part of this one specific piece of functionality that you are currently trying to test so often you will have a system where you will mock part of it away while you're testing this piece and then in the next test you're gonna test this piece and mock the first piece away so it doesn't mean that you're never gonna test this piece of code but it's not relevant to the current thing that we're trying to understand so unit test up mock these pythons unit testing library it's part of standard Lib in Python 3 if it's if you're using Python 2 don't but also you can import mock as a library from from pi PI it's back ported to Python 2 as well so pythons mock library has a variety of mock objects in it but almost always what you want is the default magic mock object and these objects think of an interesting ways you can alter their behavior in a bunch of ways but by default they accept any call that's made to them any attribute access any method access they don't throw any errors no matter what kind of call you make on them and every time you make any sort of call the return value of that call is a new magic mock object and similarly anytime you access an attribute of a magic mock object what you get that attribute is created as a new magic mock object and then all of these magic mock objects record everything that happens to them so each magic mock object records any call that's made to it and the arguments of that call so they're really useful for assertions done so this is just me playing around inter apple with magic mock objects if you're not familiar with these objects if you haven't used them a lot I highly recommend this exercise of just trying a bunch of things seeing what gives you errors seeing what return values are it's super useful to just get a sense for what happens because it's not very intuitive they're very strange objects so the details of what I'm doing here are less important than just kind of this idea that it's really useful to play around with a little bit so I object doing kind of two things one of the things that mock objects do is they like provide actions they have they return stuff when you call them as we just said by default they return new magic mock objects but it's not always what you want right I like in the database read function that we just talked about we wanted to actually return a data frame because we wanted to know what would happen to that data frame and the rest of the function so there's a couple of ways to get your mock to give stuff back the first way is a method called return value this one's pretty straightforward you give it you set a return value and then every time you call that mock you get that return value back so if I set my return value to be a list of three integers every time I call that mock I get back a list of three integers then the other way to get things back is this side effect method this side effect method is very unintuitive to me but it's super useful and you're gonna use a lot there are kind of three main ways to use it the first one is the most common so let's say I have a function that gets called several times in the function that I'm testing and I want to mock it but I want to I need a different return value every time I call it you do that with side effects so you consent side-effect equals any kind of iterable and then each time you call the function you get back to the next thing in your interval so the first time I called this mock object I would get back first object the second time I called that mock object I would get back a second object and so on another thing you can do is set the side-effect to be an exception and then any time you call this function you will get you it will throw that exception it will raise that exception this can be useful for testing error handling paths in your code I don't use it all that often but that's that's one thing it's useful for then this next one so these are all side effects it's the method for all of these it does a couple different things the last thing it does is you can set it to be another function if you set your side-effect method to be a function then every time this mock object is called the side-effect function is also called with the same arguments given to the call to the mock object and then if that throws an error if there was an error if it doesn't it returns the return value so here I have a lambda function that checks if that takes one variable and checks if that variable is even or odd and this is just some modular math if you're not familiar and just returns the string odd or even depending on whether the int given was is odd or even so now after setting the side-effect if I call this mock object with one argument that's the integer one I get back a string that said it's odd if I call it with the integer xx I'll get back a string that says even if I call it with three integers I'm gonna get an error because that doesn't match this the signature of the side-effect method this is the functions are easy to do weird things with I only ever used them to do very happy things that I would not really recommend but they can be useful in certain cases okay so this is a bunch of actions that you're mocking take the things that your mock can do in a test function the other thing that we care about is the assertions that we can make on the mock objects when we're done with the test so you mock some stuff you patch it into your the functions that you're trying to test you run your function that you want to test and then you have these mock objects that have recorded everything that's happened to them in the tests and so sometimes you care whether or not something happened to them so a good example here is if you're mocking alright to a database you might well want to mock so you do want to mock that right and then you will want to assert that you wrote the thing you intended to write so you want to check that whatever your right database function is got called with the data frame that you are intending to write to that database so there's all kinds of assertion methods this is a small subset of them so you can assert that a mock was called at all you could assert that it was called exactly ones you can assert that it was called a certain arguments you will also sometimes get into cases where the precise thing that you want to assert is not one of these methods and in that case this call args list method is super useful because it gives you back a list of all of the calls made to that object and then you can do whatever person on them whatever checking you want to do so with a little less clean but a lot more flexible one morning you weren't enough be weary of typos here remember that mocks accept any call made to them with very few exceptions so this first case assert called once is a real mock method ascertain that will check if that mock was called once and if it was then it passes if it's not throws an exception this next one maybe I forgot exactly what this interface was what this API was and I called mock dot called once this is not a method that exists in mocks library but it's not gonna throw an exception because mocks accept all calls made to them so you will instead get back a magic mock object it will continue on its way which means your tests will never fail which means you think you're testing something but you're actually not testing anything at all so it's a good thing to be be wary of they have in recent releases of mock made some concessions to this and they've made it so that any method starting with assert if it's not actually an assertion will throw an error so if you're bad at type in like I am and you call a method assert called but you spell it wrong then that will throw an error so they've started to help it you out a little bit but it's still something to be wary of mocs also have some like particular gotchas one is where to patch an object so when you call Mach that patch you give it a string that's a path of Python essentially Python imports but it's not what you think it's going to be if you're not used to this framework and if you don't think super carefully about how Python imports work so if I want to mock panda like pandas not read CSV call in my function I might think that the second way is the way to do that I want to mock pandas read CSV so I'm gonna call mock on this string pandas dot read CSV that's not gonna work instead you have to mock the object where it's used not where it's defined so in this first case in the with the green check mark here I have a module called PI test examples a file called functions to test in functions to test I import pandas as PD and then somewhere in that file I say PD read CSV so at the point in time where mock can stick my mock object in where it's supposed to be um pandas can replace these objects I'm sorry I am overtime I apologize the other mod is important the other Gacha is import order you can pay attention to these things in a future moment these it's a it's a good thing to check out the the order of your mocs is opposite of what you think it should be when you patch things though these last are just a couple of resources these are a couple of useful libraries you should check out and then these are some useful blog posts to read this bitly link contains up further and mark sense of list and this is a link to the repo that has all of these examples and will also have these life later on qualities from running over if you have any questions talk to you later [Applause]

Info

Channel: PyData

Views: 12,856

Rating: 4.8481011 out of 5

Keywords:

Id: Da-FL_1i6ps

Channel Id: undefined

Length: 39min 11sec (2351 seconds)

Published: Fri Feb 01 2019