TDD Full Course (Learn Test Driven Development with Python)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone wes here in this series we're going to be building a complete web application from scratch and we'll be following a practice known as test driven development [Music] now i'm sure you've heard of tdd if you're watching this video it is essentially the practice of writing tests before we write production code and this was an idea that was really championed by one of my developer heroes kent beck along with his ideas about extreme programming and i believe he described it as a way to essentially manage fear when you're coding and the idea is that we are going to make assertions about what we would like our code to do before we actually write the implementation in production code to do it so if you're interested in further reading about this directly from kent beck i would highly recommend checking out his book test driven development which i know is incredibly popular and others have also written about it uncle bob in the clean coder talks about its benefits at length and i believe there's some of it also in clean code by uncle bob it has also been kind of a controversial topic in the past i know that some people really see a benefit in writing tests before they write code whereas others say as long as you write your tests whether it's before or after the code the important thing is that you really have a way to exercise the production code that you write but i think that the general consensus among experienced programmers is really that tdd is an excellent practice to adopt for ensuring that you're writing testable well-designed code now i do want to kind of make a little bit of a distinction here i think that while tdd is extremely useful as a practice in writing testable code and building designs that you can test it isn't going to make design decisions for you per se so while it may be the case that you can write bad code with tdd i think it's just far less likely because in order to test something in code generally you need to be able to isolate it you need to be able to assert something about the functionality of a piece of code independent of some particular state or at least with state controlled in such a way that you can say given the state of the world is a certain way then the piece of code that i've written should behave as expected whereas if you kind of take an approach of writing tests later maybe after you've written a lot of code and there's a chance that you will have written some code throughout that process that's going to be really hard to extract and isolate without calling perhaps many other functions or methods or setting up a whole lot of state just to be able to test what you've written so for that reason alone i think tdd is a really good practice to at least learn and maybe practice for yourself and see if you think that it helps you one of the other benefits of tdd is of course the fact that the code that you produce when you are doing tdd is covered by tests and so you can at any point when you're working on your system run tests that make assertions about the code that you've already written so in that sense you can change code and you can extend code with much more confidence than you could otherwise because if you can actually test and make certain assertions about how your code behaves as a result of having written tests for it as you wrote that code then then i think that's clearly a benefit you end up with a whole bunch of unit tests each of which explicitly state the expectations that certain code has so in in a sense it forms a living documentation for your code base so as long as those tests are green then anyone can go into the test suite and take a look through at all the tests methods and get a real sense for what the code is actually doing and this is going to be far more valuable than static documentation that's written at any point in the software's lifetime and that's because as long as those tests are passing and the tests are written properly then you can be certain as a developer that those tests clearly document the working state of the code okay so now i want to talk about the three rules of tdd the first rule is that you're not allowed to write a line of production code before you write a failing test the second rule is that you're not allowed to write more of a test than is required to fail and not compiling counts as failing and then the third rule of tdd is that you are not allowed to write more code than is required to pass the failing test and so what happens is you go through this loop of writing a failing test you see the test fail you read the exception you see why it fails and then you write a very small amount of code just enough to get that test to pass once that test passes you can then go and clean up and reduce duplication do small incremental refactoring and then continue on with that cadence write another test making an assertion about the system see it fail write just enough code to make that test pass do a small bit of refactoring and sort of work your way in this cadence through changes to the system if you've never done this before it it may seem incredibly foreign and it might feel a little awkward writing a whole bunch of failing tests it might even feel awkward being in this loop where you're actually executing your code every 30 seconds or so you're you're compiling your code if it's a compiled language you're making sure that it compiles you're making sure that the functions and methods that you call are returning the types of results that you expect and effectively you're exercising your code as you're writing it so rather than spend hours and hours writing code only to uncover maybe a dozen or two dozen bugs after several hours of coding and then going back through and trying to fix all of them you are essentially fixing everything that is immediately broken as you work and one of the side effects of this is that as i mentioned earlier you end up writing code that's really loosely coupled because if you want to test the behavior of something again you want to test it in isolation and so it sort of really plugs into a lot of ideas about clean code things like dependency inversion come into play here single responsibility principle and and many other principles really you find yourself really understanding what your code does and um and writing it in such a way that it's really easy to compose more complex functionality as you build functionality into the system so we're going to kick things off here we're going to take a look at the application that we'll be building in this series we're going to start from scratch so we're going to write every test from scratch and we're going to look at the kind of cadence that we can get into when we do tdd and we'll demonstrate here how it can be used to actually build real world applications so i hope you guys enjoy this series thanks for watching if this video series is useful to you i would really appreciate it if you liked and subscribed and with that let's go ahead and get started okay so let's take a look at the application that we'll be building the backend for in this series we have a named entity finder and the way this is going to work is we're going to type raw text into our search bar so youth polgar is a hungarian chess grandmaster from budapest and when we select find named entities we get in response two different visualizations first we have the sentence itself parsed and we can see the named entities highlighted in that sentence then we also get a table below with the entity and their particular type and so we're going to be building the back end for this where we receive this raw text at an http endpoint and then we parse it using spacey which we'll look at momentarily and then we serialize that data and send it back to the client so that we can construct this front end and we'll be doing all of this using a test driven approach so we'll look at how we can get into a rhythm with tdd so that we write our tests first and then we write the minimum amount of code that's required to get our test to pass and then maybe we do some cleanup and then effectively we just repeat that process until we have the fully tested features that we would like to spec out for this particular application so now quick look at some of the dependencies before we start it would be good to install virtual env so we'll be using python and with virtual env we're going to be able to scope our project dependencies to this particular project as opposed to installing the dependencies global for the particular python interpreter that you have on your computer so anytime i make a new python project i'm typically going to be using virtual env to isolate the dependencies just to that particular project and so i'll paste links to all the links that we visit here in the description but you'll need to install virtual env to set up a virtual env for this project next we're going to be writing end-to-end tests as well as unit tests so with the end to end test we're going to be doing some browser automation with selenium and for that to work we're going to need to choose a particular browser that we'd like to automate and for the purposes of this video i'll be automating firefox using the geckodriver and so head over to the link for geckodriver to download the latest binary if you'd like to use firefox for the end-to-end tests and if you'd prefer to use something like chrome and you can pick up chrome driver as well alternatively either one is definitely going to be compatible with selenium okay next i mentioned spacey we don't need to download anything here yet we'll be installing all the python packages with pip but i did want to mention the website for spacey it'll give you a really great introduction to the library it's a really powerful fully featured natural language processing library for python and we'll be using it specifically for named entity recognition but it does much more than that it's a fully featured natural language processing library so we can do anything from tokenization and part of speech tagging and build all sorts of interesting natural language processing pipelines and so it's a really fantastic library i've used it at various jobs and on personal projects for nlp tasks and it's the library i would definitely go to to use for this type of application next we'll be using flask which we'll also install with pip but i did want to point out the flask website here if you want to get more experience understanding what flask can do we're going to be using it to build a few very simple endpoints we'll be building a get endpoint and a post endpoint so that we can effectively get the page serve up some static html and then make post requests in order to post the data to our app which is then going to serve us back the data we need for our feature here and just to try out another few things here just to demonstrate that it works for sort of arbitrary text that we input mumbai is on the west coast of india and so it does a pretty good job in general with the language model that's loaded for our purposes here of identifying named entities so this should be a lot of fun all right so let's go ahead and get started the first thing that we need to do is make a new directory for our project and i'm going to call this directory flask ner and then we're going to move into that directory and it'll be here that we create our python virtual environment so we can run that with the command virtual env env p python3 and so that's going to create a new python3 virtual environment in this directory now we can activate it by running the command source and then pointing to env slash bin slash activate and now we're ready to install our dependencies so we'll pip install spacey selenium pi test and flask and this will just take a few seconds once this is complete we're gonna run a script python dash m spacey download encore web small so this is going to be the small language model the small english language model that is for spacey and it'll be the language model that we're using for our app so you can see that we'll get a message here that we can now load the model via spacey.load encore web small and we'll take a look at that when we actually start writing some code the next thing that i'd like to do is to create a directory structure so right now we just have our env virtual environment directory so we'll make a few directories here static will be used to serve up our javascript and css templates will be used to serve up our index.html flask template and then test will be where our unit test and end-to-end tests are located finally we're going to edit a new file called setup.pi and this is going to allow us to actually install our current directory as a package so from distutils.com import setup from setup tools import find packages and then here we just need a call setup and we're just going to pass it a few arguments the name of our app which we can call flask and er the version which i'll say is 0.0.1 and a description a simple ner api okay so we'll save this file and then we'll run the command pip install dash e dot to install this directory as a package and dashi means editable so we'll actually be able to edit the code in this directory and it will continue to be treated as a package without needing to sort of reinstall it if you will so let's start our tdd approach here with some unit tests so i'm going to move into the test directory here and in any editor of your choice just go ahead and open up a new file we're going to create called test ner client dot pi so here we're going to import unit test that'll be the library that we're using for our test runner and we're going to create a new class called test ner client which extends unittest.testcase and then when we write our unit tests they'll just be methods on this class and this class is an instance of a unit test test case base class and so this provides us some helper methods that we can use to do things like make assertions so let's think about the business logic here that we want to exist so we don't have any code yet whatsoever and we're inside of a test here so we need to think about the behavior of this ner client object that we're going to use to extract some named entities so the way i'm thinking about it is that we will have this object which maybe has a method on it that receives a string as input and as output it's going to return some type of data structure that contains our named entities and the sentence for instance however that's going to be represented so we're going to write a new test here and we're going to sort of make just to get warmed up maybe the simplest type of assertion we might make about such a client and its behavior so let's imagine that we give this thing an empty string we would still want it to return that data structure and maybe it just doesn't have any data on it so let's just imagine that this data structure is a dictionary and so our test might be some method getents returns a dictionary given an empty string as input so nothing more or less than that but we're going to code by wishful thinking in the sense that we're going to imagine that we had something called a named entity client and we were able to get some entities from it by calling ner.gets for instance in this case we're going to pass it an empty string i'll call it string and then we're at least going to assert that what we get back here is a dictionary so we can say self dot assert is instance ants which is our result that we get back and the python type is dict so right now i'm kind of imagining a structure that looks something like this just a python dictionary which has two keys on it and send html and ans is just some list of another dictionary perhaps or maybe an object type which has some properties on it like the type of named entity and the text for the name entity and then the other key in our dictionary will be the html that we can output to our client to do the labeling of the named entities in sentence format and we'll do it like this because i know spacey has the ability to give us html like that and so we might as well leverage the ability that we have with spacey to do that the entities or the ants key here which contains a serialized list of and says we would like to display them for our table will be useful as well so we'll imagine a structure like this getting returned from ents and so we'll keep this in mind as we think about how we might want to build the behavior for this ner client okay so all we have to do now is we want to run this test and i'll be running the tests within vim using a plugin i'll put a link in the description if you're a vim user and are interested in unit testing in vim that makes it really easy i can hit my leader key and then another key on the keyboard and just run the current test under the cursor which you can see is currently failing but if you're just working from the terminal we can come back here and what you can do is run python dash m pi test and this will also search i believe recursively through the directory and find all the unit tests so either works so right now as you saw the test is failing so let's examine it so as we do tdd when we when we write our tests our tests are going to fail initially and they're going to give us clues if you will about what we need to do to move forward and so here we see we have a name error that named entity client is not defined well that's true we haven't made it yet so let's go ahead and make it so what i'm going to do is i'm in the test directory now so up a directory i'm just going to create an ner client dot pi so the directory structure should look something like this any our client is just in the root of our project directory and here we'll have a single class named entity client which we need to initialize and we don't really know what else we need to do yet so we're just solving one step at a time if we head back to our tests we should now be able to from any our client import named entity client and we run our tests we'll see that we're failing for a new reason now which is that name density client object has no attribute getents so let's go ahead and fix that def getents which we know takes a string input so we'll just call this sentence and we're asserting that it returns a dictionary so let's start with the smallest amount of code that gets our test passing and we'll head back and now if we run our test we can see that it's passing so this clearly isn't the behavior that we ultimately want but it's enough code to get green for a given test and so we're simply making a making an assertion that our getents method returns a dictionary and so the simplest amount of code to actually do that is what we have here getents takes in a string sentence and returns a dictionary so we're going to have to write more tests to explicitly test more behavior than this and our test should always pass because the assertions that we're making about the behavior of our code don't change so let's imagine a very similar test that test getants returns a list given non-empty string so we're going to have our named entity decline again we'll grab our ends and if we pass a sentence this time a non-empty string madison is a city in wisconsin and actually if we run this test we'll see that it also passes so right now our code doesn't distinguish between whether or not there's an empty string or there is a string in both cases it's returning an instance of a dictionary okay so let's start thinking about some more concrete behavior than just whether or not it returns a particular python type so really what we need to do here is to think about the behavior of this class our tests are going to help us actually explore what the behavior of our class is through assertions but we need to be very precise if you will about the behavior that we're making assertions about so we have to keep in mind that we're not in control in this particular application with developing the statistical language model that's being used to present us with named entities to do that extraction for us if we were say building a custom model which we could do with spacey then we might need to make assertions about the fact that it maybe extracts things like madison every time or wisconsin every time from this string but rather than make assertions about the behavior of the statistical model which we just downloaded we need to make assertions about the behavior of our object the any the ner client named entity client and the structure and content of the results that our named entity client presents us with given its dependency which is spacey gives us something in the first place so we're not trying to test spacey here we're trying to test our ner client and so the way we need to think about this is is given spacey gives us a then we expect b to be some condition that's met as a result of its use in our object so i just mentioned that spacey is a dependency of our named entity client and that's true and when we think about dependencies something pops into our mind here which is the dependency inversion principle so we could do something like this we could have a named decline if we come back to our named entity client here we know that we're going to need spaces so we could import it here and then here we could do something like self.model is equal to spacey.load and core web small and with spacey this returns us a callable that we can pass a string to get a document back and then here we could do something like doc is equal to self.model and we could pass it sentence and then we could work through to extract the named entities from this doc which is something that spacey is going to help us with but here we're tightly coupled to spacey existing in the first place and anytime we create an instance of a named entity client we're going to be loading this language model into memory or at least having a reference to it in the client here and if we're testing the behavior of this client we don't really want to have to load this external statistical language model every single time that we knew up an instance of named entity client to test it and furthermore we may wish to use different models maybe not even use spacey altogether and so one of the ways that we can accomplish that is through exercising the dependency inversion principle from solid so the idea is that rather than have named entity client depend on some detail which is this model it should instead depend on an abstraction that gets passed to it and so instead we can pass model here we'll say self.model is equal to model and in fact the behave the code in the method still acts the same it's just acting on the instance model that's on named entity client but rather than named entity client being in charge of instantiating that model we pass it from a higher level of abstraction wherever we're collaborating with named entity client so we're not going to write any code just yet we're still going to return dictionary except the way that we construct an identity client with this new parameter in the initializer will be a little bit different so now we know that our named entity client needs to take some model to get constructed and to test this we need something to stand in place of the spacey ner model so we might have an ner model test double this is where you hear about things like mocks and spies and other types of test doubles we're not going to com we're not going to write out a full mock of the actual ner model provided by spacey but we do need to know a little bit about how it works to create a lightweight test double that can stand in for the behavior of the real spacey model in production so when we're testing we just need something that behaves in a way that we can control that behaves like the spacey model but doesn't load this entire language model every time that we create a new instance of our class rather it just behaves in such a way that we can say things like given spacey returns a named entity collection on the spacey side for something like this particular string then our named entity client parses that result from spacey correctly and puts it into a format and a structure in such a way that it's responsible for that we can work with and we can make assertions about that certainly okay so if we run our first test now we're going to fail because any our model test double is not defined so let's go ahead and define it i'm going to create a new file called test doubles in our test directory and we'll make this class any our model test double and in order to make this we should look at how spacey actually works so what i'm going to do is i'm going to move into our flask nerd directory we're going to source the nv bin activate and then i'm just going to start a python repel here and so we'll import spacey and then typically what we do with spacey is we do something like this we say something like nlp is equal to spacey.load and then we load in that model that we downloaded so we downloaded uh the english core web small i think and then we say doc is equal to nlp and you can see nlp is a callable so spacey.load is returning us some python callable whatever it is and we provide it with something like madison as a city in wisconsin this in turn returns us a spacey dock so if we do type of dock we can see that this is a spacey token stock dock class instance and this dock has ants on it under ants so we can see madison and wisconsin and each of these ants themselves have some additional properties and so here we can see also that ends themselves contain some additional data so we can say something like ent.text and label underscore and i'll mine the typo here for and in doc.ins and so here we can see that spacey has all this metadata about the various things that it extracts here it's incredibly useful and what we need to do now is to create a test double that can stand in place of the actual dependency we have in our client so we need to create a mock if you will for nlp here and then be able to set the behavior on the dock such that we can control what the ants look like because we're going to be passing arbitrary strings as users to this callable that we get back which adds a little bit of complexity to the double for sure but we are then going to be getting some dot ends given some string and those ants the crucial point here is that we don't know dynamically when we're writing tests what a user is going to type in here right so we don't know what the contents of doc.ns is but we do know something about the structure we do know that there are a limited number of labels that spacey can provide us with and we know that the structure has certain properties on it the doc class has certain properties on it's got text label it's got start and end cars it's got some extra and some other things on it that we can make use of but we're going to create a test double that behaves in such a way that's useful for us to make assertions about the behavior of our class so let's keep that in mind the structure and i'll leave this here just so we can kind of see what spacey actually does and how we can mimic that in our testable so i can see that when nlp is initialized it's initialized with some model but what's important is that when we create this we get back a callable which in this case is called nlp and we call that with a string so for our model test double what i'm going to do is i'm just going to create an initializer here which we can pass it a name for a model and we can set it this part doesn't really matter too much but when we call it we want it to return this doctype so this spacey doctype and we can create our own test double for a doc as well but the key point is we're going to return a doc test double here so that's what we'll do and we're going to pass this a sentence and we're going to pass it self.ns because that's what we want to control i'm just going to change the for the the panes here to make a little bit easier to read so let's look at how we can control this i'm going to create a method on our ner model test double called returns doc ants which just takes in some entities and what we're going to do is we're just going to set that on the model instance now when we return a doc test double we're going to set the entities for that doc on it straight away so this is going to allow us to do a typical sort of mock setup syntax where we can say our test model is a new instance of an ner model test double and it returns some controlled set of docents and as long as the interface if you will we don't have strict interfaces in python being a dynamically typed language obviously but as long as we write our test to sufficiently conform to the interface of the aspects of this model that we're using namely that the ants take the same structure then we can actually test the behavior as if it was spacey under the hood providing us with the same type of structure so let's make our dock test double this is a test double for a spacey duck so it's going to take in scent and ents and we can see the structure of spacey ents here so let's just go ahead and replicate it ants is going to be some object this is actually called a span in spacey so what we'll have is a span test double and i promise that'll be the last test level that's going to get the end text and then end label for end in ends so really all we're doing here now is we're setting self.ns on the dock to this list of spans which have the text and label on them and in fact this should be label underscore as we can see that's the format that spacey uses so we're just replicating what spacey would do except we're saying we want to be able to set this because we want to say given spacey has this behavior we're going to make an assertion about what we do with it so the last thing that we really need to do now is to say class span test double which itself just takes a text and a label and set self.text to text and self.label to label okay so this is about as lightweight of a test double as we can get away with and i think that is really a good way to start i mean the other thing that we could do here is we could bring in an external library something like pi test mock and or or just monkey patch the methods on these classes of our dependencies but i really like the control we have here the one thing to keep in mind that's a little bit of a trade-off is that we have to be very careful that the doubles that we create are true stand-ins at least as far as their scope to the methods that we're using here they're true stand-ins for the way that our dependency actually behaves so just some thoughts to keep in mind i really like doing it this way you can see in less than 30 lines of code here we have a useful in test double and we can still exercise in inversion of control okay i'm going to close our reply here oops and we have a typo here that we should fix we need to use list comprehension properly so i think that's correct span test double which takes in the text and the label for end and ends while we're here i'll also show you a trick if you want to patch a method in place we may not need to use that here but i wanted to show it because it's something that i think can be useful if you're building your own test doubles here is a lot of times i'll create a patch method on a class whose method i want to patch and then i do a nested function here just to return whatever some return value is and then just set adder on the the particular name of the method that we want to patch we pass that as a string and then we set this patch to the return value that we want and then we return self so this is nice because you can imagine a case where the test double that you're using in place of a real dependency calls a particular method and here if we patch the method in place and control its return value we can hold that constant to some return value and then make assertions like given some particular method on this dependency returns a value then we can do a test that way in this case uh for our purposes we're actually just patching the attribute ents which is just like a field on the instance and so we don't need to patch the method call but we can do we can do that as well so i just wanted to throw that in here we'll head back into our ner client or rather the test for any our client and so we have our model we need to bring that into play here so from test doubles import ner model test double and then here we can say model dot returns docents and let's imagine that it just returns an empty list and so we're not really testing just that it's given an empty string but what we can say is that given an empty string causes empty spacey dockins or empty string and so if we run this test now we're passing again this test will be failing because model is not defined so we'll bring in our model and look what happens if we don't set the return for the dockins and we'll say model returns documents empty list and what we're testing here is test get ants returns and i said dictionary or a list here this should be dictionary given non-empty string causes empty spacey dockets okay so this is the case where we pass it an actual string and we're setting it up such that the natural language model doesn't return us any ants for that string that we get back a type of dictionary okay so now let's write something a little bit more interesting that we would like to see before we continue writing code for our client here we might say something like test getent's given spacey person is returned which is one of the spacey types this serializes to person which is the type that we would like to return with a different spelling here in this case so we want it to be more human readable and these sort of label tags that spacey returns so given we have some model we're going to say that that model returns documents it looks something like this we'll say docents as a variable here to make it a little bit cleaner so spacey's going to give us in some cases something like this text and a name of an entity here so i'll stick with chess players and we'll go with laurel fresney who's a strong french chess player and it's going to return us a label that says person in all caps so then we're going to create an instance of our named hd client with the model that we've prepared it doesn't matter what we pass it any r dot get ins but we're we're saying given that we pass it some string doesn't matter what the string is if spacey tells us the named ents in the string you passed it are this then our expected result is completely under our control so what do we want to see here well i don't really want to see text and i don't really want to see label underscore with a capital capitalized string person what we want to see is maybe something like this we talked about previously it's up here so our expected result is to be a dictionary so let's fix that it's going to be ens here this is a list of dictionaries which say ent and it would say lauren and we just wanted to say label with no underscore person and then we want a second key in our dictionary which is html and it has some html in here so we don't really know what that looks like yet so let's make an assertion about the ants on the expected result here so let's say that we need to assert list equal that the ends that we get back which is going to be on the key ends is equal to our expected result hence and so maybe we'll call this result to make it a little bit more clear so we're saying the nth key on the result object that we get back the result dictionary that we get back should equal the expected result and specifically the expected result ends key that we've set up here so given some documents that look like this that we get back from spacey our client should behave in such a way that it returns to us a structure that looks like this and we don't know what the html looks like yet so we're just going to test for the nth key value at this time so let's run the test we'll see it fail we can see html is not defined so that's a syntax error that's not the type of error that you want all right so we'll try again key error ends okay so we're getting a key error because result that we're getting back doesn't have an nth key on it so let's fix that so now we finally go back into our code and we'll add a key hence it's an empty list for now and while we're in here we'll add the key html which will just be a string all right so let's run our test and now we can see that our lists differ so empty list is not equal to our list with laurel fresney and a label of person on it so let's add some code to make this test pass we're going to come back into our ner client and so we have our model let's use it to get those ants off of it so we're going to say our dock is self.model dot sent or we pass it sent rather this is how we use spacey so we saw this a moment ago at the repple and so the entities that we get back here take this form and is in text from spacey label will be ent.label underscore from spacey for and in doc.ins and then ants will be entities and html is still empty string so let's head back to our test and we run our test oops and we need to make sure that this is a colon here okay so we'll head back to our test and we'll run our test now we can see sent is not defined so where we have get ins so we have a another error here so this should be sentence head back to the test and we run our test now we can see something very interesting so we're still getting back just person in all caps rather than person which is what we would like to see so we're using the spacey tag here and we want to use our own custom tag that we mapped to so let's write the minimum amount of code that's required to fix this problem so what we want to do here is we really want to map this so let's create a method map label or something and it can be a static method we're going to have some label map which is a dictionary and so given some spacey label here like person we just want to return person all right so that's the minimum amount of code required so now we can return label map dot get label okay so if we go back to our test and we run our test again we can see that it passes so you can see now how we're building the behavior of our named entity client by making assertions about its behavior and so it's not as if when we're doing test driven development that the tests are really driving the decisions that we're making about the implementations but what they do is they help keep us focused i think on the goal of the assertions that we're saying our code should do and while it does help sort of shape i would say the type of code that you write the tests aren't going to make sort of architectural decisions if you will they're not going to make decisions necessarily about how your objects collaborate for instance so for instance we could have made our named entity client not depend on not use the dependency inversion principle by passing a model into it but just creating a new instance of it inside of the class that for one thing would have made it much more complex to write unit tests for but it also would have made it depend on spacey itself and as you can see here this just depends on something that behaves like spacey and so that's sort of the general idea here so i'm going to move through the following tests relatively quickly and we'll just see how we can continue to extend the behavior of the class we're doing so we want to say test get ants given spacey say norp is one of the spacey tags we're going to resolve this to group so we'll just make up something here we want this to be lithuanian with a label of group and if we're going to run this again now we can say label none is not equal to group so clearly what we need to do is we need to go back to our ner client and add a mapping for norp to b group need to put a comma here we'll go back to our test and run it and we can see that it passes you can also run all the tests with capital t with the vim test plugin that i'm using all right so we could do this for all of the mappings that we want so i'm going to do that really quickly here we want spacey resolve to location we'll run the test we'll see it fail go back to any our client loc would be location go back to the test and we run everything and everything's passing and so if you're kind of in the flow of things we can make some decisions here we don't have to just rigidly follow i would say this this this cadence especially when it comes to maybe just like updating a simple mapping here so what i'm going to do is i'm going to add a few more here so given spacey language this resolves to language and we're not going to see every test fail here we'll put in american sign language here and then into asl label language just going to put two more labels in here there's a spacey label gpe which i believe stands for geopolitical entity we're just gonna say this is location again just to just for an example and maybe we and so if you did want to see all of them fail a few of them will fail here two of them so we'll add our mappings i'll run all the tests again go back to our tests and we'll run all the tests again nope and we're still failing on language that's interesting so let's go back to our mapping and i can see that i have a typo here line gage so this is another reason i'm actually pretty glad that this happened because this just demonstrates that if you are doing tdd or at the very least if you're writing unit tests you will find the typos that you make here if you have an ide that shows you typos then you can make the argument that you might may find it then too but this is going to show you a big red error that something is not correct and so just another reason i love tdd as a workflow for particularly starting new applications it's going to find those mistakes for you and sort of direct your attention to the right place okay so our tests are passing i'm going to write one more test here which is we want to make sure that it serializes multiple correctly so test getants given multiple ants serializes all so we'll say that doc ants returns this one and say the text comes back as you poker label person we would expect to get back is analogous all right so let's clean this up a bit and we'll run all of our tests again okay so we have eight passing tests here we could continue to make assertions about the html piece i'll leave it to you to do that because right now i'd like to move on to doing some automated browser testings we'll be doing our end-to-end test now so right now we have made an object that is fully tested consumes a actually consumes spacey potentially as a dependency and parses the results that we get back and can provide them in this sort of dictionary format that we have but we don't have an api so let's go ahead and write some end-to-end tests for accessing that api okay so in our test directory i'm going to create a new file in here called test e to e index or test index e to e maybe test index e to e and n dot pi all right we're still going to use unit test as our test runner so we'll import this and this time we're going to from selenium import webdriver webdriver is going to be what we use to actually automate our firefox instance in this case if you're using chrome it'll do the same so we'll say class e to e tests again extends unit test dot test case and you can see that test case just to give you some additional documentation here is a class whose instances are single test cases okay the other thing that we're going to do here that we didn't do with our unit test is take advantage of two test hooks here set up and tear down and keep in mind they need to be camel case here so when we start an end-to-end test we're going to create an instance of the firefox browser and then when the test is complete we want to close the browser so we don't just have a whole bunch of resources tied up with these open firefox browsers so we're going to set a driver on the instance equal to webdriver dot and you can see there's a number of different options here you could use chrome or even blackberry for some reason or android but here we're going to use firefox and here we just need to pass a parameter executable path and this is why i mentioned it was important to keep in mind where that geckodrive or chrome driver is located on your machine mine's an application slash get go driver so it should just be an executable or some other type of binary and we're going to call self.driver.get and we're going to go to the root path of our api and so by default flask is going to serve up an app on port 5000 so we're just going to go to http localhost port 5000 and then tear down we're going to self.driver.quit now keep in mind that we don't necessarily need to host our app on localhost if this were a production end-to-end testing system then we might deploy the app to a staging environment in which case we could still run these tests from any machine and hit that staging environment so that's just something to keep in mind that is potentially useful all right so we're going to start the api portion of our app and we're going to do this as well in a test driven format we're just going to be using these end-to-end tests as the tests that drive our development so let's make an assertion about what happens when we open a browser and head to the root of our application one of the easiest things we can do is that we could test that the title in the browser contains our app name so we could say test browser title contains app name and we could say something like assert in named entity and then self.driver.title oops i need to fix a typo in executablepath so now if we run this it might seem kind of silly of course but what's going to happen is our browser is going to open up and we're going to go to port 5000 there's not going to be anything running on it obviously and our test is going to fail before anything even happens because it reached an error page and so when this happens a webdriver exception is thrown and we're actually just going to manually close this you could do more sufficient error handling to close the browser in this case as well but we don't have an api running so the first step is to write the minimum amount of code possible to get something running in the browser so that when we hit port 5000 we have the title named entity happening so what i'm going to do is in the root of the project here where our ner client is i'm going to create a new file called app.pi and we're going to import flask or rather we'll say from flask import flask we're going to import render template and we're going to import request then we're going to set app equal to a new instance of the flask object just the name of the file gets passed and then we're going to create our first app route so we'll say app.route and then when we go to the root route in other words just forward slash we'll have an index action here and we're going to return render template with an index.html static page in our case it'll be static except for some javascript that's used to to do things like show our table and modify the dom i'll place a link to the javascript in the description of this video we won't go through writing the javascript in this video so we're just focusing on the back end but i may make another video in the future about tdd for this specific file in vanilla js so now we need to create this index.html so i'm going to save this file and then in our templates directory we're going to create that index.html file and remember minimum amount of code to get things passing the worst thing would to be to leave your project in a state where the tests aren't passing so if anyone comes along and picks it up at least the tests are passing you might not have the right solution in place yet but no one has to go back and fix any broken tests so that's good so we'll create a title here named entityfinder title okay so that's probably the minimum amount of reasonable code for us to write and then the last thing that we want to do here is say if name equals main then we want to run our application and we can do that in flash just by calling app.run and we're gonna pass it debug equals equals true here as well so we can run it in debug mode okay so that should work we need to actually run it now so in a new terminal here make sure you run source env bin activate of course and then run pythonapp.pi this should start your application on port 5000 if that's complete we can head back to our tests and we can try to run our first and end test okay so it ran the browser closed and we can see that the test passed so selenium is going to drive the browser pretty quickly which is good we don't have to spend a lot of time doing manual testing but it's also not always going to be clear what's happening visually so we need to make sure that the assertions and things that we have in here are clear to other developers and testers coming along so let's write some more tests we can say test page title is named or rather the test the page heading i should say is named is named entity finder so here we can use selenium to actually find elements on the page using css selectors so if we have some heading let's say we can say self.driver dot find element by id we can also find element by css selector which i like to do because yeah because we can use css selectors now if we were to find by id this actually brings up a good point if we uh take a look at the index.html so i'll save this go to index.html and we're clearly going to do something like this we're going to have a body i know that we shouldn't be writing code before we write tests but i think you get the idea here we might have some id that's a heading here okay so we have an h1 tag like this we've got an id on it now when we're doing our end to end test we can select any element in the dom by using things like any css selectors but we might choose to use the id because that should uniquely identify the dom node but the other thing to keep in mind here is that there may be people working on the front end of this application who are making decisions about things like the name of the id here or other css selectors that we don't want to depend on when we're writing our end-to-end tests because someone might come in here and change it to say app heading being completely unaware of the end-to-end test that we have in place which were dependent upon finding an element with this id instead so a useful way to get around this is to make use of data attributes so a lot of times what i like to do is to create one called data test id now when a front-end developer comes in and sees this and hopefully we communicate with the entire team about the reasoning for doing this but we would say hey we have this data test id don't mess with this do whatever you want with the ids but don't change what i have in data test id because we're explicitly using this as the attribute that we're selecting on when we do our end-to-end tests and so this gives everyone else the freedom to update the other attributes on the different dom nodes and makes it explicit that we use this when trying to select it for the purposes of selenium this case or cypress which is another really fantastic end-to-end testing suite okay so now we have this data test id the identity finder so rather than find element by id we're going to do find element by css selector and then the css selector for a custom attribute or for any attribute is just square brackets and ours data test id is equal to and we're using an f string here to interpolate some value so i'm going to clean up the quotes here and then what i like to do as well is just create helper methods throughout this because we're going to be using them over and over again and so we'll say something like input element is well effectively this and we'll interpret interpolate this value in place and then now we can just say self.find heading and we're finding our data test id so then we can self.assert equal and we can say that the heading on our page is named entity finder all right so let's run this test and we saw it flash pretty quickly assert that named entity finder is not equal to none it looks like it didn't find our heading so let's go back to index.html data test id should be the name of the attribute oh and clearly i just need to return this sorry about that so let's try again okay and this is getting the selenium object so forgot one more thing here we can call that text to get the html text from it so let's try one last time here and now we can see the test is passing all right so let's think about another feature that we might want well we want a way to input some text so we can say test page has input for text we'll imagine that we have some input element self.find something like i don't know index text something like input text and then we can assert is not none input element we would run this we would see it fail and say it couldn't find and we can see that it says it's unable to find the locate and we can see that it's unable to locate the element so now we need to go add that element to the dom we'll add our data test id again and i'm going to be using bootstrap for this so we'll add a form control here as well just to anticipate that so we'll run our test again and we should see it pass okay so let's write a few more tests here we'll test the page has a button for submitting text if we ran this we would see the test fail so i'm just going to go ahead and add the button now and we'll run our test oops and this needs to be not none so we'll run the test and we should see it pass okay so let's do something a little bit more interesting here let's assert that we want to be able to write some text into the input box and then click the button and this should generate a table for us so we're going to write test submitting sentence creates a table for us and here we're going to get our input element and we're going to get our submit button and you can see how we might want to extract these into some class that is reusable but for now we'll just reuse these we can call send keys on the input element so if we say send keys and then we say something like i don't know france and germany border each other or share a border in europe and then on the submit button element we can just call click and then we can say we have some table which we should find at any r table and then we can assert is not none table all right so if we run this we're going to fail for a different reason this time and that's because we don't actually have the javascript to handle that click event to then hit our back end api which will generate our table for us so what we can do is well it'll say unable to locate the element so let's actually go set up the html so that that element exists in the first place and then we'll incorporate the vanilla js script which will append elements and handle the fetch request to an api endpoint which we need to make so here we have some table we've got a heading for entity and a heading for type and then for now an mpt body okay so the other thing that i want to do here is i want to bring in the javascript file that is responsible for hitting our endpoints and handling the on click events that's on the button here and then handling the dom updates as well so i've just copied that file in here so it's a javascript file and i'll paste a link to the repo where you can pull this file it's pretty minimal it just does a little bit of dom manipulation and invokes the fetch api to an endpoint that is expected to be running on our application at forward slash ner it's going to accept a post request and it's going to attempt to get those results back from us and so it's going to actually use or invoke any our client at that end point so we'll need to make sure that that actually works so if we run this test it actually passes and it's sort of like our test is named incorrectly here because it's not that submitting the sentence creates the table but the table already exists so we can say page has ner table on it now for something slightly different we'll look at how we can test drive some integration tests here to actually build out the post endpoint that our ner client will be used in so what i'm going to do is in our test directory i'm going to create a new test module called test api dot pi and we're going to import unit test as usual we're also going to import json and from flask we're going to import request then from app which is our app we're just going to import app we'll make our class test api with unit test dot test case so this is all standard stuff but now we want to make some assertions about the ner endpoint so we need to make a post request at forward slash ner so let's test that our ner endpoint given some json body let's say returns a 200. so we're using a post endpoint here and we're thinking in restful terms sometimes that would be like a create of a resource and then so a lot of times you get a 201 or another 200 to describe that something's actually happening we're just going to return a generic sort of success response here and the way that we can set this up with flask is flask provides a nice method on app called test client so we can actually say with app.testclient as client we can make a post request so let's say our response is equal to client.post at we're going to test by wishful thinking here and imagine that we have an nar endpoint with some json we're going to post a sentence this is what we would like to be able to post and we can post a sentence here the malcomus is in a good band then we want to assert that the status code on the response so this is underscore status code in the case of flask is equal to 200 so let's see what happens when we run this test well the name rest is not defined because i called it response so let's try again and so now we're actually seeing that 404 is not equal to 200 so it didn't find this endpoint so the easiest thing we can do here is go back to our app actually app.pi and create a new endpoint for it so we'll say app.route forward slash ner and we'll call this endpoint get name dense and the data since we're posting json we can just get off of the request so we can say request get json for this we need to make sure that we've imported requests from flask and then we'll just return json.dumps some response so what does the response look like well we don't know yet let's just return true okay so if we come back to our test api and we run our test again now 405 method not allowed so that's interesting we need to update app.pi to make this a post endpoint and we do that with methods equals a list and with one endpoint or one verb here post let's go back to test api and we'll run again okay json is not defined and steve malkomas's name is spelled incorrectly so here we just need to import json and run our test and now we can see our test passes so now we have an endpoint that returns a 200 when we post some data with it in this case we have something about steve malcomus in it all right so now we're going to write another test and this is going to be basically an integration test i mean we're going to actually invoke our class and we could write a test to mock out the dependency of our test client and sort of test this like it was a controller test for instance but i'd like to move ahead and just do a real integration test with spacey involved and so given some sentence whatever it is let's make it so that we'll craft a sentence that we know spaces ner will actually return some named entities for us and we'll make an assertion about it this will be useful so we'll say test ner endpoint given json body with known entities returns entity result in response okay so now we will again say with app test client and we could pull this out into a test fixture but seeing as we're not going to get much more involved in this this is fine for now okay so we've got our response where we're posting to nar with some sentence we're going to get the data back from that response hopefully we'll say json.loads and here we can actually call response.getdata and we can make an assertion we can say something like assert data entities and maybe length of data entities is greater than zero and so we'll run this test and so now we get an error that a bool object is not subscriptable and that's because in our current implementation we're always just returning true so we want to actually wire this up now so what i'd like to do here is we're going to import spacey and then from our ner client we're going to import our named entity client and then here which is our sort of composition route in terms of the dependency inversion we're going to load in finally english core web small and we're going to load up our named entity client with that spacey model then when we come down here we call request.getjson we're going to set a result equal to ner.getents which is our way of proxying the sentence to spacey in this case we're going to pass it the sentence on that data dictionary then our response will be whatever the shape of the response we'd like to be for our front end and our front end is actually expecting entities and so we pass that back and it's expecting html and so we pass that back okay and then we're just going to dump the response using json and return this so let's go back to test api and run our test again okay so we're getting a key error on entities so let's go back to app.pi and we have a typo here again another great thing that unit tests can catch for us oops and here we call this ens actually okay so let's go back to test api and we'll run our test again and we'll fix this typo here so length of data entities is greater than zero and we'll run our test and we get a passing test we can also make an assertion that kamala harris is the named entity that we get back here so maybe we do something like sir data.entities dot zeroth in the index dot end is equal to kamala harris and we can assert that the label is person so this is sort of an integration test we can see if it succeeds and it does so now you can see on just another way that we can continue to drive our work in this case using more of an integration test and actually using this test client sort of fixture that's provided by flask in order to make post requests against some endpoint and if we've done everything correctly then we can see that all 15 tests that we wrote throughout this tdd session are passing and we've done a mixture of unit tests here some end-to-end browser automation tests and an integration test on the api so thanks for watching and i hope you enjoyed the series
Info
Channel: Wes Doyle
Views: 27,804
Rating: undefined out of 5
Keywords: tdd, test driven development, test-driven development, tdd tutorial, tdd vs bdd, python tdd, python tdd tutorial, pytest, pytest tutorial, spacy python, spacy tutorial, selenium tutorial, selenium python, flask tutorial, flask python, unit testing python, flask api, named entity recognition, machine learning python
Id: eAPmXQ0dC7Q
Channel Id: undefined
Length: 81min 52sec (4912 seconds)
Published: Tue Nov 24 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.