Consuming Docker API with R

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay welcome back to hudson's hacks this is going to be the third in a series of tutorials we've been looking at the first one had us training the machine learning classification model on a stranded patient data set available at the nhs nhsr datasets package to download we showed you how to do that in the first step we then serialize that model in the first step um and you know this could then be retrained weekly and i'll show you as part of this tutorial how to set up that task schedule process as well secondly we then took that model and we then said let let's put it into a docker container let's push it in so we created a docker file and we had all the files that we needed to create that doc container refer to the second video in this tutorial the third video which is this one is in response to a lot of questions i've been getting from people saying okay we've got this great uh docker container and i like the swaggy ui that we created in json it does make it interoperable but how do i actually interface with all so say i've trained my model as the first part of the phase i then deployed it into docker as the orchestration step so it's available for everyone i want to set that's the start of my pipeline the end of my pipeline will be just consuming this with unseen data so i say unseen data this will be live data running from a database um streaming i'm going to pass it to that model and get my predictions back and that's where this tutorial comes in so it's going to be consuming our api from docker with r so i'm going to take you through this today and first of all i'm just going to refresh you on what our swagger ui looks like okay so we got this swagger ui that we created in the last tutorial we've got a get so we created a get request so getting data from an api and this just pulls back simple things like the connection status it gives a message the time that it was connected up to and the username and we've got this post so we've got this post request and it's called predict so that's the handle at the website at the end you'll have docs and then you have predict at the end if for this one to get the connection status you'd have connection status so if i just copy this you'll see what happens so you can see that that's actually a json string now we can get that as a json string for a post request it's a little bit different we'd have to pass each one of them so into predict and it gets a bit messy here we'd have to pass in the post so it'd be age it was the 52 and then you'd have an and it gets messy so we're going to use all to do that for us and that's what we did essentially in the last tutorial so refer to that and sorry the first tutorial so refer to that if you need to just catch up this structure was dictated by a yaml file that was created so if you look at the schema you've got age so the age of the patient whether they were referred from care home something called medically safe in hospital terms so can they be discharged medically safe whether they're a member of the health care for elderly people specialty if they need mental health support how many previous care impatience days have they had in the last 12 months have they got any activity limitations same mobility problems again that's another flag do they have a history of falls which is something else that's closely monitored in hospital settings in england and scotland and wales and northern ireland should i just say the uk so we've got this swagger endpoint it's all cool but i'm going to want to consume this part of it so the post request we're not going to focus on the get request because it just returns status updates the post requests where we're going to really uh work with so if you don't know what a post request looks like it's essentially something you learn about when you're doing web design so a post request is http and it sends a request through to a website so in computing it posts a request method support by http it requests methods that the web server accepts the data enclosed in the body of the request message most likely for storing it it's often used when uploading a file or submitting or completing a web form so that's essentially what we're doing you know when you fill in a web form on the internet that's a post request or if you know you want some shopping or you register for a conference or whatever they'll all be examples of post requests okay so we've got this swagger endpoint and now we're going to use r to interface with that endpoint so i need a couple of packages here so i'm going to need my httl for connecting up to http i need json lite for working with json we need all json as well because there's a couple of handy functions in there i'm going to need dplyer and data table just for fast loading so we're going to run these first of all just let me bring the console window up okay so data table says that it's using parallel so it's using threading so we're going to read in our production data so assume you've got a live environment this could be access via sql it's going to read in it's going to do a sql query it's going to get all the patients that have not been trained in the model so you could have some flag saying was in the model training step was not and what we're going to do is we're going to jsonify this so we're going to take in our input data frame we've created a little helper function here so the prod data is going to do the data table method it's going to read it in as a data frame and then json new is going to use the to json method from the production data so it's going to create a json string and then at the end i create a list because i'm going to want two returns because all's default doesn't have multiple returns you have to create lists to create multiple returns so it's going to give me a json string and a data frame so let's just run this function you'll see it appear in the environment if you clicked on that it gives you the function breakdown and parameters we've only got one parameter it's just going to be a data frame so inside of um so i'm going to call this function that we just created inside the data frame data folder you can see that on the right i've got this production csv data set so essentially from your server this would be a live data set that keeps getting updated and then you want to pass it through to your model and get your predictions so i'm going to run that so it's going to give me a list of returns because we created a list two outputs didn't we so it's gonna give us json and it's gonna give us a data frame so to expose those elements if you can see the what the titles or the what we call them df is the list element there and json is a list element there so for the prod data i'm going to get my prod data you can see that i've got quite a few observations so new observations that have not been trained on the model and i want to make these predictions so i want to find out if if the model's predicted whether they're going to be a stranded patient or not because we don't have the labels for these yet until at the time they become stranded within seven days so say this is their first day that they're in hospital the first inpatient stay and we want to predict whether they're going to be stranded and they the kind of benefit of this would be to say to your bed coordinators or people that deal with capacity actually on day one we are predicting they're going to be here longer than seven days so actually you need to make beds available for the next six six to seven to eight days uh and it will help with capacity and planning requirements and this is essentially the reason this was a live model that i created when i worked in the hospital setting as a machine learning tool and they found it really helpful especially going into community you'd have essentially social care provision after the end of this obviously the model that i had had lots more variables than this this is a very very much a simplified version of that again if you want to find out how to worm about this data set if you go to the nhsr community page or nhsr data sets in cran you'll find that i've co-authored this with a number of my nhsr community bodies so myself tom chris and zoe and if you want to look at the types of data and how to work with them if you look at the vignettes you can see that we've got a length of stay model so that's a regression problem any attendance is another deployer regression problem we've got mortality data set we've got a stranded model and this is what we're using so the stranded model shows you what it's all about what the data fields are how you can finish your engineering to get more out of the data how you evaluate it with confusion matrices etc so we've got this data and these are the unseen predictions so we're going to make predictions on whether they're going to be stranded or not and then you can get the json as well so i've got the json string so that's how it would look if you're passing it as raw json my typing is not good so you got the raw string and it's encased in json tags as you can see here so list item you've got your json tag there so it's passing multiple values so you've got an age and you've got all these multiple items there's lists that you're going to pass through so json list similar to python if you know that the lists are always the square brackets okay so this is the sexy part then so this is just loading the data from a data frame making it a json string i'm going to pass this json string through to our function but i also want the original production data that's come through as a data frame because we're going to bind onto that later on okay so let me take you through these steps so this function is a series of one two three four five six steps that i coded out initially individually but then i changed into a function function for help for ease should i say so i've got the post to api and we're going to use our url so a url is going to be this url here it's going to be the predict method the post predict method so i know that that url will be just this i think so that'll be that url there our ip address and i'll predict so our name of our endpoint and we know it's got a predict yeah so we're going to connect up to that later it's there but it in httr it lets you do a lot of the standard http requests so post get requests etc so i'm going to use the url that we're going to pass into there the body is going to be this json string so remember we've created a string from that data frame and it's going to pass in the age and it's going to pass in all these multiple ages and then it's going to say oh actually list item one or zero in the indexing depending on what platform you use this item one is 52 and then it will look to the next index on 52 for the next parameter which will be like previous uh periods of care etc so it's going to put that body json through and we're going to specify the content type it's actually json because we're passing json string through to our api again just let me refresh your memory if you passed anything else it wouldn't work so it's got to be json so then there's another httr function called stop for status because it's going to send the post request through and if you don't wait it's going to get you're going to get a connection result saying it's successfully connected but you're not going to get any of the response body which is essentially if i run this now if i tried this out and just did it with the default parameters it gives you execute it you can see i've executed that i've got this request url that's my curl request string which is essentially what http is sending out in this part is that curl request and then this is my what's called response body so it gives me those stranded and not stranded probabilities and it will say if it's connected successfully or not so i know that it's connected successfully because i've got code 200. so you've got this response body and you've got these predictions and for this this is very much a not stranded patient so what we're going to do is we're going to do this for multiple patients because we've got 450 observations and we're going to get that response so this is really important this stop for status without that it'll only give you the status so then the next step so we've made a post request we've sent it through we're waiting for the status getting it back for each one of those 450 observations the request body and get those content from the results it's going to pulse it and it's going to pass it to json and then we're going to send the results back to json so request json and we're going to convert it to a json so the request from the request body here because that's the result so we resort back to json and then we're going to use a special parameter here as as data frame i'm going to use the do call method so we're going to r bind we're going to bind the rows together and because this this will return a string if we didn't do this result we're going to list apply it so we're going to loop over every one of those in the request body and we can convert it to a data frame so essentially it's going to be stranded not stranded as a pair this will just put them as a string across that's why you need this method to convert them back to a data frame so not stranded and stranded finally we're going to store our lists as our results here and that function then it's got everything that we need to work with that api so now we've got that what we're going to do is we're going to actually connect up to it and get some predictions back so we're going to run our url see our url's now in our global environment we're now we're going to access this poster api function that we just created so i'm passing that url i'm going to pass in that json string that we created as part of the first part of the process so just to remind you this massive string of multiple values with the key and each of the values here so you've got the key needs mental health support and each one of those encapsulated in the list so that'll be encapsulated for every page 450 times on that line 450 times on that line until you've got that we've got down to those nine variables so if we count these tags here you'll have one that's the first variable two is the second variable three as a third variable four is the fourth variable five is the fifth variable and so on until we get to nine and the last one will be mobility problems so if we wanted to check that in our data frame we can see that we've got this frail mobility problems there so let's get everything in in the data that we need okay so we're going to do this call here api call results and you can see that i've got this list of four so it's going to give me back uh the httr requests so the original request the status code so it's successfully connected then it's going to give me these list of values and this is why we needed to do this special or bind l apply function to create it back into a data frame then we've got this prod data that we're going to bind it onto so now we've got our predictions we're going to create this data frame so this is your predictions based off that data frame so let me just go up to the df so you could call that whatever data frame so what you should have now is our original data that we fed through into the function and then our predictions so what's that what's happened is it's gone to the api it sent all those predictions through as a json string it's then said okay like it did before okay i've got all your predictions for you let's wait until the response comes back you've got this response body and it will give you all those predictions as in terms of probabilities so if we look at that df now and if i put it into my environment i've got the the probability of not being stranded over stranded for every one of those 450 patients what we're going to then do finally is we want to bind that data back on so we've got this prod data and what we're going to do is we're going to bind through this see this df or we could call that predictions whatever i'm going to call it there for now we call that predictions and then we're just going to create a new column i'm going to call it class and i'm just using an if if else function so our thoughts the class label strand is greater than 0.5 then we're going to classify it as stranded otherwise not and you could kind of tweak this threshold if you wanted to really be strict on your model so we only want to be 80 percent confident or 90 confident that they're stranded um and and tweak that we're just going to say by if it's greater than 50 then it's more likely to be stranded based on the representation and the patterns in the model so this is then if you saw that increase gone up from nine variables to 12. so at the end of this data set you'll have what we think the predictions are and the only way you then know what the class would be if you wanted to track it for accuracy later on is when this person becomes stranded or not when they're discharged so this is when they've been admitted into the start of the process you'd have to wait say four to five days to find out if they were stranded or not and then you can look at feed that back in as a feedback loop into your training process to say actually your model is just not accurate enough it's not predicting on the unseen data well enough so try some other other modeling techniques in terms of machine learning classification models and again if you want to find out around carrot and how you can create different models and and different options for that i did a co i did a talk for the nhsr community on advanced modeling with carrot and i showed you how you can boost some of the weaker models using something called ensemble learning so check that out so that's accessing our endpoint we've got those predictions back and then you could store them into you could send them to a database you could write them as a csv so prod data and then we could say file we can store into data uh prod with preds something catchy like that um we've got f right we've got uh deployer uh we've got data table as well haven't we so this prod with predicts if you wanted to view that it's outputted everything for me and if i go to my file itself there it's in the data folder prod with the predictions waiting for you so it should be a larger than the original file because we bound those columns on so that's essentially how you then consume that api another question i got asked at the start of the process is so i've got my machine learning model and what i want to do is i want to essentially create a task scheduler i'm going to automate the training of that model so it it trains every week um depending on how how much you want the model to pick up on the patterns normally with inpatient data it doesn't change much so weekly is a good enough tech good enough kind of strategy if you had massive massive massive models with lots and lots of data they can consume a lot of resource free training they can take days so you'd want to really think about your strategy around retraining so what i'm going to do now is i'm going to go to our original folder so it's at the top left of the folder i'm going to use a program called tash scheduler to schedule this model that we've got so i've got this um ml training script that we we developed in the first first tutorial and i'm gonna every time we get new data in so new data is fed in to this so at the start of the process in the input phase real star to the pipeline stuff so sequel in or csvn or whatever interfacing with another uh platform i want to schedule it just to retrain on that new data and it could even be the these existing observations in our production data once it becomes stranded patients and then you have a query that looks at the difference between those that are in the model and those that are in production i'll leave that up to you to figure out but yeah then we want to schedule this machine learning model so tasha just cool it's got a nice user interface as well so if you go to add-ins i'm not going to use the user interface but you can you do this if you want there's nothing stopping you so add-ins at the top you should find the task schedule once the library has been loaded scheduler scripts on windows and you'll see that you've got this lovely little um kind of shiny application that you can use so i can select my model training scripts saying upload has been complete and i can tweak with these different settings and it will change depending on what the radio button is that you've selected okay i'm not going to do it through here though i'm going to do it in command line because i just like to do that way so i'm going to load in our test scheduler and a script to schedule that one like i said that we trained at the first start of the process so again pipeline step this is the retraining phase so let me use x data i'm going to use ml model training script in r so it's got a link to your training script name and the package is task scheduler so this is a system file task that we're going to run and then this is the real the real key part there are examples of how to create cron jobs as well through this in in linux uh but here we're going to do it on windows because i use a windows machine so test schedule has got a method called underscore create and it wants you to give it a test name so we're going to script this nightly or weekly let's call it weekly that's a better description and then the r script that it needs to trigger is this script to schedule and i like it in code because you could schedule multiple parts of your of your pipeline so if you did the so for me learning models you normally split down to a couple of phases i let me do my training step but i forgot hyper parameters tuning to do i'll also do that as a separate phase i'd want to schedule that after the model has been essentially created potentially and then i'd have an evaluation step and then i'd have a putting it into production orchestration step like we just had saw with the docker container and then you'd have you can consuming steps for how you consume it which is what this this webinar this tutorial is all about and i'm going to say we're going to create this schedule we're going to weekly i'm gonna do it 1201 i'm gonna do it on a tuesday and then let's create this it's already created anyway so it might get we might get a warning here yeah it's already created and then you you'd list your tasks that you've got running in all and then if i looked at windows scheduler scheduler you'd see at the bottom here i'll have this stranded patient task and it's started and it's ready to go so that's my weekly task that i've created and this will then in windows just kick off at that time when it runs when it needs to run so that's essentially how you then just quickly create a retraining task of your machine learning model using tash scheduler in all you could very easily create it in windows scheduler as well just do it is to run a little batch script that then triggers your all scripts but that's a lot easier to to understand how to do so in this tutorial we've kind of capped off uh parts of the process we did how we'd access our api in docker we we we got our production data we returned some we sent that production data in json through to our api the api sent the predictions back we bound it onto our production data and then we could put that into a database somewhere or put into a bi report so business intelligence report something like power bi to look at actually what our model's predicting is like a predictions tab finally then uh on in response to a second question around model retraining and how regular should do it again the regularity is something that you need to decide but you can use task scheduler to create windows tasks and there's also in taschej there's a cron jobs one as well which follows the same syntax as the windows scheduler so i hope you found that really useful please let me know on twitter statsgary if you've got any comments feedbacks drop some comments into the youtube channel or just check out my tutorials at hudson's hacks dot info alright guys you stay well and i'll see you next time

Info

Channel: Hutsons-Hacks

Views: 163

Rating: undefined out of 5

Keywords: Docker, Production, Docker R, Data Science, MLOps, R ML, Machine Learning

Id: 2OFtMtYyVsw

Channel Id: undefined

Length: 27min 54sec (1674 seconds)

Published: Tue Sep 21 2021