Deliver high-performance ML models faster with MLOps tools

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
foreign [Music] Welcome to our webinar uh today we're going to be talking about sagemaker ml Ops today we have on the call uh my name is pranam Murthy I'm in AWS Solutions architect and an AIML specialist on the sagemaker team hi and I'm Paul Hargis I'm also an AI ml specialist AWS Solutions architect and I will be presenting the second half of the webinar thanks Paul so uh the agenda for today's presentation and demo is going to work you all through uh the typical patterns for exploratory data analysis and how you can undertake exploratory data analysis on sagemaker using notebooks On Studio or leveraging processing jobs to process your data at skill then we're going to be quickly touching up on model training and model tuning and how to leverage model registry to scale out model deployments within your organization and Paul's going to be quickly touching up on inference and hosting and how to operationalize your workflow on sagemaker so in essence we're going to be working through different pieces of data science lifecycle and then Paul's gonna at the end showcase how they can all be chained together using sagemaker tools today we're going to be talking about exploratory data analysis and how you can do Eda On sagemaker Studio so today there are three ways to do Eda on sagemaker you can use say Amazon sagemaker data Wrangler which is a fast and visual way to Aggregate and prepare your data for machine learning you can also do exploratory data analysis in situ on studio notebooks you can locally prepare your data leverage the data to train your model even train your models right on the studio notebook or finally we can leverage sagemaker processing job to prepare analyze your data at scale in today's webinar session we're going to be focusing on sagemaker Notebook Ada uh quickly introducing how we can run exploratory analysis on studio notebooks and finally how you can process the same job at scale using processing jobs so what is Amazon's sagemaker Studio notebook and how how can you use it to perform uh eda's within studio you get access to notebooks so think of Studio as a web IDE and right there within studio you have the tooling available such as sagemaker experiments which allows you to log your ex Explorations and visualize those expert Explorations right within studio and just like how you would work with any notebooks on your laptop or on any other IDE it's as easy to work on sagemaker studio so moving forward what is the advantage of running Eda on sagemaker studio notebooks there are no Wrangler environments it's as easy as launching a new notebook in under 30 seconds pinning your notebook to a compute backend running your Eda and you can even switch your backend and proceed to train a model thereby enabling cost reduction uh when you're doing a heterogeneous workflow such as exploration model training with Amazon sagemaker processing job it's a fully managed solution for data processing and model evaluation sagemaker processing jobs are fully managed which means that the cluster for distributed processing is managed by by Amazon sagemaker in the back end you don't have to worry about setting up the Clusters or tearing them down you can leverage your own processing scripts to uh to run data processing or you could bring Sage make you could bring your own containers into sagemaker to run your processing jobs so let's quickly jump over to to a demo so the first demo we're going to be seeing is how to do exploratory data analysis within your notebook so here I have a few notebooks uh that I have got our uh running some code uh the first notebook that we're going to walk through is how to do exploratory analysis on the sagemaker notebook so the way we do is you can create a new notebook you can choose uh the the container image type that you'd like to run your notebook on I'm choosing data science kernel a data science kernel typically contains most popular data science Frameworks out there such as matplotlib pandas numpy Etc so data scientists don't have to spend time preparing containers or installing packages these are readily available out of the box and you can just select the backend you'd like and I'd also like to point out you know there are so many images out there that you can choose if your code has dependencies on let's say pytorch we do have python CPU and GPU optimized containers for you to run your code on so the first step is to choose an image type in the next step we choose the kind of instance that we'd like to do our exploratory analysis on right now my data is not that large I'm going to choose an mlt3 medium instance to do this analysis so once I select my image and the instance type my notebook is ready to go so I'm going to import my dependencies I have my data set in the root of my project directory so your sagemaker studio user profile has an EFS Mount attached to it so you can bring your sample data set onto studio and use that for uh analysis model training local model training and all the other tasks so think of a scenario where you have a small sample data set that you'd like to experiment with once you're done with experimentation maybe you have a million rows that you'd like to process and that's where we'll be leveraging processing jobs which I'll talk in a in a while but yeah so let's read our data set you can see what our data set looks like it's right here you know there are a lot of different columns and we're trying to predict uh whether you know fraud occurred for this data set for a row in this data set and what time did the fraud occur so features are effectively it's a credit card fraud detection system and our features are uh are representing a binary classification of zero meaning no fraud one meaning there is a fraud so let's analyze the columns within our data set uh there are a whole bunch of floating point value columns we see here that we do have a prediction or our target class called class which is an INT which means it is most likely a one or a zero now let's you know explore our data set let's look at our distribution between our yes and no classes or one and zero classes so you know we can straight up see that the zero class which means the non-fraudulent transaction classes are overwhelmingly much larger than the one class we probably wouldn't be able to even see it here so this is not you know ready for model training so we're going to have to balance this data uh we're going to have to analyze how many of these features are correlated to our predictors Etc which we can do in the next step so you know this one a one class is too small we only have 492 one samples well we have 284 000 zero samples so then we go in and analyze you know fraudulent transaction distribution or across the amounts and the number of transactions that have happened so we can see a whole range of transactions at different amounts but a lot of them it just does seem to be SQ distribution now we we see percentage of our imbalanced data set so there's only 0.1 percent of fraudulent transactions well there are 99 of non-fraudulent or regular transactions so before we actually try to train our model we're going to balance our data set I'm choosing it two is to one uh sampling ratio where for every two non fraudulent transaction we have one fraudulent transaction within our data set now let's just take a quick peek at our data set everything is good uh let's analyze how our features correlate to our Target data set and one thing you might be noticing is I'm using the plotly chart to visualize my data set just like in your current data science environments you can leverage any visualization tool within studio to analyze and visualize your data set you can use you know popular Frameworks like Seaborn matplotlib plotly Etc so you know just taking a look at this data set it looks like our class or our y column strongly correlates with some of these features particularly V16 feature has a very strong negative correlation uh uh and uh yeah we do see a very strong positive correlation with v11 so it's a mix of uh it's a mix of correlated and un uncorrelated features now we can choose to in this case eliminate some of the features so our model is more tuned or finely tuned to the to the strong predictors but for this example we'll keep all our um all our X1 to xn features while we're training the model now we go in and we check our distribution across fraudulent and non-traudulent transactions so as we as we thought it's a two is to one ratio and then we summarize our entire analysis and upload our credit card uh fraudulent transaction data set that we balance into S3 using pandas built-in S3 upload feature so we just like you write it to a local filer you can reference your S3 path and the data frame will be written into S3 this is this is fine you know we did the analysis now we know how we need to take our data set and prep it how do we do this at scale and for that we get into processing jobs so just like the previous notebook I'm going to choose an image type an instance type and I'm going to walk through this notebook so for this notebook I do need to set up some session variables such as I need to instantiate a new sagemaker session I need a bucket where I can write the artifacts of my processing job I need the role information Etc now prior to kicking off a processing job and running the state same steps as I did before I need to upload a my data set to S3 so it it is accessible to my processing container so I'm going to go ahead and do that I'm going to upload my credit card uh my unprocessed data set to S3 I'm now going before I kick off a processing job let's take a quick look at what the processing script looks like so in my pre-processing script it's the exact same procedure as with the Explorer 30 data analysis notebook so what we're doing here is we have a few Imports we're going to read a few uh command line arguments that I can control such as you know what my training size needs to be my validation size and my test size needs to be during the data split I'm going to provide a random state so I can deterministically repeat this or repeat the process if I need to in the future I provide which Target column I would like to process and finally the input path as I mentioned is going to your S3 path is going to be mounted into optml processing input and if you have multiple folders from that point all of those folders are visible from this path onwards your output you can dump it into opt ml processing output directory and sagemaker processing job will take care of uploading that to S3 so you don't have to manage the S3 pull and push yourself so in my processing job I I split my transactions I balance my data set I split it into test strain and validation I write the data set into my local output path and I log a few parameters and I'll explain in a quick second what these parameters are and I complete the job that's the essence of my pre-processing script so I'm trying to replicate everything I did on the Eda side inside a script now how do I kick off a new processing job so I'd say it's pretty simple I create a new scikit-learn processor I choose the framework version I'd like so there are a few different variations a few different framework versions that are available to our users you can choose the right one for your use case you provide the type of instance you'd like and you control the number of instances that you'd like to run this processing job on if you'd like to distribute this across n instances you can definitely do that you can choose the right instance type and distribute your processing jobs so your processing job scales with the amount of data set you have and finally I kick off a new processing job by providing my script as an input so my script is located here provide that as an input I'm going to supply a new unique job name and finally a supply processing input which is the S3 pad where I uploaded my data set let's do this is supposed to be V1 on my data destination for that data inside the container and whether it's shared by S3 key there are other distribution types you can choose from and my processing output is now going into this path and inside the container that output can be found in this path now the arguments that I'm using to control my processing job can be supplied as arguments so my training size my validation size Etc can be controlled here now one more thing you may observe is I'm wrapping this entire run or this entire execution with what is called as an experimental run so sagemaker has an ml Ops experimentation toolkit called sagemaker experiments where every experimentation that a data science Persona runs can be logged and tracked at scale so how do you do that and I'll show you in a quick second where you can find your experiments so you can log your experiments into Amazon sagemaker experiments with the Run decorator and ensuring that that run decorator or switching back to your processing pre-processing script you can create a new session inside your session you can call a load run and just ensure that that load run is wrapped around your execution inside your script so what what does it achieve when you go into sagemaker experiments I've created I've already created an experiment called credit card fraud detection and inside this it creates a new pre-processing run inside this pre-processing run you get information about this job such as what is the iron for the job what is the name of the job when it was created by whom it was created or Modified by whom and created by which user and you can take a look at some of the components of the processing job so for example you know what where was the input and output path what was a random state that was used to Target column test data set size so everything is logged in here so you can think how this might be useful for reproducibility so let's say you do your processing uh job today 10 days from now if you were asked to reproduce the same job without sagemaker experiment tracking it's really hard you'd have to take notes or maybe try and remember what the parameters of the job were but with sagemaker experiments with a simple decorator wrapped around your execution you can just go back in time take a look at uh your experiment and the parameters and simply reproduce your experiments so what have we done until now we have seen exploratory data analysis inside a notebook a standalone notebook if you were trying to replicate that process at scale we've seen how sagemaker processing jobs can be very useful and how tweaking the instance count and instance type you can accelerate a processing job with a large amount of backend data set what's next what comes after processing we have flexible model training and tuning at scale so once you've completed with your pre-processing actions you know I want to be able to train a model using that data set now with Amazon sagemaker you can leverage uh sagemakers built in algorithms such as k-means xgboost PCA you just provide your data set you choose the algorithm that you'd like and you can kick off model training allow sagemaker to do the undifferentiated heavy lifting of parallelizing your data set training training it efficiently you don't have to worry about all of that bring your data at the end of your processing job cycle and trainer model with sage makers one of many built-in algorithms now what if you know your organization wants to bring in their own algorithms to do this and you can definitely do that you can bring your data set you can bring your custom algorithms as a python script and you can leverage sagemakers supported containers all of these containers are optimized for CP GPO execution you can select the framework of your choice you can bring your containers you can use the data set at the end of your processing job cycle to train a new model lastly if both of those methods don't work for you what other option do you have you can definitely bring your own framework bring your own data set and bring your own algorithm in the form of Docker containers and leverage sagemaker Training Services to train your models at scale your training jobs are controlled uh using sagemaker Studio notebook you can kick off training jobs in one of in any of the three methodologies that I just previously described using studio notebooks or you can go into your Management console and create a new training job using the UI how can we train our sagemaker models using data in S3 or the data at the end of our pre-processing cycle so I have a notebook here that's going to walk you through that just like the two previous examples you can come in and you can choose an image type you can choose an instance type and you can get started with the new model training task but at this stage oftentimes you may want to use more complex Frameworks so you can definitely choose from our many framework types and versions and you can even choose GPU instances if you'd like to build your model locally within the notebook and you can switch out the instance type to let's say g4dn GPU enabled instance so you can train your model reliably without having to run into Auto memory issues or slow training jobs if you're only using a CPU instance that's a possibility but now for this example we're going to be leveraging training jobs so let's walk through the notebook so we again initialize some some sagemaker sessions which we'll be using throughout this notebook the output path from our previous processing jobs are referenced here by training data set is found in this path validation and test found in the respective paths now there are three ways uh or three examples that showcases the uh that showcases sagemaker training so in this example we're going to be looking at scikit-learn training using decision tree classifier I've defined my framework and my framework version one thing you may notice is the framework question I'm using for this example is 0.23 which is okay just shows you that you can choose any framework version that you like in the previous example I used 1X in this example I'm using uh 0.2x version so let's choose our framework and frame equation now we Define a new prefix and uh get a new date time string so we can customize our training job names so I can identify them later on and I'm going to log all of my training job into my experiment so I can visualize outputs from my training job such as my loss value my accuracy value my phone score Etc the method that I'm using to train a new model is with the bring in on our own algorithm but I'm leveraging sagemakers sklearn extension to do that so here I Define a new sagemaker psychic learn extension instance I have defined an entry point which is a train dot Pi which is under Scripts trainingtrain.pi and we also see customers sometimes asking for additional packages on top of what may be defined by sagemaker and right here you can choose distributed training or single instance training for this example it's just one instance but you can scale the instance count and allow your code to leverage uh multiple instances for Accelerated training you can choose your instance type this is a CPU instance type but you can swap it for a GPU instance if you'd like or if your training demands that for let's say Frameworks like pytorch and maybe CNN model training and finally all of my hyper parameters that govern how my model is generated are all controlled using command line inputs to mytrain.pi and I can specify them as hyper parameters during the instantiation of my scikit-learn class now you see I've defined model Criterion model depth Min Leaf size if I go into my training script all of those parameters are art cars arguments to my training job now a few more things that you might observe is during a training job sagemaker will provide you with environment variables such as training Channel uh validation Channel and test channel so the data in S3 is mounted onto your container it's similar to processing job but they're mounted in specific paths so you can just specify the OS and Broad values for these and they can be picked up and uh and used throughout your training session your model output uh your model dot there you can dump your model right there so your model will be art will be zipped and available to you for Downstream usage and any artifacts that you generate may be images or maybe charts Etc you can dump them into output directory and they can then be used for analysis Downstream analysis but you know we'll we'll kind of see how sagemaker experiments allows us to do that and not even use our output directory for this task and then I prep during uh the fit operation I reference my train validation and test data paths and I can kick off a new training job so this is going to be running in the back end but let's let's move on to the next example uh which is using first Party HD boost uh classifier training so in a very similar way I'm going to using sagemaker's image URI retrieve function I'm going to check what images are available for the XC boost framework so you know these were the available versions as of the as of the notebook was prepared but they might be more if you upgrade your sagemaker python SDK so you can come in I'm choosing my specific frame equation of choice and now I'm going to be calling my XC boost training as training XG boost so I can identify that in my sagemaker experiments with the built-in algorithm you don't have to you don't have any code that you need to specify you just have to ensure that your data set is visible to the algorithm just like here I'm providing this as uh as inputs and I'm specifying the content type and I provide my role instance count my sagemaker sessions sets set the hyper parameters for my training session and I can kick off a new training task also note that the hyper parameters that I set are exactly the same as open source XG boost hyper parameters so there's nothing special about the hyper parameters that you find um would sagemakers first party algorithms so this is going to kick off a new training training task so let's go take a look at both of those sharing tasks so if I go into experiments I can see an SQL learn job that's currently running and and also the XT boost shop that's currently running but you know we actually don't want to watch paint dry so what we could do is I have a job an old job that I have that we can take a quick look at so here's a nice training SK learn using bring in our own algorithm So within my training job I actually logged values such as final recall uh train mean at sample size the accuracy value or cross validation accuracy value at different sample sizes uh during train and then test and those values which are continuous outputs will be available as a summary for me but what if I'm a more visual person so you can go into your charts the same logs can be visualized using line chart so for example I can choose step wall time or relative time but for this example I'm gonna choose step and I'm going to log my training mean at sample size values now I can say it's mostly flat you know the accuracy value for my training job is around 0.97.98 so it hasn't varied a lot so that's good now I can even visualize final accuracy of final left phone score in the form of a bar chart as well and the last thing is if you the most common practice is to lock confusion Matrix for us to analyze uh at the end of a training session so sagemaker provides a built-in method to analyze confusion Matrix and you can log it very easily within your training job using run DOT log confusion Matrix API extension well this is good I can analyze a single run um and I can do some deep dives into the model and maybe leverage um learnings from a training session to improve my next set of models but what if I'm running a hyper parameter tuning session what if I know the parameters or at least the ranges or parameters that I'd like to train the model on how is sagemaker experiments going to help me so for that you know let's just uh look at a very simple hyper parameter tuning example I'm replicating the the Bring It On algorithm example using scikit-learn the exact same framework question but I'm going to be altering the hyper parameter values into my training session and I'm going to kick our four training tasks so all four of these training tasks are running in parallel and you can visualize them either in sagemaker experiments once they're done or you can look at the status of these jobs on your management sagemaker management console or you can use API to to get the status of your training job now again we don't want to wait for these jobs to complete so I'm going to open up experiments I do have a few jobs that I'd schedule before this let's probably take a look at those now as you can see with similar naming convention I have a few training jobs that are that that have completed before this now I can go into individual training jobs take a look at the metric summary table see how different models are performing one after the other but that's not efficient and that's not how people like to perform analysis at scale so what other Alternatives would can I suggest well you can actually go in let me remove the the jobs from this table what you could do is select a whole bunch of these runs simultaneously and you can analyze them in a side-by-side manner so here's the same table overlaid uh every job overlaid on top of each other uh from this I can say possibly the first run was the best which is uh it says a depth of 1 and Min Leaf of two which is great but again I'm a more of a visual person what do I do so I can create a new bar chart and I can select my final F1 score and see how all of these different models compare against each other you can also check any loss outputs that may that I might log during my training session so I can come in and I can say I want to look at the main you know test accuracy score for given different sample sizes that allowed during my training session as you can see all four of them look very close to each other not a lot to to differentiate them but from this analysis I can quickly come to the conclusion that that possibly the the if you look at the naming convention you could you could say that one run is better than the other so possibly the Run ending with 4042 is the best that's great now I can go back take a look at let's see here we go the 4042 run I can look at where the model is I can look at the hyper parameters about the job I can take a look at the artifacts if there were any for the training job so my 4042 training job was the best now what do I do with that data set what can I do with the model or where is that model that's available that's at the end of that training cycle so what you could do is uh here's the best training job I have it already kind of uh written out I'm going to attach uh to that session and I'm going to take a look at where the where the model artifact is so let's copy this here okay sorry to redefine and reattach there we go services are model right here what do we what can we do next we can register our models into model registry a model registry effectively is designed into model package groups and model packages themselves model packages are individual model entities such as the uh the model at the end of your training session uh parameters that that went into generating the model the algorithm type Etc and if you have multiple models generated for a single experiment type you can group them into model package groups so in essence model registry is a collection for your versioned models so what do we do with the model at the end of a training session so what we can go ahead and do is first create what is called as a model package group so model package group in this use case is all the models generated from a credit card fraud detection data set will go into credit card fraud detection model group you can have decision tree classifier xgpost model and different kinds of models all sitting under a credit card fraud detection model group and you can have multiple versions of either XG boost or decision tree models and they'll all be automatically versioned and you can use model package as your model package as a central hub for deploying models to an endpoint controlling whether some of these models are okay for deployment can be States for deployment or is rejected for deployment so let's go ahead and create a new model package group I've already created one so let's go take a look at that so here I have a model package already generated so in my model package I have two versions already registered let's walk through the process of registering a new model so given this is my best training session I've just attached myself to the best training session and now the best sessions model can be registered into a model registry by defining a few things about the model so you could control your model's input and output content type you can say you can only accept Json or text CSV and you can control what types of in instances that can be used for inference so you're setting up guardrails around your model for deployment and you can set the approval status to either pending approved or rejected automatically uh during registration you can you can you can control the description of your model so I'm going to go ahead and do that oops I guess there is a framework ah there we go the pending manual approval all right there we go our new model is registered let's go into model registry let's open this up here's a new model waiting for manual approval if I was uh a product owner for this particular model based on the description and based on the metrics of the model I can come in and either reject the model approve the model or have further discussions with the data center generated it so here we have and we can open this up this model this new model is now approved and it was updated by the data science user which is me updated by this Persona and now ready for deployment for the next set of actions post model registry I'm going to hand it off to Paul to present the rest yes so this is Paul and I'll be picking out from where pranav left off which is you know having built a model and storing it the model registry so in the remaining time we're going to talk about sagemaker's ml Ops tooling which are primarily revolve around pipelines and projects so as you already know um any any project is not complete until you've deployed in you know to a real time either a real-time endpoint or a batch transform so sagemaker has both of these capabilities um today we're going to use a real-time inference endpoint okay so I've got a demo that I'm going to walk you through this but I wanted to um bring in some overview information so when we're working in the ml lifecycle most of everything we do is iterative so that includes building training and tuning running pre-processing all the things that print objects walked you through so these things happen over and over again so these tools that we're going to show you next are to help you efficiently achieve those goals we're going to be able to get models from concept to production we're going to be able to track artifacts deploy and manage particular versions of the model and then automate at scale sagemaker pipelines is a purpose-built tool set to do exactly these things it's going to help you compose and manage ml workflows you'll be able to track model lineage replay and rerun workflows so that once you create them and get them working you can simply schedule them or or trigger them manually we will look at the visual tools available within sagemaker studio there's the model registry which is the central repository of trained models and we're also going to talk about how you can satisfy full CI CD support Within These tools so sagemaker pipelines then you know can be used across the end-to-end ml Spectrum so all of the all of the steps that uh prenup just showed you in pre-processing training tuning these can all be codified within a workflow so we can prepare and transform and train and validate and so at the end of his presentation we left off with the model registry and we need to deploy these things into production to get value from them So today we're going to use the SDK toolkit in order to create a pipeline and I will show you the both the inputs and the outputs of that also within sagemaker studio we can visually inspect the pipeline at any stage while it's running we can look at workflows while they're running or after they're completed so um the the typical steps uh in a workflow include many different types of jobs so we call these step types and some of the ones we'll show you today are processing training uh conditionals register model and and such so I will show you the code that actually creates each of these steps and then the pipeline is in charge of executing these steps in order or in the order that we specify so and the conditionals also give you Branch points where you can check for a certain condition and take one route or a different route within the workflow so we will pick up from the model registry that has just been built and that's where we'll begin our journey the the CI CD portion of the workflow is handled and and constructed by sagemaker projects which is related to sagemaker pipelines and projects that's the first thing we'll do today is create the actual project projects exist from a series of templates sagemaker provides a set of templates out of the box there's about seven or eight of them currently but you can also download modify and upload your own organizational templates we'll we'll look at this in the um demo portion but uh project a project consists of a series of hierarchical artifacts and we will walk through those all right I'm going to stop this uh presentation and I'm going to jump over to sagemaker Studio um actually before we walk before we look at Studio I want to show you that the stuff the examples I'm showing you today are are freely available under GitHub and this this repository is the one I'm using today so we'll provide this in the artifacts with the webinar so I've just launched you know from the the home page I I've pulled in a set of project templates and these are available within what we call the service catalog and if I want to search for them I can I can search here and I can see that the service catalog is a version set of templates and those are the ones we're going to use today so let's go ahead and get started so I've got a couple of notebooks up and running from the repository that I showed you first thing we're going to do is create a project and we can do this in two ways we can create it programmatically using the SDK which is what I'm going to do today or we can use the visual tools within sagemaker studio to create so this notebook has already created a project so I'm just going to briefly run through the code on how to do this you see that we we pick up data from the service catalog we choose a provisioned artifact we then locate a template next thing we do is we programmatically do a create project so this will this actually created the project for me and we see the outputs of that creation here we wait for it to complete and now we have a project so the next few screenshots here within the notebook show you how you would do this within studio and so we again we start from a particular template we select that template and tell it to create the project and we wait for it to complete and then we're ready to go so now what is being this this shows the actions that we just took so whether we do this programmatically or through the studio visual interface the same steps are occurring so we're importing from the service catalog we are um we're going to be creating a pipeline in just a second and we're also the project is also going to create some seed code for us that it's going to exist in code code build and code commit and then we're going to tie it together with code pipeline I'll show you more of these details as we go so this project was already created if I hop over so in the studio UI most times you're seeing what is this file browser and this left-hand panel is the navigation panel so I'm sitting here in the file browser but if I come up to the home button this is where all of the components the sage components can be accessed so we could get access to you know feature store data Wrangler experiments pipelines from right here what I did was I opened up the model registry I'm sorry I opened up the projects screen which is right here so I'm going to hop over there and look and so you see that um there's a fairly new project here that was created today and so this was the project that we got created and so if we open that up and we see that there's a hierarchy of artifacts that are part of this project the first thing you'll note is that there is a repository we can we can clone this which would bring it down from the remote to our local file system and um so that's an option that we have if we wanted to edit that code we also have the sagemaker project also created a pipeline for us so we can open that Pipeline and look look at it you can optionally create an experiment and the end goal um as as product showed us at the end of the training and tuning is that you have a model group to to view so this notebook essentially created what we call a project and and then you know we can we can interrogate that project and get information about it right here the the next notebook that we're going to pull from is is actually showing the model deployment and showing the deployment phase of the pipeline I'm gonna yeah so this is at the we've already created the project and now we're ready to uh create the pipeline the um the actual creation of the pipeline is here in this notebook and so I'm going to scroll down to the creation of that okay so here's here's a simple uh pipeline showing uh training uh build train in tune so um what we have to do is we have to create these objects one by one so we instantiate them just like we would if we were executing this training live we we create the training object and then we call do a trainer an estimator.fit but we encode that as a training step and this is part of Pipelines so this is one of the steps in the pipeline we will also do an evaluation step we had we created a pre-processing step as well I'll show you that in a second so we're instantiating these objects and then the last thing we do is encode them into a particular step depending on the type of Step that we are executing we then create a model here so we instantiate the model and get some metrics and then we again instantiate it as one of the types of steps this is called a model step and I mentioned that we could do conditionals and so it's convenient sometimes if a condition fails and and fails to give you the proper result you can encode a fail step and that will that can be used to alert your team of ml Ops people that a particular pipeline has failed here's the encoding of the condition step right here so we have we we embed some condition that's based on the these metrics that have been produced so in this particular instance we are going to compare the AUC score which has been produced by the the training rep now the next step is to actually create the pipeline so we've talked about pipelines and what they do now we're going to show you that we are actually creating the pipeline using the SDK so this this code actually creates a pipeline we give it a name and we pass in a series of parameters that control the execution of it and then the most interesting part is that we wire together the steps in a particular order and we mentioned those in it in a a list here so these become the steps of the pipeline so what we see is and and pipe sagemaker pipelines will render a diagram for you that shows you exactly what your pipeline is doing and so this is an example this is from the gitlab repository this is the example that we're using now to execute the pipeline after it's been created we have to do an upsert which uploads it to sagemaker and at that point we can list the pipeline definition and I just wanted to show you that real quickly I pulled the pipeline definition out so that we know what's going on and this this is actually the real pipeline that I just created and you see how it encodes all of the information that we're interested in has a section on parameters and then here come the steps and there's a processing step if you recall there's training steps there's a valve steps here's the training step and so the pipeline that we've created is just a Json file that includes and encodes all of the steps the arguments the parameters and the conditionals so that's what we just created so we're able to You Know download that here now to actually execute the pipeline and we call on this we do pipeline.start and we provide the instance types and accounts that we want to use in order to run that pipeline we can wait for it if we choose and then we can list the steps and download that so I opened up let's go look at a pipeline so now we can get to this pipeline from a couple of different ways we can get to it from here so here we are in our project and so I've opened up this pipeline this is one of the artifacts that got created by the project so I open this up you see I've actually executed it twice since I created it and so we can look at one of these executions so we did right now we can we can look at the settings and the parameters that were used in order to when we created this pipeline but we can also open up an actual specific instance so here's an instance of an execution of the pipeline um again we can access settings and parameters but we can also drill in and see particular information about this particular execution so we can see all of the input parameters for the training job for example we can see if it had any outputs so in this case we have the AUC numbers we can also go and visit the logs of the training job if we want so those links are provided as well so we can scroll around and evaluate each of these steps and look at and get info more information about each of them and the last thing that we do is we register the model so if we come over here to the model registry then you know this this model package group will show up here and we can open one of those and we see that some some versions of this model were pending but others were approved so I wanted to show you quickly where that was done um yeah so you can do that in the visual UI or you can do it programmatically so this is how you would update a model package and approve it using some code um but you could also come into here and let's open this up and we can we can also so we we can see several things here first of all we can see all of its various uh lineage all of the changes in status that occurred to this model over time so when I created this I first did a I put it in a state of pending manual approval and what that allows is that allows your ml Ops Team to come in and do some validation do some quality check before pushing this into production then I either manually or using the API I approve that model and then you can see that it's gone through a series of deployments so this particular model was deployed to staging so we have these tags out here indicating you know what what um which piece of infrastructure we were deployed onto we deployed it to staging a couple of times presumably we validated things there and then we pushed it into prod so we have this indicator here that we push this model into prop um you can also if I maybe make some more screen space here you can also update the status directly right here using the UI so if you wanted to revert to reject it or or promote a model to approved you can do this with the studio UI so that that shows essentially end to end last thing I will show you is the pipeline itself so we did a we show here a pipeline this remember so now we're talking about code pipeline instead of sagemaker pipeline but this was one of the things that was created by the project and so what we see here is a full execution of the end to end we start with source code so we perhaps this was triggered by a particular code commit made some changes to the source that led to a build we did we used code build to create this and then we next thing we do is deploy to staging so this is where your ml Ops Team would come in and they'd say well you know we've deployed the staging but we need to do some validation and we get the indicator right here that it's waiting for approval so what I can do is I can come in here and I can say maybe get my name and say approved for push to production I can annotate that and then I hit the approve button right here so what that does is you see that the approved deployment step was just indicated and so what that will do is enable this to deploy to our pride infrastructure and so that's that will be going on uh right now so if we were to look over here at endpoints we should uh if we wait a few minutes we should see that um you know they will start to deploy this but here's some from the previous run the staging endpoint is up and running in the prod end point is up and running and then as you know we can interrogate more information about this particular endpoint you know like if if we had data capture enabled um we can see statistics from the endpoint there you know live monitoring of the endpoint Etc so so that so that concludes our walkthrough of both sagemaker projects sagemaker pipelines the lineage tools that go with them and and this is these are the tools that your ml Ops Team can use within sagemaker to you know build manage and and monitor your models from end to end so thanks for your time today look for the artifacts and thanks for attending [Music] [Music] foreign
Info
Channel: AWS Developers
Views: 8,453
Rating: undefined out of 5
Keywords: ML automation, MLops, Machine learning, AWS, Online Tech Talks, Amazon Web Services
Id: T9llSCYJXxc
Channel Id: undefined
Length: 61min 7sec (3667 seconds)
Published: Mon Jul 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.