How to Serve PyTorch Models with TorchServe

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome to pytorch 2021 hackathon my name is hamid i'm on partner engineering team from pytorch at facebook ai my job is to help ml developers like you to improve their solutions with pytorch using its best and greatest features i'm also one of the co-maintainers of twitch serve today i'm excited to tell you about torsor and give you some example of how it works so you have a better idea on how you can use it in the hackathon so during today's video i'm going to start talking about model serving then i will have a quick overview of tour serve and walk you through how to use it finally i'll end with the demo so you can see it in action what is model serving you can you have trained your model and now we have to integrate it into a larger system to make it available for running inferences i'll be using the term model serving to refer to this integration and the subsequent use of the model torserv is the native model serving solution for pytorch it's a performant and scalable tool for wrapping your pytorch model in an http or https api in this short video i won't be going into advanced features of tor serve instead i'll focus on stuff that you need for the hackathon namely how to install your torch serve and start serving inferences from your model first step is to choose your handler this can be a default handler that torsure provides out of the box or your custom handler handler in tor serve is a python script that includes all your model initialization pre-processing inference and post-processing code we already have out of the box handler for a number of applications like image classification segmentation object detection and text classification and you always can write your own handler that fits your need this can be especially helpful in cases where you are using a pre-trained model from a third-party library such as hugging face or others as you can see on the right hand side you can simply initialize your model and apply your custom logic for any of the pre-processing inference and post-processing steps using the four main functions here after you choose your handler second last step is to bundle your model and all supporting files that it requires into a model archive that is a mar file in torso and place it in the modular store that torserv can see it next step is to start torster and configuring it for serving your model using the torsos inference api management api will give you the control to register unregister scale up or down the number of workers for your model and finally where to find when to find the log so you can see what's happening under the hood if you participated last year and sold torch surf tutorial then you will notice that we have added several examples including hugging phase transformers and mmf models which is a multi-modality framework from facebook ai research that allows you to combine signals from different modalities we also have added ensemble support where you can define sequential or parallel pipelines with several examples to help you get started easily torster now has a new explanation api powered by captain that provides insights about your model outputs on different levels i encourage you to have a look at captain library on captain.ai also if you are a kubeflow user torsterv now has integrations into kubeflow pipeline kf serving and google vertex ai if you are using mlflow as your email ops library tour serve has been integrated as a deployment plugin with different examples that would make it easy for you to get started course serve overall has grown in multiple areas along with the new features many performance optimizations has been applied to it this may not affect you if you are a first time tour server user but if you are more advanced you will want to be able this i have included a bunch of examples here if you are interested in checking them out i will add the links in the comments section of this video if you are a cloud user and architecturing your solution torserv has integrated into different services on major cloud providers namely aws google cloud and azure i've included more examples here check out the comment section of this video for the links before going to the demo i wanted to mention that tour service in production with companies including metroid toyota research institute and what one eai serving models for different use cases from computer region to nlp another example of a team using tor service dynabench a research platform between facebook ai for dynamic data collection and benchmarking if you follow ml competitions in the space you might know flores a low resource language competition flores is running on dynabench which uses torsurf please check torsurfgethub repo for more details and examples at fighters slash serve now we are going to switch to the demo part now we are in the demo section first let's make sure that we are in an environment with pytorch 1.5 or higher that we are using 1.1 which is the 1.9 which is the latest version and we need open jdk 11 version 11 of java development kit and if we have these two we can install tor serve along with model and workflow archiver workflow archiver recently has been added to support ensembles we can install torch serve using pip or on a condom i'm using pip here now tor serve is installed for this demo i'm going to use an image classification example from resnet and uh using this example from tor serve repo you can find this example in github dot com slash pytorch slash serve under examples and here i'm going to break down the steps for you the example uses a pre-trained resnet 18 model from torch vision i'm showing this example as transfer learning is very popular in the mls space these days here i'll show you how this works how you can extend it in your case and if you are using a pre-trained model with torsure how you can serve your models you are going to access the model through torch vision models and if you want to fine-tune the model based on your data and the task that you are working you can use it here in turnserve we can serve the models in both eager mode and alternatively we can torch script your model and use the torch script mode pytorch recommend torture scripting for deploying your models especially in production setting torch scripting models gives you an intermediate representation of your model that it can run in a python free environment this will bring a good speed up for your inferences this is an optional step and part of the performance optimization recommendations to learn more about pytorch model and performance optimization offerings you can check pytorch.org tutorials i'm going to save the model weights here and we are going to use the eager mode for this presentation now we are going to create an archived file which is a single file that bundle your model checkpoints along with all the supporting files that it needs in case of nlp it can be a vocabulary file or a index file to make the result of the predictions more human readable i'm going to use this comment important things to notice here are the name of the archived file the version flagged as it specifies the version of your model as tor serve simultaneously can serve multiple instances of the same model for a b testing or benchmarking purposes then we have the model file flag that you cannot specify the model class definition we are using the model class definition provided by the example which basically borrows the resnet 18 definition from torch region then we have the serialized file flag that let you specify the model baits the handler flag that let you specify the handler here we are using a default handler in torso for image classifications but if you have your custom handler you can point it to the path of the handler here extra file flag will let you to add the supporting files to the model archiver that here in this case we are using index to make the predictions more human readable and if we run this comment it's going to create the dot mar file for us this is the archive file that we made now we have used the python we have used the model vats the model definition and the index to archive the file and make them other now we are going to make a model store and move the resin and move the archive file to the model store next we want to start towards serve and we are gonna use this comment here again important things here to notice is the start flag for torso then with the model store we are specifying the path to the model the store where our archive file leaves then with the models flag we can specify the name to the api endpoint and also the archived file that is going to use and at the end we have the ncs flag that stands for no configuration snapshot by default torch serve saves its state as a snapshot that next time that you restart it it will restore uh the snapshot for the purpose of this uh demo i'm gonna turn it off now we are gonna start towards surf we can see that the logs are scrolling the model threads are set and we can see that the worker threads are set here then the model has been loaded into the memory and it's ready to serve them all to serve the inferences now i'm going to switch to another terminal and start to query the general status of the tors server i'm going to send the request to the management api to check the status of the tor serve and generally by default port 8080 is assigned for the management api now it shows us the model name and the model url that basically is the archive file that we specified and if we want to learn more about the details of the file of the model which is being serving we can add the model name to the url and then it will show all the details for us that we can see the model version along with the runtime the number of workers batch size and all the details about the model that is being served next we want to path an input to run the actual inference and for this we are going to use port 8080 which is basically uh assigned for the predictions api and we assign the model for serving our inference then we are assigning the model input as well and here we can see that the result of the model inference are top five predictions from the model and basically uh we also have the explanation api that can provide the insights about your model outputs i'm not going to go through it here but i encourage you to look at the examples in the repo to learn more about it next we want to stop tor serve and we can simply call torster from common light with this top flag and now tor serve is stopped to check your logs you can basically check the logs directory that has been added to your folder and if you see you can see that it has different logs about the access to the mobile server the model log the torches the tor server logs and horse serve metrics and basically if you're running into any issue model logs would be a first good place to look at before we finish this demo i want to skim through an nlp example that uses a custom handler to serve the popular hugging phase transformers the example leaves here in hugging phase transformers under example in torso repo the main point i want to show you here is that you can write a custom handler which is a simple simple python script that initializes your model and plays all the processing and pre-processing and post-processing code in one place this will bypass the need of providing the model class definition uh explicitly to the model archiver here in this example we initialized the model and tokenizer from transformers and you can see that all the pre-processing and post-processing code goes into one place in this handler another important thing that i want to highlight here is that all the extra files that you have passed to the model archiver are accessible here torster extracts the archives file in a temporary directory and now you can access all the files that you have bundled here in the handler and then you can use them in your custom logic as needed i highly encourage you to check these and other examples in tor serve repo for more use cases so now that you know what tor service and how it works you should have everything you need to get us started with torster and hackathon feel free to reach out to us with any questions suggestions or feedback by opening an issue on torso github repo thanks so much for joining us today good luck on the hackathon
Info
Channel: PyTorch
Views: 17,572
Rating: undefined out of 5
Keywords: pytorch, hackathon, torchserve, model serving, ai, artificial intelligence, machine learning, ML, deep learning, API, ML model
Id: XlO7iQMV3Ik
Channel Id: undefined
Length: 16min 22sec (982 seconds)
Published: Tue Sep 21 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.