Introduction to TorchServe, an open-source model serving library for PyTorch

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone this is sashanka AWS in this video I will introduce dr. which is an open source model serving library for pi torch jointly developed by a SS on Facebook we take a quick look at what it is how it works what are its key capabilities and how you can get started with it quickly Tatsu is open source so head over to github.com / pi thoughts / serve and follow the install instructions to install thought sir once you've installed it you can quickly start up a server by running dot surf start and once a server is running you'll get access to two sets of api's the inference API which lets you query for the health of the system but also submit inference requests and the management API which lets your register models unregister models version your models increase or decrease the number of workers per models and so on both these services are listening at port 8080 and 8081 respectively but these are the defaults and you can optionally change them to any other port that you prefer to deploy models dots or supports models written in both eager mode as well as dots clipped you can use the torch model archiver utility which ships as part of thought self to take not just your model checkpoint but also the model definitions and the state dictionary and package all of them into a single archive file called Mar file which you can then optionally also redistribute when you're ready to deploy models in the model store simply use a management API to register specific models that you want hosted and by default dot server will switch on logging and metrics these are both fully customizable which means you can make it as what BOS as you need or make it provide as little information as you need once the models are hosted you can have external client applications such as web applications mobile applications other web services invoke the inference API for prediction requests also natively doesn't provide authentication but you can leverage these capabilities offered by Amazon eks Amazon ECS Amazon sage maker or a self managed and hosted kubernetes cluster for additional security now let's take a look at an example showing these steps in action I'm running this example on an Amazon ec2 instance but you may very well run this on your laptop or desktop simply head over to github.com / pythons last serve and follow the instructions for your platform once you've installed taht serve you can verify that it's installed by running Health and you'll see that there's some helpful information about using toad so starting a server is easy as I mentioned you simply say dot start and you can provide additional configuration parameters in this case I'm providing only one which is model store and this is just instructing thoughts of where to find models and now you can see on the left from the log that dot saw is now running in order to make requests to this server I will open up another terminal there are different ways to do this I'm using Emacs which I encourage will now submit request to talk so management API and inference API and register model Center on inference and so on so to start let me go ahead and download a model and I'm downloading a dense net 161 model from the official PI touch repository and now that this model is downloaded you can see here on my terminal that I have this new dense net model weights next step in deploying this model is to create a model archive file ma our file and to do that we'll use the torch model archiver utility there are a couple of different options here the first thing is to specify a model named ice a dense net 161 you may very well call it something else you can version your models so I specify version 1.0 I specify where my model definition is then specify where the weight files are which I just downloaded and then i'll specify index to name JSON file which maps the prediction outputs to categories or classes and finally the most interesting thing is the handler notice I'm not providing any custom handlers I'm just specifying that I should use the image classifier handler which takes care of initialize pre-processing post-processing so you don't have to manage any of those things and thoughts of provides default handlers for image classification and object detection semantic segmentation and text classification so if you're deploying any of these models you just provide a name and you're ready to go now that the archive has been generated you can see that you have a new file here called dense net 161 dot ma R so this is the model archive file that will move through the model directory model store directory so that dotsub can find it then stat 161 dot ma r ok now that the model is in the model store directory we can now register this model with Tatsu so it can start serving requests and registering is just as easy we use the management API and say models equal to dense net 161 dot ma r this is the file that's in the model store directory and you'll see status that the model was successfully registered you can also take a look at all the models that are currently registered by using the management API again and the way to use this is say C URL 80 81 slash models and you'll see that there is one model that's been registered you can further get additional information about this specific model by appending dense net 161 to slash model slash tenses 161 and you'll see now that I have additional information about this model the name of the model the version and so on and so forth you'll see that currently it says the minimum workers and maximum workers are zero which means there are no workers assigned no CPU threads assigned to serve request and you can change that by specifying the minimum number of workers and we'll do that here by invoking this particular request I am saying localhost 8080 one models dense net 161 I say minimum workers let me say minimum workers to they go dance danced at 161 minimum workers equal to two okay then I got it right now that we requested for minimum workers to I can go back and query the specific model details and I'll now see that I have minimum workers - when I invoke the management API for more information about my specific model okay so now that it has workers assigned to it I can now make prediction requests and to do that I can go ahead and first download an image and I'm going to download the image of Khitan and you'll see now that I have a kitten dot jpg on my terminal here now I can submit inference requests and I can do that using the inference API at 8080 so this is what I am doing I'm calling the local host inference API predictions its prediction / dents net 161 and kitten jpg and you see that the model thinks it's primarily tiger cat or tabby cat and right below on the terminal blocks you will see that you'll see instant 161 here and log information about the inference request being made great so now we have one model that's hosted how do you host multiple models following the same process I can now host a faster our CNN model let me go ahead and download this particular model just like we did in the past I will go download this model create a model archive file which is the same process as before and notice now instead of the handler being image classification the handler is now an object detector handle which means again I don't have to write initializers pre-processors both processors everything else is taken care of I go generate this model archive file again same process like before this new model dot M AR file is available to me now as you can see here first our CN n dot M AR next I'm gonna move this model past our CN n dot M AR to the model store now this model is available to be hosted and I can register this model same processes before I call the management API and you will see that the model was successfully registered I simply had to specify the name of the model and it's now successfully successfully been registered same as before in order to make this model work I need to assign workers minimum workers I specify minimum workers equal to two and I can again confirm that by using the management API if I scroll up I can see that I now have minimum workers two because that is what I requested you can request more workers if you have high traffic for a specific model you can scale up or scale down just as easily with a simple API call and finally now that it's been registered I can query the model make inference requests just like before and you can let's take a look at all the model that's been resistor so I run management API slash models and you see that I have two models now registered and you can query add models delete models you can version models everything is very simple so once you done with Tatsu you can go back and ask thoughts to to stop and the server is stopped and then you are all set
Info
Channel: Shashank Prasanna
Views: 4,142
Rating: undefined out of 5
Keywords: pytorch, torchserve, machine learning, deep learning
Id: AIrrI8WOIuk
Channel Id: undefined
Length: 10min 58sec (658 seconds)
Published: Tue Apr 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.