Deploying ML Models in Production: An Overview

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody and welcome to a new mini series in this series you're going to learn how you can effectively deploy your machine learning models into production the mini series is going to be made up of two videos this video and the next in this video you'll learn different strategies that you can use to deploy machine learning models into production from a theoretical perspective i'm also going to give you an overview of the different ml deployment tools available on the market in the next video we're going to take one of this ml deployment tools namely bent ml and we're going to use it to deploy a sample machine learning model into production let's get started there are different strategies that you can use to deploy machine learning models into production one is to take a model and wrap it within a service and serve it through a rest api using endpoints another one could be to take your model and directly deploy it on an edge device like arduino raspberry pi but if you don't need to have external access and rather you need some sort of offline computation you can just do batch serving of course you're gonna decide on different deployment strategies depending on the use case that you're tackling in my practice as a consultant in machine learning machine learning operation most of the time i see option one strategy one used in other words going for serving a model through a rest api now let's take a look at a basic approach to ml deployment and in particular to deploy a machine learning model within a rest api of course we would start from the model itself the model can be a keras model could be a pie torch model scikit model whatever you want really now you take the model and you build an api around the model using a web framework like flask or potentially fast api at this point you have all the ingredients to serve your model to the external world so you can take your service and deploy it directly on a computation unit like aws ec2 and instance right but there is a more sophisticated and better practice approach which is rather than taking your service and directly deploy it onto a computation instance you could just containerize it so you create a docker image out of it and now you can take the docker image and then deploy it directly on an ec2 instance for example or on gcp wherever you want really or if you have a system that's more complex because perhaps it's made up of many different models with a lot of different use cases uh the best case scenario here would be to take your docker image and then deploy it on in a kubernetes cluster in this case you would have sort of microservice architecture now this is an approach that i've seen used multiple times well i mean all of the different approaches that we've seen so far but this basic ml deployment strategy or strategies because it's more than one really have some drawbacks so they have some disadvantages first of all the whole process is quite convoluted you have to go through a lot of different steps for example you have to package both the ml code and the model artifacts together and you have to do this in a custom way and it can be a little bit of a hassle doing that also you have to create some infrastructure around your solution the infrastructure is needed for yeah of course creating the kubernetes cluster or creating a monitoring infrastructure that will monitor all of your models deployed into production or it can be as simple as creating some documentation around your service api none of that comes for free because you're doing everything from scratch on top of that there's the issue that the web server that you're using most likely is not going to be optimized for ml inference don't get me wrong here i think that flask and fast api as two web servers that you can use are fantastic but they're not particularly designed for ml inference they would expect high throughputs like for example for many web applications but not high computation so in that respect the web servers that you're using are not ideal for the use case of ml inference for all of these reasons we see now the advent of ml deployment tools on the market some of these tools are open source others are property here i just want to list a few of them and guide you through some of the pros and cons that they believe they have and then we'll focus on the one that will analyze more in detail okay let's get started from tensorflow serving now tensorflow's serving is a deployment tool or framework if you will which leaves within tensorflow extended tensorflow extended is a sort of framework that manages the entire lifecycle for a machine learning project so tf serving is just a part of that but it's quite powerful it allows you to deploy directly your tensorflow models now this works like a charm whenever you're using keras and tensorflow models but if you're using other models like pi torch or psychic learn of course tensorflow serving is not going to work for you also the other issue that i have with tensorflow serving is that just like for most of tensorflow documentation isn't really that great another valued option is ammo flow once again ml flow is not just a tool for deployment there's way more um to that and indeed ammo flow takes care of the entire life cycle of a machine learning project but it has out of four units that makes makes it up it has one that's called ammo flow model now ammo flow model is a standard that allows you to package an ml project in a way that can be easily deployed into production but now you don't necessarily have a simple way or direct way of doing this deployment you just have a very nice way of packaging uh your project and to easily dockerize it containerize it and then once you have your container you can deploy it wherever you want so ammo flow model i think it's really really cool as a solution but it has some issues so first of all it can only be easily deployed into azure and siege maker if i remember correctly then the other big issue that i have with this approach to deployment is that you're gonna get a lot of libraries you're going to get a lot of noise that you don't want and so it's not really a lightweight solution let's take a look at another option which is selden now selden is yet another framework that uh manages the entire uh sort of like ml workflow uh the cool thing about it is that it is completely integrated and built on top of kubernetes so if you are thinking of using like kubernetes and deployed there your models then seldom is a really good solution the only issue that i have with uh is that it is uh sort of like created by a company and it is distributed by a company so i personally prefer to go with completely free completely open source solutions and regarding selden so i said that this is a full framework for ml projects but it has a part of it that's called selden deploy which is responsible only for deployment on uh the department of your projects there is also another great alternative if you want to work on kubernetes and this one is completely free completely open source and it's called k serve k serve is a part of kubeflow kubflow being this ml workflow framework that has been developed by engineers at google and it's been used internally to work on machine learning projects this is quite cool because it allows you to serve directly your models into a kubernetes cluster and it is part of the the kubeflow environment so if you want to do more than just deployment you can easily do that and of course if you want to do all of that on kubernetes now the problem with k-serve is that it is quite complex to set up and uh also it's just like seldon you are completely locked into kubernetes so if you don't use kubernetes then neither k-serve nor celgen are good solutions for you in most of my projects i've used banter ml this is a killer solution for deploying machine learning models into production benchmarks tagline is simplified model deployment i think this sentence here really summarizes well what benchml is all about so benchmark is an open platform that simplifies ml model deployment and enables you to serve your models at production scale in minutes let's take a look at the different features that banter mail has to offer first of all it is service oriented deployment in other words with band to ml you can create rest apis if you use ben2ml you can throw out of the window flask and fast api whatever web server you're actually using because benchml is going to replace that and the great thing is that the option that benchml offers is way more performant than flask or fast api because it is indeed optimized for machine learning inference now the other great thing about benchml just like ammo flow model it allows you to package all the necessary artifacts for a successful machine learning deployment into a single unit so you're gonna take your model you're gonna take your code and you're going to package it into a bento a benter is the unit of deployment used in banter ml the great thing about mantis is that they're going to be stored locally in a registry and this registry is going to also version the different banter another cool point about benchmale is that it supports all major machine learning libraries so that you can use python scikit keras tensorflow no matter what you use here you have a solution that works with all of them a banter also integrates perfectly with docker you can take a banter with a simple command line instruction you can take a docker out of that that then you can deploy wherever you want bento makes deployment on kubernetes also quite straightforward thanks to a tool called a yatai let's take a look at it so yatai is built on top of bento and it allows you to take your banters and deploy them at scale on kubernetes in a very simple and straightforward manner another great feature that bantamow has is documentation automatic documentation this is a little bit of a hassle when you create apis right so you have to create also a an open api kind of documentation and all of that is generated automatically for you so this is really really cool of course bentaml has some cons and i think i've identified a couple of this so first of all benchmade works only with python so for example if you are using go for your machine learning models well then benchml is not going to cut it for you because it's a python only library and more than that the other aspect is that benjaml is a tool focused only on deployment which basically means that all the other aspects of your ml pipeline like building and training your models or tracking your models it should be done with other tools so if you're using benchml you have an extra tool to deal with and this is different uh if you compare it for example with a kubeflow or ammo float there you basically have the entire um sort of workflow ml workflow done for you in this case you have to deal with an extra tool that does only one thing but i think the price is worth paying because deploying with benchmark is actually quite easy by now you should have a decent understanding of different strategies to deploy machine learning models into production and the different tools available in the next video we're going to be focusing on bento ml we're going to create a sample model that classifies the famous amnest digit data set and then we're going to deploy it into production using benchml i'll see you next time
Info
Channel: Valerio Velardo - The Sound of AI
Views: 40,961
Rating: undefined out of 5
Keywords:
Id: Mrv3CZNWYEg
Channel Id: undefined
Length: 14min 26sec (866 seconds)
Published: Mon Jun 27 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.