Building Kubernetes cluster with GPU

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
i'm daniel whitenack i work for a company called  pakiderm you'll hear more about that project   a little bit later but today i'm  going to be talking to you about   building gpu accelerated workflows with tensorflow  and kubernetes which is this really long title i   should have made it more shorter and more exciting  but but hopefully the talk will be exciting so   it's a great setup in the last talk talking about  you know some of the challenges around using   gpus on a data science team and how you could  actually offload like model training onto a gpu   on kubernetes this talk is going to be related  but uh you know it's it's going to be have a   slightly different spin so really what i'm going  to be talking about the key word here is workflows   so oftentimes running model training isn't the  only piece of the puzzle right actually it's only   a very small piece of a very much larger  puzzle which includes a lot of pre-processing   of data it includes training it includes  inference it includes post-processing it   includes visualization and what i'm going  to try to get across to you guys today   is how we run all of that together on kubernetes  while still being able to offload those important   pieces to the gpus when we need to utilize  them and so we've worked with a bunch of   different users and clients to do this so what i'm  going to describe is kind of how how we do that   to that end i'm gonna start by talking about like  my my picture of this kind of bigger bigger data   pipeline um scenario where a piece of that is  model training on a gpu but it's it's more and   then we're gonna you know obviously as part of  that talk about where the gpu comes into play   then we're going to talk about why kubernetes is  so good at managing this sort of thing and what if   anything we need to add on on top of kubernetes to  be able to support this sort of workflow and then   all of you guys are our technical crowd so i  know you won't believe any of that so then i'm   going to go right in and to a live demo and we're  going to try to deploy a bunch of tensorflow and   data processing stuff on kubernetes and switch  between cpu and gpu nodes and you know fingers   crossed that goes okay so sound okay everybody  good i mean that's the only talk i have so   i mean you can like exit now if if you  want to hear something different all right   let's go ahead and get started so in my mind again  like the the typical workflow for a data science   team or an analytics team um is very much you know  broader than just model training that that gets a   lot of attention because that's kind of i guess  the cool part right but actually there's really   really a struggle around managing the pipeline as  a whole and actually i think this this problem is   really crucial because if you can't handle like  all these pre-processing things and then get the   data to the gpu when you need to and handle  all the post-processing things and be able   to share those resources and also be able to  reproduce certain analyses and hand them off   to other parts of the team then you're really  going to struggle to create value in a business   so i'm going to share like kind of an example  so this is by this is very much an example it's   not um you know the the things that we'll talk  about today will apply in many more examples   but one exam example of this is like processing  images and maybe doing some object detection so   in like this sort of workflow we might start with  like a raw data set of images from somewhere let's   say let's not worry about where they come  from but we have access to this raw data set   of images the first thing that we probably  need to do is pre-process those somehow so   maybe our like model needs to take those in  a certain format or a certain size or maybe   we need to like pair them with other images or  label them or format them or tag them in some way   anyway there's a lot of different  pre-processing that we might want to do   and out of that pre-processing then we might  get a set of of nice images that we want to feed   into our model training and i've represented  this here by one stage but often times like   when we work with people this you know could be  15 stages of pre-processing right developed by   like three different people in a team right okay  then we have the cool model training stage which   takes in that data and trains some type of model  maybe a neural net and often times like i think   last week people or last night people raise their  hand a lot of people are using tensorflow that's   why i'm going to be talking about using tensorflow  but again this is an example so this would apply   to any framework that you're wanting to use  whether that be tensorflow or cafe or whatever   and so we're going to train our model using that  framework on that input preprocessed input we're   going to maybe serialize a model or export it  in in some way such that we can use that model   for inference so we're not going to retrain our  model every time an image comes in that we need to   do object detection on we need to like serve that  model somehow like tensorflow serving or other   things so we this is kind of a separate stage  and i i haven't even added in after this like   if there's post processing right so here i've  already kind of got these three distinct phases   i've left out post-processing and these could  be expanded into multiple other stages right   so you can start to see that this can be  a little bit of an orchestration nightmare   and this is why oftentimes me as a data scientist  and i start talking to you know devops people and   infrastructure people and they say i'm a  data scientist then they kind of just like   start walking away and they don't  really like me so much anymore   uh but it is a challenge especially you know  you're utilizing multiple frameworks you're   utilizing weird frameworks that the rest of an  engineering organization doesn't understand and   you have these like multi-stage distributed things  that need to be managed and updated over time   okay so we again here we have the these  kind of three distinct stages so let's   let's talk about like where where  gpus come into that um so actually   most of the time so i'm not like saying a general  statement here but most of the time people   utilize gpus for model training as was mentioned  in the previous talk so here we actually have two   stages that will run just fine on cpu nodes so  like basically our whole workflow however many   stages we do for pre-processing and inference  let's say that we can run that on on cpus   and then we have this one stage that we  want to that we want to run on a gpu node   this is pretty essential for a lot of teams  that are they're building models at a large   scale they they need to run this training  on a gpu but they also need to interface   that with these other stages of pre-processing  post-processing inference and all of that stuff okay so um that's kind of the the general  picture that i wanted to have in your mind so   really um i mean we know from previous talks and  more talks that will be today and other things   you've seen online that we can we can utilize  gpus in kubernetes but um really like i don't   think there's a lot of content out there and  tooling around actually managing this sort of   workflow on top of on top of kubernetes outside of  scheduling the individual pieces so that's that's   really what i want to focus on is like enabling  this workflow and then also being able to get   those necessary stages that need some sort of  acceleration onto uh onto gpus when we need them   okay so again let's say that we have  these these few stages um just generally   uh well i mean one of the things that we need to  do is we need to make these stages portable right   and i'm kind of preaching to the choir here but  you guys understand like i can i can dockerize   these these different stages and run them in any  sort of environment and that's that's great and   um i like not having to convince you of that in  this conference because a lot of times i have to   convince people that in data science conferences  but you guys understand that so this is a way that   we can package things up and and get them running  with you know reproducible behavior in another   environment but again we don't want to you know  we don't want to be like sshinging into machines   and deploying these things manually right so  kubernetes you know among other things i can't you   know i don't have time to go over all the benefits  but again i'm preaching to the choir somewhat here   right kubernetes gives us this great and awesome  framework for being able to take these stages   and deploy them not only on the cpu nodes but on  gpu nodes in a very descriptive way where i can   say you know i i want these certain workloads  defined by these containers to run on these   types of nodes and then guess what it happens  and that's that's really great so we have   up to this point in in this picture we've made  our individual processing stages portable right   and we've actually made deploying all of them  together portable because we can run kubernetes   anywhere right your data scientists can develop  these things you can deploy kubernetes where   you want on whatever infrastructure you want and  then deploy this set of things on kubernetes but   that's actually not that's not the the only key  so so what am i missing someone someone tell me so let's say that let's say that like the  model inference that's some type of serving   thing okay so like outside of the functionality of  each of these pieces um like operationally what am   i missing to enable that workflow that i discussed  before what now the linkage between stages right   what else life management development lifecycle  management updating that sort of thing what else   data yeah so so the first the first one that comes  to my mind right is i deploy these containers here   and i i'm a data scientist so i want to process  data where is the data right i have to somehow get   the right data to the right code right and maybe  like let's say that's stored in an object store   that's what was talked about in the last talk  as well let's say our data is in an object store   so somehow i need to get the right pieces of  data which aren't everything that's stored in   the object store right i need to get  the right pieces of data to the right   pods to be processed okay and then there's  the element of the linkage that was mentioned   right actually that's that's not all right i  need to get the right data to the right code   and i need to run those steps of processing  in a very specific predefined sequence right   for things to go right um and and so like  um kubernetes provides this really great   framework and foundation but similar to like you  know how um like borg is different in google than   uh what kubernetes is and borg includes a  bunch of these pieces that fills gaps in   the context of what google's doing um and then  there's kubernetes and in industry you know it   doesn't it doesn't offer us everything we need  you know we we might want to use like vault for   secret management or istio for uh for service mesh  right so somehow we need to like fill this this   gap of like getting the right data to the right  code in the right order on the right nodes right   so um so really what i'm what i'm saying  is like kubernetes is great for ml because   you know we can have this portability we can have  scalability we can have you know auto scaling all   of these great things that you guys that's why  you're here um and those things are directly   applicable to machine learning but we need a  little extra sugar okay and this extra sugar is   actually really important and and not that trivial  so we need to get the right data to the right code   we need to process the right data with  the right code on the right nodes right   so whether that be a cpu work  work load or a gpu workload   and we need to trigger the right code at the right  time with the right data on the right nodes you   guys you guys get the idea so um this is really um  uh this this is really what i believe you know we   we need and what our team believes we need um but  as was mentioned up front here um so this is like   operationally kind of what we need to enable  this it would be nice as a bonus as well oops   to be able to actually have some concept of  maintaining this over time and making sure that we   do it in a sustainable way which also means that  we need to be somehow tracking what's going on   we need to be versioning what data ran with  what code on what nodes at what time and we need   especially if you're working with like healthcare  or finance data we need to stay compliant and   be able to be able to reproduce and have the  provenance of what we did at what point in time so all this together is what what we put together  in the open source project pachyderm so pachyderm   is the open source data pipelining  and data management layer on top of   kubernetes so what i really mean by  that is there's data pipelining and   there's data management so we need to get  the right data to the right code which is   related to the sequence sequence of things and  um and data pipelining we also need to somehow   manage that data we need to um shim the right  data to the right code we need to collect output   data right so all of these pieces that enable  what we talked about before getting the right   data to the right code on the right nodes at the  right time this is that that layer for kubernetes   so the pieces of of pachyderm that  enable this are first data versioning so   all data that's processed in pacoderm is version  controlled so kind of think like get get for data   you can set up collections of data and commit data  in there make changes and we'll track all of those   all of those changes which which both lets us  have reproducibility but it also lets us know   when there's new data so we can trigger the right  things at the right time right obviously since   we're running on kubernetes we use containers for  analyses and this is actually really important   it might be lost on on this crowd but the the the  the set of tooling the data scientists use like we   might take for granted the containers provide this  unified layer but they struggle so much and i i   struggled so much in my past with all of this  diverse set of tooling and being able to string   it all together so being able to use tensorflow  when i need it and then connect that output to r   and do some visualizations and then someone else  builds some you know weird thing with julia or   something and and we do that so that gives us like  a unified framework for saying our basic units of   data processing or containers we're unopinionated  about what you run in those run tensorflow   run r run python run julia run a bash command we  don't we don't care um next we kind of combine the   containers for analyses with the data versioning  to build up pipeline or build up distributed   pipelines or or dags of processing where these  containerized processing stages subscribe   to version collections of data descriptively so  you say i want to process this data with this   image and you build up this dag of processing stop  steps which is also scalable and parallelizable   and finally because we're versioning all the data  and we know what docker images we're using for   each stage we actually have complete quote-unquote  provenance for any data anywhere and what i mean   by that is we can produce a result and then if  we want to know all of the other pieces of data   and the states of the those pieces of data  and the states of our docker images all of   that when we produce that result we can get all  of that information very easily which helps both   in terms of compliance and maintainability  and debugging and all that stuff uh just to   again i know um i know we're uh in a with the  with the technical crowd here so uh i i want to   uh definitely don't want to you know leave without  kind of giving you a few more of details before   we jump into the demo um so pachyderm again  it's a it's a layer that runs on kubernetes   so kubernetes forms the base this this gives us  most of what we need right um pachyderm runs as   a pod on top of kubernetes and then it talks to an  object store which is where all the data is backed   and we talk to that packet and pod and tell it you  know we want to process this data with this code   and then pachyderm talks to kubernetes under the  hood and then spends up whatever pods are needed   to do that processing so if i have an inference  stage or a training stage running tensorflow then   i can spin up however many pipeline workers under  the hood to do that processing for that stage   and then there'll be other workers which are  just pods to uh to do do the processing for the   for the other stages um okay so  enough of my uh enough of my blabbing   um let's get to the let's get to the good stuff  um okay so i've got a a demo here um this is   sorry about the fuzzy uh the fuzzy um text uh i  think the terminal will be a little bit better   um but just to give you an idea so this is like a  dashboard that you can have that you can look and   see what's running um as pachyderm pipelines and  i have this pipeline running here i'll show you on   the back end what that looks like here in a second  but just to kind of illustrate here here i'm doing   image to image translation with tensorflow which  means like an image comes in in one style and i   want to transfer it to another style um in this  particular case i want to bring satellite images   in and kind of automatically transfer their  style to like google maps images okay so here   in training so each of these blue  dots here represents one of these   version collections of data remember this  is a kind of our first piece of the puzzle   and in this in this version collection  of data i have a bunch of of images   that i can use for training so i want to be able  to translate images like what's on the left to   images like what's on the right um also uh in  my in my input here i have two input images and   i want to say okay i want to take this image and  i want to translate it so this is my this is my   input that i want to utilize  my trained model to transform and so my inputs are those that training data and  then uh and then that input data or input images   then over here on the left i do the training  so this this next stage here is the training   then i do uh just some model uh export so just  by the way that the the scripts are set up i just   changed the format of the the model and then that  model is used in a generate stage along with the   pre-processed images so i have a stage of  pre-processing and then i feed those together   into the generate stage which generates output  images so remember so each of these collections   is one of those version collections of data and  each of my pipeline stages are containerized   analyses right so if i click on one of these  this is my model training stage and i go down   here i can see how this how this pipeline stage  is defined and it's defined via a docker image   and a command that's run in that docker image so  i'm basically telling pachaderm hey i want you   to process this data using this docker image and  when you do it run this command which is just my   python script that's that's using tensorflow okay  and each of these is defined in in a similar way   so in this case all all of my pipeline stages  except for this checkpoint the the model training   stage all of those are just fine running on cpus  and actually i can parallelize them very very   easily across cpu node instances so no worries  like i don't have to worry about using a gpu for   for those stages so ideally what i would want  to have is i would want to have those pods that   are running those processing stages run on cpu  nodes and then when i need to run checkpoint i   want that to be scheduled on a gpu node such  that i can do my model training very quickly   okay so so let's move over and connect some of  the dots um so we have if i look at what's running   in this cluster now i can see there's pac-d  running which again manages all of this uh   pipelining stuff and data management  things and then i have all of these   pipeline workers so i have in this case i have  a single worker for each stage of my pipeline   although that's by no means not the only thing  you can do you could spin up a hundred workers   to process this stage in parallel that's that's  fine and i can talk about that in the question and   answer if if that's something you want to talk  about but i have each of these pods scheduled   and i've already actually put some example data in  so if i look at what jobs have run which jobs just   mean how many times have these stages run sorry  for the the wrapping there but i can see like   checkpoints run once i've pre-processed a couple  times i've run my model export and i've generated   images once and that's reflected over here in  the in the dashboard as well i can see that   this uh this last stage if i look at um or  actually excuse me so the reason why that   one input ran twice which i'll illustrate the  data versioning here is because i put two images   into the uh into that input images but i can see  that i've actually done that in two consecutive   commits okay so i could actually go back to  this original commit of the data and see that   oh well i only had one image at that point in  history and then i added the other one and what   will happen um to kind of illustrate how all  of these things are automatically connected and   and and those links are made let me  just put one more image into that uh   into that um input images repo so i'll  put it into image input images on the   master branch remember again like kind  of like get like semantics here for data   and i'm going to put this third image in and  actually what i'll see if i list the jobs again   i'll see that that automatically what happened is  pachyderm saw that there was new data that needed   to be processed in the input and it said oh you've  descriptively told me that this this pod should   be processing that data i'm going to hand off  that data to that pod it's going to be processed   and then and then it's going to go on down the  line so that'll probably run quickly we can see   that it actually pre-processed that and it ran  the last stage as well all automatically triggered   okay if we look at the output here it's  actually not super impressive because i ran   uh so it kind of looks like a google maps uh  drawing but i only ran the training for like   one epic if you're familiar with uh uh training  um that's not sufficient in general um but i i   did it so it ran for about 11 minutes on a  cpu and uh the reason why i did that is to   show you something that would actually execute  in the time of this talk hopefully um so okay   i can see that i automatically generated the  output for the for the third image as well okay   so we've solved if we take a step back let's think  we now have all of the all of our stages running   as pods on kubernetes we've wired them together  we've connected the right data to the right code   when new data comes in we can automatically  trigger all of those things um the piece that   i haven't covered which is really the the  um punch line of this talk i guess is that   that's actually all not not the complete story  because i still need to make sure that at least   one of these stages runs on a gpu right i want to  run my training on a gpu and actually what what's   happening under the hood let me show you here  the way that i told pachyderm to spin these up is with a specification like this that says  you know create a pipeline called checkpoint   use this docker image and command run it on  this input data here i'm just going to run one   one instance of that but i didn't  say anything about a gpu right   so what happened is packerderm said okay  that's that's all cool i'll run that whenever   the data comes into training then i'm going  to train your model on that on that new data   but like i said if we look at this so our our  original uh training here took 11 minutes so   i just ran it earlier this morning and we're i  mean 11 minutes isn't that much in the training   world but let's say that we want to do better  and we want to make sure that that runs on a gpu   first let's just see if it did run on a jeep on  a gpu node so to kind of show you how i have this   cluster set up i have this is the the instances  in my cluster it's running in aws although you   can do the same thing in google cloud or azure  it's fine or some hybrid uh hybrid solution   but here i have one node that's a p2 extra  large which include which is a gpu node   okay so maybe i got maybe i did run on that on  that gpu let's let's check so if i i list job and i look here at this training and let's  go ahead and get the logs for that job   and just grep for cuda okay oh crap i  didn't i didn't find any uh i didn't   find any gpu right so this this ran just as  it as it normally would on a on a cpu node   now all i have to do so remember going back to  the last talk remember what the standard workflow   these days is for data scientists using gpus you  do a bunch of stuff on your local machine maybe   you do some stuff in the cloud or on a cluster  with cpu nodes and then when you need to do   something on a gpu node you turn around you're  like hey frank you're using the the gpu node   and then um you know he says no but then maybe  susie's using it so you schedule a job on it   and then like all goes to crap and everybody's  angry and it's not harmonious at all so we want   to strive for something more harmonious so um i  want to just make sure that that runs on on a gpu   so all i have to do is go in here  and modify my my uh pipeline spec   and just set some resource limits and say hey  i want to run i want to run with a gpu okay   and then here's where you can cross your fingers let me just update that and i'm going to reprocess which means you  know i want to reprocess with the updated   spec oops you actually do this sorry  with the gpu let me actually because i just a second you didn't cross  your fingers quite good enough   um because i forgot the gpu tag there so  let me delete this other job it's going to   run another 11 minutes and i'm i don't  want to keep you guys from launch okay now let's see what's running all right so  now we have this checkpoint running again   let's see let's get the logs again haha there's my gpu and so all i had to do was  just set that resource limit and now pachyderm   knew hey you don't want me to run this on a  cpu anymore you want me to run it on a gpu   and then it talked to kubernetes under the hood  and said hey this needs to run on a gpu and it was   scheduled on a gpu node and then it it recognized  the drivers boom we're off to the races the rest   of the things are still going to run on the cpu  nodes everything's still wired together so when   i output this model from the gpu training it's  still going to be supplied to the other stages   that are running on cpu nodes and then i'm uh and  then i'm golden okay so let's uh i'll just keep   this up in case you're this should finish i think  it finishes in like one minute um so one minute   compared to 11 minutes it's pretty pretty good  improvement um so yeah that's um let's just see if   oh we got it okay so we ran um about  a minute okay so all right thanks guys
Info
Channel: Share Learn
Views: 774
Rating: 5 out of 5
Keywords: building kubernetes with gpu, Building GPU-Accelerated Workflows with TensorFlow and Kubernetes, Building GPU-Accelerated Workflows, Building GPU-Accelerated Workflows with Kubernetes
Id: MuGQwDpW2bk
Channel Id: undefined
Length: 32min 58sec (1978 seconds)
Published: Tue Dec 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.