Using GPUs on vast.ai for Machine Learning model building with Jupyter Noteboooks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi i wanted to make this video to talk about vas.io it's a service that allows you to access non-commercial gpus to build machine learning models against i've been using it to build my own machine learning models and i find it really fast to fire up a jupiter notebook and get going and it builds the model like really snappy so it's pretty inexpensive you can fire up your jupiter notebook run your machine learning model on a decent gpu for a few hours and it'll only cost you a couple of dollars so one thing to keep in mind is the machine should be thought of as pretty much ephemeral so you want to back up your data pretty regularly and get it off there and you also want to consider the fact that you don't know whose machine it is where they are they're just someone who signed up to this service and said i can rent out my gpu so security i would say is is an issue uh you don't want to trust it with um secret data or you know data that you shouldn't be allowing other people access to so it's great for learning and for using like open source data sets or things where you really don't care if someone actually gets access to your data or your code or anything else you put on there so i'm going to give you a quick demo of me on vast io firing up a machine with a gpu running my model on jupiter notebooks making any changes backing those up to get and then basically blowing away the cheap the machine with the gpu on so let's take a look okay so here we are in the vast ai user interface um and we can create an instance from here you can see i've got some credit there pretty easy just go to billing add your credit card add some credit so we'll click on create an instance and you can see here there's there's basically two options we can have you can have on demand so you're basically running the machine it's a little bit more reliable or you can have interruptable and that's a little bit cheaper but it basically means obviously you can be interrupted someone can take the machine anytime you're using it i'm generally going for on demand because i don't want to mess about with that too much i could probably tweak my workflow so that i can be pretty sure that i wouldn't lose anything at any time but my workflow at the moment is more i work and then i commit something to get and i'm loaded to get and then over here you basically say what kind of machine you want so you're basically getting um like a darker image running so you can get a darker image that is the tensorflow docker image or the pi torch docker image or there's a few others here fast ai or you can run your own custom image and as i use fast ai more i probably will go down this custom image route just because um i'm sure my setup script is going to get longer and longer and it would make more sense for me just to pull that down however everything i want to use in my own docker image but obviously that docker image needs to be publicly available so you couldn't put anything in that custom docker image that's going to be a security concern because vast ai needs to pull that image down before you've authenticated or anything like that i'm using um pytorch so i've got that selected here it's going to run jupiter notebook as it says here that's the easiest thing you can also have jupiter lab i haven't used jubilee lab very much i kind of like keeping a notebook just as it is so that's fine or you can run ssh i've played with that but i just find it easier just to jump into jupyter notebooks right now if i was moving more of my code into mod like python modules then i'd probably use ssh and run more scripted things but at the moment because i'm experimenting in the general machine learning process kind of favors jupyter notebooks but as the code solidifies a little bit more maybe i'll i'll convert that to code i'll run ssh and i can file a machine and run the code across that and you can also pass a startup script so i'm actually running my startup script in jupiter in the jupiter notebook but i could potentially just paste it in here and run it here so yeah we'll stick with the pie torch one and then there's a bunch of options um generally they get more expensive as you scroll down and you can see here there's a lot of single instances of of gpus but you can actually get multiple instances of gpus so here there's like 10 gpus that you could use um so i haven't got to the point where i'm i'm playing with multiple gpus but i do intend to do that at some point there's lots of other information here um what the processor is what kind of disk you get the speed of the disc um how much memory you get on the gpu just the speed of everything basically uh and then you've got reliability here and to simplify a little bit um what vast a i have done is created this uh dl perf score so i'm just going to go ahead and click rent on this one and you can see it's pending here so i'm going to go over here to instances and you can see my instance here and i can i can start it before i start it i'm just going to get my setup ready i've got this directory here just for the demo and i've got a script in here and so in this script i basically have all the things i want to run on that machine when i start it up and the way i do that is on the juvenile notebook i create a bash a notebook and i just paste this in and run it and then down the bottom here i have a little section that i just run whenever i need to just to commit my changes in order to do this i need um an ssh key so we can create that just by running the ssh keygen so i'm going to paste that key into my setup script so this is one of the things you have to be wary of is i'm putting a ssh private key in here and so anyone who runs that machine could potentially have access to this key and so that's obviously a concern and then we want to put the public key on git because basically i'm going to be using this for talking to git so what i'm doing is using gitlab i never use gitlab so it's not that much of a risk so i can paste that the public key in there i can call this last day high demo and i cannot expire so expire tomorrow i've only got one repo on here it's the repo i'm working on and the repo isn't that sensitive so if this ssh key got compromised i'd be okay so let's go and start our machine oh it's already running it's been running four minutes so let's connect that machine you can see here straight away we've got a notebook i'll create a bash interface i'm gonna copy my setup commands place these in here i'm just going to move this to a separate let's make this a bit bigger okay so i'll just go for this script let's call it setup so we can distinguish it so at the top here i'm installing um just curl and get because i use both of those um and then i'm making a ssh directory i'm putting that private key in there as just the default ssh private key so git will pick that up by default um setting the permissions on that i'm also making sure it's in the known hosts so that um it doesn't get any prompt i'm sending my git credentials i could also paste in config file and then i'm checking out the repo i'm just testing first that it's not already checked out because i want to run this item potently i just want to run this over and over again if i have to um and just adding things like this just to make sure that clone isn't going to complain that i've already cloned it i do a good pull because again i might have already cloned it i might want to just run this again and i might have made changes since then um and then this part here you don't need to worry about too much essentially i'm using poetry in my repo jupiter is running in the python installed in the docker image and i'm just doing this to change the poetry dependencies into requirements.txt file and install them that way and then i actually removed poetry because i was hitting some problems but that's all details you don't need to know about and then then i got here just something actually i don't need this line just something to do a git pull can make my changes and then you push the repo and so yeah let's uh go ahead and run this so that is saying usage key is expiring soon and you see here it's added it to the the known hosts and it's expiring soon because i set it to expire tomorrow let's see here we go and it's still running and so it's downloading everything all my images and everything are in this repo my image is being training data okay now it's installing the python dependencies and so black here is basically um something i've added to the the notebook it basically does some reformatting which is pretty handy because juvenile notebook is it's not very good at formatting and i'm used to working you know ide where you can just hit a command and it formats in your python code nicely okay so i'm going to run my judo notebook from the beginning restart the kernel run all the cells and you can see here first thing i do is is run nvidia smi just so i got some kind of record of like what it is i'm running on and you can see memory um i have about 24 gigabytes and i'm using i'm using this command just to output that information and i've scattered that throughout my playbook just so i can keep track of where the memory is going i can i can look at this and say well did i really need 23 or 24 gigabytes of memory if i see i'm only using 10 then i'll probably use a smaller gpu next time i find this like pretty fast each training iteration is taking just over five seconds validation it's taking out just over a second the model is resnet152 so it's it's pretty big model let's stop this and let's basically just add something here hello so i'm going to go over here and i'm going to commit this so i need a dummy cell cool and so that's pushed it up to get lab if i go to the repo i've only got one project here you can see the the last commit was this one added a dummy cell and uh yeah and so now i've got the changes committed um and if i exported like the the train data maybe the weights of the model i could have pushed those up to github as well or git lab rather um generally i use github so i'm using git lab because i don't want to put ssh keys that are going to give access to my github so i'm just using gitlab for that um another option you could do is just host your own get somewhere but uh yeah there's lots of services where you can use use kit so now we've been using this for 30 minutes um paying 90 90 cents an hour so i'm gonna click here to destroy this so we can basically calculate how much how much we spent on this um and so i've spent about 19 and a half cents um and maybe maybe i'll be up my model in that time maybe i didn't um generally i use it for maybe an hour to three hours or something like that usually by that point i've figured out a different direction i want to go so i open the notebook on my local machine make some changes there maybe find some new training data images and then when i'm ready i'll jump back onto the gpu again so that's basically vast.ai i have no affiliation with last ai all i just i'm a user i'm just learning machine learning and i just thought it was really great for building models and iterating and just learning and and doing it fast because these gpus are pretty snappy so if you use vast.ai and you've got some comments like some better workflows or things that i could improve in my workflow then yeah i'd love to hear those thanks for watching
Info
Channel: Phil Whelan
Views: 11,254
Rating: undefined out of 5
Keywords:
Id: cPsQBJ7Y9n0
Channel Id: undefined
Length: 14min 0sec (840 seconds)
Published: Sat Feb 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.