Building My Ultimate Machine Learning Rig from Scratch! | 2024 ML Software Setup Guide

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what's up guys today we're going to be building and setting up a machine learning server that I can use to offload machine learning jobs for my main gaming computer leaving it free to play video games or edit videos we're going to be using these parts and once the computer is built we'll install Linux and set up all of my favorite pieces of software I use for my daily ml projects the goal is for me to be able to remote into this computer and launch a training job speak to a local llm or run some stable diffusion let's jump right into the parts that we have for today given that this machine will be used solely for machine learning let's start off with the most important part the Nvidia GeForce RTX 480 super it has 16 GB of gddr6 memory and over 10,000 Cuda cor I can't wait to make those cores crunch The Matrix multiplications while originally designed for video games the ml Community soon figured out that the highly parallelizable tasks something graphics cards excel at in ml map really well to the graphics Hardware 16 GB of vram should be enough to run llms like Llama Or mixture locally albeit they'll probably have to be quantized we'll talk more about llms later this video the brains of this machine will be the Intel 14900 K having a good CPU is also very important for ML tasks because it's what will be handling the less parallelizable tasks such as data pre-processing and moving that data between RAM and vram also some ml algorithms that aren't heavy in Matrix multiplications are more sequential in nature benefiting from a more powerful CPU speaking of Ram we got 96 GB of gddr 5 Ram from gskill after the 16 GB of vram get saturated from large data sets or models the system will rely on this Ram as it's the next tier of the caching hierarchy so it's good to have plenty just in case we should avoid going to the disc at all costs for persistent storage I'll be using this 1 TB Samsung 980 Pro nvme SSD that I had lying around we'll store all of the model weights and data that aren't in Flight in here so hopefully 1 ter will serve our purposes the glue of the system the motherboard will be the Asus Rog STX z790 I gaming motherboard it has the newest Intel chipset for the LGA 1700 socket so it should allow our components to perform at their peaks for the CPU Cooler I have the NZXT Kraken 280 AIO cooler it should be enough to keep our CPU temps cool while the CPU is doing our bidding it also comes with this display to show PC metrics like temperature or component utilization for the power supply I have the one ,000 wfxl power supply from Corsair the nice thing is that this supports the new 12volt high power connector that the new 40 series cards have due to their ever increasing power demands it also allows for more direct communication between the GPU and the power supply and last but not least the case that will be holding all of our components is cooler Master's new nr200p V2 case it should be able to provide our components with adequate air flow we'll be putting in a corser af1 120 slim fan at the bottom of it to accompany the the CPU coolers fans so let's get to finally building this PC let's begin with installing the components that go on our motherboard we start with the CPU I slowly lower it into the socket and lock it in place then we carefully insert the ram dims in their slots next we prepare our CPU Cooler we screw in the cooler bracket on the board and then Mount the fans on the rad since the radiator will be at the top of the case it'll be acting as an exhaust next we bring out the case this was my first time working in a small form factor case and my God wasn't challenging all of the parts barely fit and I was surprised I didn't break anything to access the inside I had to remove all of the side panels and unscrew some of the railings we slowly place the motherboard in the case and screw it onto the pre-installed standoffs then because I forgot to do this earlier I insert the nvme drive into the motherboard probably would have been easier if I had done this at the beginning but oh well next we prepare our power supply because of the case's form factor the PSU goes on the side rather than the bottom of the case that means the case has to come with an extension cable to power the PSU now we get to mounting the radiator at the top of the case it was a really tight fit with only a couple millimet of clearance and I had to reorientate the fans to be pulling through the radiator and out the case next we do some cable management and add the PSU cables to power our components we also installed the pump block onto the CPU afterwards I removed the GPU brace from the case and attach it to the 480 super inserting the GPU was also a chore because of all of the cables getting in the way but finally everything fell into place after I installed the bottom fan alongside the fan that already came with the case setting them to be intakes I reinstalled the panels and TAA the build is finally complete after several arduous hours Let's now switch over to the software side of things we're going to be installing a BTU server 22.4 LTS code named Jammy jellyfish which is the latest long-term support version of auntu since we only be remoting into the server I really don't want to waste any resources on the whole desktop environment which is why I'm electing for the server version however if working with just the terminal seems daunting don't worry we'll be covering some apps that'll still allow us to interact with our server through a guei not to mention that if you really want a desktop environment everything we'll be installing today will work exactly the same on the desktop release of auntu also because this is an LTS release canonical has guaranteed stability and security updates until April of 2027 I prefer working with the LTS release because it tends to be more stable especially with the various pieces of software we'll be installing today those of you that have seen my Nas video know that I'm a huge fan of proxmox which would allow me to be able to run virtual machines on This Server while proxmox probably seems more appropriate for a server rather than auntu this machine only has one GPU GPU virtualization which can split up the GPU across multiple VMS isn't really something I want because I want the GPU to be solely dedicated to just one ml task at a time and since I'll be the only one using it I'd rather just have the OS be as close to the bare metal as possible we'll talk about Docker though later this video which will still let me run multiple jobs at the same time if I really want to I downloaded the ISO from the abtu website and flashed it onto a flash drive with rofus I then plugged in the flash drive a keyboard a monitor and ethernet before I turned the machine on as it was starting up I jammed the delete key because that's how I boot into the bios for my specific motherboard I selected this flash drive as the boot device and hit save and exit the actual install process is pretty simple I first noted down the IP address that was assigned to my machine because we'll need this later then I chose the 1 TB nvme drive as the install disc set up my profile and enabled op SSH so we can remote into the server and that's it after the installed process is complete I press reboot once the machine has rebooted and the logging screen has shown up I unplug all the peripherals because we will no longer be directly working with the server now from my gaming computer I can remotely connect to the server with SSH because both machines are on my home network I use the IP address that I noted down earlier and the password that I set up during the install I run the classic pseudo AP update and pseudo AP upgrade just to ensure that everything is up to date if I run htop we can see all 32 cores of our CPU and are 96 GB of RAM the first piece of software I always install whenever I have a fresh install of Linux is called tail scale tail scale is a VPN service that enables all of my devices to talk directly to each other as if they were on the same network this is unnecessary if I'm only using my gaming computer which always stays at home to access my server but now I can remote into my server from my labtop or even my iPad from anywhere in the world provided that they also have tail skill installed and set up while tail skill isn't directly related to machine learning in any way it makes managing my server remotely so much easier I run the tail skill in install script from their website run tail scale up and authenticate with my tail scale account now this server is truly my personal Cloud Server one that I can access from anywhere at any time the next thing we need to install are our Graphics drivers this step took a little bit of trial and error but the way that worked best for me was to directly download the NVIDIA drivers from their website I selected the graphics card that I had and the OS and then copied the download link to W get in the terminal I'll leave the exact steps I follow down below but after the download completed I had to change some values in a couple of different files then I simply ran the Run file and accepted all the suggested values not sure if my issues stem from an unlucky combination of my Hardware kernel version or driver version but hopefully your experience is a little bit smoother finally to ensure that the drivers were successfully installed We Run The Nvidia SMI command and taada we see the name of our graphics card now let's get some llms running locally we'll be using an app called olama which is probably the easiest way to get up and running you'll be able to use olama even if your computer doesn't have a GPU they have a wide variety of llms that you can pick and choose some are better for natural language While others are better at coding tasks after running their install script all we need to do is run o Lama run and the name of the model you want I chose mixol after a brief download process we're able to chat with the LM we can ask at anything our heart desires the poem being generated on screen is in real time and hasn't been sped up at all it's reasonably fast when choosing what llm to run you need to take into account the amount of vram your GPU has and what the expected usage is for the specific model some models will list their requirements on their olama page if a model is running really slowly consider using a more heavily quantized version which uses less Precision for the model parameters resulting in less vram usage here's a quick snapshot of our GPU utilization during token generation it's using about 50 watts of power and our vram is fully saturated let's see how we can use a code llm to help us program my go-to IDE whenever I'm working on my ml projects is always vs code because of its wide array of extensions and you guessed it there's one that lets a local code llm help us out while we're programming now we could use VSS code on my gaming computer to SSH into our server but let me show you a cooler way the nice thing with vs code is that it's built with electron meaning that it can run in our browser as a progressive web app code server is a service that we can install on our server in order to be able to access a web-based vs code session from any device that's on our tail scale Network their install command is provided on their website after running it I use system CTL to have code server run as a background service I also edit the values in the config.yml file to make sure the service is listening on all interfaces so our other tail skill devices can access it after restarting the service I navigate to the IP address of my server in my web browser at the port 8080 now we have a fully fledged programming environment that has a GPU and can be accessed from any web browser on our tail scale Network my favorite feature is probably the fact that you can access this website from your iPad and save it to your home screen so you can access it next time as if it's just a regular app connect a keyboard and mouse to your iPad and you have exactly the same experience that you'd have on your laptop the great thing is that the terminal works exactly how you'd expect allowing you to even connect to other machine on your network if needed before we run a code Lon locally let's make sure we have python installed I personally love to use mini condo which is a lighter distribution of anaconda this allows us to have multiple installations of python simultaneously on our computer which is useful for when we need different versions of python or different accompanying packages for different projects to install miniconda I run these commands which can be found under the miniconda section on the Anaconda website and if I run cond Dash version we see that we're good to go running which python also shows the path of the Python executable and I'm also able to print out hello world in an interactive python shell now let's connect vs code to our llm and generate some code the one I'll be using today is code llama 13B but feel free to choose a different one we can download it with the O Lama run command again now we can use the continue VSS code extension to talk to our code generation llm after installing it we have to update the config.js file to let it know that we'll be using Code llama also made sure our tab autoc completion feature uses code llama now we can chat with it directly from the IDE and also use it for code autocompletion I go through the sample tutorial that the continue extension provides when I highlight the first function code llama correctly identifies the algorithm to be the bubble sort algorithm when I ask it to generate comments for the next function it also does so while neatly identifying the input parameters and return type finally I can also have it help me debug an issue I try to add a list of strings and the exension lets me directly prompt the llm with the error in my terminal output the llm correctly finds the issue but its solution isn't necessarily the most robust you can also connect your vs code installation from another computer on your network to the llm being served on your server I had to make the olama service listen to all interfaces on my server and then I could simply point the continue extension on my gaming computer to the IP address of my server the default port for olama is 11434 you simply have to change the API base field in the config.js file thanks to tail scales DNS resolution I can just use the name of my server because both machines are on my home network communication between the two is blazingly fast letting me program without a hitch now my gaming computer doesn't have to deal with the burden of running the llm locally let's Now set up pytorch which we'll use in just a bit to run a local video generation model we'll first create a new cond environment named local diffusion pytorch's website has a very simple section that gives us the correct install command based on our OS and whether we want the GPU version with Cuda or just the CPU version all we have to do is simply run this command in oura environment I recommend you go with the latest Cuda version unless you specifically need the older version if you're on Windows or Mac make sure you click the respective boxes for your OS to get the right install command and now we're done we can have our local llm generate a small script to ensure that pytorch can see our GPU correctly perfect exactly what we wanted the reason this process was so smooth is because we already did the heavy lifting of configuring our Graphics drivers earlier let's go to hugging face to check out a really cool text to video model the anime diff lightning model generates stylized videos based on a prompt and can be ran 100% locally all we need to do is run the sample code that comes with it you may need to install a couple more packages such as defusers Transformers and or accelerate I made a couple of changes such as increasing the number of steps choosing a more stylized base model and enabling CPU offloading so we're not limited by our vram I also set the prompt to a rabbit programming at a computer the model had to be downloaded but that's only a onetime thing afterwards you're free to generate as many animations as you want the video generation is extremely quick and in vs code we can easily see the output GIF afterwards there are so many cool open source models on hugging face that you can simply download and run locally so it's definitely worth your time to do a little exploring for stable diffusion specifically there are also other web uis like comfy UI or automatic 1111s so definitely consider checking them out if stable diffusion is something that you're particularly interested in now we're going to be installing another tool that's not directly related to machine learning but is still used very extensively by the community Docker Docker will allow us to run separate containers that each have their own configuration and dependencies such as pytorch tensorflow or Kuda Docker has an install script provided on their website I prefer the convenience script because we won't be serving any real production traffic I like to set it up such that I don't have to be root to use Docker commands which can be done by following these instructions and now after running the hello world Docker container we can see that we get the success message everything is working as intended next we have to enable our Docker containers to use our GPU this can be achieved with the Nvidia container toolkit we follow the steps on nvidia's website and run those commands then we can run this final command command to ensure that our containers are set up properly with our GPU TDA this output from Nvidia SMI was ran inside of a Docker container so now that we have the ability to run any Docker container on our server even the ones that need a GPU I wanted to show you guys a really cool reinforcement learning tool from Nvidia titled Isaac Jim this Library provides several different environments in which you can train virtual robots the idea is that rather than wasting time and resources to train a robot in real life from scratch you can virtually train thousands of instances of it in parallel and then transfer the final model to the real world robot obviously the challenge is making sure that the virtual environment is as close to the real world as possible but that's where Isaac Sim comes in we're going to be using nvidia's Isaac SIM Toolkit which is built under their Omniverse platform for this section specifically I won't dive into the exact steps I went through I'll just leave the instructions in the description after starting an interactive bash session in the Isaac Sim container I cloned the Isaac gym environments repo in the container coner to run a couple training jobs just so we can check them out the really cool part is that we can remote into this container and see what's going on as if we're working directly on our desktop on my gaming computer I have to download the Omniverse launcher from which I can then install the Omniverse streaming client after starting it I can give it the IP address of my server and now we can see the Isaac Sim interface and how all of our robots are training in parallel there are some really cool environments they have everything from little ants to robotic hands practice practicing their dexterity Factory arms learning how to screw a nut onto a bolt and even toy quadcopters exploring how to fly the really cool part is that when I went to Nvidia GTC last month I actually attended a presentation by some Nvidia researchers using Isaac Jim to train the robotic hand to rotate a cube to match a given orientation if you'd like to learn more about it definitely check out my previous video so far we've been able to install and run some really cool programs that the ml Community has put together for us none of the relied on tensorflow though the other popular deep learning framework but I wanted to quickly take a couple of minutes and walk you guys through how to install and set up the GPU accelerated version we first need the Cuda toolkit which is a development environment that also contains additional GPU accelerated libraries it contains libraries that tensorflow uses to be able to run on the GPU also it contains the Full Cuda runtime and the nvcc compiler allowing you to write compile run and debug your own Cuda code that runs directly on the GPU the reason we didn't need to install this for pytorch is because pytorch already ships with the necessary Cuda binaries to know what version of the Cuda toolkit we need we need to go to the tensorflow website's download section if you scroll down you'll see that the requirements are listed here thankfully we already have the drivers installed and ready to go at the time of writing this video it seems like we need version 11.8 for the Cuda toolkit we also need to download cdnn separately which we'll do afterwards I recommend always using the Run file to install the Cuda toolkit it allows you to better be able to manage multiple versions of Cuda on the same machine and doesn't override your graphics driver which can prevent further Downstream issues after you run the commands to install the Cuda toolkit making sure that you deselect the driver option we need to add the location of the Cuda binaries to your path so our OS knows where to find the programs that ship with the toolkit because I want my path additions to persist beyond the current shell session I'll add it to my zshrc replace zshrc with bashrc if you're using bash now after I refresh my terminal and type an nvcc D version we can see that the Cuda compiler successfully displays a version number installing CNN is also pretty straightforward again I just run the commands listed on the CNN website and run the final command that's meant for Cuda version 11 now that all of the prerequisites are met installing tensorflow is literally just one more command I create a new Anaconda environment named tensorflow and set it to Python 3.11 and then I run pip install tensorflow and Cuda command with the and Cuda and brackets to ensure that everything works as intended we can run the following commands they can be found on the tensorflow website for you to copy and paste the first is to check if tensor flow is working and the second one is to ensure that tensorflow can see our GPU the nice thing is that if you don't want to set up the entire Cuda toolkit you can simply use tensor flow in a Docker container all you have to do is run this command to download a Docker image with tensorflow and the Cuda runtime already set up after the download it'll run the same tensorflow command to print the name of the GPU no Cuda toolkit necessary well there you have it guys this is the machine that I'll be using for all my future machine learning projects that I'll be making videos on I have so many cool things planned for you guys that I just cannot wait to share I think my main takeaway from this project was just how important reading the documentation is whenever you're trying out new tools or software and when that doesn't work you may have to get your hands a little dirty and persistently scour the internet for a solution but if you enjoyed this video or have any specific questions about anything that we set up today please let me know down in the comments below but until next time see you guys
Info
Channel: Sourish Kundu
Views: 5,962
Rating: undefined out of 5
Keywords:
Id: ayWcs5FbxGY
Channel Id: undefined
Length: 21min 42sec (1302 seconds)
Published: Tue Apr 16 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.