Raspberry Pi Supercomputer for Quants | How to build a Raspberry Pi Cluster | SLURM Cluster Config

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
g'day youtube and welcome back to the asx portfolio channel now after watching too many videos about raspberry pi clusters i convinced myself that it would be a really good idea to buy a couple and make a cluster i used a cluster throughout my thesis project at the university of queensland and founded a really helpful resource to do heaps of scenarios with different parameters when i wanted to run really resource intensive scripts a number of times either resource intensive or computationally um expensive so what we're going to do we're going to create our own cluster so we can throw python scripts at it on this youtube channel maybe get into scenario and stress testing and uh yeah i whether it's a good investment or not i'm going to do it and it's going to be interesting and we're going to use it on this channel so let's get into it essentially i've got four raspberry pi's here the head node is the 8 gigabyte ram model and the other three worker nodes are going to be the 4 gigabyte ram model i needed a link here so this is the ethernet switch that we're actually going to power over the ethernet cable all these raw raspberry pi's for that you also need the poe plus hats that we're going to put on top i've got a tower here so we can store it and of course i've got a bunch of ethernet cables that i've had to purchase so for now we're actually not going to make kubernetes clusters like the videos that i showed you that got me interested in this process to begin with we are going to um put slurm on there now what's slurm slum is a simple linux utility for resource management system so essentially what that is it's just a scheduler and it's going to be really easy to run scripts in parallel put them across nodes and get back our results so very useful had that most math departments and physics departments will have some kind of cluster that they their students who are doing research can use so that's the idea with this it's going to obviously be a very mini scale because we've only got 20 gigabytes of ram but the idea is that this is scalable and we could add to it each processing units i believe on these model fours pi fours model b's can actually have 1.5 gigahertz which isn't too bad processing capability so scalable we could just continue replicating this adding to this adding more switches and adding more raspberry pi's to make this more powerful so that'll be interesting going forward but let's just get set up straight away so i'm going to just build it out first the tutorial that we're going to be following and you can follow along with me if you want to make one of these yourself is from garrett mills here on medium now he's put together a three part series on how he actually created his own cluster in a very similar fashion how he loaded slurm onto his clusters installed python and then how he actually went about um incorporating something called open mpi which are labeled the parallel computing on his cluster so we're going to go through that three-step process we're going to learn how to use slurm and then yeah hopefully we're going to get parallel computing set up on this cluster so stay tuned for that and let's get on with the build [Music] [Music] [Music] so now that we've got the physical build out of the way now we need to flash the sd drives and place raspbian on there so get your sd cards and then connect it to the computer you might need an attached you might need some kind of converter here i've got a micro sd card to usbc plug that in now my favorite way to actually flash this drive with raspbian is by using the default raspberry pi imager now the operating system that we're going to be using we want a headless operating system we just want to ssh into all these pies i don't need desktop environment installed so i'm going to click on other and i'm going to go through to raspberry and pi os lite 32 bit click on that we'll choose the storage device i've only got one on there 63.9 gigabytes now once i've clicked that generic storage device i just click right now you just wait for that to write and i'll be back with you in a second so once that's completed what you want to do is just unplug disconnect the usbc and then reconnect that micro sd card and we'll just press cancel there okay and what you should see now is that there is a boot drive d now in this boot drive what we want to do is go to config now we want to run all the way down to the bottom part of this config file and what we're going to do is we're going to enable 64-bit so we're going to go arm underscore 64-bit equals one and what that's doing is saying that when we load this even though we downloaded the 32-bit we want a 64-bit os um that's enabled on this raspberry pi environment so now if we go out of the convict file so now that we've changed the configuration all you want to do is go to the command line and we want to enable ssh so what we have to do is go null space greater than symbol space ssh enter now it says access denied but you'll see that that file actually did get created within our boot microsd card so that's fine we have an empty file and it's called ssh that is all we need to be able to enable ssh on this raspberry pi environment so now what i want you to do is just eject this eject this micro sd and we're going to flash the other three so that exact same process three more times okay i'm going to do that and i'm going to be back with you in just a sec cool so now you flashed all your sd drives what you need to do is add them into your raspberry pi so just unconnect it from the ethernet cables and then place in all the sd cards what we're going to do is we're going to power up each one of these raspberry pi's each one at a time just to make it easy because what we need to do is do the network setup so we need to work out what ip address each individual raspberry pi has that will make it very easy for us and the way we're going to do that you could use a tool like nmap as this as this guide suggests or you can go in a better way is into your router's interface and determine exactly which ip address your specific pi is so let's do that second option so it's a bit of a mess here with cables but what we're going to do we're going to just get our master done first so let's plug in the power over ethernet cable you'll should see the power button the lights come on so the leds and we're just going to let that boot so once you've let that boot up for a few seconds then you're going to come into your router interface and you're going to go to wired clients now i can see my raspberry pi and i can see the ip address of the raspberry pi now i've just blanked that out there and you can look at what your individual ip addresses are what i want you to do is go ahead copy that and place it down in a notepad now this is going to be our master node so potentially we can call it master node here so once you've done that we're going to do the exact same process for each single one so now in turn i'm going to add the second pi and give that power and once it's powered up we should see that come on cool so you can see the second one shown up there now just take that ip address make a note of it and we're going to call this node 1. now you don't need to rename it here or on the actual interface but yeah it'll make it easier for us to identify later if we come back in here now let's power up the third one apologies there sorry i thought i was recording on my screen but what i'm doing is i'm turning raspberry pi's on one at a time and then coming in here into my router interface and you all will have a router interface go look through the documentation when you first plugged in your router if you don't have that you can use nmap to decode which device is on which ip address but as you can see i've got the names here on the left hand side i've already set up masternode recorded the ip address node 1 recorded the ip address i've just got the next default raspberry pi and the next ip address so i'm going to make a note of that ip address and i'm going to call this node 2. again you don't need to rename it on your router's interface but it's going to make it easier for identifying later so just going to power in the last one here so once i saw the little fan spinning i knew that the whole thing was booted up grabbed the ip address of the last one and call it node 3. so i've just put the cluster in the corner there what we want to do is actually ssh into these pies so we're going to use putty to ssh and we're going to save down these so we can do it again and again so we're going to first list the ip address of the master node we're going to save each one of these sessions so let's call this masternode save and we'll open that now you're going to get a pop-up here whenever it's the first time this server host key is not cached in the registry do you want to trust it yes i do log in as the pi user and i want the default password which is raspberry enter and we're in cool so now that we've ssh into this what we're going to do is we're going to set up the raspberry pi's individually so the first thing we've got to do is set up the raspberry pi is sudo raspberry config so let's do that now to copy and paste something into the terminal is quite annoying but you have to just right click so control c obviously to copy and then to paste into the terminal you've got to right click so sudo we're going to go and open the config utility we're going to change the default password we're going to set the location time zone and expand the file system okay so let's go in to the default passwords now go ahead and enter your new password so now go down to location options what we're going to do is go time zone and the time zone we want to set is australia and we're going to make it not brisbane but sydney cool so once you've done that you can go out and finish so we're going to have a shared storage device and that's going to be this solid state hard drive here now you can use any hard drive obviously a 64 bit gigabyte usb will work as well but yeah it doesn't matter i've got this lying around so i'm going to use this this is going to be shared across all the four clusters as a network file system so go ahead and mount that to your main node your master node make sure you use the usb 3 port you'll note that i'm already logged in through ssh to my master node here which is step 4.0 now we're going to identify what that usb port is so lsbl k and you can see that we have the partition there on sda now for the reformatting we're actually going to use fdisk so we're going to make sure that we're the root user so you just go sudo hyphen s and that will make you the root user so after that we're going to go fdisk dev sda changes will remain in memory only until you decide to write them be careful before using the right command excellent so after that we're going to look for the partitions p and we can see our two partitions there so let's delete partition number one partition one has been deleted let's do the same thing for the second one so now we're going to make our new partition all we have to do is use n partition number we're going to default that as one now all you need to do for the first sector is just select enter for the last sector just select enter now that's created a new partition with a linux file system so to view that all we need to do is click p and you can see that file system there so excellent so we've created our partition now all we need to do is write that change so w and you can see that that's happened so let's just do the command that we used before which is lsblk and we should be able to see that new partition now so now actually to make our file system we're going to do this command here but we're going to leave the ext4 so we're just going to go mkfsdev sda1 and enter that we're going to let it do its thing okay so now that that's written there what we're going to do is we're going to actually create this folder directory and then what we're going to do is we're going to give it access and give it permission excellent now setting up automatic mounting and this is very important so i want you to identify what this you want u id is so if we just um go b l k i d well you can already see it up there u uuid and you can see that it's ext2 so what we're going to do is we're going to copy that we're going to sudo nano e t c f stab and i spot nano incorrectly once we're in here we are going to automatically boot so we can just open up a notepad and what we're going to do is we're just going to copy that uuid and place it in put it in place of the line that we've got in this tutorial and instead of ex t4 we'll just put ext2 so once we've done that we'll just copy it there and we can exit the file so after we've done that we want to mount the drive in that location that we've specified so what we're going to do is call that the cluster fs and we've mounted the drive excellent so we're just going to run these lines i'm not sure why we have to do them again but um i assume this is just because now that we've mounted the drive we're going to give them more permissions so now to export the nfs share we need to export this mounted drive so that other nodes can access them so what we need to do is insta install the nfs server so we're just going to copy that line download it great then we're going to nano etc exports and we're going to add the following line i will copy and paste this now update with your specific now don't forget to update that with your specific ip address i'm just going to blank that out and put in my specific ip address now lastly what we want to do is we want to run the following command to start that server that kernel server so sudo export fs all mount the nfs share on all the clients now so now that we've got that share exported from the master node we want to mount it on all the other nodes so they can access it repeat this process for all of the other nodes okay so we're going to have to install that net that network file share client create that mount give it access and then set up automatic mounting excellent okay so i'm going to jump into the other nodes now so now i'm in node 1 i'm going to install the nfs client going to make a directory i'm going to give the necessary permissions we want this to automatically mount so edit the etc f stab file sudo nano once i've done that i'll save it down now mount it with sudo mount all okay that was a bit of a workaround but essentially the issue there was that um here you actually need to white list your whole range of lan ip addresses so the way you can do that let's say that your network was 192.1 a 168.1 point a whole bunch of things then you go dot asterisk here now that will allow all your ip addresses on your network to be able to get onto this um stored file network file attached storage and yeah and then essentially you need to give it read write access and all these other things so you export that as all and then what you need to do is on each individual node create this file location and mount that external drive so now we need to configure the master node we need to get this scheduler slurm so it's going to be overkill for a personal cluster but it's going to be useful for us especially if we scale up so we've logged into our master node step one to make sure the resolution's easy we're going to add the host names to the nodes of their ip addresses in the ax the etc hosts file so we're going to add the following lines i'll just update that so go ahead and sudo nano into that file etc hosts save that down install the slurm control packages let that download we'll be using the default slim configuration file as a base copy it over so once this is done downloading we'll go through all these steps so now that we've installed the slam controller package all we have to do is configure it so first thing we're going to do is cd into this file directory here no such file exists and actually i think that it's now just slam yep so once we're there we're going to cp so we're just going to copy this file permissions denied so what we're going to do is we'll go sudo root try the same thing dot no problem once we're in there well actually we'll copy that entire thing nano slime awesome so now we're in the slam config great finally okay so we can start updating this information so we're going to set the hostname so the hostname we're going to set is our node so i'm going to call this master node and i'm going to put the ip address i'll just black that out for you guys customize the scheduler so we're going down to select type so now near scheduling we've got select type so we're going to change this to oh well it's actually already set we've got select um cons res and cr core by default so that's good set the clustering name well let's call it asx portfolio now we need to add the nodes so let's tell the cluster which compute nodes we have so there should be an example entry of the compute node delete it and add the following and i'll have to gray mine out here because i'll have those ip addresses listed great then we need a partition excellent so config c group support the latest update of slam brought integrated support for c groups kernel to do this we create the file c group no it could have been mistaken okay so i'm going to save this down save and should we do what this is telling us to do sure nano c group dot conf awesome and we'll just have to change bc group releases to slam file save that down and then we're going to white list this as well so we'll just actually nano create that file i'm not even sure what this is doing but i'm sure it's just white listing the specific components that can work with this dev cluster fs yes sda that's our drive our cpus ports etc great so just save that copy the configuration files to shared storage in order for the other nodes um to be controlled by slurm they need to have the same configuration file as well as the munchkey code copy those to shared storage to make them easy to access like so excellent a word about munch munge is the access system that slurm uses to run commands and process on the other nodes similar to the key based ssh it is a private key on all the nodes then requests are time stamped and sent to the node which decrypts them using the identical key this is why it is so important that the system times be in sync and that they all have the munge key file enable the start slurm control services so let's enable we'll just get back out executing and then we'll start the munch services the slum daemon so do the exact same thing again we're still on the master node here just setting it up and now the control daemon well this would have taken ages to work through the documentation so i'm grateful that we're um just able to freely copy uh what garrett mills has done here thank you for all your hard work reboot optional this step is optional but if you're having problems with manage authentication or your nodes can't communicate with a slam control it try rebooting it configure the compute node so it's turn install slam so we'll need to do that again update the hosts file copy the configuration settings munch enable test etc awesome okay so we're going to log out of that and do that for the other three nodes now we're going to sudo nano etc hosts yes so we're going to do is copy the file configuration sorry we'll just do this one at a time no such file exists we don't have the lln enter and again here we don't have the lln cool we will test the munge key copied correctly start the sudo enable looks like it has we'll start the process great and after that we will we can manually test munch see if it's communicating run the following to generate a key on the master node and try to have the client node decrypt it run this on the client and we call this master node pi masternodes password [Music] invalid credential so i actually encountered a bit of a problem and it was because i had to reset my master node i didn't share that in the tutorial but i did reset it and i forgot to update the clock so the clock is extremely important to have set up correctly synchronized along all these nodes otherwise the process will not work but if you did follow this tutorial step by step and the only difference in setup was the slurm dash llnl with a new instance you do not need to include the llnl part of that so it's just slim directory so if you just copy this tutorial exactly for step six um then you will be able to verify that the slam works i also found that you didn't need to actually use the pipe unmung so don't do that when you're doing this step just use this command here so now we're on step seven and we're just going to confirm that that works so s info on my master node and i can see that we have our cluster made up of three nodes compute modules uh all in idle state and they're they're called nodes one to three remember i'm on my master node so now if we just use this we're gonna run across the nodes use the three nodes and use hostname now if we do the same thing and we use two nodes we should only get two nodes back if we use one node we should only get one node back so hopefully that was a lot of fun and that you're going to do that at home um if not feel free to follow me next episode we're actually going to learn how to use slam and we're going to do some resource and computationally intensive tasks maybe do some scenario testing so stay tuned for that until next time youtube see you later
Info
Channel: QuantPy
Views: 21,092
Rating: undefined out of 5
Keywords: raspberry pi cluster, raspberry pi, raspberry pi server, managing a cluster, raspberry pi cluster rack, controlling multiple raspberry pis, how to build a raspberry pi cluster, raspberry pi cluster case, how to control a raspberry pi cluster, managing a raspberry pi cluster, slurm cluster, quant, quant finance, mini supercomputer, slurm, using slurm with python, how to install slurm, how to install slurm on ubuntu, how to install slurm on, how to install slurm on ubuntu 20.04
Id: l5n62HgSQF8
Channel Id: undefined
Length: 33min 4sec (1984 seconds)
Published: Tue Nov 16 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.