Raspberry Pi Supercomputer Cluster

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello my name's Gary Sims and this is Gary expense they ought to look at how you can build a supercomputer using a cluster of Raspberry Pi boards so trying to find out more please let me explain now modern-day processors our multi-core processor even on a Raspberry Pi you might get four cortex a53 cores in smartphones there might be eight cores in laptops four cores in desktops and servers maybe 16 cores 32 cores just depending on how much money you want to spend now when you write a program that has to do lots of computing intensive tasks the more cores you have on that process of the better so if you can split up your tasks into individual chunks and then you can give each core that piece of data then they can work individually and then come back and give you a result and that's multi-threaded programming now have a whole video on multi processing and multi-threading and multitasking and I'll leave a link to that in the description below now the problem is if you want to calculate something that's very very complicated you know predicting you know weather systems or you know all kind of things that they do in you know with supercomputers then just having 16 cores or you know even 32 cause it's not going to be enough you need thousands of cores and that's where a supercomputer comes into a supercomputer in this context is a set of individual nodes so each one is a computer to boot its own operating system it has a CPU multiple CPU sometimes with multiple cores in them okay and that they run but they're all connected together so you have all these nodes together and accumulatively together they might have let's say you know a thousand calls or or whatever it is that you're aiming for now the problem is when you write a program for multi-threading you're running on side that one computer so you compile the program you run it and it runs there when you have all these nodes you have to have a way off the computers to talk to each other and to share out this task and then once they've done their chatting and their processing they can kind of come back and give you the results so that takes a different style of programming than just maybe traditional multi-threaded program that you might get on a on a PC or a laptop and so there are three stages in building a supercomputer one is you need the actual hardware the second is you need to actually do some setup and the third is you need to write your program sets look at those three stages now obviously in a real world supercomputer you're talking about lots and lots of hardware that's got lots lots of cooling and it requires a lot of electricity and it costs a lot of money but you can actually build the concept of the supercomputer using something simple like the Raspberry Pi then you don't have problems with price you don't have problem to the electricity bill you don't have problems with heating but actually you can still replicate the theory and the basis behind supercomputing so in this example I've got myself four Raspberry Pi module three boards I've got them together here in this nice little rack and they are all connected using USB power so it's very simple to get it all up and running they also all have Wireless built into them that means I don't have to run an Ethernet cable to each one I can do it over the wireless network so once you've got the hardware together the next thing is to do the kind of the the setup the software configuration and to do that each of these boards is running a raspbian and also I've done two important things one is I've made sure that each board can talk to the other board using secure shell without a password but using public and private keys now I'm not going to go into how you set that up now there are lots of tutorials on the internet but the idea is basically this you should be able to copy files using SCP or login to any of the nodes in your cluster using SSH with private public key which means it doesn't ask you for the username doesn't ask you for the password it just connects straight away and that's needed because when all this data is flowing about you need authenticated secure ways of transmitting that data without you having to type in passwords and worry about is this message allowed to come from here or not so you have to get that set up first of all and the other important step is you have to install some kind of MPI library MPI is the message passing interface and this is a standard way of sending blocks of data from one node in a supercomputer cluster to another and there are very different implementations of it I'm using MPI for Python which is a way of allowing MPI messages to be sent across a supercomputer cluster just using the Python programming language then the final step is you need to write the program itself now it's not like writing a normal program we just go around in a loop and say hey let's do this you can't just say oh please magically run this on all the nodes in my supercomputer you need to write your program in a specific way that it knows it's running in a cluster and it knows it's got these other knows that it can talk to and it knows to send that information to then it knows to gather that information back in again so that it can analyze the final results so what I've done is I've wrote a program that allows me to check if a number is a prime number or not and I'm going to do that by sending out that in that number to one of the nodes and getting it to check it and then send the answer back in fact the way it works like this with four boards that means I've got 16 cores with each board having four cores so the MPI program sees it as a sixteen node set up and what I actually do is I create an array of sixteen numbers and then you can say to MPI please scatter that across the nodes and what it will do is it will send one number to each of the node which really is actually four node each with four cores and then at the receiving end those nodes will receive that number and say right this is my number I've got a check if that's prime or not it will check if it's prime or not and then it will send back the result and so the master node gathers in the number so these are two of the important words for MPI scatter and gather now there are many actually different models in the MPI setup and the different ways of using them this is a simple way of showing how you send out the data and bring it back in again now actually if you run that for just one prime number it's actually really slow that's he's slower than if you did it on just one no and the reason for that is is to get this running each of the nodes has to first of all spin up and be ready to receive these MPI messages you then have to send the MPI messages over the network it then has to do the calculations then has to send the messages back they have to be gathered in again analyzing the results printed out so the overheads of doing all that just to check with a one number is prime or not is really really a high particular check for lower numbers you know is 7 a prime number well I send the message over a network it works it sends it back again well that'd take forever compared to me just checking locally if seven is a prime number or not so what they my program is actually I've created a huge chunk of data that maybe got a thousand numbers for each of the nodes and then that gets scattered across all the nodes please check these 1000 numbers then come back and give me the result now even this method itself is not very efficient there's a lot of number you've got to send out I could maybe for example send a beginning and end of a range maybe that would be easier but this is the way I've chosen to demonstrate that you start with a big chunk of data and then you scatter it across the node in your supercomputer finally we want all the note of process a thousand numbers each they come back and they tell the master node here are the result and then very quickly it can say well these are all the prime numbers from the list that has been checked ok so there's two things in quickly one is this program will be available on my github repository and you will find a link to that in the description below and now let's head over to my cluster log in via the command line and let's see this actually running in real life ok so what we have here is for terminal windows each one connected to one of the four boards and the one here in the top left is the first one second here on the top right thurber on the bottom left-hand fourth one on the bottom right and what we're going to do is show you that each one is basically not doing anything to idle you can see here the full processor cause each one is pretty much doing nothing so what would they do first of all is I'm going to show you that I have here a program written in Python okay that is actually you'll find it on every single note if we get any hit a number 4 for example we can do exactly the same thing and we can see that the same script is there and that means that it when it runs across the supercomputer nodes the same script is run on each of the 4 different machines will put this on back now into H top ok and so the first thing we're gonna do is we're gonna run this script just on one node ok and the way you do that is you use this HP a mpi exec program the time but here at the beginning is a way of seeing how long it runs and then it says basically run Python and then this program I just showed you to run now just on one Raspberry Pi so notice it says there are only four cores okay and it's starting to find the prime number and if we look here the other device if we can see their activity has remained zero very idle there's nothing going on and this will take about 30 seconds or so to complete as it finds all the different things in the other and three nodes here are doing nothing is all happening just on this one node so it's about to come to its completion now and we'll see how long that took 90 mm prime numbers found so far hundred one thousand found it should five hundred and nine thousand nine hundred took 33 seconds okay so that's what it does if you run it on just four cores on one Raspberry Pi board now before you run it on the cluster you have to define a file which I have called cluster file which has got the four names of the different the IP address of the different clusters so here we can see 33 43 53 and 47 so here if I do an IP address show W land 0 we can see here that this is 47 number 47 which the last one here if we go down here to this bottom one here in the bottom right hand corner and run the same command IP address wlan0 we can see here that this is 53 okay and the other two are also listed as in that cluster file ok so now we want to run it across all the different node in the cluster so what we do is we use this command the same e an MPI exec but now notice we have this pit here the talks about host file cluster file which points it to that file I created with those lists of IP addresses ok so now when we run this we will be able to see that it runs across all the different nodes let's just fire that off and the first thing to notice notice now the other devices here the other boards all starting to run at 100% as they are being used notice how it now says total calls 16 because running across all of the different things if you look here on this one for example you can see these first four tasks the Python Tarson there it has is finished and it finished in 16 seconds okay so that was twice as fast now some of that is because of the overhead if this was running for much longer for several minutes for several hours then that overhead would become negligible but what we saw was that by running it across all four nodes in the cluster using 16 cores we got a much much faster time and that is the essence of how you run to super computer programs okay so there you have it there you saw the program running there and you saw that it is quicker using the cluster I'm sure I could actually make it more efficient to make it even quicker on the cluster and it would gain efficiency even more so you've got to even bigger numbers but that's how supercomputers work and so it no matter how complicated the task you still have to break it down into smaller tasks and those smaller tasks are run on the individual nodes in your huge supercomputer and of course you could build a computer much more powerful you could do this using pcs you could do this using laptops but of course then you do have the issues of cost electricity and of cooling whereas if you do this with four raspberry PI's even ten raspberry PI's or twenty raspberry PI's you get the idea of how to distribute the tasks across many many nodes and then you can verify that and then one day if you do actually get access to a real supercomputer you can run that program on there and see how quickly it really does run okay that's it my name's Gary Sims this is Gary explained already hope you enjoyed this video if did please do give a thumbs up don't forget to subscribe and that's it I'll see the next one

Info

Channel: Gary Explains

Views: 792,916

Rating: undefined out of 5

Keywords: Gary Explains, Tech, Explanation, Tutorial, Supercomputer, Cluster, Raspberry Pi, MPI, Message Passing Interface, Supercomputer programming, Rasperry Pi 3, Raspberry Pi 3+, Raspberry Pi 3 Model B, Python

Id: VzcarXuVUvU

Channel Id: undefined

Length: 12min 32sec (752 seconds)

Published: Thu Jun 13 2019