100x Faster Than NumPy... (GPU Acceleration)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] if you've coded in Python before you've probably come across the main python libraries two of the biggest examples of course being numpy and scipy these examples are used extensively in the scientific community and for a good reason they provide many important functions that are used all the time in scientific research but compared to what we're about to explore they're nothing with the right tools you can take a simulation of 400 particles and turn it into a simulation of ten thousand particles but at no extra time cost the secret of course is proper utilization of computer hardware and more specifically the GPU which most people have access to these days welcome to the first episode of the GPU accelerated python series the boltzmann distribution [Music] so the general idea behind these videos is I'm going to be taking code that I wrote in an older video in this case it's the boltzmann distribution code uh put a link to that video in the description and it'll be on the right hand side here and then the new optimized code will be on the left hand side here and then you can actually see how much faster something like Pi torch in this case makes the code run uh on GPU I should say by the way in this video that the graphics card I'm using is a GTX 3070 so depending on the speed of your graphics card maybe you have a faster one or a slower one the code might run a little bit faster or slower but the 3070 is a pretty you know mid-range graphics card in terms of you know the sort of things that scientific researchers will be working with so the first thing to note is the Imports and you'll see that I have a few more Imports here and less here so in order to get this GPU boost we're going to be using the machine learning library Pi torch and so that's you know exactly what you might think of it you think okay pytorch that's machine learning that's training AI models Etc well the reason why pytorch is so good at that is precisely because it uses the GPU and if you go into the back end functions that pytorch uses they're all optimized functions for you know linear Transformations and things like that and you can borrow those into other applications in this case we'll be looking at it directly but there are also libraries that build you know fast integration fast other things on top of Pi torch So today we're going to look specifically at pytorch and how it can help for this simulation but in future videos we'll get more into the other functions so I'll import my libraries here and I'll import it here so this is the new code the fast code here and so remember that what we're doing here is we're simulating the boltzmann distribution so the question is what is that well the idea is that if I have a bunch of particles in a box and they're all sort of flying around and bumping into each other what happens over time is that their velocity the distribution of their velocities so the statistics of their velocities approaches what's known as the boltzmann distribution so no matter what initial state they're in you can have a bunch of them coming to the right at 500 meters per second to bunch of them going the other way at 500 meters per second and they start in a random state but after colliding with each other and bouncing off the walls a bunch it will always approach a certain distribution and so you can actually perform a simulation of this and prove it and so that's what we'll do here and so the idea is that you need to simulate a number of particles in this box simulate them colliding with each other bouncing off the walls for you know an extended period of time and then look at how the distribution of their velocities changes over time so in each case we'll take 16 particles here and we're going to randomly initialize their positions now in this case we're going to be in a 2d plane it's not a box it's like a square they're all confined in the square and so what you need is a random x and y coordinate so if I have 16 particles and I randomly assign a bunch of particles here using the torch.rand function you can see that my r is going to have a bunch of x coordinates and a bunch of Y coordinates 16 values each same thing here with numpy I do the exact same thing it's just a different function so first thing to notice is I'm using torch.rand the same way I would use numpy.random.random and so what that does is it uniformly distributes the um particles in the square and what I do is I give certain indices to these particles so whenever a particle is greater than 0.5 so by the way these particles are distributed between zero and one so the ones that start to the right I'm going to call these Index right and the ones that start to the left I'm going to call it index left so I'm keeping track of which particle started on the right and left hand side the reason for that is when I make an animation of the simulation I want all the ones coming from the right to be red and the left to be blue and then you can sort of see how they all mix in together it's just a little visual feature and I do the exact same thing here so now I have my indices as well so if I look at uh index R and notice it's a tensor it's a pi torch tensor so it basically tells me these ones are on the right and then the index left of course tells me which ones are on the left so I get the same thing same thing here only with numpy arrays so this is the slower way of doing things of course not so much at this point because I only have 16 particles but I'm also randomly sort of putting particles in places and I have my Index right and index left of where the particles came from so the next thing we're going to do is we're going to give an ID to each particle so there's 16 different particles there'll be 16 different IDs and so here I use the torch.range and it just basically generates a list from 0 to 15. these are going to be the IDS of the individual particles and again torch.arrange is analogous to numpy.range so I can also generate the IDS in numpy the slower way like this and I also get my thing here notice that this is a tensor this is an array and the tensor is an object that can live on the graphics card and speed it up quite significantly so now I have my initial particles randomized in the box and I can plot the initial configuration of them just by doing a scatter plot with all the locations of the particles so here I can make a scatter plot you can see that in torch for example I have to convert to CPU before I plot so I've assigned that I'm using my graphics card here using pytor and so whenever I plot something you you have to take it off the graphics card first so I use dot CPU that allows me to plot this tensor well not this tensor but this tensor here which lives on my GPU device on the plot here so that now I have my plot of the particles and I can do the same thing in numpy notice that in numpy I don't have to call CPU so I have my particles on the right starting as red and the left starting is blue so that's my starting location and eventually I'm going to need to assign velocities to them and we'll actually create the simulation so again same thing on both sides we're going to uh sort of initialize the velocities of the gas and we're going to make all the particles on the right hand side move to the left at 500 meters per second and the left-hand side they're going to be moving to the right at 500 meters per second so you have these two gases colliding with each other and then you'll show that over time even though all the particles have a speed of 500 meters per second it will eventually distribute in such a way that the distribution of the velocities is according to a boltzmann distribution which we'll get to so the way to do that is well okay I initially sign all the velocities as zero right so if I just look at the shape of this tensor V notice that it's shape 2 by 16 because that's how many particles I have so I can look at V Dot shape always very useful in both um numpy and Pi torch so these are the XY coordinates of all 16 particles uh then what I can do is I can say okay I'm going to take v0 so V 0 is going to be the um x velocity of all these particles so right now it all starts at zero and at any of these indexes remember these ixr is indexed to the right so I Index this array only the values at these indices so only the points on the right get a velocity of negative negative 500 meters per second and to the left get a velocity of 500 meters per second so if I look at V after running this cell you can see that some of them are moving to the right at 500 meters per second and others to the left at 500 meters per second and that depends on their location so if I confirm this with r you can see that of course this one starts at the left because it's less than 0.5 so it moves to the right whereas this one is starting on the right hand side it's a little bit greater than 0.5 so it's moving to the left so useful way to understand these arrays so of course this here on the left was done with tensors on the right hand side it's done with numpy using the np.zeros so you're starting to see that every function in numpy is analogous to a pi torch function which is really handy when it comes to taking code that's initially coded in numpy and converting it over to something like torch for that huge speed boost so this is probably the most difficult part of the simulation remember we have a bunch of sort of billiard balls in space they're traveling towards each other and they're going to collide with each other and when they Collide they're going to transfer momentum to each other that's how the speed of the particles changes and that's how the distribution of all the velocities approach the boltzmann distribution so we need to determine for example if two billiard balls are going to collide with each other of course in both cases and so you have to think okay well how am I going to keep track of this because basically what we'll do is we'll run a simulation where we advance time a little bit they all move a little bit then we check to see if they've collided with each other if they haven't collided with each other then we move them again move them again move them again and for any pairs that are colliding with each other we need to send them going in opposite directions so we advance time a little bit at a time we check for particles that have collided with each other and for those particles we change their velocities so in order to determine if two particles Collide you need to find the distance between all the different pairs of particles so that's important when you consider all pairs of particles so if there are n particles there's going to be n times n minus 1 over 2 pairs that's just simple combinatorics so if I have 16 particles I'm going to have 16 times 15 divided by 2. so 8 times 15 is going to be 120 pairs so you see why these simulations get complicated if you're simulating 16 particles you already have to keep track of all these different pairs and the distances between all these different pairs remember we need to keep tracking of all the different pairs at once so for 16 particles if I have 120 different pairs I need to Loop through all 120 of those pairs and check to see if the two particles are within colliding distance every time and so that can be difficult and so this is combinatorics and to make this easier we'll get pairs of particle IDs this is just so to understand at the beginning and also to compare the two functions so this is really in some ways where the real speed up comes in pi torch with large numbers of particles so you'll note that in the numpy case on the right here first I use this combinations function so what it does is it takes in this list of IDs just the IDS looks like this and it's going to give me all possible pairs of IDs so I convert it to a list and then I convert it to a numpy array and so to give me for example 0 1 because that's one pair 0 2 0 3 0 4 all those sorts of things and of course if I look at the actual array it does give me exactly those things so it's the pairs of all the particles in this case it's the pair corresponding to the index of that particle now it turns out that Pi torch is a function that does that exactly as well so here IDs is the tensor and it's also going to give me the pairs of all particles so instead of having to use this combinations function I can use this built-in Pi torch function to do this and sure enough it gives me the exact same thing remember this is an array it lies on the CPU whereas this is a pytorch tensor of course it has the option to be on the GPU which is exactly where I add it so I I go dot 2 device and that puts it on the GPU so any operations I do with this are going to be very very fast so remember in the simulation we need to consider the distance of all pairs if we're doing running a simulation where we advance time Advanced time we need to be looking at the distances of all the pairs and then for any pairs that lie within a certain distance when the two radius of the balls or the billiard balls touch whenever we get less their distance between the centers is less than the two times the radius then they collide with each other that's sort of how billiard balls Collide as soon as the distance between the centers is twice the radius they're going to bounce off each other so that's how we need to simulate here we need to check for that every sort of iteration of the simulation and so we need to ask how are we going to do that well the first thing we can do is we can get the pairs of all x positions so remember that r0 gives me my X position here and if I want the pairs of all x positions well I need to get the common of Torx the same way I did with the particle IDs with this array here and so what I can do there is I can use the same combinations function as above and sure enough I get the X pair distances so this gives me x one of one particle and of the other particle as well and it corresponds exactly to the IDS pairs as well and again I do the same thing in numpy here and I get the same X pairs so if I want the distance Delta x i j between pair between particle I and particle J all I need to do is take the difference between these two values here and that gives me Delta X between particle 1 and particle two so my particles are here Delta X gives me this distance here and so I just use the torch.diff function all it will do is I'll take the difference between these two elements so I index along this axis equals one so it takes differences along this axis and then I unravel it to get a 1D array and this will give me an array of the sort of distances between all particles all pairs of particles so remember I only have 16 particles but my distances array is going to be much larger so now the total distance well I need Delta X of the pair plus Delta i y of the pair squared this gives me the euclidean distance between the pairs and whenever this distance between the pairs is less than twice the radius we know the two billiard balls Collide and we have to send those two ones in opposite directions so instead of going by all the code that we did about above I get my X Pairs and Y pairs DX and d y and then I just you know take the sum of the squares which is the euclidean distance and I get the distance between all my pairs of particles and remember that these pairs these distances correspond to all the different ID pairs as well um IDs pair I think I called it so for example this distance here corresponds to particle 0 and particle one or for example this part distance here you'd have to scroll down and find the corresponding pair that that corresponds to so these distances give me the distance between all pairs and I need to look every time I run the simulation I need to look at all these distances and whenever this value is less than twice the radius I need to make the particles collide with each other so the next question is you know once these particles do hit each other what's going to be the corresponding velocities afterwards in other words if you have two sort of objects collide with each other in space how do you use momentum transformation for spheres remember the Spheres are going to hit at a certain angle what do their final velocities look like well I'm not going to go into the physics you know you know too much depth here but you can show and this would be a good exercise with spheres and geometry that the final velocity of the two spheres well the first sphere is going to have this final velocity where you have you take your first velocity and your second velocity and the corresponding positions as well of the two particles and you can get your new velocity for sphere one and your new velocity for sphere two and so this holds obviously in you know both simulations and so for example if I set my radius equal to 0.06 and I want to know which pairs are colliding I can use my D pairs array so this is the distance between all pairs and I look for values where it's less than two times the radius so if I go like d pairs is less than 2 times 0.06 it will give me the location of the billiard balls that are already colliding with each other by the way this is bad if they're colliding with each other before the simulation starts it occurs for when the the particles of the gas are sufficiently large so 0.06 is actually a large radius here um but basically every iteration of the simulation it will tell me whether these are um colliding with each other so I need to pay special attention to these indices here notice that I use the exact same notation in pi torch and numpy to get these so in this case this is giving me the pairs of the particles that are colliding in this simulation which is done on Pi torch and this one in numpy so I just re-ran the simulation there was a little bug here with my computer but now the indices are the ones colliding on the left are zero two three four the ones on the right are one eight five eleven so sure enough if I index my velocities and I actually look at the corresponding new velocities you can see that these are going to be the corresponding V ones and the corresponding V2s of the particles that are colliding and of course I can get my positions of the r1s and of the r2s so then we basically just use this equation here above of course in this case note I'm using the torch.sum function in this case I'm using a numpy functions to apply this operation then we basically just use this equation here to compute the new velocities here of course I'm using torch functionality here I'm using numpy functionality but it looks very similar so now we basically have all the functionality we need to perform the simulation everything that I include in these functions here is stuff that we've done before so in this case here's like getting the distance between the pairs using that uh combinations function for both numpy and torch this function just computes the new velocities given sort of these old velocities and old positions here uh the only thing we're adding here is the motion so this is something that is going to be called every sort of iteration of the simulation where you slowly move in the particles and colliding them Etc so here I'm going to store the position of my particles at all times because eventually I'm going to want to make an animation of the simulation so right now I started as just empty uh same thing here in numpy you can tell the difference between the two functions my initial state is going to be the initial state of R here so that's the randomized state that I specified above same thing with the velocities I also pass in the ID of all the pairs the number of times that I want to simulate so maybe I want 10 000 iterations the delta T between different iterations maybe it's like 0.0008 seconds or something and then the D cut off which is basically you know two times the radius of the particles uh then what I do is I Loop through for each iteration of the simulation I get all the IDS of the colliding pair so that's IC for Collision um I set the velocities at these corresponding indices equal to the new velocities so I'm updating part of that array based on these uh Collision IDs it's the same sort of um thing I was doing here to get the values of the arrays um and then what I do is if they're hitting the sides of the box as well I have to adjust it so anything that is tries to pass outside of the box well it gets a negative velocity so it has positive velocity and R is greater than zero it's just going to reverse its velocity same thing for the left hand side of the box and the top and the bottom so this is something new as well again you can see the corresponding functions in numpy it basically looks the same only I'm replacing torch with NP so you just have to know which torch functions to use so I can Define these functions in both cases then I'd basically Define the radius of my particles I set D cut off equals two times the radius I set a very super small delta T and I simulate for a thousand different time points here so not super long but again remember have to keep track of many particles so here's where you have to start paying attention to the computation time and it actually won't make a difference or it might even be worse for pi torch for small numbers of particles because there's an overhead associated with using the GPU so if I run this a few times just to get some statistics here you can see it takes almost exactly 2.1 seconds to run each time and let's look at numpy in fact um here the radius is a little let me set the radius the same size just uh for consistency look at that that is interesting it runs you know 20 times faster for numpy than Pi torch and I'm here claiming pytorch runs faster than numpy and so someone who you know does this okay we have our simulation we've shown that numpy runs a lot faster why use Pi torch well you have to remember there's an overhead with this and it turns out that as you ramp up the number of particles the GPU actually does become way more efficient and I'll prove it to you but as you can see right now this is very interesting numpy seems to run faster 0.1 seconds versus two seconds but don't worry I'm going to show you that pytorch is indeed much faster maybe not for 16 particles but definitely for 40 000 particles uh so this is just a little plotting script this is just going to plot circles on the screen showing the locations of all the particles so you know this is just some code here I don't want to get too much into the actual animation here if you're interested you can look in the code that I'll put in the description below um there's some code here as well that I have to make an animation of the simulation in both cases so I'll run that I suppose for both cases here now I should note that here I'm using a much better animation plotting script than I did here in my old video uh this is because I'm using the technique of blitting so this whole series is about optimization um blitting is a way of only redrawing uh things on the new frame that have changed so in this case only the positions of the particles have changed so I don't need to redraw the axis every time um and here I'm not using blitting and it makes a big difference for you know large animations where especially where there's a lot of different things on the screen moving around and so again we can actually look at this animation uh I'll just show it here um so you can see the particles are colliding with each other and bouncing off the walls and you know it looks just about right sometimes you get this issue where two particles when they start in the same position they become connected like that and that's because they start with their radius less than um you know like their initial position they're inside of each other that can lead to glitches that usually happens for simulations where the radius is pretty big here and there's ways to fix that if you're clever but I didn't want to get too much of that into this video um but for many many many small particles it doesn't matter as much so now we're going to make a simulation with many more particles and what I'll do is I'll start with 400 particles each and all I'm doing is I'm running the same simulation above except with 400 particles in each case so I showed you that numpy runs 20 times faster than Pi torch which is my plane runs faster so that doesn't look good for me right now but watch as I run up to 400 particles it's going to run faster reason for that okay remember 400 particles I need to keep track of basically 400 squared different pairs so the computational burden goes up as a factor of N squared for the simulation so let's first run it in um numpy and see how long it takes and I think it takes about 40 seconds so I'm going to skip to the end okay so the simulation just finished in numpy it took about 48.5 seconds so not the end of the world I mean 50 seconds that's pretty you know reasonable in terms of if you're doing if this was the total number of particles you wanted to simulate for all intents and purposes I mean sure it's fine now let's go to pytorch and see how long it takes to run two seconds so two seconds versus 48.5 seconds okay and you think all right well what does it really matter why not just wait an extra 40 seconds for your simulation I mean it's not that much of a speed boost you could just you know 40 seconds things take way longer anyways but the real power here is okay maybe it's as fast for 400 particles but let's now do 10 000 particles and let's see how long it takes to run okay so look at that this is beautiful you can see it took about almost exactly the same amount of time as the numpy simulation only we have 25 times more particles in this case so you tell me which one is better and by the way this is super important to sort of illustrate where gpus become important because you saw above okay like this is doing 25 times more particles same amount of time whereas here it took two seconds and here it took 0.1 seconds that's awfully confusing this one is like 20 times slower and again there is an overhead for using GPU but when you have powerful simulations like this where many things are happening at the same time you know that it speaks for itself that you can do 25 times more particles compared to numpy and not a lot of people know about this that's why I'm trying to put these videos out there is that you know for a while I was like that too I thought okay numpy is the most optimal way of doing everything well obviously that's not the case if you have good Hardware so it's so important to like look at your task and think can I use a GPU to sort of speed it up and if it's you know where you're doing a lot of like simulation type work the GPU is just essential so okay and I want to show you sort of why that this 10 000 particles is better for showing that it approaches the true sort of uh boltzmann distribution so remember I said the the speed of all the initial particles was 500 meters per second they move around they collide with each other in the box and eventually it approaches some sort of distribution uh that probability density function is equal to F of V this problem PDF and it's basically um a normalization factor times V times e to the negative V squared so it's a function that looks like x e to the negative x squared kind of with some you know proportionality factors here uh same thing sort of here and so what I'll do is I'm going to plot an array of these and here I'm actually just going to plot the PDF so I get the PDF here of course I'm using numpy on this side it doesn't matter this is just for plotting purposes and then I'm going to plot the final histogram of the velocities so the v's uh next to the probability density function in both cases um so here of course what I'm doing is I'm taking the sum of the squares at the 400th time point right so the sum of the squares that's like square root of v x squared plus v y squared and then I take the square root so that gives me the magnitude of the speed so I'm going to make a histogram of that in this case I'm going to use um 50 different bins and in this case I can only use 10 bins because I have much less statistics so let's look at the difference of these plots so here's uh using those statistics and here's using these statistics so you could see that as I simulate more particles look at that you get a much better sort of Statistics that approach this uh probability density function so of course x axis is velocity and this is a PDF and so a lot of the particles although they all start at 500 they sort of approach this distribution here uh the same thing happens here but of course you know when you when you have so little bars it doesn't look great and I could always add 50 uh bars if I wanted to but it's going to be very noisy right and also there's less particles so you know um in this case I only have 400 particles so they're not colliding with each other as often um it's it's much better to use more particles and then you get um far more statistics as well on your histogram so finally let's make an animation of both cases because I think it's really interesting to uh look at this so here I'm going to make the animation with um well this is uh 10 000 particles and I'm not going to run this so this is another thing as well this is again updated code from the previous video in this case I was doing something very slow uh ignore what I did in the previous video of course uh here I'm using a much more optimized way using blitting so of course I'm only so you can see that Blitz equals true when I run the animation and I'm only updating the things that change so in this case it's the location of the billiard ball markers um and so I you can run both these codes to make the animation and you'll see this so again they initially all start at 500 meters per second they collide with each other and then you get this beautiful convergence to this particular distribution function here so it's you're basically validating a law of physics and so yeah this is a very simple use case like this isn't going to be modern research or anything but of course as soon as you start dealing with particles that have complicated forces between each other and interactions and things like that they might converge to a different distribution and that can be interesting for novel research but the main point of this video is that we had an application where we were doing a simulation that involved many particles and we use the functionality of Pi torch to massively speed up the computation time of that simulation so thank you for watching this video there's going to be much more videos in the series so I look forward to seeing you next time uh please like And subscribe of course join the Discord server below order one of these beautiful t-shirts here uh with your fellow Billy on it as well uh and I'll see you next time

Info

Channel: Mr. P Solver

Views: 78,084

Rating: undefined out of 5

Keywords: python, physics, speed up python, speed up python code, how to speed up python, python speed up code, how to speed up python code, gpu, python speed, python speed test, ai physics, blender 2.8 gpu python, accelerate python with gpu, gpu intensive physics simulations in blender, physics simulations in blender, cloth physics - python/pygame devlog #4, cloth physics, physics ai, physics podcast, gamedev physics, quantum physics, jax python, machine learning python

Id: iSEAidM-DDI

Channel Id: undefined

Length: 28min 48sec (1728 seconds)

Published: Thu Apr 13 2023