Prime React: Fast Inverse Square Root — A Quake III Algorithm

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

the fast squirt am I allowed to call it the fast squirt okay who is seeing this one I've seen it a couple of times only to forget it in a week I know I watched it I'm like wow I'm smarter like 30 minutes later someone's like Hey how do you do the fastest in uh the fast squirt and I'm like uh uh should we watch it okay well everybody wants to watch it all right let's go game company it's software open source the engine for their video game Quake 3 arena in that source code fans of the game discovered an algorithm that was so ingenious it quickly became famous and the only thing this algorithm does is calculate the inverse of a square root it was the inverse Square it is the inverse squirt see I thought it was just the squirt not the inverse chord by the way this is clearly called a squirt the square root that's not true that's not real okay if I had to write a piece of code that would calculate the inverse of a square root this is how I would do it here I'm using the C programming language the same programming language used for Quake 3. but to be fair I wouldn't actually write problem one didn't use rust it's probably faster in Rust honestly they probably it's so safe and fast it probably works every time write the square root in there myself people who work more closely with the c language or with CPU design have already figured out how to calculate the square root and the provided algorithm to us in the math.h file that we programmers can then just include in our program so what could possibly be so interesting about the Quake 3 algorithm How does its software calculate inverse square roots at first glance it doesn't seem to make any sense where does this number 0x5f3759df come from what does this have to do with taking square roots and why is there a disgusting curse word in the second comment there's a disgusting curse word yeah no he should have abbreviated it if you would have put WTF would have been way less offensive he should have got the I mean come on everybody knows about this I know it is actually funny that he put this comment here and this is somehow he has text reveal animations I know those were pretty sexy I'm not gonna lie to you this is the only way one should comment code it's a sexy curse word I like how this one's commented out though I don't know why second iteration can be removed oh yeah we'll learn about it should be magic happens here that's really okay so I I did one time come across a piece of code when I was programming robots for the government uh to help destroy people uh there was this one piece of code that was in C and it was just a series of constants and variables just being right and left shifted and added to and the only comment on it was we add two because it's 5. not a single other comment in the entire system and guess what there was neither a two Nora 5. anywhere within that code a bunch of constants oh there's a bunch of numbers there were sevens there was Niners I mean there was everything in there but there was not a two or a five explicitly and I was just just so confused I'm kidding it didn't actually there's no killing anybody okay it's just a fun store to say you know uh senior VP of lead engineer I know caviar no I haven't I bet you the Apollo Source probably has some pretty hilarious stuff in this video I will show you how with some cool bit manipulation you can take in for square roots and the algorithm that does this Bears the name the fast inverse square root first of all why would the game engine want to calculate 1 divided by square root of x if you want to implement physics and like I don't care we already know looking at the code again we can see that the beginning is pretty harmless we are given a number called number as input the number we're supposed to take the inverse square root of first with the variable I I'm pretty sure this is actually a type I mean I've been doing a lot of typescript lately that's that's actually the type the the the name is is float we declare a 32-bit number then we declare to a 32-bit decimal numbers X2 and Y and then we store 1.5 into the variable with the obvious name three halves the next two lines is simply copy half of the input into X2 and the whole input into y but it's after that where the magic happens take a moment to look at it again okay so obviously the first thing right here is that I assume a float and a long are the same size in C architecture right because it's a double and a long long correct did I get that correct I think I'm correct on that one so by casting it as a pointer taking a reference and casting the reference as a pointer to a long you don't convert you don't do a convert cast instead you actually just you literally map the floating bits onto a long bit right what the hell's even that I just I literally just explained it so I I believe I understand what's happening here so before he goes on here just looking at the code we have a pointer to y Y is a float we cast it as a pointer to a long and then we concrete ties it meaning that the bits that are in the float are mapped now in the long it's not value casted it's bit value casted by the way this does not lead to security bugs okay this is not how you accidentally rmrf your home directory because you made a tilde directory in one of your subdirectories and you didn't realize it that's not how these things happen okay well the longer you look at it the lesson makes sense and the comments on the right are not really helpful either but they do hint that there are three steps to this algorithm puzzling together these three steps will show us the brilliancy of this algorithm but before we start with these three steps let's first take a look at binary numbers we said that in the first line we declare a 32-bit integer in the C programming language called a long that means we're given 32 bits and we can represent oh I really think one of the greatest tragedies in all of programming is that there's int there's short there's int there's long and there's long long and nobody was like we should probably just use numbers how about we just call this one i-64. oh there's also Char and bite which Char may not be a byte equivalent in length have you not had a long long yeah there's long longs it just just to me that's like the greatest tragedy ever that in during during the conceptions of this that they didn't they're just like ah it's this big will happen if we want to double it that's long end I want to long it or a short end really you can NC I believe you you know you can just say short or you can actually say short int so it means cut in half short int long end yes it's there's no double long long thankfully we now have standard end yeah the standard end I only use that specifying Things based on like words and meanings is just the worst uh let's say that long long long and short falls under the exact same problem as like undefined versus null you're putting meaning on something you're putting you're putting some sort of like idea on top of a word that doesn't fit one to one I hate I hate those kind of remember with it okay but I think you all know how to do that this is one two three four and so on up to around 2 billion but in the next line signed long the we declare two decimal numbers and C called a float again we're given 32 bits and we have to represent the decimal number with it how would you do that if you and I were designing decimal numbers this is probably one way we would do it just put a decimal point in the middle in front of that's not what you do I had to hand calculate out these things as part of like graduating from college you know how hard it is to do floating Point operations or like how you do multiplication on a CPU like the iterations of addition and all that stuff that stuff's really hard it was really fun it's a decimal point we count in the usual way one two three four and so on and after the decimal point there no surprises either just remind yourself that this is binary so instead of tenths hundredths and thousands we have halves fourths eights sixteenths and any combination of them yeah give your three-fourths also known as 0.75 but this idea is actually terrible yeah it's fun in the sense that I think a good way to describe the difference between long and long long is by looking at Theo's mustache compared to Primes foreign mustache okay we're not gonna do that not in this chat okay this chat okay stash comments let's keep it on the minimum okay you know sure mine is majestic beautiful full-bodied free range and grass-fed and sure I mean Theo's mustache is not nearly as glorious I mean but he's trying you know what and you can't ever be upset at somebody trying okay he's an intern in the mustache Department I just happen to be a VP you know like I shouldn't look down on him he's doing the Lord's work out there as an intern okay the world goes around on internships okay and I'm happy that he's deciding to intern in the mustache Department we're happy to have him we've decimated the range of numbers we can represent before we could represent numbers to around 2 billion now only to about 32 000. luckily people much smarter than us have found a better way to make use of those 32 bits they took inspiration from scientific notation the same way we can systematically represent numbers like 23 000 as 2.3 times 10 to the 4 and 0.0034 as 3.4 times 10 to the minus 3 we can also represent them in a binary system where here for example 1 1 0 0 could for example be 1.1 times 2 to the 4. the standard they came up with takes the name IEEE 754-185 I triple East standard seven five one eight defines the following we are as usual given 30. the first bit is the sign bit if it is zero the number is positive that's how you get negative zero by the way if you're wondering this little first bit that's how that's how it all that's how it's all done that one little guy makes it that's it if it is one the number is negative but the number is Quake three provides to the Fast and the square root are always positive I mean obviously they're positive if we would have to calculate one divided by square root of minus five something definitely has gone wrong so for the rest of this video we ignore the sign bit as it is always zero then the next eight bits Define the exponent that means two to the one two to the two two to the three two to the four and so on with 8 Bits we can represent numbers between 0 and 255. but that's not exactly what we need we also want negative exponents this is why everything is actually shifted down by 127. so instead of 2 to the four we actually have 2 to the four minus 127. if we actually want the exponent to before the bits need to be set to 131 because 131 minus 127 is for classic the last 23 bits are the montissa they might as usual in scientific notation we want I had a teacher that just loved that I swear I heard the term mentisa one million times in my life okay Mandisa I wanted to denote one digit followed by the comma followed by the decimal places but with 23 bits we can represent numbers from zero to but not including 2 to the 23. again that's not exactly what we need for scientific notation we need the montissa to go from 1 to 10 or in binary scientific notation to go from one to two so we could do something that we've already done before put that little range there said two to one I got I just my brain just divided by zero there for a quick second the comma after the first bit this automatically gives us numbers from one to two but this naive approach is wasteful you see the people that design standard 754 realized that in binary something happens that happens in no other base what look at the first digit in scientific notation the first digit is by definition always non-zero but in binary there is only one digit that is not zero one and if we know that the first digit will always be a one there is no need to store it thus we can save one bit by moving the comma one double the value being an extra one in the number it represents now armentiza is between one and two even though 23 bits gave us numbers between 0 and 2 to the 23 we scaled them down to get numbers between zero and one and then we added an extra one to get numbers between 1 and 2. and this already is the main part of IEEE standard 754 but just the so-called Norm I want you to take a little quick second here you know we're watching a pretty intensely technical video and just think about this for a second casual color think about this casual color okay just put your think cap on for a second when was the last time you thought about anything I know most of you probably programmed typescript JavaScript Ruby on Rails Lua for for a living and let's just be real when you do substring of a string you're not even sure if you're copying it or creating a reference to it you don't even know so when we're breaking down how a how a float works this shit's like magic okay this is like we deal with so little stuff in modern world like you forget how crazy the foundation is on computer science like some dude was sitting there up at like three in the morning being like if you dropped a one you got to you got to represent like it comes running in like I'll save the pit a safe one bit of probably like people like high-fiving him he's like running and you're just like what right like that it's just crazy it just makes no sense like the world we live in this is is it's just so different right now people come running in they're like what happened he's like I use 600 tail s and everyone's like high-fiving and man the borders are so round but so not round at the same time it's amazing foreign I can slow count these nuts I'm sorry is that sexual harassment realize numbers the informed viewer knows that the standard also includes denormalized numbers not a number Infinities and two zeros but we won't go into those because in Quake 3 it just happens that these are never inputs into our algorithm otherwise something definitely has gone wrong anyway at no point should our game engine have to normalize a vector with infinite length for this algorithm and for the rest of this video it will be useful to think of the mantissa and exponent as the binary numbers they are if we are given two numbers one being the mantissa and one being the exponent 23 bits and 8 Bits respectively we can get a bit representation with 2 to the 23 times e plus M if you think about it because multiplying e by 2 to the 23 just shifts e by 23 digits so that's how one could write the bits but we get the actual number behind the bits with this formula this should seem familiar to you here we have the exponent with 127 subtracted from it and here we have the mantissa with the extra one in front but now something completely different for no obvious reason at all let's take the logarithm of that expression since we're doing computer science here we take the logarithm base too we simplify as much as we can for no apparent reason let's just take the logarithm you're just like okay like how many of you just accepted the fact that he just out of nowhere tossed out a log like that just throws out a log and you're just like okay yeah the log of bass too yeah it's base two because it's binary we all know that it's easy it's binary come on really log is normal this is a normal log I mean I mean I mean we should all be familiar with the natural log at least and take out the exponent but then we get stuck but not so the creators of Quake developer Gary taroli knew a trick to get rid of the logarithm you see the trick is an approximation to log of 1 plus X for small values of X log of 1 plus X is approximately equal to X if you think about it this approximation is actually correct for x equals 0 and x equals one but we'll add an additional term mu this correction term can be chosen freely again with mu equal to 0 this approximation is correct at zero and one but it turns out that setting mu to this number gives the smallest error on average for numbers between 0 and 1. so going back to our formula we apply our trick as M divided by 2 to the 23 is indeed a value between 0 and 1. we rearrange a little bit more and we finally see why we did all those calculations M plus e that man just tossed out like 17 maps that quick map just a little quick math obviously see it this way okay so you know I used to be really good at math uh I was really really good at math but I'm not good at math anymore like when I see this I go oh yeah I like I'd have to sit down I I'm sure I could get to this this position it would just take a moment right like how we just did that you know he like you know like come on let's go back here trick all right so he converts this thing this just becomes X Plus mu plus this m divided by 2 to 23 is indeed a value between zero and one okay we rearrange a little bit more yep or and we finally see why we yeah we did all those calculations okay that makes sense look at that skill Gap is that skill Gap uh nonchalant problem solving 101 yeah okay perfect M plus e times two to the 23 appears that's our bits representation there we go that's where that number comes from how much you want to be interested we applied the logarithm to our formula and got the bit representation just scaled and shifted by some constants so in some sense the bit representation of a number is its own logarithm armed with this knowledge we can finally start with the three steps of the fast inverse square root hit me Daddy okay here we go the first step is actually not complicated it just looks complicated because it's memory address trickery yep so we started why and now we want to do cool bit manipulation tricks floats unfortunately don't come with the tools we need I'm gonna skip that one because that one that one's fairly obvious just reinterpret what you're pointing at as a long instead of a float that's what that step is yo yo baby yo baby can you play for me you'll pay me can you please okay oh there we go we're at the what the stage the intuition behind the second step is the following remind yourself bit shifting a number to the left doubles it yep and shifting it to the right half sit classic but what would happen if we did something like this to an exponent doubling an exponent squares the number and having the exponent gives us the square root but now also negating the exponent gives us 1 divided by square root of x that's exactly what we need so let's remind ourselves what our goal is here we have our number stored into Y and our goal was to calculate one divided by square root of Y as I've already said calculating this directly is too hard and too expensive but we've extracted the bits from Y and we've seen with the IEEE standard 754 that the bits of a number are in some sense its own logarithm that means in I we have stored log of Y up to some scaling and shifting I claim that their problem becomes way easier if we work with logs instead of trying so hard to calculate one divided by square root of Y we instead calculate log of 1 divided by square root of Y we rewrite this to log of Y to the power of minus a half so we can take out the exponent calculating this is stupidly easy you might think oh no we have a division in there didn't you say in the beginning that divisions are slow well yes but remember instead of dividing by two we just pitch shift once to the right this already explains why we do minus I bit shifted ones to the right but why is this number 0x5537590f here again that's two to the 23 let's go well because our logarithm is actually scaled and shifted so let's calculate and understand where it comes from not at all let gamma be our solution then we know that log of gamma equals to log of Y to the power minus a half which equals to minus a half times log of Y now we replace the logarithm with the bit representation and then we just solve for the bits of gamma I'll spare us the details but this is the result the magic number turns out to be the remnants of the error term mu the scaling factor and the shifting now we have the bits of a solution I always forget how easy that was uh like I'm over here thinking about like that that's so difficult but not that was it's dude it's just we can just reverse the steps from the evil bit hack to get back the actual solution from those bits well actually not the exact solution just an approximation this is why we need the third step who thinks of this well you got to remember that during this age a how fast was a processor so this was late 80s so the processor was 16 megahertz am I correct on that one uh how fast uh CPU 1988 to me oh no here we go okay we're up to eight do we have some 80s okay so let's just say it's 88. yeah we're sitting at like 10. oh I mean I guess you could have some 25s 10 is somewhere between 10 to 25. right is this for Quake three I thought this was uh prior to is this Quake threes oh okay it's 99 megahertz okay okay so I was a decade off my bad my fault okay so now a decade we're looking pretty good right we're looking we're looking like we're like hundreds of megahertz um do you know how many operations you have to perform for a division and this square root we're talking like I mean it's it's hundreds upon hundreds of machine instructions but then we do that for like thousands of things it's like you you you just you just couldn't right like there was not enough there's not enough units of power to actually do that you know what I mean so you have to you have to really have I don't understand the gentleman coding or anything yeah of course we code baby we code so much [ __ ] we also like to watch things why you gotta be like that white guy coming here with that with that small twitch energy okay okay right now your flops are small okay I want you to bring up your flops floating Point operations don't forget that after the previous step we have a pretty decent approximation but we did pick up some error terms here and there but thanks to Newton's method we can make a really good approximation out of a decent one classic native 500 years later method finds a root for a given function meaning and finds an X for which f of x equals zero it does so by taking an approximation and returning a better approximation you keep on neutralizing the process until you're close enough to the actual solution but it turns out that here we are already close enough to the actual solution that one iteration suffices to get an error within one percent welcome to Costco the only things new face method needs is the function and its derivative and what Newton's method does is that it takes an x value and tries to guess by how much it is off from being a root it does so by calculating f of x and its derivative yep then it does you can write f of x as Y and the derivative as d y over DX we have the ratio between Y and the X offset and Y itself so to get the X offset we just divide y by the ratio so then we simply subtract this offsets the informed viewer can now verify that the last line is one such Newton iteration applied you know thank you Zaskar welcome to Costco I love you Newton plus ratio dude um I would like to say that when you're watching uh math stuff and someone's like the informed viewer nothing makes you feel so stupid in your lifetime I guess we're a bunch of not informed viewers oh man Newton was not a sex Hatter tides to the function f of y equals one divided by y squared minus X notice that y being a root of this function Newton could most certainly not invert a binary tree okay Gauss could Gauss was really good he did that whole like German map min max flow how many times you could enter an exit German seven River City thing you know what I'm talking about action is equivalent to Y being the inverse square root of x I really encourage you to verify this last line of code since it's really surprising that even though both the function and Newton's method have a division in them the Code does not which means that our algorithm is and stays fast now we finally understand the fast inverse square root it only took us the knowledge of the IEEE standard 754 a trick to outsmart the C programming language magic bit operations and the calculus behind Newton's method to be real I only outdone these C programming language okay I'm like seg Falton but real talk like I mean just just sit there for a second and think about that like this guy could come up with that like the fact that we're watching a video about how it happened didn't mean that we're watching a video of it happening in real time this person had to just like look at the formula and be like wait a second weapon if I take a log takes the log he's like oh gosh classic 2 to the 23 mantissa trick black Bam Bam Bam look at that I just reduced it wait a second right like just kept on like just kept on going and then it was like boom no division no division no squirting let's go what I don't know this stuff I can't fathom how much time this would have taken me by myself to come up with if someone said there was a solution that was significantly faster and you need to use extreme math to understand it I think I could figure out maybe in one year if I devoted all my energy and Life Source to it I might did it it's crazy twitch is off today twitch has been just having a bad day recently regularly um is anybody here else worried that twitch is like that we don't know something is about to happen to Twitch like when you really think about it we have significantly more ads poor service recently it's something about to happen you know something about that happened this was not CarMax so Carmack did not do this Carmack let's see don't look at that oh don't look at that either that's Carmack did not write this they mentioned the person's name it's the it's the Forgotten developer right it's not Ramos and it's not uh Carmack I don't know who it is twitch is dying let's all go to Mastodon that's it we're going to Mastodon everybody everybody get out your distributed Federated systems we're going to Mastodon

Info

Channel: ThePrimeTime

Views: 151,261

Rating: undefined out of 5

Keywords: programming, computer, software, software engineer, software engineering, program, development, developing, developer, developers, web design, web developer, web development, programmer humor, humor, memes, software memes, engineer, engineering, Regex, regexs, regexes, netflix, vscode, vscode engineer, vscode plugins, Lenovo, customer service

Id: 0K4vzfwV-wo

Channel Id: undefined

Length: 30min 19sec (1819 seconds)

Published: Fri Feb 03 2023