Quake III’s or Fast InvSqrt() or 0x5F3759DF Algorithm

whoisslimshady
15 min readMar 8, 2021

In 2005, a game company open-sourced the engine software for their video game Quake 3 arena and after looking in that source code fans of the game discovered an algorithm that was so ingenious it quickly became famous, and the only thing this algorithm does

IS to calculate the inverse of a square root if I had to write a piece of code that would calculate the inverse of a square root this is how

I would do it here I’m using the c programming language the same programming language used for quake 3 but to be fair I wouldn’t write the square root in there myself because the design has already figured out how to calculate the square root and the provided algorithm to us in math.h file that we programmers can then just include in our program so what could possibly be so interesting about the quake 3 algorithm

how does its software calculate inverse square roots at first glance

it doesn’t seem to make any sense where does this number 0x5f3759df come from what does this have to do with taking square roots and why is there a disgusting curse word in the second comment

Prerequisite

C, Number system, Newton method, IEEE 754 standards, and basic mathematical skills
I will show you how with some cool bit manipulation you can take in for square roots and the algorithm
The name fast inverse square root, first of all, why would the game engine want to calculate 1/(x)¹/2
if you want to implement physics and lighting or reflections in your game engine

doing this will helps if the vectors you’re calculating with are normalized to have length 1 because otherwise, your vectors might be too short or too long and when you do physics with the things can go wrong as all of you know the length of a vector is sqrt(x² + y² + z²)
And if you don’t happen to notice I think you’ve seen this for two dimensions

it’s just Pythagoras theorem so if we want to normalize the vector’s length to 1 we value have to scale everything down by the length of the vector,

it’s just Pythagoras theorem so if we want to normalize the vector’s length to 1 we value have to scale everything down by the length of the vector,
I mean obviously because, if we divide the length of the vector by the length of the vector we obviously get one
So all that’s left for us to do is to divide x, y, and z by the length

x/sqrt(x² + y² + z²), y/sqrt(x² + y² + z²) and z/sqrt(x² + y² + z²)

Or similarly, multiply by one divided by the length
x*(1/sqrt(x² + y² + z²))
y*(1/sqrt(x² + y² + z²))

z*(1/sqrt(x² + y² + z²))

You might already see where this is going calculating sqrt(x² + y² + z²) is easy and more importantly is how fast in code, you would implement

x*(1/sqrt(x² + y² + z²))

All it is is just three multiplications summed up additions and multiplications are common operations that have been designed to be very fast the square root on the other hand is a terribly slow operation and division is not much better this is not good if we have several thousands of surfaces where each has a vector that needs to be normalized but this also means that here we have an opportunity for speed improvements

If we can find even just an approximation of (1/sqrt(x) as long as it’s fast we can save precious time, So the fast inverse square root is such an approximation with only an error of at most 1% while being three times as fast

Looking at the code again we can see that the beginning is pretty harmless

we are given a number called number as input the number we’re supposed to take the inverse square root of
first with the variable i.
we declare a 32-bit number then we declare two 32-bit decimal numbers naming x2 and y and then we store 1.5 into the variable with the obvious name three halves.
The next two lines simply copy half of the input into x2.
The whole input into y.
But it’s after that where the magic happens, take a moment to look at it again well the longer you look at it the less it makes sense and the comments on the right, are not helpful either but they do hint that there are three steps to this algorithm puzzling together these three steps will show us the brilliancy of this algorithm but before we start with these three steps let’s first take a look at binary numbers we said that in the first line

we declare a 32-bit integer in a c programming language called long,
That means we’re given 32 bits and we can represent the number with it but, I think you all know how to do that, but in the next line we declare two decimal numbers in c called float again we’re given 32 bits and we have to represent the decimal number with it how would you do that if you and I were designing decimal numbers this is probably

One way we would do it, To just put a decimal point in the middle, and in front of the decimal point we count in the usual way 1 2 3 4 and so on and after the decimal point, There are no surprises either just remind yourself that this is binary so instead of tenths hundredths and thousands and we have halves fourths eights sixteens and any combination of them like a half and a fourth give you three-fourths also known as 0.75, but this idea is actually terrible we’ve decimated the range of numbers we can represent before we could represent numbers to around 2 billion now only to about 32 000.

Luckily people much smarter than us have found a better way to make use of those 32 bits they took inspiration from scientific notation

the same way we can systematically represent numbers like 23 000 as 2.3 times 10 raised to the 4 and .0034 as 3.4 times 10 raised to the minus 3.

we can also represent them in a binary system where here for example 1 1 0 0 0 could for example be 1.1 times 2 raised to the 4.

The standard they came up with takes the name IEEE 754 standard

The IEEE 754 defines the following

We are as usual given 32 bits the first bit is the sign bit if it is 0 the number is positive and if it is 1 the number is negative, but the numbers quake 3 provides the fast inverse square root is always positive.

I mean obviously, they’re positive if we would have to calculate 1 divided by the square root of negative 5 something definitely has gone wrong so for the rest of this we ignore the sign bit as it is always 0.

Then the next 8 bits define the exponent that means 2 raised to the 1 or 2 raised to the 2 or 2 raised to the 3 or 2 raised to the 4 and so on with 8 bits we can represent numbers between 0 and 255, but that’s not exactly what we need we also want negative exponents this is why everything is actually shifted down by 127.

So instead of 2 raised to the 4, we actually have 2 raised to the minuses 127, So if we actually want the exponent to be 4, the bits need to be set to 131 because 131 minus 127 is 4.

The last 23 bits are the mantissa as usual in scientific notation, we want to denote one digit followed by the comma followed by the decimal places but with 23 bits we can represent numbers from 0 to 23, but not including 2 to the 23.

Again that’s not exactly what we need for scientific notation we need the mantissa to go from 1 to 10 or in binary scientific notation to go from one to two so we could do something that we’ve already done before putting a comma after the first bit this automatically gives us numbers from one to two but this native approach is wasteful.

As you see the people that designed standard 754 realized that in binary something happens that happens in no other base, look at the first digit in scientific notation.

The first digit is by definition always non-zero but in binary, there is only one digit that is not zero one and, if we know that the first digit will always be a one there is no need to store it thus we can save one bit by moving the comma one digit to the left and fixing an extra one in the number it represents now our mantissa is between one and two even though 23 bits gave us numbers between 0 and 2 to the 23, we scaled them down to get numbers between 0 and 1 and then we added an extra 1 to get numbers between 1 and 2.

And this already is the main part of IEEE standard 754, but just the so-called normalized numbers the informed viewer knows that the standard also includes denormalized numbers, not a number infinities and two zeros but we won’t go into those because in quake 3 it just happens that these are never inputs into our algorithm otherwise something definitely has gone wrong anyway at no point should our game engine have to normalize a vector with the infinite length, for this algorithm and the rest of this it will be useful to think of the mantissa and exponent as the binary numbers they are if we are given two numbers one being the mantissa and 1 being the exponent 23 bits and 8 bits respectively

we can get a bit of representation with 2 to the 23 times e plus m

If you think about it because multiplying e by 2 raised to the 23 just shifts e by 23 digits, so that’s how one could write the bits but we get the actual number behind the bits with this formula.

This should seem familiar to you, here we have the exponent with 127 subtracted from it and here we have the mantissa with the extra one in front but now something completely different for no obvious reason at all let’s take the logarithm of that expression since we’re doing computer science here we take the logarithm base 2.

we simplify take as much as we can take out the exponent, but then we get stuck but not so the creators of quake developer Gary tabouli,

he knew a trick to get rid of the logarithm you see

The trick is an approximation to the log (1+x) for small values of x log(1+x) is approximately equal to x,

if you think about it this approximation is actually correct for x equals zero and x equals one but we’ll add term mu this correction term can be chosen freely again with mu equal to zero this approximation is correct at zero and one but it turns out that setting mu to number 0.0430 gives the smallest error on average for numbers between zero and one so going back to our formula we apply our trick as (m/2²³) is indeed a value between 0 and 1.

we rearrange a little bit more and we finally see why we did all those calculations m + e* (1/2²³) appears that’s our bit representation

so let’s think about what we just did, we applied the logarithm to our formula and got the bit representation just scaled and shifted by some constants so in some sense the bit represent and the ratio of a number is its own logarithm armed with this knowledge we can finally start with the three steps of the fast inverse square root

First step

the first step is actually not complicated it just looks complicated because it’s memory address trickery so we stored our number into y, and now we want to do cool bit manipulation tricks with floats, unfortunately, c doesn’t come with the tools we need to do bit manipulation the reason you cannot do bit manipulation on floats is that they were never designed to do so floats are inherently tied to the IEEE standard 754

Longs on the other hand were designed to do bit manipulation on them for example here’s one trick bit shifting along to the left doubles it and bit shifting it to the right halves it and, yes if your number is odd you do end up rounding but hey we’re willing to accept such inaccuracies as long as this means that our algorithm is fast in c as pretty much all programming languages

This does provide a way to convert from a float to a long this conversion does what most programmers needed to do namely convert from a decimal number to an ordinary integer as best as it can do so if we give it float like 3.33 it converts it to an integer in this case 3 but this is not the conversion we need here, first of all, we don’t care about the resulting integer number we want to somehow keep our float and secondly the bits that lie behind our number get all messed up we don’t want this conversion to mess with our bits the only thing we want to do is to put the bits one to one into a long.

The way you achieve this is to convert the memory address not the number first we get the address of y, this is the address of a float then you convert that address from a float address to a long’s address,

the address itself doesn’t change but c now thinks that the number living at that address is now long so then you read what’s written at that address, because c now thinks that this is an address of a long, it will read the number at that address as if it were long like this we tricked c by lifting the conversion away from the number itself to the address of that number and that’s how we get the bits of a number into I don’t know what else to say that’s just how c works

Second step

so let’s just go to the next step the intuition behind the second step is the following remind yourself bit shifting a number to the left doubles it and shifting it to the right halves it, but what would happen if we did something like this to an exponent doubling an exponent square the number and halving the exponent gives us the square root, but now also negating the exponent gives us 1 divided by the square root of x that’s exactly what we need, so let’s remind ourselves what our goal is here we have our numbers stored into y and our goal was to calculate 1/sqrt(y)

As I’ve already said calculating this directly is too hard and too expensive but we’ve extracted the bits from y and we’ve seen with the IEEE standard 754 that the bits of a number are in some sense its own logarithm

that means in variable i , we have stored log of y up to some scaling and shifting variable i claim that our problem becomes way easier if we work with logs instead of trying so hard to calculate 1/sqrt(y) we instead calculate the log(1/sqrt(y)).

we rewrite this to log (y)raised to the power of negative 1/2,

so we can take out the exponent calculating, this is stupidly easy you might think oh no we have a division in there didn’t you say in the beginning that divisions are slow well yes but remember that we can do bit shifts now instead of dividing by 2, we just bit shift once to the right this already explains why we do minus,i bit shifted the ones to the right but why is this number 0x5f37590f here again in the code, well because our logarithm is actually scaled and shifted

so let’s calculate and understand where it comes from, let gamma be our solution then we know that log of gamma equals to the log (y)raised to the power minus a half which equals to minus a half times the log of y

now we replace the logarithm with the bit representation and then we just solve for the bits of gamma I’ll spare us the details but

this is the result

the magic number turns out to be the remnants of the error term mu the scaling factor and the shifting now we have the bits of resolution and we can just reverse the steps from the evil bit hack to get back the actual solution from those bits well actually not the exact solution just an approximation this is why we need the third step

Third step — Newton’s method

After the previous step, we have a pretty decent approximation but we did pick up some error terms here and there, but thanks to newton’s method we can make a really good approximation out of a decent one.

newton’s method is a technique that finds a root for a given function meaning it finds an x for which f( x) equals zero, it does so by taking an approximation and returning a better approximation and usually you repeat this process until you’re close enough to the actual solution, but it turns out that here we are already close enough to the actual solution that one iteration suffices to get an error within one percent,

the only things newton’s method needs is the function and its derivative and what newton’s method does is that it takes an x value and tries to guess by how much it is off from being a root it does so by calculating f of x and its derivative

we can write f of x as y and the derivative as d y over dx we have the ratio between y and the x offset and y itself so to get the x offset we just divide y by the ratio so then we simply subtract this offset to get our new x the informed viewer can now verify that the last line is one such newton iteration applied to the function f of y equals 1 divided by y squared minus x notice that y being a root of this function I equivalent to y being the inverse square root of x

I really encourage you to verify this last line of code since it’s really surprising that even though both the function and newton’s method have a division in them the code does not which means that our algorithm is and stays fast now we finally understand the fast inverse square root it only took us the knowledge of the IEEE standard 754 a trick to outsmart the c programming language magic bit operations and the calculus behind newton’s method you

Source -Youtube and Wikipedia

As always thanks for reading………………

--

--

whoisslimshady

Just a boring guy who falls in love with machine learning