Growing up, Quake was one of those games that just
blew me away with what a computer was capable of. It helped inspire me to become a developer in my
own right. One of the first books I ever got on coding was the Graphics Programming Black Book
by Michael Abrash, although I confess that most of it was over my head at the time. Released in
1997, this book isn’t as relevant as it once was, but it holds stories and insider info about the
development of one of the most groundbreaking games ever made. Today, I’d like to dive into one
of those stories, but with fresh eyes after more than 20 years of additional experience.
So visibility determination isn’t easy, even today, as I showed with my recent deep dive
into current methods. It remains very much an area of active research and development.
Rewinding the clock back to the mid 90’s, and we go back to a time when this field was very
much in its infancy, comparatively speaking.
The Quake dev team faced this very
problem. The game was unplayably slow, and since they were breaking new ground,
there was no clear solution to turn to.
When we talk about visibility determination, for
the purposes of the video, we’re talking about what goes into drawing a pixel on the screen.
So you’re down here, this is the camera, and you’re looking off in this direction, so
what we need to know is what’s visible to you.
In modern games, this mostly boils down culling
out what’s off screen, which is called frustum culling, then doing some occlusion culling
to figure out what objects are behind other objects, so we can cull those as well.
But back in the 90’s, this was being done in software on a desktop computer that would choke
on just playing a song today, something we all take for granted as background work.
Quake originally required a pentium 75, which had a processing speed of approximately
126 MIPS, or million instructions per second, according to google. Take note that
this is instructions per second, not floating point operations per second, which
is what you might see on more modern hardware.
By comparison, the phone in my pocket is capable
of somewhere in the realm of 1142 GFLOPS, or 1.1 Million mflops, if I haven’t messed up a decimal
somewhere, so the gulf in computing power, even though this is a silly comparison, simultaneously,
it's’ good for younger viewers to have a grasp of just how much of a potato that thing was
compared to anything you have now. Not to take anything away from what it was, in its day.
The cost alone of simply rasterizing a triangle was high, so you needed to be fine grained, and
only draw the polygons that were visible.
There was also no gpu in any modern sense, as in
a gpu that you could offload a significant amount of the work to. No depth buffer, no shaders,
not even hardware T&L, like you had nothing.
So you’re working with a few more constraints than
today when doing your visibility determination, you don’t want to simply discard
the objects that aren’t visible, that’s still way too much. You want to get even
more granular than that. In an ideal world, you’d somehow know just the polygons that are visible,
that contribute actual pixels on the screen.
In an ideal world.
And of course, that magical test, whatever it is, has to be fast.
Quake stored its levels in a BSP, or binary space partition, which is a data
structure you don’t run into much these days.
Conceptually, they’re not especially difficult to
grasp. Let’s take a simple 2d example, and we’ll build a little tree to illustrate what they do.
So right now, the space, the level, whatever you wanna call it, it’s empty. We can add a wall,
let’s call that wall A, and the wall has an orientation, like this way is forward, and
obviously the other way is backwards.
The wall creates a splitting plane, so
that forward, or in front of the wall, we’ll call it empty space, while behind the
wall, this is solid. This is an important thing to take note of, because it’ll be
very relevant throughout using the BSP.
So we’ve defined the wall, let’s define a
node, in this case the root of the BSP tree, we’ll call it A. This node has 2 children,
the left node represents what’s in front of A, which is empty space, while the right
is behind A, which we’re calling solid, this represents the region behind the wall.
One wall isn’t terribly interesting, so let’s add a 2nd wall now, and for simplicity
I’m just going to make it down here, like this. So this is wall B, it faces in this
direction. We have to insert this wall into the BSP, so starting at A, we have to check,
is this wall B in front of, or behind A?
We know it’s in front of A, so we’re going
to go to A’s child, the empty space node, and replace it with a B node. And this
new node of course has a left child of empty space, and a right child of solid.
The left child of B represents this region here, notice how it's the region in front of A that's
now being split by B. The left child of B represents the region in front of B, while the
right child represents the region behind B.
Let’s repeat the process now, but
with a horizontal wall C on top, facing in this direction. So C is in
front of A, and it’s in front of B, so it get’s inserted here, replacing B’s left
child, and so you have this new node C, with a left empty child, and a right solid child.
And I’ll leave it to you to do this on paper, but if I were to add this new wall
D, which faces in this direction, then it’d end up here as a child of C.
So we end up with this kinda boring BSP tree, that’s basically just a line of nodes descending
to the left. But the really cool part of BSP’s is they enable you to do fast operations.
Let’s bring in a slightly more complex BSP, just to make the test’s a bit more
interesting. This one is basically the same, but I’ve inserted a few more walls.
If you’re following along on paper, take special note that I just inserted them
in alphabetical order, and notice how E is in front of D, but when I go to insert
F, it’s behind E, thus on the right.
Now, let’s test whether a point is in empty
space, or in solid space. This point here, very obviously by looking at it, is in empty
space, but we can confirm that with like, actual code trivially. Starting at the root node,
is this point in front or behind the splitting plane? In front, obviously, so we move to the next
node, and it’s in front of that, so again, we move on. And we repeat that process, over and over
again, until inevitably we end up in a leaf node, either a solid node, or an empty node. In this
case, an empty node, signifying that this point is in empty, or free space. Or in other words, if
that’s a player, they’re allowed to be there.
So BSP’s can accelerate these kind of
tests, they allow you to rapidly discard large sets of the world. But another interesting
property is that each of these nodes here. Each of these represent a portion of the space.
They, in fact, split the space of their parent. So starting at the root node, this is all space.
This node’s left child is one half of the space, while the right child is the other half.
When we look at this node, B, it partitions this space here, meaning that it’s left child
is this space in front of it’s splitting plane, while it’s right space is this space behind.
And we can continue in that way, looking at C, this node is splitting this highlighted
region here along the plane that we provided, meaning that the left child of C represents
this space over here, while the right child of C represents this space over here.
And likewise, we can continue on and on, and these nodes carve up the space
into these ever finer convex subspaces, and we end up with these 3 convex subspaces that
i’ve coloured, which correspond to these empty leaves in the tree. Which is kinda neat.
That’s one of the big takeaways, bsp’s carve up space, and you can of course calculate
bounding volumes for these subspaces if need be, and having bounds on these nodes allows you to
rapidly discard entire portions of the world, making bsp’s useful not only for rendering but
for other gameplay systems like physics.
Lastly, BSP trees give you a natural
ordering when you draw. They can be used to easily give you surfaces in front
to back or back to front order.
If we take a really simple bsp tree, we’re
just going to have a couple lines here, so you could follow along on paper if you wanted
to. And we’re going to place our eye here. Now the interesting thing with the bsp is, the direction
doesn’t matter, only the position of the eye.
So to draw, the algorithm is pretty
straightforward, you walk the tree and at each node, decide to either draw the left
or right child first. So you start at the root, if you’re in front of the splitting plane, draw
the back of the tree first, then this node, then the front. If you’re behind the
splitting plane, do the opposite. Then of course you recurse and do the same, so you
you go through nodes B, C, and D eventually.
So that gives you immediately the order B, because
that’s behind, then A, because that’s the node in question, then you have to draw this subtree,
so we recurse, we’re in front of C, so we draw behind C, which is nothing, then we draw C, then
we recurse, which just ends up drawing D last
So we get easy back-to-front order by
simply walking the tree, easy peasy.
So they give you quite a lot of advantages,
but this isn’t quite enough to draw effectively because of one major problem, overdraw.
Overdraw is a problem that’s still being tackled on modern hardware 30 years later,
and in Quake this was really serious. This was being done in software, and although they
did implement a z-buffer, it had a performance impact and was limited to dynamic stuff.
If you draw front to back, in some terrible cases, you may end up drawing and redrawing
the same pixels over and over again. You could easily imagine cases where
overdraw could reach 10x or more.
So this was the big problem to be tackled, how
do you reduce, or even eliminate overdraw?
This part in the story was one of the most
interesting to me, the sheer number of approaches Carmack took, attempting to solve this.
He attempted to use a beam-tree to track visible fragments, a subdividing raycast approach, almost
like binary search raycast, representing the world with planes rather than explicit vertices,
a 1 bit draw buffer tracking whether or not a pixel had been draw yet, rasterizing the
polygons into spans and clipping using those, and an approach that involves tracking the
portals between leaves, and clipping further away geo and portals to these portals.
I won’t dive into all of them, but let’s poke at the first couple of them.
The first thing mentioned, the beam tree is really interesting, because finding
information online was difficult, references are pretty sparse or result in dead links.
There’s some scattered references to some papers back in the mid 80’s that you can read,
which gave me a bit of context, and there’s a very brief explanation in the book as well.
From what I understand, they’re in essence a special type of bsp tree, but instead of being
built directly from the polygons of the world, you do something very different.
If you’re looking out at the world, your camera would be down here, and the viewing
volume that you see is formed by 6 planes. You’ve got your planes on the left and right,
you’ve got your planes on the top and bottom, and you’ve got your planes on the near and far.
Now let’s consider a single triangle out here in the world, and we have the camera looking at
it. So, if you take an edge of that triangle to begin with, and you form a new triangle
using those vertices on the edge, and the camera’s position. From this you can form a plane
that passes through the camera and the edges of the triangle, similar to the viewing volume.
You can then repeat the process for each edge of the triangle, building another plane that passes
through those 2 vertices and the origin.
So this is called a beam, this kinda volume
that’s projected out from the origin to the triangle, and you’d insert this into
a bsp, not the whole thing, only the planes that you calculated, the side planes.
So let’s visualize this, so this is our screen, and you’d start the beam tree off by inserting
4 planes representing the bounds of the view frustum, the visible area or the edges of the
screen. And you’d end up with a pretty basic tree, kinda similar to what we saw earlier.
We’ll colour the “drawable” area green, so anything in the drawable area is good to go.
So the basic idea is to use this secondary BSP to check what’s been drawn on the screen. So let’s
say I want to draw this triangle here, you can tell from a glance that this whole triangle falls
into the “drawable” area in the beam tree.
What would actually happen is that you’d insert
this triangle into the beam tree, by inserting each of the planes of the triangle, which in
2d basically look like lines. So one by one, those would get inserted into the tree, and
you’d end up with a tree more like the one on the right now. These open leaves of the beam tree
correspond to regions that are still drawable, which you can go test on paper if you like, but
I’ll leave that to you and we’ll move on.
Let’s say now, I want to draw this second
triangle, we know by just glancing at this that if we were to walk the BSP tree, we’d end up in
this region here in front of F, which corresponds to this leaf on the bsp tree itself.
As you prepare to draw this triangle, you walk the beam tree and if it ends up in a
visible area, you’d update the tree and draw the triangle. But if it doesn’t, there’s no tree
changes and thus the polygon isn’t visible.
That’s my understanding of it anyway. In
theory, a great approach, but as Abrash notes, the worst case was still terrible, and
maintaining the beam tree wasn’t free either, so in the end they had to scrap the approach.
Another interesting approach was to use raycasts, since the bsp is great at accelerating those.
So by this point, you should be reasonably familiar with the idea of walking the tree to
figure out whether a point is visible or not, it just involves descending and
doing the plane tests until one reaches either an empty or solid leaf.
So a raycast isn’t really all that different, except you’re of course dealing with a line. If
you were to say, overlay the line over the bsp, you can clearly see where it would
get cut by the splitting planes and chopped up. So you’d end up with a bunch of
fragments of the original ray, each fragment lying in either a solid or empty node.
Of course, you don’t need to chop it by every plane though, that’s just wasteful. You
can be smart and do the closest section first, you can see this section, if you walk it
down the tree, it ends up in an empty node, and thus it never hits anything.
So we go ahead and process the other side, and this part of the ray of course gets
chopped by this plane here, so of course what you’ll do is process that part first, so
we sorta can clearly see that it’ll end up in an empty node, and thus hits nothing.
This next part, behind the wall, well, we KNOW that it hits something, like us the
viewers, but we need to prove it in code. Well, the next node down the tree is this node here,
with this splitting plane, so the ray gets lopped into 2 parts, and we’ll take this part
here and walk it down the tree. And of course, it ends up in a solid node, so we’re done, we
know for a fact this intersects something, we know where it intersects, we know everything.
So if we take a look at an example screen, the subdividing raycast idea, at least in my head
given the description, kinda works like this. You divide the screen into some number of areas, they
mention use an 8x8 grid, so 64 in total, and for each you cast a ray, using the exact same raycast
approach to clip the ray against the BSP.
Let’s take a look from the side, so here’s
a side view, here’s the camera here, and we’ll have some walls here to fill out the
scene a little bit. So what you’d do is you’d fire out a bunch of rays at the world, which
of course would get clipped against the BSP as I showed before. Let’s assume that every ray hits
something, so each of these rays is being extended until they touch a surface. The question would
then be, what did they hit. We’ll colour them with the colour of the surface they touched.
Now, let’s focus on these 2 rays. If 2 rays are adjacent, and hit the same polygon,
then you can simply interpolate the surface in between, easy enough.
That’s a best case scenario, so now if we were to focus on another pair when they don’t hit
the same surface. So these neighbouring raycasts hit different surfaces, so then you in essence
have to do a binary search by firing a ray in between and repeating the process.
Then it kinda follows that eventually, after repeating this process a buncha
times, that either all the adjacent rays will be on the same surface, or they’ll be
at pixel level, at which point you bail.
I could easily imagine this working
great in one of those old Quake levels, where polys took up a lot of real estate on
screen, and chances were pretty good that just a few iterations of this subdivided raycast
approach could cover most of the screen.
From the sounds of it, this
ultimately failed due to dropouts, basically problems with tiny polygons.
There were other approaches, as Abrash mentions, some were merely kinda tossed around,
while others saw actual implementations. I’m guessing based on the descriptions that
the draw buffer and span buffer approaches, since they seem like iterations on existing
techniques, were tried out, as well as the portal clipping, but the vertex free surfaces idea, maybe
something they just punted around, who knows.
Here’s the point in the story where we have the
miraculous turnaround. If this was an episode of House, you’d have him look at a bus going
by, or a leaf falling, or something random, and suddenly have an epiphany.
According to Abrash, we’ll just read his words here: When I came in on
Monday, John had the look of a man who had broken through to the other side--and also
the look of a man who hadn’t had much sleep.
And then he goes on to describe how
Carmack figured it out. The solution being precalculating the visibility between
leaves in the bsp. This would be called a PVS, or potentially visible set, which now, nearly
30 years later, how it works is widely known, but it was a real game changer in its era.
The general idea is that, let’s say that we’ve got our level here, we’ll just do a 2d version
instead of a 3d version, and let’s just quickly understand what we’re looking at here.
So each of these regions here, let’s say these regions are leaves in a BSP. We’re taking
some liberties here, but let’s run with it.
Then each of these spaces is separated by these
green lines, these would have been splitting planes, and you can figure out that where there’s
no polygons, those form a sort of border between adjacent spaces that are called portals.
So given this shaded region here, of course any region that it’s immediately adjacent to
it, is, of course, visible right away. You can figure that out without a lot of effort using
the BSP since they basically share a portal. The trick is really whether or not these further away
regions are visible from that initial region.
I mean, you could do something like cast a
zillion rays from one portal to the next, and see which other leaves they manage to
hit, but that’ll be really unreliable.
Quake did this by erring on the side of being
conservative. Secondary neighbours are usually visible, not always, but pretty much most
of the time. So we’ll consider a neighbour of a neighbour to be visible right away.
So the effort was mostly on leaves further out than this. The question now is whether
section 4 here is visible from section 1.
The approach they employed was by using a
separating plane, so they’d create a plane from each edge of the portal, to the opposite
side of the other portal, here we have our separator plane, and the portal lies completely
in front of it, so it’s not eliminated yet.
We’ll do the separator plane for
the other side, and lo and behold, the portal lies completely behind the plane,
getting culled out. In a small way, this is just like the beam tree we just talked about.
If we were to say, expand section 2 so that the portals end up a bit fatter, you can see
that now the separating plane just catches the end of the portal from section 3 to 4,
making section 4 visible from section 1.
That's the general idea, and it extends
pretty easily into 3d from there. In 2d, of course we only needed the 2 separating
planes, but in 3d you would do this for each edge. As I said, it’s a conservative
approach, but works incredibly well.
BSP’s have since faded from their glory days,
you don’t really hear much about them anymore. Levels now are dynamic, requiring a degree of
flexibility that’s necessitated new approaches.
The original Quake source code was released
over 24 years ago at this point, and that, along with subsequent sequels, have been a source
of some incredible gems. Perhaps, most infamously, the fast inverse square root function, which
is this abomination you see before you.
This was a story that for some reason or other,
has stuck with me for over 20 years. I read this at a time when I was still struggling with basic
concepts, and it shaped, at least in a small way, how I approach things today.
I find that it’s really easy to be enamoured with a clever solution, I mean we
all appreciate an elegant answer to a problem, but it’s also in the failures, the missteps
and dead ends along the way, that we can find some really valuable insights.
The final solution, at least to me, evolves naturally from the failed attempts
that preceded it. Understanding the beam tree, the portal clipping ideas, a lot of
the basic ingredients were there.
I remember, I had just barely started learning
to program and struggled hard with the sparse information Abrash provided. Looking back
now, I actually appreciate how it pushed me to work harder, to really strive to
understand the underlying concepts.
If you think you’d like to learn more about
gamedev from an experienced graphics engineer, I have a variety of courses available, very
suitable for beginners, so if, for example, you ever wondered if understanding quaternions
was possible, check them out. Otherwise, if you’d still like to support me and help
choose what topic I cover next, you can vote on my Patreon page.
Cheers