Recently, I played Returnal, and as someone that doesn't play AAA games frequently, I was shocked by how many high-resolution textures the developers were able to cram into it. The game unabashedly used several gigabytes
of video RAM in every area that I encountered, but even that felt lower than what should be necessary
to display everything that I was seeing. And it got me thinking, how were they were able to
it so many textures into the memory that they used, and, once they were there, how
were they able to bind and read from them? I've been working on a voxel engine named
Zepha for the last several years, and in doing so, I've discovered just how stringent GPUs
can be with how many textures they allow to be "bound", or readable, at once. Typically you'll get somewhere in the realm of 16 to 32 textures
in today's graphics cards, but your mileage may vary, like, a lot. This has been a recurring issue for me. I need to render large swaths of the terrain at once,
which requires all of the textures in the area to be loaded simultaneously. Up until now, I've been Texture Atlasing to accomplish this,
which is where multiple textures are inserted into one massive image. But that's not without problems of its own. For example, I wanted to add Anisotropic filtering, But to do so, I'd have to add a lot of padding
around every texture in the texture atlas, and that wastes valuable texture space, and memory. And even then, a whole maximum size texture atlas in my engine
might only be enough to hold a couple dozen textures the size of Returnal's, so something was clearing up. So how did they do it? I was determined to figure out how modern
games were able to take advantage of so much VRAM. And that's exactly what I did. As a quick disclaimer, I don't know which of these
techniques Returnal specifically uses, However, I'd be pretty shocked if it didn't use all of them,
as they're quite prolific in AAA gaming today. I'm also not an expert, by any means. Information surrounding these topics is scattered
and incomplete, and I had a hell of a time even finding out what I know now. I'll have links to the resources I found in the description, but if I make any mistakes in my explanations
in this video, I'll include corrections in a pinned comment. Apologies in advance. That being said, I feel like topics like this are even more
important to cover, given how scarce the information is on them. A video like this could have saved me months
of wasted effort, had it existed. But it didn't, so that's what I'm here to fix. Sparse Bindless Texture Arrays are by far
most important graphics optimization I learned about, However, these words don't refer to a single
technique, they refer to three. Each of which can be mixed and matched as necessary
to get the most out of texture memory. So let's break it down. First of all, we have Sparse Textures, which is where space for a texture
can be allocated in virtual memory, without having it be allocated in physical memory. What's the distinction? Physical memory is what your Graphics Card actually has
to work with, the number you'll see on the box, or in it's name. For example, the GTX 1060 6GB has 6 Gigabytes
of physical memory. Virtual memory, on the other hand, is just a list of
addresses that may or may not point back to physical memory. These addresses are unconstrained by the physical
memory of the card, there can be terabytes of virtual memory accessible
at any time. So what's the catch? Well, it's that virtual memory doesn't actually store anything. It can only point back to physical memory,
which is where any texture data actually goes. Let's say we have a large texture, like really
large, but we only need access certain parts of it at once. The GPU doesn't need to have all of the texture
in physical memory just to read a small part of it. We can actually cut it up into pieces, and only upload
the relevant parts at any given time, saving memory as we go. This is a sparse texture! When it's created, it immediately reserves
all of the virtual memory it needs to point to each chunk of the texture in physical memory,
but it doesn't actually allocate any physical memory to go with it. Then, once a part of the texture is uploaded to physical memory,
the virtual memory is updated to point to the physical memory that now exists. This process can be reversed, too. By removing part of the texture from physical
memory, the virtual memory pointing to it is reset, and that physical memory can be
repurposed for other textures. That's the gist of sparse textures: Immediately reserve space for all the parts
of the texture in the near-unlimited virtual memory, and then only upload to physical memory as needed. This technique is super powerful, and synergizes
really well with Texture Arrays. Texture Arrays are exactly what they sound like. Instead of having one texture, why not
just have a whole list of them? Modern GPUs can have as many as 2048 textures
in one texture array, and binding all of those textures only takes up one bind slot,
which is equivalent to using a single ordinary texture. There are a few stipulations to this. The main ones being that all of the textures
have to be the same size, shape, and format. However, most of the time developers will
only use one or two texture formats, and to solve the size issue, multiple texture arrays
of exponential sizes can be created, and each texture can be put into the smallest array that fits it. But creating texture arrays for really large
texture sizes can be prohibitively expensive. Creating a massive texture over two thousand
times uses a lot of memory, after all. This sucks, because a game might not need
a maximum length array for the textures that it uses. If the devs know how many textures of each size they'll
be using in advance, they can size their arrays around that. But if the game might need new textures on demand, or it's meant
to handle user-generated content in any way, that might not be possible. So, we combine the two techniques we've learned already. We create a texture array of sparse textures. This is, intuitively, called a Sparse Texture Array. The array part allows many textures to exist
and be accessed with only one bind slot, and the sparse part allows the textures to not
occupy any physical memory until needed. When a new texture is added to the array, one of the layers
of the array, which itself is a sparse texture, is committed to physical memory. This allows the GPU's memory to be used a
lot more efficiently, since instead of each Texture Array occupying memory for its entire length,
it only occupies it for the number of textures it's actually holding. However, once we start employing this technique,
we might find we're running out of bind slots anyways. Because each size of texture needs its own
array, 8 or more bind slots might be necessary. And it's quite possible that a game might
need more than one texture array's worth of textures of a specific size, requiring it
to have another array, which takes another bind slot. On top of that, sometimes bind slots need to be used for
other graphical effects, such as SSAO kernels or wind effects. And if all of the bind slots are exhausted,
then we're back to square one. No other textures can be bound, and the game
will be unable to reference all of the data it needs to render. That is, unless we use Bindless Textures. Bindless Textures allow a GPU to access a
texture without it being bound to a bind slot. This was by far the most shocking of the techniques I learned. It felt like a rug pull, why was I even using
bind slots if I could just… not? It turns out that over the last several decades
of computer gaming, GPUs have changed a lot. GPUs used to need explicitly bound textures
to know what to draw, but that's just not a limitation anymore. And thus, bindless textures were born. By getting the 64-bit handle of a texture
and passing it to the GPU as a uniform, any number of textures can be read at once, regardless
of the number of bind slots the GPU has. And that brings us to the full name: Sparse Bindless Texture Arrays. Arrays of textures up to hundreds of items long,
that don't require any bind slots to use on the GPU. which allows us to have hundreds of textures loaded at once. That's a far cry from our initial limitation
of 16 to 32 textures total. But that's not all we can do. I have a bit less to say about this topic, since
texture compression doesn't typically work for pixel art very well. That said, it's a valuable technique for games
that use higher resolution antialiased textures, so I figured I'd cover it. GPU Texture Compression basically works the
same as traditional texture compression, that you would see in something like a JPEG. The difference is that it uses algorithms optimized
for fast texture reads to avoid inflating render times. I won't go into the full detail about these algorithms, because
I don't really know how they work, and you don't have to either. OpenGL will do all the work for us, all we need to do
is tell it we want the texture to be compressed and it'll happen. There are three levels of texture compression
that are widely supported by modern graphics cards, each of which provides different compression
ratios in exchange for different levels of data loss. These techniques will lose texture precision
no matter which one you choose, they are lossy. But if the textures are detailed enough to
begin with, this may not be noticeable, and it may be worth the 4x or even 6x memory savings
that compressed textures can have. Before I learned how texture compression worked,
I assumed it was a lot more complicated, I thought I'd have to do something special on the
CPU and GPU side to make it work, but really, it's entirely opaque. Other than specifying that a texture should be compressed,
the process for writing and writing them is identical to normal textures. It just works. One important caveat is that, right now, GPUs supporting sparse textures
are not required to support sparse compressed textures. They might, but they also might not. This could cause issues when trying to implement
the techniques presented earlier in this video in conjunction with compressed textures. In that case though, I'd probably recommend
just using bindless textures without sparseness or arrays. So far, we've taken a look at different ways
to represent textures in memory, but I'd like to cap this video off with a quick detour to look
at how we can reference these textures more efficiently in our models. All models in video games are made up of a
collection of points, called vertices. Each point stores information about its position,
direction, texture, and more. And the combination of these points into triangles
is what makes the model actually have an appearance. Developers are free to change the layout of
this information to best suit their needs. As complex models can easily be composed of millions of vertices,
even a small reduction in size can improve the memory footprint a lot. With sparse bindless texture arrays, the naive
way to reference a texture would be through four 4 floating point values, where the first
two values are the x and y position of the corner of the texture that the vertex is using,
and then the last two are the texture array and the layer within that array that the texture
is contained in. This index alone uses 16 bytes, and depending
on the complexity of the model, multiple textures may need to be referenced for a single vertex. We can have huge savings by using bit shifting
to store this information more compactly. I've done this in my engine, and I think I've found a format
that balances precision and flexibility really well. I reserve 21 bits for the x and y positions
of the texture, stored as integers. I then use 11 bits for the array layer, which
conveniently ranges from 0 to 2048, which is the minimum number of texture layers
required to be supported by OpenGL 4.0. Then, the remaining 11 bits are used to specify
which texture to read from, which allows up to 2048 textures to be referenced at any time. That's way more than 32. All of this data fits into 64 bits, or 8 bytes. Using this layout instead gains 2x memory
savings over the floating point system. There may be some noticeable precision loss
if a game uses exceedingly large textures. In which case, it might be worth it to sacrifice
the number of textures or texture layers in exchange for more precise position information. But that's up to the specific game to decide. And that's all I have for today! Hopefully this information was able help some people,
or at least serves as interesting food for thought for the curious. One last disclaimer I'll give is that all of the features I've talked about today,
aside from Vertex Optimization, are extensions to the OpenGL spec. They are widely supported by all modern graphics
cards, but they may be missing on mobile or consoles. If a game is meant to target both, it may be worth it to use
these features if they are supported, and fall back to basic textures if not. This topic was really interesting to learn
about, despite wanting to kick myself every time I learned a new piece of information that could
have saved me days or months of effort. This is my first attempt at making educational
video content, so I'd really appreciate hearing what you liked and disliked about this presentation,
and what I could do better next time. I'll try to reply to every comment on this
video, so do leave me a message, and I'll get back to you! I should probably also give the obligatory
interaction reminder. As of this video's creation, I have a grand
total of… 19 subscribers. So I would really appreciate a like or subscribe
so that YouTube has a chance of showing it to others. I'm gearing up to the release of my game sometime
next year, so I'll be posting a lot more game dev content in the coming months. If you're interested in that, stay tuned,
or join my Discord and chat with me. I also have a Twitch where I stream every Saturday,
and a Tumblr where I post various development updates. Links are down below. Other than that, I want to express my deepest
thanks to the people that have inexplicably chosen to give me their support on Patreon and Twitch. It means a lot to me that people are interested
enough in what I do to help me do it. Thanks to… Butter, Luna, Maeve, Silent_rkgk, Therem, and Zythia. Thank you so much. I appreciate it more than you know. Anyhow, thanks for sticking around till the
end, see you next time!