How do Games Load SO MANY Textures? | Sparse Bindless Texture Arrays

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Recently, I played Returnal, and as someone that doesn't play AAA games frequently, I was shocked by how many high-resolution textures the developers were able to cram into it. The game unabashedly used several gigabytes of video RAM in every area that I encountered, but even that felt lower than what should be necessary to display everything that I was seeing. And it got me thinking, how were they were able to it so many textures into the memory that they used, and, once they were there, how were they able to bind and read from them? I've been working on a voxel engine named Zepha for the last several years, and in doing so, I've discovered just how stringent GPUs can be with how many textures they allow to be "bound", or readable, at once. Typically you'll get somewhere in the realm of 16 to 32 textures in today's graphics cards, but your mileage may vary, like, a lot. This has been a recurring issue for me. I need to render large swaths of the terrain at once, which requires all of the textures in the area to be loaded simultaneously. Up until now, I've been Texture Atlasing to accomplish this, which is where multiple textures are inserted into one massive image. But that's not without problems of its own. For example, I wanted to add Anisotropic filtering, But to do so, I'd have to add a lot of padding around every texture in the texture atlas, and that wastes valuable texture space, and memory. And even then, a whole maximum size texture atlas in my engine might only be enough to hold a couple dozen textures the size of Returnal's, so something was clearing up. So how did they do it? I was determined to figure out how modern games were able to take advantage of so much VRAM. And that's exactly what I did. As a quick disclaimer, I don't know which of these techniques Returnal specifically uses, However, I'd be pretty shocked if it didn't use all of them, as they're quite prolific in AAA gaming today. I'm also not an expert, by any means. Information surrounding these topics is scattered and incomplete, and I had a hell of a time even finding out what I know now. I'll have links to the resources I found in the description, but if I make any mistakes in my explanations in this video, I'll include corrections in a pinned comment. Apologies in advance. That being said, I feel like topics like this are even more important to cover, given how scarce the information is on them. A video like this could have saved me months of wasted effort, had it existed. But it didn't, so that's what I'm here to fix. Sparse Bindless Texture Arrays are by far most important graphics optimization I learned about, However, these words don't refer to a single technique, they refer to three. Each of which can be mixed and matched as necessary to get the most out of texture memory. So let's break it down. First of all, we have Sparse Textures, which is where space for a texture can be allocated in virtual memory, without having it be allocated in physical memory. What's the distinction? Physical memory is what your Graphics Card actually has to work with, the number you'll see on the box, or in it's name. For example, the GTX 1060 6GB has 6 Gigabytes of physical memory. Virtual memory, on the other hand, is just a list of addresses that may or may not point back to physical memory. These addresses are unconstrained by the physical memory of the card, there can be terabytes of virtual memory accessible at any time. So what's the catch? Well, it's that virtual memory doesn't actually store anything. It can only point back to physical memory, which is where any texture data actually goes. Let's say we have a large texture, like really large, but we only need access certain parts of it at once. The GPU doesn't need to have all of the texture in physical memory just to read a small part of it. We can actually cut it up into pieces, and only upload the relevant parts at any given time, saving memory as we go. This is a sparse texture! When it's created, it immediately reserves all of the virtual memory it needs to point to each chunk of the texture in physical memory, but it doesn't actually allocate any physical memory to go with it. Then, once a part of the texture is uploaded to physical memory, the virtual memory is updated to point to the physical memory that now exists. This process can be reversed, too. By removing part of the texture from physical memory, the virtual memory pointing to it is reset, and that physical memory can be repurposed for other textures. That's the gist of sparse textures: Immediately reserve space for all the parts of the texture in the near-unlimited virtual memory, and then only upload to physical memory as needed. This technique is super powerful, and synergizes really well with Texture Arrays. Texture Arrays are exactly what they sound like. Instead of having one texture, why not just have a whole list of them? Modern GPUs can have as many as 2048 textures in one texture array, and binding all of those textures only takes up one bind slot, which is equivalent to using a single ordinary texture. There are a few stipulations to this. The main ones being that all of the textures have to be the same size, shape, and format. However, most of the time developers will only use one or two texture formats, and to solve the size issue, multiple texture arrays of exponential sizes can be created, and each texture can be put into the smallest array that fits it. But creating texture arrays for really large texture sizes can be prohibitively expensive. Creating a massive texture over two thousand times uses a lot of memory, after all. This sucks, because a game might not need a maximum length array for the textures that it uses. If the devs know how many textures of each size they'll be using in advance, they can size their arrays around that. But if the game might need new textures on demand, or it's meant to handle user-generated content in any way, that might not be possible. So, we combine the two techniques we've learned already. We create a texture array of sparse textures. This is, intuitively, called a Sparse Texture Array. The array part allows many textures to exist and be accessed with only one bind slot, and the sparse part allows the textures to not occupy any physical memory until needed. When a new texture is added to the array, one of the layers of the array, which itself is a sparse texture, is committed to physical memory. This allows the GPU's memory to be used a lot more efficiently, since instead of each Texture Array occupying memory for its entire length, it only occupies it for the number of textures it's actually holding. However, once we start employing this technique, we might find we're running out of bind slots anyways. Because each size of texture needs its own array, 8 or more bind slots might be necessary. And it's quite possible that a game might need more than one texture array's worth of textures of a specific size, requiring it to have another array, which takes another bind slot. On top of that, sometimes bind slots need to be used for other graphical effects, such as SSAO kernels or wind effects. And if all of the bind slots are exhausted, then we're back to square one. No other textures can be bound, and the game will be unable to reference all of the data it needs to render. That is, unless we use Bindless Textures. Bindless Textures allow a GPU to access a texture without it being bound to a bind slot. This was by far the most shocking of the techniques I learned. It felt like a rug pull, why was I even using bind slots if I could just… not? It turns out that over the last several decades of computer gaming, GPUs have changed a lot. GPUs used to need explicitly bound textures to know what to draw, but that's just not a limitation anymore. And thus, bindless textures were born. By getting the 64-bit handle of a texture and passing it to the GPU as a uniform, any number of textures can be read at once, regardless of the number of bind slots the GPU has. And that brings us to the full name: Sparse Bindless Texture Arrays. Arrays of textures up to hundreds of items long, that don't require any bind slots to use on the GPU. which allows us to have hundreds of textures loaded at once. That's a far cry from our initial limitation of 16 to 32 textures total. But that's not all we can do. I have a bit less to say about this topic, since texture compression doesn't typically work for pixel art very well. That said, it's a valuable technique for games that use higher resolution antialiased textures, so I figured I'd cover it. GPU Texture Compression basically works the same as traditional texture compression, that you would see in something like a JPEG. The difference is that it uses algorithms optimized for fast texture reads to avoid inflating render times. I won't go into the full detail about these algorithms, because I don't really know how they work, and you don't have to either. OpenGL will do all the work for us, all we need to do is tell it we want the texture to be compressed and it'll happen. There are three levels of texture compression that are widely supported by modern graphics cards, each of which provides different compression ratios in exchange for different levels of data loss. These techniques will lose texture precision no matter which one you choose, they are lossy. But if the textures are detailed enough to begin with, this may not be noticeable, and it may be worth the 4x or even 6x memory savings that compressed textures can have. Before I learned how texture compression worked, I assumed it was a lot more complicated, I thought I'd have to do something special on the CPU and GPU side to make it work, but really, it's entirely opaque. Other than specifying that a texture should be compressed, the process for writing and writing them is identical to normal textures. It just works. One important caveat is that, right now, GPUs supporting sparse textures are not required to support sparse compressed textures. They might, but they also might not. This could cause issues when trying to implement the techniques presented earlier in this video in conjunction with compressed textures. In that case though, I'd probably recommend just using bindless textures without sparseness or arrays. So far, we've taken a look at different ways to represent textures in memory, but I'd like to cap this video off with a quick detour to look at how we can reference these textures more efficiently in our models. All models in video games are made up of a collection of points, called vertices. Each point stores information about its position, direction, texture, and more. And the combination of these points into triangles is what makes the model actually have an appearance. Developers are free to change the layout of this information to best suit their needs. As complex models can easily be composed of millions of vertices, even a small reduction in size can improve the memory footprint a lot. With sparse bindless texture arrays, the naive way to reference a texture would be through four 4 floating point values, where the first two values are the x and y position of the corner of the texture that the vertex is using, and then the last two are the texture array and the layer within that array that the texture is contained in. This index alone uses 16 bytes, and depending on the complexity of the model, multiple textures may need to be referenced for a single vertex. We can have huge savings by using bit shifting to store this information more compactly. I've done this in my engine, and I think I've found a format that balances precision and flexibility really well. I reserve 21 bits for the x and y positions of the texture, stored as integers. I then use 11 bits for the array layer, which conveniently ranges from 0 to 2048, which is the minimum number of texture layers required to be supported by OpenGL 4.0. Then, the remaining 11 bits are used to specify which texture to read from, which allows up to 2048 textures to be referenced at any time. That's way more than 32. All of this data fits into 64 bits, or 8 bytes. Using this layout instead gains 2x memory savings over the floating point system. There may be some noticeable precision loss if a game uses exceedingly large textures. In which case, it might be worth it to sacrifice the number of textures or texture layers in exchange for more precise position information. But that's up to the specific game to decide. And that's all I have for today! Hopefully this information was able help some people, or at least serves as interesting food for thought for the curious. One last disclaimer I'll give is that all of the features I've talked about today, aside from Vertex Optimization, are extensions to the OpenGL spec. They are widely supported by all modern graphics cards, but they may be missing on mobile or consoles. If a game is meant to target both, it may be worth it to use these features if they are supported, and fall back to basic textures if not. This topic was really interesting to learn about, despite wanting to kick myself every time I learned a new piece of information that could have saved me days or months of effort. This is my first attempt at making educational video content, so I'd really appreciate hearing what you liked and disliked about this presentation, and what I could do better next time. I'll try to reply to every comment on this video, so do leave me a message, and I'll get back to you! I should probably also give the obligatory interaction reminder. As of this video's creation, I have a grand total of… 19 subscribers. So I would really appreciate a like or subscribe so that YouTube has a chance of showing it to others. I'm gearing up to the release of my game sometime next year, so I'll be posting a lot more game dev content in the coming months. If you're interested in that, stay tuned, or join my Discord and chat with me. I also have a Twitch where I stream every Saturday, and a Tumblr where I post various development updates. Links are down below. Other than that, I want to express my deepest thanks to the people that have inexplicably chosen to give me their support on Patreon and Twitch. It means a lot to me that people are interested enough in what I do to help me do it. Thanks to… Butter, Luna, Maeve, Silent_rkgk, Therem, and Zythia. Thank you so much. I appreciate it more than you know. Anyhow, thanks for sticking around till the end, see you next time!
Info
Channel: Aurailus
Views: 25,068
Rating: undefined out of 5
Keywords: indiedev, gamedev, opengl, gpu, c++, programming, programmer, graphics, indie, gaming, coding, code, game development, indie development, indie developer, indie game developer, game, game developer, auri collings, aurailus, gamer, texture, art, pixel art, programming art, auri, voxel, voxel game, voxel gaming, minecraft, blocks, block game, minecraft game, minecraft clone, returnal, returnal game, triple a, AAA, bindless, arrays, texture arrays, sparse bindless texture arrays, SBTA
Id: YTfdBSjitd8
Channel Id: undefined
Length: 12min 32sec (752 seconds)
Published: Sat Sep 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.