I Remade Minecraft But It is Optimized!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Since I started working on this Minecraft clone, I had a few goals in mind, I wanted it to be a finished product and work multi-player, I wanted it to have nice shaders, and I also wanted it to run well on older hardware. So today I will finally talk about both the smart optimizations and simple tricks that I have made for this project, I will also answer the question if many draw calls are bad by reducing 3600 draw calls to only one, the answer might just surprize you. We will start from around 20 FPS but with a good rendering foundation, and I will apply 3 optimization techniques on top, including a custom GPU geometry memory arena, to see how far we can get, so let's go. The first thing that I wanted to optimize very well was the block renderer. Because that's the fist building block of Minecraft. All of the other algorithms and datastructures that will optimize the game will come on top. So let's start with this and than I'll add other optimizations. This is the geometry that I want to render, as I explained in the first video, it is important to not render faces that are hidden by other blocks, so you actually don't want to draw an entire cube all the time, but rather faces of cubes. And this lead to a very interesting optimization. So normally when rendering you would send the geometry that you want to draw to the gpu, like for example I would send the shape of this face, and than you would use what is called a uniform to specify a position for the entire object in the world. But we can do better. So what I realized however is that in Minecraft I always have the same geometry, one of 6 possible faces, the only thing that differs is the position and the material. So I flipped the rendering process around. The process works like this: I can only render quads of predefined shapes, and for each quad I have to specify the shape, the texture to apply, the position, the torchlight the sun lights, and some flags. Now for the shape and the texture I can just use a 16-bit index for each getting me to one int so far. For the positions I need a full int for the x and z direction but for the y the build limit is very small so I don't really need the top 16 bytes. For the lights, the maximum light value is 15 so I just need 4 bytes for each light type, that can be placed here, leaving me with another 8 bytes for the flags. So I managed to fit the entire information about one face in just 4 ints, remember this because I will return to this number in a seccond, and keep in mind that usually people would waste that only to encode position, I managed to encode both position shape material and even some flags here. Now this data is sent as a vertex array but I am doing instance rendering. So I am drawing 4 vertexes, for each face. And this data is configured to be the same for the entire face, so nothing changes per vertex except the vertex id, that's why I said that I flipped the rendering process around. And because I managed to fit all of the data in only 4 ints, this actually means one attribute or one cache line so this can only be extremely good for performance. I calculate the actual shape and the normals in the vertex shader. This is just a matter of putting all of the possible geometry in a buffer, than all I need to do is to index the correct position taking into account the face shape and the curent vertex id. Now I calculate the normals on the fly and this helps with the dynamic geometry, and to be fair I don't think it costs that much to calculate. I will mention that I didn't use a geometry shader because they are very slow apparently, and it turns out that I didn't need it in the end. And to animate the geometry is preety easy, I just use a different face index to specify that I want to animate that face and than I just apply some functions in the vertex shader. This gets us to here, at around 20 FPS at a render distance of 60 by 60 chunks. Now there is one last hidden secret here that I have found to be very interesting and it has to do with the index buffer or the way you draw your geometry. You see I never quite got the point of the index buffer, Let's say we have a cube, with an index buffer you only have to send 8 vertexes instead of 36, and this seems nice except that in practice once you need a different normal for each face you actualy still have to send 24 vertexes. And in my case an index buffer would't help me in any way because I am not even sending different data per vertex. And here is why optimizations are difficult, because they have many hidden secrets. So apparently whenever you use an index buffer, each vertex that you reuse has to be processed only once, in the best case. And this also applies when using things like triangle strips or triangle fans. So instead of rendering 2 triangles, I render a triangle fan, and this means that the vertex shader has to run only 4 times instead of 6, reusing the shared vertexes. And this gets us to the present moment at around 30 fps. So now let's apply some other 3 optimizations but before that I want to remind you that it is very important to always measure and have systems to help you with that. In the episode about shaders, I showed this tool that shows how much time each system takes. Now I also added this tool to show me how chunks are loaded and take a look at that, some chunks get recreated multiple times. This has to do with the light system updates and it clearly is a thing to optimize but I hever would have noticed it without this tool. Ok now, the first 2 optimizations are related in a way and they have to do with overdraw. I have an expensive pixel shader so If I am drawing a nice thing here and then another thing comes on top, I waste a lot of time. This is called overdraw. How do I know if my pixel shader is expensive? I test it of chourse, If I disable fancy shaders, we get from 30 FPS to something like 45. Now a trick to reduce overdraw is sort the chunks, from the closest to the farthest, and this works because things that are closer to the player are more lickely to end up on the screen. And luckily, we already have the chunks sorted because we need them to be sorted when drawing the transparent geometry. So we can have this optimization for free. Now this doesn't boost the FPS very much, however, maybe only in some cases. so let's try something even better. Z pre-pass. This is a very easy optimization and I want to make a tutorial about it so subscribe to not miss that out. But basically, I render the entire geometry first, but only to the depth buffer, and than I render it again but this time I change the depth test so that it will only render fragments that end up exactly where they are in the depth test. This reduces overdraw to 0, but at the cost of having to process the geometry of the schene twice. And if the complexity of the shcene is more expensive than the pixel shader this can actually decrease the fps, and on its own it does seem to increase the FPS but when combined with the geometry sorting it does nothing and it even decreases is at high render distances so it has to go. And finally we get to what is sometimes called unified geometry. But I will quickly mention that as last time, you will be able to vote on what should the next Minecraft clone video be on so make sure to subscribe fast to not miss that pool. Ok so you probably have heared that issuing many draw calls and having many VAOs is bad for performance. And a firend of mine that is very good with GPU programming told me that nowadays that's not true. But than, I saw that there are mods that allow Minecraft to have insane render distances, at better FPS than my clone, that felt personal. So I tried to optimize this because right now I have one separate VAO for each chunk so this schene binds 3600 different VAOs and issues 3600 draw calls. So to optimize it I created a single unified geometry pool. Inside I store all of the chunks data and this also meant that I had to create an allocator for it. This fortunatelly wasn't that difficult. I just have a linked list on the CPU to encode the used memory blocks. and an unordered map that points to the list elements for fast acces. Finally, I have to bind only one VAO and issue one big indirect draw call using glMultiDrawElementsIndirect. And after this optimization, the performance improvement is... none. I'm serious it is literally 0. So how is that possible? We're talking reducing 3600 draw calls and VAO binds to only one and no FPS difference. Well this are the things to consider. First of all, yes it is indeed a small but visible improvement on the CPU usage when using only one draw call. The FPS doesn't change tho because the CPU is not the bottleneck here but maybe for a slowe CPU this will make a difference. Also, for an older graphics card or a less powerfull one, this could still make a difference. Now for a modern GPU the optimal way to do things is to bind your stuff first and than issue draw calls. If the things you are drawing are very small, the GPU will render them before you have time to send new data, waisting time waiting. But in a normal use case that won't happen. And since I use bindless textures, I don't bind any resource in between draw calls. So there you have it this is the conclusion, and if you are not impressed, you have to keep in mind that there are still many important optimizations that I still have't talked about like frustum culling or draw chunks only in a circular radius, or optimizing the very expensive screen space reflections that I have right now, and also that this render distance is equivalent to the max 32 render distance in vanilla Minecraft, so running Minecraft at max render settings with shaders on at almost a playable FPS is a good achievement but again there are many things to be added and many things to be talked about in the next videos. So don't forget about voting for the next video and until then check out another video from my channel. See you!

Info

Channel: Low Level Game Dev

Views: 89,828

Rating: undefined out of 5

Keywords: making Minecraft in c++, how I optimized minecraft in c++, minecraft c++ clone optimizations, optimizations, c++ optimizations, opengl optimizations, minecraft optimizations, cpp, c++, opengl, minecraft in cpp, minecraft in cpp optimized, i remade minecraft optimized in cpp, gamedev, game programming, opengl optimizations cpp, low level game dev, minecraft clone, minecraft programming cpp

Id: gFX7Di2aTCY

Channel Id: undefined

Length: 9min 39sec (579 seconds)

Published: Mon Apr 08 2024