I Optimised My Game Engine Up To 12000 FPS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
have you ever been excited about a new game only to be disappointed that it lags I've spent the past six years creating a game engine and I've been shocked at the things that can make or break performance I'll show four simple optimizations that you can use to make your games run much quicker and instead of just talking about them we'll use them to optimize something that's found in a lot of games voxal voxal are these small cubes and this world is composed of millions of them right now this runs at 50 frames a second and will increase it up to 2500 using the first optimization if we render these voxal one at a time we would render this map at one frame a second that's the opposite of optimization so instead we'll group them into chunks of 32 all these voxal now share the same mesh which lets us render all of them at the same time we can then combine more chunks together to create a larger world this works but it renders very slowly so let's start optimizing it if we have a look at the mesh for a chunk and then zoom in further in into a voxal we can see that it's composed of 12 triangles that means the entire world is composed of 115 million triangles the less triangles we have the faster our game will run so let's get rid of some if we look at these two voxal there's four triangles between them that will never be able to see so when we generate the mesh for this chunk we'll skip the triangles that are between two voils this brings our triangle count down massively to only 1.2 million but we can reduce it further if we look at this wall we can see it's composed of voxel that all have the same metal texture it looks like a large flat surface but it's really composed of 1500 triangles we don't need all these individual triangles we're better off combining them together the algorithm for this is too big to fit on screen but I've put a link to it in the description we can also combine them sideways but not until we've done another optimization in step four this brings us up to 2500 frames per second just by reducing the amount of triangles the next thing we can optimize is what's stored inside each triangle every voxal face has two triangles and each triangle is composed of three vertices each of these vertices stores position and normal vectors and a texture ID this adds up to 28 bytes per vertex and for the entire world that's 24 mbes reducing our triangle count increased our performance and so we're reducing our memory usage so we'll compress all this data into one integer without losing quality to start every vertex needs a position currently we're using a vector for this which holds three FL these floats are great for standard 3D models because they can store fractional values but every triangle in our mesh lines up to a grid so we can use a bite instead that's a quarter the size of a float but we can go even smaller a bite holds eight bits and can store numbers up to 255 but our chunks are only 32 voxal wide with seven bits our mesh will be half the size at up to 127 voxal wide with six bits our mesh can be up to 63 voxal wide and with 5 bits our mesh can be up to 31 voxal wide but our chunk can't fit into that so we'll use six bits for each of the X Y and Zed positions we can reduce this further but not until we've done another optimization in step three each vertex also stores a normal Vector which helps us calculate lighting and other effects this Vector stores the direction that each triangle is facing for example here all the green triangles are facing up and all the red triangles are facing left since boxes are cubes there's only six possible normals they can have so we'll store the number zero for up one for down two for right and so on then in the vertex Shader we can turn this number back into a vector we can fit these six directions inside three bits which is a huge saving lastly each triangle contains a texture ID which tells the Shader which texture to apply we only have 70 unique textures so we can store this ID using seven bits all the data we need to render a Vertex can now fit inside 32 bits which is the size of one of the floats from the old vector but the vertex Shader needs to know how to unpack this data into its separate components we'll use bit masking for this the top row here is a mask and matches the number after the N symbol in the Shader when we apply this mask to our vertex data we get a new number with only the bits that line up with the mask to unpack the rest of our data we'll shift it to the right and then apply the mask again we now have all our data and can run our Shader as usual but all the chunks are overlapped in the same position that's because every vertices position is now in the range 0 to 32 so we need to tell each chunk where it is in the world to do this we'll create a world position uniform in the vertex Shader and update it before drawing each chunk the vertex Shader then adds this world position to the vertices position our world now renders correctly and our memory usage is down to 3.4 mbes this increased our frames a second up to 4,000 just by reducing the memory usage but we're still storing more data than we need to if we look at this wireframe every voxal face has six vertices and two of them have the exact same position storing the same data twice feels like a waste can we render a voxal face with only four vertices instead currently these triangles are rendered individually which means we need three vertices for each of them but open G also lets us render triangles as strips where each extra vertex builds a new triangle off the last the problem is everything gets joined together as one long strp I thought surely there's a way to tell open to start a new strip every four vertices but I couldn't find anything but I found an even better solution when games render particles they use instancing they have a base particle model and a list of positions and then open gel draws this model at all of these positions we didn't have to create a huge mesh with all these particles in it and even better each particle is a triangle strip and they're not joined together so we'll use instancing to render our voxels we'll have a base model for the voxal face which has only four vertices and then instead of a list of positions we have a list of our 32-bit voxal data this voxal data contains everything we Ed to store in the mesh but rather than storing it six times for each face we only have to store it once but when we render the world everything is facing up that's because our base voxal model is facing up we need to rotate it based on the face Direction in the vertex Shader if the face is up we'll lift the model up if it's facing left we'll rotate it to the side and the same for the other direction but if we look at the world we still have all these gaps that's because earlier we combined our triangles together and we're no longer storing how long each combined face is we need to store this length but we don't have room for another six bits thankfully now that we're using instancing we can reduce the size of our position Data before we had to store positions in the range 0 to 32 which is 33 unique numbers but since our base voxal model is already one unit wide we only need to store positions up to 31 five bits can store exactly 32 unique numbers so we're in luck we also only need five bits to store the length because each combined face can be up to 32 voxal long now in the vertex Shader we can use this length value to stretch out the base voxal model this reduced our memory usage by a factor of six down to 580 Koby that's because we're only storing one vertex for each base instead of six also the vertex Shader only runs four times for each face because we're using triangle strips together this increased our frames a second up to 6500 and the next optimization will increase it even further to render this world we're sending all these commands to the graphics card this is typical for a game but we can replace them all with just one to do this we need to change the way we store our data right now each chunk has its own list of instance data in its own buffer this means we need to ask the graphics card to render each of them one at a time instead we can create one huge buffer and give each chunk a small portion of it to render a specific part of this buffer we can use the longest open gel function ever created the first two numbers refer to the four vertices in our base Vox or model the next two numbers refer to the start and length of the part of the massive buffer that we want to draw if we want to render multiple parts of this buffer we could repeat this function but now we're back to multiple draw commands we could just render the entire combined buffer but that would render chunks that are outside our field of view we only want to render chunks that we can actually see so what we'll do instead is store these parameters in a special kind of buffer called an indirect buffer then we can use a second longest open gel function to render them all at the same time but when we draw them we have that issue from before where they're all overlapped that's because we can't use Shader uniforms anymore to position each chunk to fix this we'll use another buffer called a Shader storage buffer object we can store anything we want in this buffer and in this case we'll store the weld position of every chunk we're trying to draw each of these commands has a unique ID called GL draw ID this ID increments for each of our draw commands so we can use it to select the right position for each chunk we're now rendering the entire world with one draw command which increased our frames a second up to 12,000 that's because our CPU spends hardly any time telling the graphics card what to do and the graphics card has a huge command it can focus on but 12,000 frames a second isn't the limit now that we're using one draw command we've unlocked two bonus optimizations the first is related to an optimization that every game uses to ignore triangles that are on the other side of a model it works by checking if the points in the Triangle are going clockwise and if they aren't the triangle isn't rendered this speeds up rendering because we're not wasting time on the triangles that are facing away from us but the vertex Shader still has to run every frame to determine if these triangles are clockwise or not if we only render the anticlockwise triangles we can see there are a lot of them that are facing away from us this means we're running the vertex shade for all these triangles and then just discarding them but we can stop these triangles from ever reaching the vertex Shader let's say the play is standing in this blue Chunk we know they'll never be able to see the red triangles in these chunks because they're facing away from them we also know they'll never be able to see these triangles in these chunks so rather than creating one mesh for each chunk we'll create six the first mesh only contains triangles that are facing up the second only contains triangles that are facing down and so on we're still only using one buffer and one draw command but we have an extra step to only render the meshes that are facing the player the vertex Shader is no longer wasting time on triangles the player can't see which brings our frames a second up to 14,000 we're now up to the final optimization earlier I mentioned that we can only combine faces in One Direction not two that's because we don't have room to store another five bits for the combined horizontal length but we can make room by deleting the face direction we can do this because of the Shader storage buffer object we created before right now it stores the world position of each mesh in each chunk but we can also store the face direction of each mesh this works because now each chunk has a separate mesh for each face Direction so in the vertex Shader we'll replace our face code with the ssbo we now have 5 bits free in our vertex data which is enough to store the second combined face length by combining faces in two directions we've reduced our triangle count down to 79,000 this is now running at 177,000 frames per second which is the highest it'll go compared to the start we're using a tiny fraction of our original memory usage and triangle count this reduction is the main reason it's rendering so much faster you can run these demos yourself and experiment with the code details are in the video description there are three more optimizations that I use to render massive amounts of terrain which you can watch in this video on screen
Info
Channel: Vercidium
Views: 463,921
Rating: undefined out of 5
Keywords: optimization, fps boost, game development, game dev, game optimisation, game engine, game engine optimisation, game engine optimization, optimisation, game development optimisation, game development optimization, game performance, game engines, game engine development, optimise, optimize
Id: 40JzyaOYJeY
Channel Id: undefined
Length: 11min 57sec (717 seconds)
Published: Sun Mar 17 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.