Vulkan API Tutorial - 4 - Command Buffers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone and welcome to this Vulkan API tutorial this time we're going to take a look into the Vulcan command buffers how to send some commands to the GPU and also we're going to take a small look into the Vulcan pipeline after that we're going to discuss about the synchronization synchronization is a really huge part of all canon we need to discuss it on this tutorial before we continue to the window creation also from now on you can find all of my source code that I handle on my tutorials in github the link is in the description of the video the first thing I want to mention is if you go to the renderer class I use an underscore in front of my variable names that are not public and this is not really recommended it's illegal in the you know in a namespace or in a class but it's still not recommended anywhere where you can see two underscores next to each other is always illegal and sorry about my function names I'm gonna stick with this but underscore pick letter is apparently reserved by C++ globally I personally don't really care about this too much but still it's not recommended and these names with an underscore and starting with a big letter should be reserved by the CEO as possible if you read about what's legal and not in C++ it's it's like reading a law book so you're bound to make some mistakes like I did sometimes anyway not caring about it this is gone this is what I'm gonna stick with but you might want to leave the underscores out okay I actually found a bug in one of their validation layers I think it's a bug and I reported that sound this is kind of waiting for it to be processed by lunar key but I think that's gonna take some time some ganancias go ahead with a tutorial anyways also another thing to note about we're building for 64-bits but if we turn this to 32 and we try to run this now it's gonna give us a bunch of errors that's because all these handles now become not pointers but I see you in 60 40 objects that is not pointer so you cannot put a null pointer to these objects so what we should be doing is VK null handle instead of null pointer to these handles we can all handle is a zero it just how define to zero as you might already know that if you pour a zero into a pointer that's guaranteed to be a null pointer but null pointer is not guaranteed to be zero so that causes a little bit of a headache in here I guess now that's easy to fix right now so I'll try to make this instead in my future tutorials I do want my tutorials to work on 32-bit operating systems or programs as well you are changing all all of these and let's try again and they should work now okay so what exactly is a pipeline Falken pipeline we do know that it's a way of us rendering polygons into frame buffers but here's a proper example of it Volken does have two pipelines one for compute shaders and one for drawing this image is not the pipeline and it doesn't really represent represent the pipeline too well but it does represent the process that the pipeline goes through so when we draw something it's going to go to input assembler from there to vertex shader from there optional tessellation shader from there optional optional geometry shader and then it goes to primitive assembler rasterization and fragments operations pipeline in Vulcan is a fixed function pipeline that means that you cannot change this this order of execution inside the pipeline the pink areas are fixed function stages and you cannot affect these stages at all the yellow ones are stages that you can program you know write shaders to and then you got constants and bunch of buffers and images that you can render stuff into or use as a texture or use as a vertex input this pipeline is obviously a really important part of Vulcan this is how you render everything to your frame buffers and onto your window surfaces but going back to the code now and let's start making our own command buffers so the command buffers are how you send things to the GPU and to be calculated on the GPU before we do that we have actually have to create the command pool first so inside the main function after the after we initialize the renderer I'm going to create a command pool so we K create command Paul will actually need a crew device for this create command Paul I'm going to create a temporary variable for the device so I'll talk device or underscore device by the way I disable the private section of the renderer for now because I want to access these private variables temporarily we're not going to use this code in here for more than learning purposes so I'm just going to do it like this when we create the window we are going to have to look into this command pools and stuff we actually need to set some amounts the GPU when we create the window surface because of the frame buffer which we need to actually create ourselves inside the GPU memory and continuing here we're going to give this create command pool function of device command pool creates info so okay command pool create info pool not poop anyway s type ek structure type command pool create info inside these videos I'm gonna start skipping ahead this s type variable because you should pretty much already know this okay cool next is going to be a null pointer queue family index I'm going to show you that youso show you that in a bit the next thing we need is a full creating for flags and there's a few options for this but before that I'm gonna explain this pool or a little bit to create command buffers you need a pool and you allocate those command buffers from a pool after you've done with the command buffers you don't actually need to free those command buffers it's all good if you just destroy the pool itself that's going to destroy all the allocated command buffers as well so inside these flags or if we take a look at this we have two options transient spit and reset command buffer pits this reset command buffer pit means that we can reset and rerecord the command buffers that we allocate from this pole and this transient pit can means that we are going to free the command buffer in a relatively short timeframe it's kind of like a hint to the system that this these command buffers allocated from this pool will be a relatively short-lived perhaps only used like once I actually want to reset the command buffer so I'm gonna put it in here and now for the cue family index this is the family index now we figured out when we created the device so what I need here is or traffic's family index this is the index of the cue family that supports the traffic's and we need this in here okay I'm moving on we can give the pool create info pointer to this function allocators null pointer and then the command pool handle let's create down on the top of this list week' command poor Alice let's name it command poor and let's give that to the create command pool function now this should work let's see if it does okay it works but VK Deepak reports object type command Paul ext object at this location has not been destroyed we created something when we use the VK create function we also need to destroy it so VK destroy command Paul and this takes in the device command Paul and null pointer the next step is actually to allocate the command buffer so ek allocates command buffer there we go device create info and the last one is there command buffer itself so okay turn and handle let's give a pointer of that to this function as well as let's create their ek command buffer allocate info now let's give a point of that to this function as well now in this command proper allocates info we're gonna give their pool of which this command buffer is going to be allocated from so Paul is command Paul command buffer count is how many command buffers we want to allocate from this pool using this function so one for now is good the next is level and this is a big fancy one I'm going to explain it in a bit for now let's put it to VK human power level primary let's just use primary for now I'm going to explain it in a bit now that we have the command buffer we can actually start recording it so VK begin command over you give it a command buffer and the begin info in this begin involved structure inheritance info we're not gonna touch this for now for flags we can give it a few flags the Flex flag types are here if you give it this bit one time submit you can only submit it once after that you need to reset it or rerecord it render pass continue bits I'm gonna explain that in a bit simultaneous use bit that basically means that this command offer can be used at the same time with another submit command if you don't use this flag you can only submit it once or once at a time render pass continue bits now this is related to the secondary command buffers this is ignored if the command buffer level is primary so how this works actually is on the primary command buffer you can submit only their primary command buffers to the GPU and the secondary command buffers are accessed through the primary command buffers now there's two options what you can do in your application you can either record one huge primary command buffer and update that for every single frame and then you can render everything on your screen when you submit it however you can save them on memory you can record them and then you can just reuse them recording a command buffer is going to take a little bit of CPU time not much but like it's still a lot better than OpenGL but the better yet you can actually reuse these command buffers record them only once and then reuse them now the idea behind the secondary command buffers is that when you create an object like a mesh and the you render it what you do in here is record you record the command buffer for that object once and then you may get a secondary command buffer if you plan your program good enough you basically don't need to read record the command buffers for your objects even for dynamic meshes you don't need to rerecord the command buffer ever again you will however need to operate the primary command over sometimes the time but that means that when you can reuse the secondary command buffers you cannot save some CPU time I'm going to show you how much soon this render pass continue bits in the secondary command buffer it's gonna continue from the primary to the secondary command buffer it'll become clear later actually I don't want any of these flags for my begin info so I'm going to leave all of these out now what we need to do next is to end the command buffer recording BK and come on over and give you the command bottle handle here so when you allocate your command buffers it's gonna put that command buffer into initial State and when you begin your command buffer the command buffer is going to be put into the recording States you can record the commands to your GPU after that you call the end command buffer function which basically compiles your command buffer into executable on the GPU and that means that it's the command buffer state it's going to be changed into executable command buffer now after this we can actually submit the command buffer to the GPU and we do it like this so BK Q sub mates now this function takes a cube and we have don't have the Q yet I'm gonna get it in a second submit count submit info list of submit in folks actually and offense so what do we need first is actually the queue and we don't have that yet so let's go back to our renderer and let's make a queue under our device let's create VK q we have a queue handle in here so let's initialize it with null null handle in the renderer function calls in its device so I'm the last line I'm gonna call a function called VK yet device Q or given their device q family index which is traffics family index q index 0 and handle to the cube or a pointer to the Q handle and this R it returns nothing in this function this only gives us all these only fetches our Q handle for us the parameter Q index cells are basically the which q we want to fetch it from here and this value need to be smaller than the Q count from this family we can also get another cue from difference if we allocate it like two Q's in here if you put a Q counter to week you can there are two Q's like this with handle obviously being different but yep anyway that that gets us on the Q handle in the main dot CPP let's get Q handled q or thank you there we go now let's not mix it as cute submit count how long is or how many submit in folks are we going to submit on this one go this will be we are only going to submit one so a pointer to submit in 12 which I'm gonna create in the second an offense we're not going to provide offense yet so we take null handle in here let's create a structure called VK submits info submit in full structure in this structure we have quite a few parameters that we can set in here if we want to but basically what we need to do at a minimum is to send the command buffer count how many good man poppers we're going to submit I'm gonna submit one command over when we have one and the P command buffers which is a list of command poppers can I give it as a pointer in here that would be the only command offer now this actually takes this command buffer which is now executable and it submits it to the queue on the GPU now basically all we need to do is to wait for it to finish because the GPU runs concurrently to the CPU obviously so VK device or queue wait idle I'm gonna wait for this queue to be idle and this is a really this is a really inefficient way of synchronization but this is really easy so I'm gonna do it for now and now this should work just fine let's see what happens well it didn't give us any errors so I should work just fine let's actually put a command into our command buffer and how you'd what are the commands it's basically BK c md all of these that functions that begin with me KEK CMD are all command offer instructions to the GPU so what I could do in here is scissors or actually viewport I'm gonna set the default this doesn't do anything since we don't have a render pass nor pipeline or anything to find yet but all of these CMD functions taking command buffer as a first parameter so we've now switch to which command buffer to record to first viewport let's just put something in here let's create our viewport like so so this was set out viewports so basically this is just an example of how you can set a viewport and you need to set this per command buffer well OpenGL had a sense of States it's a huge statement machine or Valken is not so basically when you rent us anything at all and you have to set the viewport physically every single time to render a mesh for example but that's okay we're gonna discuss more about that later anyways there's a few problems that we need to take care of a lot of that why GPUs so fast is because it runs concurrently almost every single command is run simultaneously or concurrently so basically what you do in this submit the CPU is just gonna flush through all of this and the GPU will finish this command or this command buffer as soon as it has some time for it now we wait awaits for the U to be idle but we can actually do little bit better because this only these weights for the whole queue to be completely idle but we won't want to only way for this submit to be done or these commands that we submit it we only want these to be done if we try to run it right now without waiting for this it's gonna crash or give us an error anyways because destroying command Paul before it's command buffer has completed so the command para wasn't completed on the GPU before you try to delete it we need to wait for this to complete on the CPU side before we can see continue our program and delete the pool a really easy way to do this is with fence fence is a synchronization between the GPU submits and the CPU so what I'm gonna make is a VK fence create fence PK and create info now in this fence I'll create info when we have flags basically and the only value we can keep the facts is longley value we can give to this flag is VK fence create signaled bit we don't want that so we can just delete this when the fence is signaled the program can continue and we want the GPU to signal our fence so let's not set it by default that creates our offense now we can keep the offense in here now we can wait for this offense and we do that with a command or function DK wait for fences device how many fences we want to wait for a list of fences when we got one so a pointer wait for all UK true that means that if this is a BK false we wouldn't only need to wait for one of these fences to complete if we sent it to VK true then all of these fences need to be set before we can continue from here on we only got one fence so that doesn't really matter and a timeout is the last option in here and that is in nanoseconds this might not be accurate like if you wait for one nanosecond it's going to round it up to some a bigger number that your system can support a nanosecond is like one clock cycle in one gigahertz clock anyway so this is going to be like a huge value usually I basically want to wait here indefinitely if the fence never gets set there's a special value for this and that's you int 64 Max and that's basically all it takes to synchronize between the 1q submit and the CPU now this function also returns a value and we can check that it returns a success usually it returns a success but if we set the last the timeout value for zero that means that this fence is not going to wait at all so if I run this again this is gonna return VK timeout and notice this is a positive value this is a positive VK result value that means it's not it not an error but it's a timeout so this timer ran out first before this fence was set and this is gonna crash now so let's put this to you in 64 max and we can be pretty sure that this is always going to finish just fine actually no okay okay yeah I forgot to destroy the fence so PK destroy fence devise and now what your answers fine okay okay and moving on the next type of synchronization it's going to be a semaphore so let's create out BK semaphore and let's call BK create semaphore there wise pointer to semaphore create info allocator null pointer pointer to on pointer to semaphore handle in this simmer for creating for structure there's only basically as type of flags and the flags is the same as the fence so I'll show you show it to you in a bit so if I go into this bags you can create the semaphore being sets by default when you create it we don't want that so we don't do that now how do you use semaphore is it's a little bit different from fence because fence waits for the GPU to be ready on the CPU side symma for it tells the GPU when other process in deceit GPU has been completed and I'm just actually going to show it to you for this I'm gonna make two command buffers and I'm gonna let make this command buffer a list with two slots and I'm going to divide this code a little bit like this I'm gonna copy this also let's make this work in here because this is an hour list it doesn't need to be a pointer to a list and on the second command buffer second not third there we go I'm going to do exactly the same process but it doesn't really matter this time just modifying this code so it works on two different command buffers I also need to submit it twice and ii submit is going to be from the second command over like this now we basically have two command poppers they are in this case identical but they don't have to be identical I want to first fully run this first command offer and then the second one but I own the CPU side I don't want to wait for them to complete I want the GPU to handle execution order GPU is actually really powerful machine and like you might already know the power comes from concurrent processes that means that when you pass or submit the first command buffer before this first command buffer has been fully completed the second command buffer might already start executing now that's a little bit of problem if this first one for example would write into some buffer like a buffer image and then the second submit or the second command buffer would read from that and if you submit them one after the each other like this this could actually crash your GPU because the second one can start executing before the first one has been fully finished and the memory for the image for example has been fully written so what do you want to do in here is using a similar form and the semaphore we can introduce it in like this on the first submit submit info again a signal a semaphore so signal semaphore counts we're gonna sing you know one semaphore and the second one is a list of semaphores to sing we only got one so it's a pointer in this case and what this means is that as soon as the command buffer this first command buffer that's being submitted in here is complete it's gonna signal this semaphore also I don't want the fence in here so we can all handle here I'm not gonna do the same in here so I'm gonna ignore the fence fully ignore a fence I'm just gonna wait for the queue to be idle what you would do in a normal application would be it would continue to do all the work on the CPU and let the GPU handle the processing of the command buffers okay so on the second command buffer submit we are going to wait for semaphores how many semaphores we can await for is one so wait semaphore count 1 submit info P wait semaphores now this is a list of semaphores that we can await for we only got one so it's just the one semaphore now we do need to make one additional step because if we try to run this now it will say that you submit required parameter P submits P weight destination stage mask specified us now and that should not be this case also I forgot to destroy the semaphore there we go okay what we need in here submit info weight destination stage mask and this is a list of stages or stage masks let's see actually it's a yeah it's a list of pipeline pipeline stage flags and this is any of these ones in here and these all of these mean a specific stage inside the pipeline so if you remember this one let's take for example fragment shader bit would be a stage where we around the fragment shader so what this means is this is going to be a list of where we are going to wait for this semaphores for example if we right into an image on this first command buffer and then on the second command buffer we're going to access that image but we don't need to access the image before we reach the fragment shader in that case we could do something like this VK pipeline stage flags I'm just gonna make flags and I'm gonna give it VK pipeline stage fragment share bits and give the list in here so what now happens is this is going to run first because it's submitted first and then this is submitted right after that one it's going to process this and if it has time it's going to process horse thought processing this one as well we put the weight destination stage mask has for the one semaphore this is a list you can specify a list of different stage masks in this list and this Flags list must be the size of weight semaphore count and what this does it's going to in this VK pipeline stage fragment shader pits by the time it reaches the second command profit by the time it reaches this fragment shader and before it X actually executes the fragment shader it's going to wait for this first command buffer to be fully executed and done if you want to make sure that the none of the work on the second pipeline gets done or processed before this first one is fully completed you can define an earlier stage on the pipeline and we don't actually have a pipeline yet we haven't implemented it yet so what we can do here is all commands bit and that means that all of these needs to be done before any of this is done and this is all done outside the CPU so we don't need to waste CPU time waiting for this first one to be done unless it's doing the second one and now this should work just fine let's see if it does and it does and that's good so this was a little bit complicated already but it's not over yet because there's also events which I'm actually not going to touch they are really and I actually haven't found a really good place where to put those things but basically event is something that you can signal or read on the both on the you know CPU or GPU side on the GPU you can waits for event and on the CPU side you can check the state of that event and the final part would be a pipeline barrier VK CMD pipeline barrier the first parameter is a command buffer let's give the first one in here the second parameter is going to be pipeline stage flags same us with this semaphore here so let's give this pipeline stage all commands bits in here I'm gonna explain what they took later and for the destination the same thing dependency flags I'm not gonna worry about it too much yet memory barrier count next is buffer memory count know before memories barriers and the last one is image memory barrier count and we're not going to use any of those either so these would make a pipeline barrier how this works is that you're not actually giving line-by-line instructions to the GPU you're giving a to-do list for the GPU kind of we can define where this barrier should execute the pipeline barrier is a two-stage operation you define the source stage mask which tells the pipeline barrier which stages of the pipeline needs to be done before we continue executing the barrier itself and then there's their destination stage mask and that basically tells the barrier that we should not continue running the stages defined in destination stage masks before the pipeline barrier itself has done its own job but we will talk more about that later when we create the window in the next tutorial I'm going to use this pipeline barrier to transition images from undefined state to something else we'll see what it is okay actually if you did manage to understand all of this I'm actually really proud of you this took me like two weeks to learn myself and reading a lot of materials on the Internet fortunately everyone's tackling the same problem of this barrier problem and synchronization problem on the GPU altogether so there's a lot more information now than like a month ago anyway I wanted to show you one more thing and this would be you probably have seen this already I got the source code and I got to mmm you know play around with this and this is a really nice model this is currently rendering on OpenGL core OpenGL you can see that the frame time is about 4 milliseconds the CPU is taking like 3.25 milliseconds per frame and the GPU is taking like 0.7 milliseconds per frame now let's animate this because it is actually really cool animation and this is you know there's a lot of stuff in this card this never gets old anyway you can see that the CPU time the blue bar on the bottom of the screen is hold on let me just resize this a little bit okay so the blue bar on the bottom is the CPU time the green one is the GPU time the GL Corps takes a lot of time on the CPU to calculate all of this and if we multiply this card by 40 perhaps you can see that it is kind of laggy yeah this is like this would be really unplayable on any game now if we go to volcán actually gonna set the work of threats to 1 now if we go to volcán now this is a lot better already like it's barely playable like I could play a game like with a frame rate like this just barely perhaps but yeah anyway frame time is 70 nope 30 milliseconds and you can see the CPU time has gone down a lot so now it's mostly on the GPU my GPU takes almost 30 milliseconds to render one screen oh one frame now we have threads in here we can increment the amounts of threads up to four on this example you can see the GP sorry CPU time goes down even more now this is a good news for us more work can be done on the CPU on the game logic side now let's go and reuse the command buffers now we can see something really nice see the CPU time it's barely 30 microseconds so why is this is because we previously be calculated the command buffers on every single frame you can see that it was already a lot better than on the opengl side but if we actually just reuse we do the command buffers once we input the commands on the ones and then we can just resubmit those command buffers if we do that so we can get really a huge boost on the CPU time that basically leaves our whole CPU up to the game logic yeah this would be a lot of a lot of polygons are not written on the screen I actually don't know how many but yeah anyway cool stuff now you should note that doesn't this is grouped by materials and the GPU time is still like almost 30 milliseconds if we go to read retro individual objects the GPU time is going to go up to 40 milliseconds per frame so like on OpenGL doing draw calls per material is still faster now if we use an OpenGL core for this again you can see that the yeah it's this is completely unworkable OpenGL with material troops it's a lot better already pretty much all I had to show to you this time on the next time we're going to create a window so until then see ya
Info
Channel: Niko Kauppi
Views: 18,197
Rating: undefined out of 5
Keywords: Vulkan, API, Tutorial, specification, graphics, programming, gpu, graphics card, c++, cpp, code, coding, advanced, close to metal, glnext, command buffers, commanding gpu, pipeline, overview
Id: Bu581jeyTL0
Channel Id: undefined
Length: 51min 9sec (3069 seconds)
Published: Thu Mar 31 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.