Master Class: The CRYENGINE Rendering Pipeline

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Nice!

👍︎︎ 4 👤︎︎ u/deftware 📅︎︎ Mar 02 2021 🗫︎ replies

Captions

i'm theodore i'm the lead rendering engineer here at crytek welcome to this section of the curriculum it's mostly about the rendering and to start off i would like to give you basically an idea of what we understand in terms of rendering at crytek so for us rendering is basically the entire process of going through the scene finding what objects are visible what objects need to be rendered sending them to the gpu and then generating the final pixels so it's a pretty long process and there's quite a bit of code involved which you'll probably get in contact with i did some rough numbers with linux tools and we have roughly between 400 and 450 000 lines of code related to rendering and then there's roughly 60 000 lines of shader code so yeah quite a chunk there we currently support four graphics apis that is directx 11 12 we can end the gnm api on the playstation 4. on top of that we also have support for all the vr devices like the oculus rift oculus quest coming soon and the htc wife so my goal for this presentation will be to basically give you a rough overview and to provide you with the tools so you can dig into the code yourself all right this is mostly for coders the core idea here is really that you get to know where to find the code that you kind of understand the general philosophy behind the code that you understand the data flow and that you get an overview of our rendering pipelines and that should enable you then in the future to basically jump in and do the work you need to do yourself hopefully so relate the code modules we have the rendering code mostly in two modules that is the 3d engine and the renderer dll the 3d engine code can be found in code cryengine cry 3d engine and the rendering code can be found in code cryengine render dll we directly build those into dlls and load them at runtime so the 3d engine will be the 3d engine dll and the render dll we actually built four actually five versions of this for each platform so we have a cryrender d3d11 dll then we have crown renderer the 3d 12 dll and so on functionality wise the modules cover different areas so the 3d engine is mostly in charge of processing objects determining visibility of objects updating objects whereas the renderer is actually the one that interfaces with the graphics apis you can see that also here in terms of what kind of data structures what kind of objects these implement so for example the entire scene data structure or octree is in the 3d engine also the terrain system and the material manager on the renderer we have more stuff like the shader manager gpu resources and actual rendering algorithms now these modules as i kind of hinted at i have a very split functionality in a sense so the 3d engine is mostly in charge of preparing a frame whereas the renderer is in charge of submitting the frame to the gpu and there is a an exchange of work here obviously once when the 3d engine is done with the frame preparation it has to send it to the renderer tasks that typically happen in the 3d engine would be the scene graph traversal where we basically for each view that we need to put on the screen we go through the spatial data structure the octree we perform visibility and occlusion culling that means we need to figure out which objects are actually visible in that view and we also check for some views if the objects are occluded by other objects and then we also have streaming updates and so on on the renderers side we then convert this data that we got from the 3d engine into actual graphics api commands for example uploading new mips to a texture or issuing draw calls in order to be performant we do the work of these two modules in a pipeline fashion that means while the 3d engine is preparing the frame we are actually rendering on the render thread the previous frame you can see here i added the thread designations here so the 3d engine works mostly on the main thread the renderer works mostly on the render thread and there is an overlap here so the 3d engine prepares a frame while the render at the same time is rendering a frame after both have done their work we have a sync between the threads to avoid any problems for the next frames basically to avoid the workloads to diverge too far that means that when the 3d engine is done preparing a frame it'll actually wait until the render thread is done rendering the previous frame and vice versa so this process basically goes on from frame to frame prepare frame on the 3d engine side render previous frame on the render side thread sync prepare frame render frame thread sync and so on this is a typical pipelining approach and it unfortunately also brings with it its own complexity for one we cannot allow direct communication between those two threads if the 3d engine main thread would prepare data and directly send it to the renderer and the render would directly read it it would actually be in the wrong frame because they're working on distinct frames so what we have introduced in the engine is a command buffer and all communication needs to go through that so the 3d engine will add commands into that buffer and the render thread on the frame sync between the two frames we give that buffer to the renderer and then the renderer can process those commands we don't have an established communication path for the reverse direction at the moment so if the renderer needs to tell something to the 3d engine there is no established way of doing that and if you think about it it's actually a sort of an ill post problem because you're kind of sending data from the past into the future so if you look at the data flow you would be starting like in the render thread say you start in frame n you would um sorry that's actually a mistake here i apologize this should be frame n minus two right let's do the next one that one is correct um so if the if the render thread would be sending a command or exchanging data from n minus one frame n minus one it would arrive in this frame here which is n plus one so there's a two frame difference right so that's the thing we generally don't do so for commands we have this command buffer approach for actual data like objects and stuff we need to have something more performant because we have potentially a million objects we can't send a million commands that's just too slow so instead we have what we call a render view that is a class which is in the beginning of the frame allocated in the 3d engine and then all the algorithms in the 3d engine will have access to it and they can fill it with data when the 3d engine is done preparing the frame the render view is passed as a whole to the renderer and the renderer can then read contents and when it's done with it it returns it to pool for reuse there is also a legacy strategy based on multi-buffering you will still find this in some classes in the renderer this would typically be global state being duplicated so having two or more copies of it and then the main thread would be writing to one copy and the render thread would be reading one copy there are some helper structs like the fill thread id global variable the process thread id which you could use to index that data if you really want to know about it just look for this m frame render stats struct in the renderer but it's not advised to use this because it simply doesn't scale and in the future we'll actually remove all this multi-buffering stuff as well so now we know kind of how to interact how the the threading model works how the 3d engine interacts with the renderer so let's dig a bit deeper into this 3d engine and look at the actual classes i picked out the most important ones the ones that you will likely come into contact with so for a start we have the cmat info class which represents material data basically in there you have everything you need to render material so that would be a set of textures a shader a shader technique and a number of shader constants and so on simple material description then next to that you have the render mesh this is basically the representation of a mesh it is a list it has a list of geometry streams for vertices and indices and it contains a list of draw chunks is what we call them each chunk represents one draw call so it's basically a little structure that says if you want to draw this section of the mesh you need to use this material you need to use this start vertex you need to use this index count building on top of those two we have a higher level object which is the stat object it basically owns a render mesh and it owns a material and it extends the functionality of those two by having lods as well in sub-objects lods use lower resolution lower poly count models for the distance and sub-objects that is generally used in the engine for breakable stuff if you want to break an object into parts you would allocate sub-objects inside the start object and each one would have its own transform matrix that can be animated then we have the render node i render node is our render node interface it is the base class of all renderable objects very common objects in cryengine are the c brush the c vegetation c terrain node and then we have character node particle emitter and like 10 and more nodes but these are the most common ones these are the actual objects that are embedded in the in the scene graph in the object in the oc tree so speaking about the octree we the octree nodes are represented by the c octa node class each node represents as the name suggests a single node in our octree it has a list of pointers to child nodes so it has eight child node pointers and it owns a list of render nodes that are embedded in this octree node good detail to know each object can only be present in one octane node ever so we don't have a data structure where the data lifts only leaf nodes but we also have it in intermediate nodes and we have this guarantee that you can have only one up like each object can only be in one node this allows some optimizations in the engine later and then the render view as i explained before this is our main container for passing data between the 3d engine and the renderer so it's filled by the 3d engine and it's read by the renderer i made a little diagram of how the brush class actually connects with its data members you will a brush is basically is a simple a single object in the world which is usually not animated so it's just a static object it has a pointer to material and a pointer to a stat object the material itself is kind of recursively defined in its manner it has material information and then it has also a list of sub materials and these are again c mod infos the start object has a render mesh and it has a number of lots so there's a define currently five uh each start object can have five lots and each lot is then again a start object so again it's sort of a recursive definition and then we have to pointer to the material going deeper into the render mesh the render mesh itself has an index stream and a number of vertex streams so this mvb stream is actually an array and then the chunks are the individual sections of the mesh that would represent one draw call as i explained before and here on the bottom left you can actually see the chunk described in a bit more detail it has something like first vertex id first index id and a material id the material id is basically an index into this list of sub materials in a material so this way basically if you know the render mesh if you know the material then you can pretty much draw the object on the screen you have all the information needed for drawing okay so let's look at the rough execution flow of the 3d engine it basically all starts with the render world function which is called by c system and then so basically there is a lot of actions that happen every frame i only picked out the most prominent ones the longest ones so from left to right we have the particle manager update it basically updates all the particle emitters and then it will start update chops for all the particles themselves in many scenes we have hundred thousands of particles they all need to be simulated and updated so this is done in jobs started here then we have the render scene function which is basically the larger scope for processing a view and generating all the data for a view it starts off with preparing the shadow views typically in our game scenes we have five shadow cascades each one would be a different view on the scene so this preparation here would prepare five or more different cameras for performing shadow rendering then we have the terrain system update in the terrain system we basically figure out which terrain sectors are currently visible what is the screen coverage of these what textures are needed do we need to stream anything in for this do we need to update some meshes for the terrain and so on and then we have the actual scene graph traversal and this is the part which is more or less the most complicated one and also the one you will likely see most this is where we actually take all the views of a scene that we have and try to figure out which objects are visible by that view and which objects and produce the rendering data for those objects it's split into two parts we have the indoor octree traversal we have a visibility area system you can basically draw shapes and then all the objects inside these shapes are represented with their own octree meaning that we can cull an entire tree if we're not inside this area so it's kind of a portal system then we have the outdoor octree which is in outdoor games what you use mostly for what you use basically i'll go deeper into that on the next few slides this is a special case for rendering objects that can't be jobified and then we have our merged vegetation system this is a system that allows us to render massive amounts of grass and trees and in the end of the render world we basically have updated all the objects we have updated all the algorithms now we can go through the list of objects and determine what is what leds do we need for them what meshes what texture leds do we need and so on that's part of the streaming manager so it'll figure out it'll schedule updates for the streaming systems basically in order for you to get an idea of or a feeling of how things interact in the engine i've prepared a frame capture of our woodland level in what we call the boot profiler so here is a functional overview over five frames on the left you see the render thread the main thread and then a bunch of job system workers so if you look on the left here you have the render thread the main thread and i think on this machine i had something like six no eight seven job system worker threats running and then we have physics and audio and network yeah so it's actually using quite a bit of the course on the system i captured this from if you press ctrl and then use the mouse wheel you can zoom into the view and then let's try to focus on one of these loop render thread profile tags here so if you click this it'll hide everything that's outside the scope of this profile label and now you can actually see basically all the work that is done in the time frame of this loop render thread so this loop renders it took like 20 milliseconds you see all the the work that's done on all the threads during that time let's look at the main thread the 3d engine we already saw the c system render and the 3d engine render world and then we saw what i explained before basically back in the slides so here we have the render world and then render scene and then party commander and so on right so if you hover over the individual blocks you can see here see particle manager updates and here you have prepare shadows and so on and then terrain outdoor octree and then we have this large render non-job objects thing and then merge meshes and in the end we basically streaming update streaming and that's it in the meantime while all this processing is being done on the main thread on the render thread we are actually doing rendering so we take we have the data of the previous frame we start with updating some stuff in our pipeline and preparing some data to render and then we actually execute our graphics pipeline which starts off with the g buffer rendering and then shadow rendering shadow map rendering next we wait for draw chops to be done on threads and then we do our deferred effects like screen space reflections we do light volumes deferred lighting and then forward opaque objects fog and so on so it's a complex pipeline we'll get into that a bit later so you can see the work distribution is fairly evenly we have this section here i don't know if you can see the mouse this section here where actually the main thread is waiting for the render thread so in this capture here we are render thread bound basically but you can see nicely the parallelism of the two threads if we scroll up down a bit to the job threads you will see here there is a lot of octane jobs so for example there is this c octree node render content which will then recurse into render common objects ky mask culling thread test aabb and so on so the 3d engine actually keeps the the job threads pretty busy over this time span you see all these are render octree nodes render content that's something else but all these are render content jobs and same here and down here actually as well oh look here i caught a a drawing drop from the render thread but you can see there's fairly good utilization actually across all the threads here i hope this illustrates how the execution of the main thread roughly works in the 3d engine and how we distribute work across jobs and how the render thread in parallel works to that on rendering i will go back to the slides and actually dig a bit deeper into how the octree traversal actually works as you saw on the main thread we basically figure out which are the views that need to be rendered and then we start traversing the scene graph the octree for each node that the main thread traversal algorithm hits it'll create a check occlusion job this goes in the drop queue and then some job thread will eventually pick this up so we have a bunch of check occlusion jobs being scheduled by the main thread traversal when these jobs are processed by the job threats they will test if the octree node is actually visible by the camera thruster and against the occlusion system if the node is visible it will issue another job this job will again go into the job queue for simplicity i put it here on the same thread and this is the render content job this is where the actual heavy lifting starts in the 3d engine this rendered content drop always processes one octane node and then it loops over all the different objects that this node contains first it goes over the vegetations next it goes over the it calls with c vegetation renderer so that's the render interface function and then it goes over the the brushes list and called brushrinder and the rest of the objects are all grouped together into a shared list that's the common object list where he calls the virtual i render node render which then in turn will be automatically dispatched to the correct implementation of the render node interface for objects which support multi-threaded processing for example brushes we can we actually inside this render content job immediately execute the processing function for objects that have dependencies some special objects like roads and stuff they depend on other objects and we cannot guarantee threat safety so these we cannot process immediately we have to push them into another queue this is what we call the the car output queue and this queue is basically then later processed by the render non-job objects function that we saw before so again the check conclusion job issues a renderer content job the render content job will process all the the render nodes that this arc this octree node contains and that can be multi-threaded those will be done in place and the ones that are not possible to do in place they will be pushed into queue when octree traversal is finished the main thread starts processing all these deferred non-threadable objects so it'll iterate through the queue pick out each object and process it on the main thread and we saw this in the capture here actually this is a pretty massive chunk of this capture if you look at reiner scene takes roughly 10 milliseconds eight milliseconds of that is doing non-threaded stuff which is somewhat unfortunate so now that we arrived at the render function for each object i would like to give some detail about how the rendering of the objects actually works internally at the by looking at the c brush class render so we started zbrush renderer and this calls into c start object render internal which in then goes through a hierarchy of other function calls important here for example is it will select which lod to render and in case we are in the middle between two leds we actually issue two renders from that we render both and we have some blending on the gpu between them but eventually we end up in the c render mesh render function and this function will then loop over all the chunks and call the render view add render object so this will slowly fill the render view with data for the draw calls here this is a pretty expensive process if you imagine every single frame for each object we need to go through all the draw calls and add those to the render view that's a lot of operations repeated every frame and in the majority of cases nothing changes so in the majority of cases the draw calls that need to be done for an object they're just the same as in the previous frame so we have an optimization which is called permanent render objects where we build a shortcut the idea is if you have performed this c render mesh render once and you have generated all the little blocks here they're called s-rend items in the cryengine if you have generated all the s-rend items for a render mesh you don't need to do that again the next frame you can just store them persistently and that's what is done in this permanent render object so we go through the the render once we store the render items in a permanent list and the next time we hit the c brush render we figure out oh we've done this before it's much simpler now i can just add the permanent render object and be done with it no need to recurse through the call stack hierarchy that we had before this beats up the engine massively actually in case we don't have this so in the first frame or in case the object does not support permanent render objects we still have to go through the old pipeline and perform all the iterations through start object render and so on and so on but we're pretty much done with the 3d engine now i'm just giving you some useful cvars that are very helpful when you work with the 3d engine so eb boxes will show you the octree node and the object boxes so it'll draw on screen the box the bounding boxes of all objects the debug draw has something like 20 different mode debug modes the most common ones are lods so you can color code the lods in the engine you can that way you can visualize lod transitions you can also display per object draw calls so how many objects do you have in which path for each object and so on if you use the question mark operator you will actually get help on these then we have the check occlusion c-var this one will turn off occlusion culling and visibility culling if you set it to zero then we have a special mode for debugging the coverage buffer that is our occlusion culling system that's basically a pretty involved system in a nutshell it reads back the depth buffer from the gpu and then tests for the next frame test all objects bounding boxes against that buffer to see if they're occluded by other objects you can use this c bar to visualize the coverage buffer on screen then we have the sus max fps to fix the fps it does only work when you turn off vsync otherwise it's the frame rate is vsynclocked but this is useful for example when you have to debug timing related problems so for example if you have some flickering on screen first thing i usually do is set suicide max fps to 2 and see if the flickering is periodic and then you know pretty much immediately oh there's like a one frame dependency issue you can turn on and off shadows with e-shadows i think it has something like four different values where you can turn off shadows of sun shadows of local lights then you have the ca draw chr that turns off drawing of skin geometry you have e entities no drawing of entities e brushes no rendering of brushes e vegetation no rendering vegetation e sun turns off the sun or on and then you can turn on and off water volumes and the ocean okay so much about the 3d engine let's go to the renderer and this is where at least to me becomes interesting i will present to you in this section the api layers we have in the renderer these are basically the interface functions and the classes which allow you to interface with the graphics apis and then in the second part we will have a look at how we render stuff so we will have a look at our graphics pipeline and see what algorithms the cryengine implements so but first let's look at the rendering api layers we have a pretty strict hierarchical system when it comes to rendering api on the bottom we have the native layer that's basically an implementation layer it contains all the code specific for a certain graphics api for example on directx 11 we have the directx 11 implementations of the low level api on directx 12 we have the directx 12 implementations and so on ideally this layer has been written once and will not ever be touched again because we don't expect anything to change in here and hopefully there are no major bugs anymore so this is something you i really hope never have to come in contact with on the next higher level we have the low level api this is our high performance rendering api so it's not like hope gel this layer is more in the spirit of wilkin and dx12 on opengl and directx 11 you have basically this immediate mode rendering api where you issue commands where you say bind this texture do this draw uh set this vertex buffer set this rendering state this is not how how will can dx12 work will click 12 have these compound objects where you you can't say set the shader but instead you say set this object with which contains the shader it contains all the depth states it contains the rasterizer state the output merger states and the vertex formats and so on set this object as a whole so they grouped a lot of simple functions into bigger objects and this is reflected in this api as well it's fairly small we only have seven classes here but those are rather big compound objects for example the c device resource set is not a single resource but a set of resources you allocate this class you allocate an object and then you can put any sort of resource into it textures buffers constant buffers and so on so it's a collection of resources we then have what we call a resource layout that also is a concept that comes from dx12 this is basically a description for describing all the resources that a specific draw call needs so if you do a draw on the screen you need to create a resource layout for that draw and that that layout needs to contain everything that you will ever read in the shader every texture every buffer every constant buffer in dx12 this would be called the root signature then we have a device stream set this is a set of vertex and index streams then we have a device command list this is the interface where you actually issue commands so this is the interface that finally has the draw command it has a command for binding resource sets it has a command for binding a resource layout and so on and then we have the pipeline states and that's the object i referred to previously that is the object that contains the shader and all the gpu states together and we have two versions of it we have the graphics pso graphics pipeline state object and the compute pipeline state objects one is used for general graphics rendering and the other one is used for compute and then with the last one it's a singleton object it's our object factory so any object you want to create it needs to go through the object factory if you want a resource set you need to go to the object factory and say give me a resource set this layer is used in the next layer in the hierarchy so the high level api basically builds on top of the low level api it uses all the classes from the low level api but that's pretty much it we have very very few algorithms in entire cryengine that use this low level api except the next higher level api you are free to use it so if you really really need high performance code and speed you basically can use it but i have to warn you it's fairly tricky there's a lot of dependencies there's a lot of boilerplate code so instead what most of the crying engine rendering algorithms use is the high level api and this is where things are actually much more programmer friendly so we only have four major classes that do drawing and rendering there will be the scene render pass it's kind of a high performance path that draws basically hundreds thousands of objects sent by the 3d engine it's a specialized thing then we have a full screen pass which issues a full screen triangle or full screen quad then we have a primitive pass which renders multiple primitives multiple objects into a render target and then we have the computer enterprise which issues a dispatch on the right side we have the resources colored differently we have a c texture class which represents a texture object then we have a c constant buffer class which represents a constant buffer we have a c shader a c gpu buffer and some specialized buffer manager handle and then we have a sampler state okay so the important classes you kind of saw them already on the previous slides the for the resources the texture gpu buffer constant buffer c shader then in the higher level rendering api we have the full screen pass primitive pass scene render pass computer enterprise and then we have another hierarchy layer above those which we call graphics pipeline or in the hierarchy below that graphic pipeline stages and then on the lower level we have the resource sets render passes so a render pass basically represents all the output textures that some draw call has so it's a collection of render targets and depth targets then the device stream set as i said collection of vertex streams the pipeline states then the layout and the command list and for exchanging data with the 3d engine we have the s rend item if you remember the 3d engine puts all the stuff that needs to be rendered in form of as friend items into the render view and then the render view itself so the low level api again you will likely not get too close into contact with this one but it's good to know some details so on the resource side we have these four classes well it's three abstracted classes and a bunch of non-abstracted ones so the c device texture that's basically a texture allocated by the graphics api so it contains memory for the texture and in a certain format and with a certain usage pattern then you have the c device buffer it represents a buffer on the gpu so texture is a formatted piece of memory so you you have format operations on it for example you can do bi-linear filtering on it you can do typed reads so so you can load rgba you can do sample compare and so on and the buffer is just like in c plus a general memory buffer the gpu doesn't necessarily care too much about the layout of these objects instead it's and on the gpu it also doesn't go through the general filtering hardware it it's just a memory load basically like in c plus so textures are typically used when you need sampling operations interpolation operations when you want to do basically all sorts of data remapping if you just want to read data if you just want to have structures on the gpu you would usually use a buffer all right so c device buffer represents a buffer on the gpu then we have sampler states so these configure the texture sampling operations for example bi-linear sampler state or you can have a wrap clamp and so on and then we have the shader unfortunately we haven't gotten around to abstract this in a in a device class so instead we used the directx 11 interfaces still so you in some places you'll find id 3d 11 pixel shader vertex shader even in high level code that's something we will hopefully fix soon in order to make this work on our platforms we simply redefine those classes to the platform-specific ones on the high level api we have the c texture and that's that's the object you will likely see quite often what's important to know about this object is that it can be empty so it holds a pointer to the device texture to the actual gpu texture but this pointer can be empty that means you're free to allocate texture objects empty texture objects and then on demand only allocate the memory for it so that makes the high-level code a bit easier to work with we have various state tracking inside this class so for example each texture knows about streaming it knows about which mipmaps it has streamed in already which ones it would like to have and so on and then it has a name and size properties and so on so in a debugger you can inspect actually quite a bit of state here there is an invalidation callback mechanism as well this is tied to the property that the texture can be empty you can as client code register a listener to the texture and whenever the texture decides to drop mips or to completely delete the gpu memory for the texture you will get a callback and that is needed in the more complex rendering objects like the full screen pass where for example it has a mechanism to realize if someone externally changed one of the textures it needs to read or write to the creation of the c texture object happens via get or create functions on the device object factory so you have various specializations you have getter create 2d texture gather create texture object get a create render target and so on there's that stencil target as well the getter create here means that the renderer or the device object factory will do its best to reuse memory so if you create a texture and then delete it again the device object factory will typically keep it around for i don't know a couple of frames 30 40 frames and if you happen in the next frame to ask for a texture object with the same size and format it'll just recycle it this increases the runtime memory overhead but on the other hand it's quite a speed up and allows high level code to actually allocate and delete stuff on the fly every frame the high level code doesn't need to care about persistence then we have the buffers so there's the cgbu buffer class this is a general purpose class buffer object it is quite easy to use i would say it's almost like a c vector where actually you can put arbitrary data into it you have no restrictions about formats and so on um and usage patterns as well the tricky thing with all the gpu resources is the interaction with the gpu when you allocate a resource in right into it and then pass it to the gpu the resource is what we say is it's in flight so it's the gpu will at some point in the future read from it if you on the render thread decide to write to it you are in you can produce easily a race condition where you're writing data and the gpu is reading at the same time and that'll well in the best case have a little bit visual defects like flickering some stuff rendered wrong in the worst case this will actually crash the gpu so in order to prevent this we have a multi-buffering mechanism built into this buffer so whenever you update the contents of such a buffer the underlying system will check is the memory actually used by the gpu and if yes it will allocate a new block of memory and you will see device buffer and use that instead for you pass that back to you so you can fill another one a new one but all this happens behind the scenes so you don't need to care about it it might be good to know about it but you usually don't have to get in touch with these systems it also has an invalidation callback mechanism for this case so when a buffer relocation happens so a new buffer is allocated there's callbacks being fired and then we have the device buffer manager this is a special thing just for vertex and index streams this class is basically there for performance reasons so it's sort of a high level manager for vertex and index streams and it exposes various usage patterns of these streams and implements optimized strategies for then managing the memory of these buffers the usage strategies we have here is bu so buffer usage immutable static dynamic and transient and as the names kind of suggest if you allocate a buffer with bu immutable you're telling the system i'm allocating this buffer once i will fill it once and then it'll stay static it'll never change view static is a little bit less strict so you're telling the system i'm allocating a buffer i will fill it and then it'll take a long time before i fill it again dynamic is a bit more frequent updates so you're saying i'm updating this buffer almost every second or third frame quite often and then we have bu transient which is even more update frequency even higher update frequency that's a specific buffer type usage type for fire and forget data for data that you prepare once sent to the gpu and then you don't care about it and and the gpu also doesn't care about it anymore so you you fill it you send it to the gpu and the next frame you fill a new buffer and fill it all these strategies are implemented in specific ways inside the buffer manager for example for bu transient this is basically one large buffer one very large buffer which is whenever you say please fill it with contents it just gives you a new chunk you fill it and at the end of the frame it will just throw it away give it to the gpu and throw it away in the next frame allocate a new massive buffer it's a bit of a special use case but it's used when you want to for example send the camera vertices in a vertex stream to the gpu that changes every frame so you feel it once gpu consumes it once and then it's garbage then we have dynamic is less efficient so this is also a sub-allocation mechanism but in here we actually track the usage so if you allocate a buffer with dynamic usage and fill it the buffer manager knows okay this is used and then when you decide to delete the buffer throw it away it will actually be re marked as not used and then be reused the cool feature about this buffer manager apart from its speed is the fragmentation the defragmentation support so it has automatic support for defragmenting all the buffers it knows about i can give an example in hunt we have something between two and 400 megabytes of buffers and without defragmentation due to us constantly allocating and deleting buffers we end up going out of memory pretty soon because we get a lot of fragmentation with this feature enabled we actually shuffle unused buffers around compact them and it's actually a lot more stable you have to believe me unfortunately then we have c constant buffer if you're familiar with opengl that would be the the equivalent of a uniform buffer for shader constants this is a also a class where we predict the usage will be very high frequency rights and of very often fire and forget right so we have a specialized manager for this one as well on most platforms we basically have the same sub allocation scheme as in the regular buffer manager but specifically for constant buffers on dx11.1 and upwards and vulkan gnm the reason why we don't have this on dx11.0 is that this requires pooling of buffers and we need to bind these buffers then with offset so you need to be able to tell the graphics api by it bind this constant buffer but bind only some specific offset as a start and you don't have this in the x11 there we let the driver do this work which is also often quite fast but we don't have direct control over it so it can produce side effects that we have no control over and this is important to know this is the only resource which you can actually bind directly yourself all the other resources when the when you want to bind them to the to the shaders and the gpu you need to put them into a resource set this is the only one that can be bound directly okay then the sampler state handle this as i said represents texture sampling configuration we have inside the device object factory a global database of sampler states and whenever you ask for assembler state you will likely get one that has already been created before so there's a lot of sharing happening typically in order to make the interface a bit more convenient we have pre-allocated the most common ones and put them into an enum so you can directly use e default assembler states point clamp for example that gives you a point client sampler or we have wrap trilinear wrap trilinear clamp and so on if you need to have a custom state that is not predefined you can always go through get or create so much about resources so this is our class for implementing a full screen pass if you look at it it's basically an object which has some pretty self-explanatory functions i hope to set up state and then to basically render let's go quickly through the functions so i'll start here the way this is built is that you always for each full screen pass you allocate the new full screen pass object so it's heavily object based i stole this code from the bloom stage which does a horizontal and vertical blur pass so the bloom stage would have two two full screen objects one for the horizontal past and one for the vertical pass so each draw is one object each object can be set up with a bunch of state setter functions so for example the set primitive flags set some flags in this case this flags reflect shader constants ps it tells the system me as a programmer i want to set shader constants by name so i directly want to call the set constant function function with a name and some values to set it in the shader as simple as it gets the ps annotation here means i only want to do this for the pixel shader i don't want to do this for the vertex shader there are other flags to allow this on different shader stages as well the next one said primitive type it tells the it sets up inside the path what kind of geometry you want to draw and we have a list of predefined geometries i believe the default one is actually the procedural triangle so this call is obsolete i believe i'm not entirely sure this is actually the most the fastest ones typically on neural gpus it just on the gpu generates a full screen triangle as a little bit of side information if you're interested there is many ways to draw full screen effects you can draw for example two quads two triangles to cover the screen you can draw draw a full screen quad if the api supports it or you can do this full screen triangle thing and it has been proven that on the most recent gpus the full screen triangle thing is fastest and the reason for that is for once if you do two triangles you have always the the line in the middle between the triangles and the quads there will be rasterized twice and then the cache behavior is best for a triangle because the rasterizer will produce the fragments in the most optimal fashion anyway side information this should be the default i believe and this should be the one that should be used usually if you want speed then we set the shader that we want to use to render the first argument is the actual shader the second one is the technique name so we use gaussian blue technique here this is a predefined name then we set the output targets so we put this texture as a render target in slot 0 we say no we don't want the depth test then we set textures where we want to read from so this would set texture hdr target scaled on texture slot 0 and then we also need to bind the sampler if you want to read from that texture and then we call begin constant update this instructs the path that now we begin to set shader constants then we do that we set the constant hdr pumps zero name and then we call execute and execute will then actually perform the draw call so far i think i hope it's a pretty easy to use interface what is very important about all these passes is they have persistent state so when you set up the path once the state is retained for future frames so in theory you could call all this setup code once and then just every frame just call execute execute execute without having to reset up all this code again this in practice obviously you often have to change some stuff like new constants for example like here the constants might change so some of the data is dynamic but that's why we have this dirty mechanism this is the the if in the beginning so this checks if the pass is fully set up and if nothing has changed inside the pass the first time we go through this code this check will fail because the primitive has not been set up yet so we'll do all the setup code in the next frame though we'll just skip all these function calls because the path hasn't changed and that way in subsequent frames we actually render the entire thing with just three calls instead of 12 or so or 10. so this is a pretty decent optimization that you can do but you don't have to what means dirtying the is dirty so this would also catch cases where you as a user have edited the shader where you rewrote the shader code then the system would detect oh this shader has been changed it would reload the code recompile it and then the the full screen pass would become dirty and you would set it up again same goes if someone reallocates the input texture or changes the size of it next we have the in the also in the complexity hierarchy we have the next complex object this is a primitive render pass it's similar to the full screen render pass except that it renders multiple draw calls into the same target so each draw call is represented by a c render primitive object which you can set up just as the full screen passed before you have functions to set all the resources the textures the shader the geometry the render state and then you can pass this to the pass to the primitive render path and the path will then render it what's important here is the past does not take ownership of the primitive so you as a client are responsible for buffering those primitives across frames for keeping them alive until the pass is done rendering here is an example from the clip volume stage it you can already see there's a bit more code now so as a first step we need to prepare the pass we need to tell the past what are the output targets what is the viewport the size and then we tell it to a that we are actually starting to add primitives and then we fetch a primitive from our list of pre-allocated primitives see this is a reference here so we're just shortcutting the name here we set it up again with similar functions as on the full screen pass we set the technique we set the render states a bit more complex this time we need we want greater equal depth test and we want to enable stencil and then we set custom vertex stream so we pass a handle to a custom stream that holds the vertex data with a custom format and a custom stream stride we also set a custom index stream with a custom stride here we set the culling mode so we want to have front sight culling we set the stencil state this is a bit more a more complex state so the stencil state itself is a 32-bit value which encodes all the possible variations that you have in stencil so the operation for stencil pass the operation for stencil fail the operation for depth fill and then we set the actual draw information so we tell the primitive okay this is actually a triangle list and we have start index.vertex start at the beginning of the buffer with that zero and we render this many indices and then we have to compile the primitive this is important the compilation step takes all the setup that you have done inside the primitive and converts it into runtime data so for example it will create a resource set containing all the resources you bound it'll create a pipeline state for all the for the shader and all the the gpu state that you set up and so on only after you have compiled the primitive you can render it inside a path so and that is already the next step we have compiled the primitive we can now add it to the pass and when we call pass execute the primitive will be rendered to the screen done so this is the next complex version of a full screen pass it allows you to render arbitrary geometry and arbitrary number of geometries into a single target then we have the equivalent of a full screen pass but in compute just to give you a very broad idea on the gpus we typically have specialized hardware to deal with vertex data and triangle data and primitive data so there is hardware blocks that take vertices that take the description of the vertices say it's a triangle or a triangle list and then rasterize these on the screen produce shader invocation for each pixel that intersects these triangles that's quite a bit of fixed function on the the gpu but the units that execute the shaders they're actually kind of general purpose so it's it's like uh the execution units on the gpu are similar to cpus except they're they're extremely wide cmd operations so on amd you have 64 data threads that are running concurrently on nvidia's 32 or 64. and since these units are kind of general purpose the graphics api starting with dx11 have evolved to a place where you can actually run code on these units without using all the fixed function hardware so you can write a shader and you can run this on this general purpose computing hardware on the gpu in this massively parallel environment and that's what compute is for compute is really really great if you have massively parallel tasks for example in an image color each image pixel there's millions of pixels that we want to touch and they all do the same thing that's perfect for compute so some stuff in the cryengine and also well basically in all high performance game engines and rendering engines is better realized as compute than as graphics pipeline shaders and that's why we have this computer enterprise thing and it works very similar to a full screen pass you have functions to set the technique you have functions to set the resources set textures set samplers you have a function to set an output unordered access view output uav so that's something that came also with compute in a normal rendering graphics pipeline setting you're always tied to the pixel that the rasterizer produced so you cannot really output anywhere else in compute you can write to any location in a buffer or in a texture that you want so that's what's called an unordered access view you have you have basically random scattering rights if you want and that's exposed via this output uav function and you just pass a general buffering here and then again we have a begin constant update and then we call setconstant twice to update shader constants and then we set the dispatch size this is another compute peculiarity so this configures how the threads on the gpu this massive parallel system groups the work into units so for example on amd gpus like in the consoles you have 12 compute units each unit can do 64 threads in one cycle so each group of 64 threads is called a wavefront it has a buffer of 10 different wavefronts so if concurrent execution on this gpu of 12 times 10 times 64 threats every single cycle so it's pretty ridiculous and in order to optimally utilize these massive amounts of threats that you can run concurrently you need to group them properly and that's what they do with the dispatch size in a nutshell it's a lot more detailed if you really want need to get into it but it's pretty fascinating topic there is a special call in here this prepare resources for use this is something that you just have to do before execute at the moment this will go away in future versions of the cryengine it has to do with when you write to resources and switch resources from reading to writing you need to sometimes perform transitions the gpu needs to do operations and this is happening inside this prepare call so these were the three easy objects the three objects that are very simple to use we're now getting to the more complex stuff and the reason why this is complex is because this is where we actually do the heavy lifting so if the 3d engine decides to send a million objects for rendering this is what handles this in a performant way all right we call it a scene render path it has multi-threading capabilities meaning that it can automatically split the work and perform it on multiple threads and it's built for doing high speed rendering of literally hundreds thousands to millions of objects in order to get to this sort of speed we need some quite specific constraints on the rendering data so one of these constraints is reusability of the same resource layout for all draws so whenever you have a lot of objects that you want to push into a scene pass they all need to share the same resource layout the same description of how the resources are bound to the shaders the way this is currently hard coded is that we have one resource set remember a resource set is a group of resources we have one resource set which is set at per pass frequency we have one resource set that is set as as at per material frequency we have one resource set that is set as per draw frequency and one inline constant buffer per draw these are the only four bindings that you can have and they are at these three frequencies when rendering a full screen pass these resources are bound exactly as i just described so when you start the path we set the purpose resource set we set it once and we keep it set we do not overwrite it whenever a new material needs to be rendered we set this material resource set it is bound until we have a material switch so you can already see one good operation that we should do in the engine and that we actually do is sort all the draw calls by material so we group them together that means less switches per material and then we have the per draw things so for every single draw call we have we can have one special resource set and this is where all the draw call specific data would go into this is currently used for example for skinning where we would put dual quaternion buffers for the skinning transforms into it or for tessellation where we have another buffer per draw that would contain the adjacency information for each triangle and then we have to draw inline buffer this is shader constants for each draw this typically is stuff like the object matrix and so on as i said you can only draw things that follow this setup of resources but you can draw a lot of them actually so it's restrictive but performant examples there would be for example the opaque path in the cng buffer stage so something that i haven't explained yet i told you basically that the 3d engine sends to the render render view which is populated with as render items and each as render items holds item holds all the information that's needed to issue a draw call i omitted the step here the s render item holds compressed information to do a draw call but the draw call typically also needs resources and gpu state so we kind of have to have a process that converge converts each as rand item into a representation that can be accepted by a scene render pass and that's where the compiled render object comes into play before we we give the render review to the renderer we call a function called compile modified render object and this function will then loop overall as render items and create a compiled render object and fill a compiled render object for this render item inside the combined render object we store all the data that we need for rendering so for example we have a pipeline state or more pipeline a bunch of pipeline states actually we have a resource set for the extra resources we have a constant buffer and so on so this is basically the the last step in the hierarchy of data transfer so the 3d engine goes over the scene graph it's visibility and occlusion calls all the objects the ones the objects that survive these tests will loop over there will go into the render mesh they will loop over the rendered chunks each chunk will then be converted into a rand item that goes into the render view and before we render the rand items will be converted into compile render objects and only then we can actually interface with the graphics api so much about the interface now we have a lot of rendering algorithm a lot of rounding code in the engine and we need some sort of a hierarchy structure to organize this and this slide roughly this roughly describes how we do it so on the bottom of our hierarchy we typically have we have the rendering algorithms which are normally implemented by the high-level rendering api so for example we have seen before the bloom algorithm it would have two full screen passes one for a horizontal blur one for a vertical blur the algorithm would then be put into a pipeline stage so for example for bloom we have a bloom stage which contains all these two passes and nothing more but there are cases where pipeline stage might contain more than one algorithm and then all the pipeline stages together that we need for rendering a frame former graphics pipeline in 563 we have four graphics pipelines currently so we have the standard graphics pipeline this consists of all the stages and all the components and algorithms that we need to do the full high quality rendering that cryengine does so it is a deferred rendering pipeline with all the post effects that cryengine offers and all the the deferred effects and so on quick excursion traditionally when graphics chips and apis were built initially everything was rendered forward so that means you issue a draw call to the gpu the draw call will run a shader and that will then just be put into a buffer and the buffer will in the end be presented that's forward rendering that implies that inside this shader you need to read the material attributes for the draw call and you need to loop over all the lights in the scene and calculate the material light interaction so if you look at this in a computational complexity sense it's an order of number of draws times number of lights algorithm deferred rendering splits this process into two sub-processes the first one renders only the geometry and outputs the geometry information into a buffer once we're done we start the second process which then loops over all the lights and then applies all the lights to the generated buffer and produces the image this way computationally it has complexity order of number of objects plus number of lights in big scenes with a lot of lights this makes a massive difference you have number of lights times number of objects versus number of lights plus number of objects in small scenes it's pretty much it doesn't matter so cryengine supports both modes but since we in the standard graphics pipeline we typically deal with highly complex and visually very complete scenes we actually prefer the deferred rendering algorithms when we are rendering on mobile for example we have way less lights and way less draw calls and there we actually tend to prefer the forward pipeline but back to the pipelines that you have currently so the standard pipeline is a deferred one with a lot of effects and post effects and so on so full feature set of the of the cryengine then we have a minimal graphics pipeline this is a forward-only pipeline with a very limited set of effects which is used for example in sandbox when you have preview viewports of objects or it's used for ocean reflections and so on we have a specialized character tool pipeline it's basically a pipeline specifically built for the character preview viewport don't ask why we had to do that and then we have a mobile graphics pipeline which is currently work in progress it's already in parts in 563 which is specialized for mobile devices let's look at the standard graphics pipeline these are all the stages we have in there it's not ordered by importance but alphabetically by name if you count it's 36 stages so it's a fairly complex thing and again i won't be able to cover everything but i'll try to give you at least the most important parts of it conceptually we start when we render a frame we start on the left we take all the draw calls stored in the render view and then we generate the g buffer out of it that means we bind three g buffer targets i'll go into the details a bit later and we draw all these draws for the g buffer all the opaque draws and output the attribute material and surface attributes into specific buffers once this is done we generate shadow maps so for each of the shadow views we generate the depth map with all the objects in it then we build deferred effects all the screen space effects for example like screen space directional occlusion ssdo screen space reflection ssr the shadow mask and so on so these are all effects that in the deferred context that you can run in screen space and that depend on the g buffer and maybe the shadow maps and the deferred data then we go to the tile shading stage this stage is the shading part of the deferred rendering so this is the stage where we take all the geometry attributes and loop over all the lights and light each pixel in the image unfortunately can't make this image a lot bigger but you can already see the colors look a bit weird this is because the output of this stage is in linear space and in high dynamic range space so you can easily have colors with values of five thousand six thousand in there for the sun and you can have colors with values of very close to the minus five so it's a very massive range of intensities that we store in there we typically store this in 16 bit floating point precision when this image is generated we basically have processed all the opaque geometry that can be done with the g buffer and we have fully lit it so this is kind of a finished image in hdr space for the opaque g buffer compatible stuff on top of that we add fog so distance fog and forward rendering so particles for example transparent stuff glass water after that we're pretty much done with the geometry pipeline at this point we have rendered everything the scene has in terms of geometry and now we go into post-processing the most important part here this is again a very complex pipeline but the most important parts here are first we need to take this hdr image and convert it into something that the monitors normal monitors can actually understand that is done in the tone mapping stage and then we have color grading which is artist configurable to allow setting a color mood basically you can change all the colors in the image too it's like you do in the film processing engines often and then as a last stage we have our anti-aliasing post processing anti-aliasing and then ta-da we're done final frame so this process takes yeah it depends i mean depending on what game you make you have a budget of uh if you run at 30 fps 33 milliseconds if you're at 16 then it's 60 fps here 16 milliseconds and so on if you're in vr you have usually a budget of 12 milliseconds or 11 and you do all these processes right so it needs to be quite efficient i would now give you a bit more details on these stages because these are the most important ones but we'll later look at the capture of a frame and then we can dig a bit deeper if you're interested and see what exactly how this stuff is run so let's start with the g buffer as i said this renders compiled render objects from the render view into targets that store material data we have three different layout modes for this buffer the first one is our standard lighting model where in the rgb components of this buffer we store the normal in eight bits in order to get more precision out of it before storing it we compress them into a best fit encoding in the w component of this buffer or in the alpha component of this render target we store the lighting model that's that occupies two bits in there and for this for this mode we basically store a zero then we have the surface albedo we have the surface reflection it's chroma encoded ycbycr and then we have the surface smoothness for lighting model one we have support for transmittance and subsurface scattering so we we store almost the same data as before except we have in the alpha channel of the second target the sub surface scattering profile id so the engine defines various subsurface scattering profiles for example for skin for marble and there is a third one i'm not sure about right now but in here we would store the index to use the subsurface scattering index and then we replace the reflectance with transmittance it's again encoded and then we have the third g buffer layout which is used for parallax occlusion mapping with self shadowing again the same components as before normal albedo but lighting model is two now and we have here in the blue channel of the third buffer we have parallax occlusion mapping self-shadowing contributions okay shadow map stage this is where we perform shadow rendering based on the render viewer again so the render view gives us the shadow frustums and this stage converts it into depth maps for the sun we have cascaded shadow maps that means we have normally in the default configuration five shadow maps for the viewfrustum with bigger and bigger sizes the further away they are from the viewer these can be split into dynamic cascades and static and cached ones the dynamic ones are re-rendered every frame whereas the cached ones are updated only when requested that's an optimization basically for local lights point lights area lights omni lights we have what we call a shadow pool so that is a gigantic render target depth target and each light or each side of a of an omni light will get a section out of that targeted chunk the reason why we do this is that based on how far the light is away from the viewer we can scale up and down the resolution to not waste space and processing time we also have a controlled caching scheme in there so in the light you can configure the update rate how often these shadow maps are updated for each light next is our actual lighting stage this is what the tile shading stage what we call so it loops over all the pixels on the screen and it evaluates the light surface interaction for this pixel it does this by reading first the attributes from the g buffer and then looping over all the lights that affects this pixel and combined evaluating the brdf for that light based on these certificate attributes and then accumulating all the light inputs combining it with the previously calculated deferred effects and outputting this the end result as i showed you before is a fully lit version of all content that we've rendered up to now this is implemented as compute for performance reasons and it's running 8x8 screen tiles next we have the forward scene so remember in our pipeline we are now basically here we have covered the first half here we are now moving into the last section of geometry rendering and then into post so as i said before this is the last section of geometry rendering so this renders forward basically all the stuff that cannot go into the g buffer yeah this is split into multiple passes we have forward opaque which if you think about this is actually a curious thing why wouldn't we put opaque stuff into the g buffer but the reason for that is the g buffer layout is quite restrictive and we have some draw calls which just need more material attributes for example hair has very specific shading model with very specific inputs so we just can't fit this stuff into the g buffer that's why we render it in this path after g buffer the good thing about this is since it's opaque since these shaders render opaque stuff we still have access to deferred effects like the shadow mask or ssdo so we can still have feature support of these effects on those shaders without actually much cost after this we have transparent rendering so we render here stuff like particles or glass fog and so on these shaders render transparent stuff so we cannot build deferred data for them if they need to receive shadows they need to do all the shadow map sampling themselves which is fairly expensive so these effects if you want full lighting quality on them full effect quality it can get expensive very very quickly the transparent stuff is further subdivided into stuff before water and stuff after water this is to get correct rendering order so we first do before water then we render the water surface on top and then we render after water such that stuff blends approximately correctly and that's it with geometry rendering next we have tone mapping stage this is the stage where we actually convert the hdr data into ldr data for display on regular monitors we don't have yet a pipeline for hdr monitors that is also in the process of being built by default we use filmic tool mapping with habel's curve it's a well-known remapping in the industry the controllable parameters of this curve are exposed in sandbox in the environment editor so you can for example control the steepness of the curve and the scales and so on and then we have the color grading stage this is done after tone mapping each color can be remapped by an artist driven color chart it's actually a color volume this is basically a simple look up table in the engine we support two modes of this lookup table we can have static aesthetic lookup table which is set up in environment editor and it'll just be used like it's set up and then we have a dynamic way of setting up the color charts that's why a flow graph where you can place a node set a path for the texture and set a transition times and this there the the colors will actually be blended the color charts will be blended and animated and then we have anti-aliasing as the last step in the pipeline we have four different anti-aliasing algorithms that we currently support one is smaa 1x sma stands for subpixel morphological anti-aliasing this works by looking at the image and trying to detect edges it looks for known patterns like l shapes or diagonal line shapes straight line shapes and if the algorithm detects one of those shapes it'll basically reconstruct the edge by filling in pixels the base version does just that then we have the one tx version which on top of that adds a temporary projection component this means we look up the the color of this pixel in the previous frame and merge it with the current pixel and in order to have temporal coherency we have a clamp on on the merge it's basically done per component the 2x version does a more sophisticated clamp it's actually pretty deep technical details to be honest it works slightly better and then we have purely temporal reprojection frontierlacing it doesn't do the edge detection and that reconstruction but it it purely blends the data with data from previous frames pretty cool feature we have is this anti-aliasing mode debug c-var it gives you this as in the image it gives you this zoomed in part of the frame and if you see the little dot here it actually shows you the sampling pattern if you do temporal reprojection what we typically do is we for every frame we jitter the image slightly to get slightly different rasterization and more data for the for the temporal averages so i would encourage you to try this out it's pretty cool feature this pretty much concludes the section about the graphics pipeline and the graphics interfaces let me give you some useful c bars again the r driver c bar lets you select the graphics api you can choose directx 11 wilkin or dx12 here our width and our heights configure the width and heights of the rendered image our show render target is a very helpful tool it allows you to visualize any render targeted system on the screen by name our display info lets you change the display info text on the top right our debug g buffer shows you the g buffer targets on the screen shadow's cascades debug gives you color-coded shadow cascades our profiler presents shows the frame profiler on the screen our wireframe gives you wireframe drawing anti-aliasing models is that the anti-aliasing algorithm our deferred shading lights turns off local lights in deferred shading in touch shading this is very useful for checking for performance and actually for bug fixing as well for checking if lights are causing flickering diverging and probes disable environment probes deferred shading tile debug show you the light overlap on for each screen pixel it gives you a color coded map of how many lights are needed to be evaluated for each screen pixel and then we have ruc pass which would uh let you confi select the depth pre-pass to use the depth pre-pass or not um adapts pre-pass means that we would do all the draw codes that go into the g buffer but only render out depth to speed up early depth rejection and to only shade each generate geometry attribute once per pixel so the games we shipped so far climb and robinson use the deferred renderer and we are currently working on a client port for oculus quest and there we're using forward render so it really depends on on the game setting and the performance of the gpu as well no it's the split is inside the transparent section it would require changes in the pipeline but that should be super easy to do yeah it would have some drawbacks you would not be able to see stuff below the water surface yes of course our transparent shaders can use the data structures that the tile shading algorithm uses as well which is per screen tile a list of lights that intersect that tile and the water shader actually has this implemented so i think in the material editor even you have an option i think it's called pbr where the shader loops over all the lights that intersect this pixel and then calculate the contribution it uses the data structures for tile rendering we need a light list for each tile and that list can also be used in other process it's not specific for the tiled algorithm yeah this stuff is super expensive though so if you look at a very wavy ocean with tons of overlapping wave geometries you will be in trouble likely let's move on to the shader system so as i said before we have roughly sixty thousand lines of shader code written in hlsl they're spread across 63 different shaders and we have some sort of a split in to include files and actual code files like we have in c plus so the actual code the actual modules are stored in the cfx files and shared shader code shared functions shared techniques are stored in the c5 files conceptually we can split shaders into two categories we have scene object shaders and we have pipeline utility shaders note that this is not like hard coded we have shaders that have both functionalities but this is from a conceptual point of view this is i think easier to understand scene object shaders are the ones that are available for artists in sandbox basically so you have them in a drop down in sandbox you can apply them to arbitrary objects in the scene they provide techniques for supported scene passes so as you heard objects that come from the 3d engine always go into scene passes so these shaders need to have code for specifically for these scene passes these are the four commonly used techniques there is technique c pre-pass which is a technique that would run a depth only pass then we have technique c which will be run if you render the object in the g buffer then we have technique shadow gen which is run when the object is rendered into shadow map and then we have technique type general which is for general forward rendering if a shader does not implement a technique it will simply not be rendered the object will still that binds the shader will simply not be rendered in the corresponding scene pass and what's important since they are bound to scene passes the initiator inputs need to conform with the scene pass conventions with the purpose resources the per material resources and so on examples of these are the ilum shader vegetation shader human skin shader hair plus and so on and then we have pipeline utility shaders that's something that graphics programmers would typically come in contact with when they modify the post pipeline when they modify internal stuff like for example functionality to downscale images and so on they're not exposed to the ui and they have no restrictions when it comes to data inputs and outputs as long as the layout in the shader matches the one in c plus plus code you're fine typical examples of these shaders are deferred shading cfx post effects cfx scale from cxfx and so on our shader system makes heavy heavy use of a permutation system this means we have a special syntax preprocessor-like with ifdevendiff which you can plug into your shaders and enable selectively enable parts of them for example here this is a section from the ilum shader where we have this if blend layer section the parser will automatically make two versions of this shader it will generate one version where blend layer is disabled it just strips out all the code that is inside the zif and then it will generate the second version of this shader where a blend layer is enabled and it'll just include all the code for this permutation so this shader here generates two versions two permutations this is heavily used in two major groups so first we have material permutation flags these are the permutations that are available to artists again in sandbox they're put into the shader x file for example elum.xt which you find together with the shaders two folders higher in the hierarchy and these files have a list of these property tags each one defines one of those permutation flags for example this property is named detail mapping this will produce in the material editor a checkbox detail mapping you see here in the legacy material territory will be here so you as an artist can enable this permutation by clicking the checkbox in sandbox material permutation flags occupy one bit inside a 64-bit integer that means we can have at most 64 permutations per shader so this is what this mask is about it specifies which bit is used and you need to be a bit careful to not have multiple permutation flags with this with the same mask because the shader system would not be able to distinguish them we have ui name description yeah that's pretty much it then we have the runtime permutation flags these are for programmers these are global typically and they are shared across all shaders again they are inside the next file but this term is a global file which is called runtimext and this has all the possible permutations we have only 64 of them one bit per permutation and what's important to know here shaders can only see these flags if they are mentioned specifically in the precache in this precache tag as down here so if you happen to to add a runtime flag to a shader make sure that you go to the runtime x file and enable this precache line copy this precache line there otherwise you will not ever see it the shaders we use most com commonly in cryengine is ilum that's by far the most commonly used i would guess it represents ninety percent it can represent ninety percent of all hard surfaces that we see on a daily basis it uses a standard micro faster model with kind of industry standard trading terms it's the one that's used most often then vegetation is pretty common as well it has a specialized translucency model and it has specialized custom vertex animation pattern for grass bending so the graph caras moves in the wind and so on then we have hair which has a specialized brdf for hair rendering we have glass which has specialized glass shading methods and so on but these are the most common ones so when in doubt always use elong shaders are compiled by the standard third-party tools for each api we have a different tool for example for directx we use fxc dx11 and 12. for vulkan we use dxc and for ps4 we have yet another compiler this compiles the shader source code into the driver specific representation which will then be yet again compiled into a gpu specific variation inside the driver what's important to know is that we only have direct compilation on windows and on windows pc basically all other platforms required during mode shader compiler the modulator compiler is a separate tool it is distributed with the cryengine you can run it from tools remote data compiler cry as compile server it is a network tool so it listens on a port and your local engine can connect to that and basically have the tool compile the shader for you and the tool will then send back to you the compiled shader blob change executable if you are to ship a game with cryengine you want to ship also the shaders with the game because each time a client plays a game and there is a missing shader the compilation will cause a stall it will interrupt the user experience and that's actually quite annoying so we have a mechanism to pre-compile shaders first of all the most commonly used shaders are already shipped with the cryengine they are in this shaders cache pack and jada's bin pack and so on and if you have additional ones you can generate those as well with an offline process and this works as follows each time the remote shader compiler gets asked for a shader combination compilation it stores the request to a file and at the end of the day you can gather all the requests that you have received that day and put them into an offline process and generate the shader cache from that the offline generation has one nice property we have knowledge over all the shaders that are needed so we can do a bit of filtering and optimization across different shaders we also have a caching mechanism for locally compiled shaders so if you run the engine on your machine and you happen to run into a shader that has not been compiled once you get the compiled shader either from the remote compiler or from your local compiler the compiled shader blob will be stored in the user folder so next time you run the engine it'll be there pre-compiled and i think that's it so far we can fire up the sandbox and do a bit of shader editing to get shader editing to run we need to do a few steps navigate to see program files crytek cryengine launcher primition56 and then open up system config and in here you plug align r shaders editing so this puts all shaders into editing mode where you can modify the shader text and the engine will recompile this save this go back to the folder where the engine resides go to engine and unpack the shaders pack file yeah this should give you a folder with all the text files for the shaders in there then to be safe please delete everything that has shaders in the name deleted and then we should be set now we can go back to the crytek folder go to the game sdk56 folder and then run the editor okay then let's make a new level just the default settings for the height map and the terrain works let's just add some object so we can go create brush and add the simple brush into the scene yeah let's so let's inspect this object so you can bring up the material editor tools material editor and then it's a bit difficult on one screen put this here here in the hamburger menu you go to file pick material from scene and you can just pick say the roof material so in my case this is as expected elon shader let's have some fun with that so let's go back to the extracted shaders go into hardware scripts cryfx elum cfx and you can see already there is quite a bit of code here in here you see this technique channel script equals statement right and it says technique z c pass as you saw before in the slides this means for the g buffer this object uses a technique called c-pass if you search you won't find it in this file the c-pass is shared between almost all shaders so i will tell you because i know but you can also do a global search on all the shader files you can find it in the common c-pass cfi file so please open the common c-pass cfi file and in here you will find technique c-pass if you take a closer look at this technique for the pixel shader it uses common z-pass ps now that's the thing that we need to edit you can just search it on this file and you'll find the code for outputting the g buffer attributes for this shader if you scroll down here there is all sorts of code to calculate normals to calculate albedo smoothness reflectance and so on and so forth until at the very bottom we hit the piece of code where we output all these calculated attributes to the g buffer let's overwrite albedo so we write add trips dot albedo equals one zero zero okay let's save the file go back to the editor so you can immediately see the result of changing the albedo here right so let's output something more fun than just red so we're a bit limited right now we only have access to values in the system so you can pick this one it's in dot wpos dot w that is basically the distance to the camera in meters if you switch back to the editor so for some reason i have to trigger manual and shader reloads so this will output the distance of the camera to the pixel if i fly around i think you can actually see how closer to the camera gets dark see well let's do something more interesting okay so three lines and it's a interesting effect and this is purely based on stuff available already in the shader if you're finding more textures and more stuff you can do a lot more stuff what we did right now is we edited a scene shader so we modified the g buffer output of the scene shader we overwrote the albedo with a blue color right next example would be changing a post-processing shader so if this is where we do the post-processing the color chart and the tone mapping so if you plug these few lines in here we get a nice circle post processing effect and this is all done procedurally right just bunch of lines of shader code i think what is important for every programmer that needs to work with the renderer you can hopefully use the slides and the notes in the slides as a reference for when you actually do the work okay that's it thanks guys

Info

Channel: CRYENGINE

Views: 10,918

Rating: undefined out of 5

Keywords: Game, Engine, Development, CRYENGINE, Cryengine Tutorial, Game Development Tutorial, Game Dev Tutorial, How to develop a game, Game Development Basics, game dev basics, sandbox tutorial, how to develop a game, Sandbox Editor, Introduction to Game Development, Cryengine introduction, Cryengine masterclass, Masterclass Game dev, environment editor, lighting tutorial, terrain tutorial, cryengine lighting, cryengine terrain, cryengine editor tutorial, cryengine editor

Id: 34S3onEr3r8

Channel Id: undefined

Length: 98min 41sec (5921 seconds)

Published: Thu Feb 25 2021