How do Video Game Graphics Work?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Video games have spectacular graphics, capable of  transporting you to incredibly detailed cities,   heart-racing battlegrounds, magical  worlds, and breathtaking environments.  While this may look like an old western train  station and locomotive from Red Dead Redemption 2,   it’s actually composed of 2.1 million  vertices assembled into 3.5 million   triangles with 976 colors and textures  assigned to the various surfaces, all with   a virtual sun illuminating the scene below. But perhaps the most impressive fact is that   these vertices, textures, and lights are entirely  composed of ones and zeroes that’s continuously   being processed inside your computer’s  graphics card or a video game console.  So then, how does your computer take billions  of ones and zeroes and turn it into realistic   3D graphics? Well, let’s jump right in. The video game graphics rendering pipeline has   three key steps: Vertex Shading, Rasterization,  and Fragment Shading. While additional steps are   used in many modern video games, these three core  steps have been used for decades in thousands of   video games for both computers and consoles and  are still the backbone of the video game graphics   algorithm for pretty much every game you play. Let’s begin with the first step called vertex   shading. The basic idea in this step is to  take all the objects’ geometries and meshes   in a 3D space and use the field of view of the  camera to calculate where each object falls in   a 2D window called the view screen, which  is the 2D image that’s sent to the display.  In this train station scene, there are 1,100  different models and the camera’s field of view   sections off what the player sees, reducing the  number of objects that need to be rendered to 600.   Let’s focus on the locomotive as an example. Although this engine has rounded surfaces and   some rather complex shapes, it’s actually  assembled from 762 thousand flat triangles   using 382 thousand vertices and 9 different  materials or colors applied to the surfaces of   the triangles. Conceptually, the entire train  is moved as one piece onto the viewscreen,   but actually, each of the train’s hundreds of  thousands of vertices are moved one at a time.  So, let’s focus on a single vertex. The  process of moving a vertex, and by extension,   the triangles and the train, from a 3D world onto  a 2D view screen is done using 3 transformations.   First moving a vertex from model space to world  space, then from world space to camera space, and   finally from the perspective field of view onto  the view screen. To perform this transformation   we use the X,Y, and Z coordinates of that  vertex in modeling space, then the position,   scale, and rotation of the model in world space,  and finally the coordinates and rotation of the   camera and its field of view. We plug all  these numbers into different transformation   matrices and multiply them together resulting  in the X and Y values of the vertex on the view   screen as well as a Z value or depth, which  we’ll use later to determine object blocking.  After three vertices of the train are transformed  using similar matrix math, we get a single   triangle moved onto the view screen. Then the  rest of the 382 thousand vertices of the train   and the 2.1 million vertices of all the 600  objects in the camera’s field of view undergo   a similar set of transformations, thereby moving  all 3.5 million triangles onto a 2D viewscreen.  This is an incredible amount of matrix  math, but GPUs in graphics cards and video   game consoles are designed to be triangle mesh  rendering monsters and thus have evolved over   decades to handle millions of triangles every few  milliseconds. For example, this GPU has 10,000ish   cores designed to efficiently execute up to 35  trillion operations of 32-bit multiplication and   addition every second, and, by distributing the  vertex co-ordinates and transformation data among   each of the cores, the GPU can easily render the  scene resulting in 120 or more frames a second.  Now that we have all the vertices moved onto a 2D  plane, the next step is to use the 3 vertices of a   single triangle and figure out which specific  pixels on your display are covered by that   triangle. This process is called rasterization. A 4K monitor or TV has a resolution of   thirty-eight forty by twenty-one sixty,  yielding around 8.3 million pixels. Using   the X and Y coordinates of the vertices  of a given triangle on the view screen,   your GPU calculates where it falls within this  massive grid and which of the pixels are covered   by that particular triangle. Next, those pixels  are shaded using the texture or color assigned to   that triangle. Thus, with rasterization,  we turn triangles into fragments which   are groups of pixels that come from the same  triangle and share the same texture or color.  Then we move on to the next triangle and  shade in the pixels that are covered by   it and continue to do this for each of the  3.5 million triangles that were previously   moved onto the viewscreen. By applying the Red  Blue and Green color values of each triangle to   the appropriate pixels, a 4K image is formed  in the frame buffer and sent to the display.  You’re probably wondering how we account  for triangles that overlap or block other   triangles. For example, the train is blocking the  view of much of the train station. Additionally,   the train has hundreds of thousands of triangles  on its backside that are sent through the   rendering pipeline, but obviously don’t appear in  the final image. Determining which triangles are   in front is called the visibility problem and  is solved by using a Z-buffer or Depth Buffer.   A Z-Buffer adds an extra value to each of the  8.3 million pixels corresponding to the distance   or depth that each pixel is from the camera. In the previous step, when we did the vertex   transformations, we ended up with X and Y  coordinates, but then also got a Z value   that corresponds to the distance from the  transformed vertex to the camera. When a   triangle is rasterized, it covers a set  of pixels and the Z value or depth of the   triangle is compared with the values stored in the  Z-Buffer. If the triangle’s depth values are lower   than those in the Z-buffer, meaning the triangle  is closer to the camera, then we paint in those   pixels using the triangle’s color and re-place the  Z-buffer’s values using that triangle’s Z-values.  However, let’s say a second triangle comes along  with Z values that are higher than those in the   Z-buffer, meaning the triangle is further away.  We just throw it out and keep the pixels from   the triangle that was previously painted  with lower Z-values. Using this method,   only the closest triangles to the camera with the  lowest Z-values will be displayed on the screen.   By the way, here’s the image of the Z or Depth  buffer, wherein black is close and white is far.  Note that because these triangles are in 3D space,  the vertices often have 3 different Z values, and   thus each individual pixel of the triangle needs  its Z value computed using the vertex coordinates.   This allows intersecting triangles to properly  render out their intersections pixel by pixel.  One issue with rasterization and these pixels is  that if the triangle cuts at an angle and passes   through the center of the pixel, then the  entire pixel is painted with that triangle’s   color resulting in jagged and pixelated edges. To reduce the appearance of these jagged edges,   graphics processors implement a technique  called Super Sampling Anti-Aliasing. With SSAA,   16 sampling points are distributed across a single  pixel, and when a triangle cuts through a pixel,   depending on how many of the 16 sampling  points the triangle covers, a corresponding   fractional shade of that color is applied to  the pixel, resulting in faded edges in the image   and significantly less noticeable pixelization. One thing to remember is that when you’re playing   a video game, your character’s camera view as  well as the objects in the scene are continuously   moving around. As a result, the process and  calculations within vertex shading, rasterization,   and fragment shading are recalculated for every  single frame once every 8.3 milliseconds for   a game running at 120 frames a second. Let’s move onto the next step which is   Fragment Shading. Now that we have a set  of pixels corresponding to each triangle,   it’s not enough to simply paint by number to color  the pixels. Rather, to make the scene realistic,   we have to account for the direction and  strength of the light or illumination,   the position of the camera, reflections, and  shadows cast by other objects. Fragment shading   is therefore used to shade in each pixel  with accurate illumination to make the scene   realistic. As a reminder, fragments are groups of  pixels formed from a single rasterized triangle.  Let’s see the fragment shader in action. This  train engine is mostly made of black metal,   and if we apply the same color to each of its  pixel fragments, we get a horribly inaccurate   train. But once we apply proper shading, such  as making the bottom darker and the top lighter,   and by adding in specular highlights or shininess  where the light bounces off the surface, we get a   realistic black metal train. Additionally, as the  sun moves in the sky, the shading on the train   reflects the passage of time throughout the day,  and, if it’s night, the materials and colors of   all the objects are darker and illuminated from  the light of the fire. Even video games such as   Super Mario 64 which is almost 30 years old have  some simple shading where the colors of surfaces   are changed by the lighting and shadows in the  scene. So, let’s see how fragment shading works.  The basic idea is that if a surface is pointing  directly at a light source such as the sun,   it’s shaded brighter whereas if a  surface is facing perpendicular to,   or away from the light, it’s shaded darker. In order to calculate a triangle’s shading,   there are two key details we need to know.  First, the direction of the light and second,   the direction the triangle’s surface is facing.  Let’s continue to use the locomotive as an example   and paint it bright red instead of black. As  you already know, this train is made of 762   thou-sand flat triangles, many of which face  in different directions. The direction that   an individual triangle is facing is called its  surface normal, which is simply the direction   perpendicular to the plane of the triangle, kind  of like a flagpole sticking out of the ground.  To calculate a triangle’s shading, we take the  cosine of the angle or theta between the two   directions. The cosine theta value is 1 when the  surface is facing the light and when the surface   is perpendicular to the light it’s 0. Next, we  multiply cosine theta by the intensity of the   light and then by the color of the material to  get the properly shaded color of that triangle.   This process adjusts the triangles’ RGB values  and as a result, we get a range of lightness to   darkness of a surface depending on how its  individual triangles are facing the light.  However, if the surface is perpendicular or  facing away, we don’t want a cosine theta value   of 0 or a negative number because this would  result in a pitch-black surface. Therefore,   we set the minimum to 0 and add in an ambient  light intensity times the surface color,   and adjust this ambient light so that it’s higher  in daytime scenes, and closer to 0 at night.  Finally, when there are multiple light sources  in a scene, we perform this calculation multiple   times with different light directions, and  intensities and then add the individual   contributions together. Having more than a few  light sources is computationally intense for   your GPU, and thus scenes limit the number  of individual light sources and sometimes   limit the range of influence for the lights  so that triangles will ignore distant lights.  The vector and matrix math used in rendering  video game graphics is rather complicated,   but luckily there’s a free and easy way to learn  it and that’s with Brilliant.org. Brilliant is a   multidisciplinary online interactive education  platform and is the best way to learn math,   computer science, and many other  fields of science and engineering.  Thus far we’ve been simplifying the math behind  video game graphics considerably. For example,   vectors are used to find the value of cosine theta  between the direction of the light and the surface   normal, and the GPU uses the dot product divided  by the norm of the two vectors to calculate it.   Additionally, we skipped a lot of detail when it  came to 3D shapes and transformations from one   coordinate system to another using matrices.  Rather fittingly, Brilliant.org has entire   courses on vector calculus, trigonometry, and 3D  geometry, as well as courses on linear algebra   and matrix math. All of which have direct  applications to this video and are needed for   you to fully understand graphics algorithms. Alternatively, if you’re all set with math,   we recommend their course on Thinking in  Code which will help you build a solid   foundation on computational problem solving. Brilliant is offering a free 30-day trial   with full access to their thousands of  lessons. It’s incredibly easy to sign up,   try out some of their lessons for free and, if you  like them, which we’re sure you will, you can sign   up for an annual subscription. To the viewers  of this channel, Brilliant is offering 20% off   an annual subscription to the first 200 people who  sign up. Just go to brilliant.org/brancheducation.   The link is in the description below. Let’s get back to exploring fragment shading.   One key problem with it is that the triangles  within an object each have only a single normal,   and thus each triangle will share the  same color throughout the triangle’s   surface. This is called flat shading and  is rather unrealistic when viewed on curved   surfaces such as the body of this steam engine. So, in order to produce smooth shading, instead   of using surface normals, we use one normal for  each vertex calculated using the average of the   normals of the adjacent triangles. Next, we  use a method called barycentric coordinates   to produce a smooth gradient of normals across the  surface of a triangle. Visually it’s like mixing 3   different colors across a triangle, but instead  we’re using the three vertex normal directions.  For a given fragment we take the center of each  pixel and use the vertex normals and coordinates   of the pre-rasterized triangle to calculate the  barycentric normal of that particular pixel. Just   like mixing the three colors across a triangle  this pixel’s normal will be a proportional mix   of the three vertex normals of the triangle.  As a result, when a set of triangles is used   to form a curved surface, each pixel will be part  of a gradient of normals resulting in a gradient   of angles facing the light with pixel-by-pixel  coloring and smooth shad-ing across the surface.  We want to say that this has been one of the most  enjoyable videos to make simply because we love   playing video games and seeing the algorithm  that makes these incredible graphics has been   a joy. We spent over 540 hours researching,  writing, modelling this scene from RDR2,   and animating. If you could take a few seconds  to hit that like button, subscribe, share this   video with a friend, and write a comment below it  would help us more than you think, so thank you.  Thus far we’ve covered the core steps for  the graphics rendering pipeline, however,   there are many more steps and advanced topics. For  example, you might be wondering where ray tracing   and DLSS or deep learning super sampling fits into  this pipeline. Ray tracing is predominately used   to create highly detailed scenes with accurate  lighting and reflections typically found in TV   and movies and a single frame can take dozens  of minutes or more to render. For video games,   the primary visibility and shading of the objects  are calculated using the graphics rendering   pipeline we discussed, but in certain video  games ray tracing is used to calculate shadows,   reflections, and improved lighting. On the other  hand, DLSS is an algorithm for taking a low   resolution frame and upscaling it to a 4K frame  using a convolution neural network. Therefore DLSS   is executed after ray tracing and the graphics  pipeline generates a low-resolution frame.  One interesting note is that the latest generation  of GPUs has 3 entirely separate architectures of   computational resources or cores. CUDA or Shading  cores execute the graphics rendering pipeline. Ray   tracing cores are self-explanatory. And then  DLSS is run on the Tensor cores. Therefore,   when you’re playing a high-end video game with  Ray Tracing and DLSS, your GPU utilizes all of   its computational resources at the same time,  allowing you to play 4K games and render frames   in less than 10 milliseconds each. Whereas if you  were to solely rely on the CUDA or shading cores,   then a single frame would take around  50 milliseconds. With that in mind,   Ray Tracing and DLSS are entirely different topics  with their own equally complicated algorithms,   and therefore we’re planning separate videos  that will explore each of these topics in detail.  Furthermore, when it comes to video game graphics,  there are advanced topics such as Shadows,   Reflections, UVs, Normal Maps and more. Therefore,  we’re considering making an additional video on   these advanced topics. If you’re interested  in such a video let us know in the comments.  We believe the future will require a strong  emphasis on engineering education and we’re   thankful to all our Patreon and YouTube Membership  Sponsors for supporting this dream. If you want to   support us on YouTube Memberships, or Patreon,  you can find the links in the description.  This is Branch Education, and we create  3D animations that dive deeply into the   technology that drives our modern  world. Watch another Branch video   by clicking one of these cards or click here  to subscribe. Thanks for watching to the end!
Info
Channel: Branch Education
Views: 3,280,132
Rating: undefined out of 5
Keywords: Video Game Graphics, GPU, Graphics Card, Ray Tracing, DLSS, NVIDIA, game development, Video Games, Graphics, Game Graphics, Sceen, Monitor, video games, 3D Graphics, RDR2, How do Video Game Graphics Work?, How do Graphics Work?, How do GPUs Work?, how do graphics cards work, Rendering Pipeline, Graphics Rendering Pipeline, Rasterization, Vertex Shading, Fragment Shading, game design, computer graphics, game devlog, GPUs, 3090Ti, 3090, Graphics Cards, 4090, 4080, CUDA Core, Shading Core
Id: C8YtdC8mxTU
Channel Id: undefined
Length: 21min 0sec (1260 seconds)
Published: Thu Dec 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.