You hear it everywhere. DLSS this, DLSS that. But
what is it? And why is it doing the impossible? This AI-based technology is being talked about
the media a great deal, and it is supposedly a way of dramatically speeding up computer games
and virtual worlds. Does this really work? Well, when Jensen Huang, NVIDIA’s CEO says that
every single pixel on your screen for these games will not be rendered, but will be generated soon,
presumably by an AI, that really got my attention. That sounds insane, and if hearing this you think
that this cannot possibly be true, I don’t blame you for a second. We can’t just create all this
footage, there has to be a proper system that can compute reflections, diffuse interactions,
and all that. Right? Well, not quite. In 2018, I had a little project where I wanted to generate
photorealistic images of material models quickly. Now, generating an image through ray
tracing took up to 40 to 60 seconds, and that was a bit too long for my taste.
So I tried to write a neural renderer that could take just the description of a material
model, and it would immediately, in real time, synthesize what it would look like. And it was
able to do all this not in 60 seconds, but in 3 milliseconds. Yes, it was 20,000 times faster.
However, it was limited to this very scene. So, how did this research field improve in the last
5 years? Do we have anything more usable now? Oh boy, if I could tell you. Well, meet DLSS. Deep
Learning Super Sampling. This is a system where hardware and software works together to
create an incredible experience where games and virtual worlds really run faster
than what should be possible. So, how? Well, as of DLSS version 3, get this, by
generating 7 of every 8 pixel that appears on the screen. Yes, more than 85% of the pixels
are just generated, not computed by for instance, ray tracing or traditional techniques. That
sounds flat out impossible. But, it is possible. The first step is running super resolution in
real time. This means that a coarse image goes in, and an AI finds out if we pretended
that this is a fine image instead, what are the details that are missing,
and it synthesizes those details. Second, optical flow. We can look at two adjacent
frames of the video game, and try to estimate what has happened between them, and where things
are going. With this, entire intermediate frames can be synthesized, creating the illusion as if
the game was running smoother than our hardware can run it. Combining the two together, with
a pool of very few pixels that are computed, both image quality and smoothness can be improved
at the same time. What a time to be alive! But it does not stop there. It gets better!
They have also announced DLSS 3.5, hopefully, all this can be improved even further
through something that they call ray reconstruction. Oh my goodness, this
is going to be so good. But, how? Well, normally, when we perform ray tracing, we
simulate the path of millions and millions of rays, and bounce them around in a scene.
And if we can’t afford to wait for up to hours at a time for an image, we will have to
settle for a really noisy image. Like this. Denoising techniques exist that try to guess
what the missing information could be, but they are not perfect. Not even close. Some important
details between objects can be missed entirely, and still, reflections can be a lot less clear
than in a full simulation. And unfortunately, we have a problem. It gets even worse.
This image will undergo an upscaling step, where these errors get magnified even more.
Denoising and upscaling are two separate steps typically done by two separate models, so
why not do both of them with the same AI model? And now, ray reconstruction enters the scene.
It learned on a ton more information than the previous models, 5 times more training
data was given to it. And look at this step. This one is right on the money. This is
tailored to retain high-frequency information, fine details to make sure that before the
upscaling step, we have the highest quality information available. So, is it better than
what denoisers did before? Oh my goodness, look at that! We finally get that shadowy
region back that was filled in by incorrect information with the denoisers before.
Yummy! So good! And that high-frequency information retaining capability can
also be witnessed here. Loving it. Now note that ideally, here, we talk about
peer-reviewed research papers where we can see all the weak points and failure cases, that
is my home turf. This is not a research paper, so it is harder for me to find the flaws,
so please bear in mind that it may have weaknesses that were not shown here. And in
the meantime, this technology is being handed out to millions and millions of people all
around the world, and it is incredible. So, this will be available only for the
shiniest, newest graphics card owners, right? Well, hold on to your papers Fellow
Scholars, because here comes the best part: no! This comes to all RTX graphics cards, even
the older ones that you can pick up for a couple hundred bucks. This can breathe new life to aging
hardware which is highly appreciated. Note that the particular game or app you’re running has to
be ready for DLSS3.5 for all this magic to happen. Now, I am a light transport researcher
by trade, I dream with rays of light, and ray tracing if you will, so I am incredibly
happy to see this technology make it to the hands of real users in the real world in such
an incredible way. And make no mistake, this is not perfect by any means, there are
games where it does not appear to work very well, and some people prefer to not use it at
all, anywhere. However, note that this is still nascent technology, and it can already
synthesize more than 85% of these frames, which is something that I thought would only
be possible in a science fiction movie. Maybe not even there. I am absolutely stunned by these
results, even with their weaknesses. And don’t forget to invoke the First Law of Papers. The
First Law Of Papers says that research is a process. Do not look at where we are, look at
where we will be two more papers down the line.