Dear Fellow Scholars, this is Two Minute Papers
with Dr. Károly Zsolnai-Fehér. This paper is called Enhancing photorealism
enhancement. Hmm! Let’s try to unpack what that exactly means. This means that we take video footage from
a game, for instance, GTA 5, which is an action game where the city we can play in was modeled
after real places in California. Now, as we are living the advent of neural
network-based learning algorithms, we have a ton of training data at our disposal on
the internet. For instance, the cityscapes dataset contains
images and videos taken in 50 real cities, and it also contains annotations that describe
which object is which. And the authors of this paper looked at this,
and had an absolutely insane idea. And the idea is let’s learn on the cityscapes
dataset what cars, cities and architecture looks like, then take a piece of video footage
from the game, and translate it into a real movie. So basically something that is impossible. That is an insane idea, and when I read this
paper, I thought that cannot possibly work in any case, but especially not given that
the game takes place in California, and the Cityscapes dataset contains mostly footage
of German cities. How would a learning algorithm pull that off? There is no way this will work. Now, there are previous techniques that attempted
this, here you see a few of them. And…well, the realism is just not there,
and there was an even bigger issue. And that is the lack of temporal coherence. This is the flickering that you see where
the AI processes these images independently and does not do that consistently. This quickly breaks the immersion and is typically
a deal-breaker. And now, hold on to your papers…and let’s
have a look together at the new technique. Whoa! This is nothing like the previous ones! It renders the exact same place, the exact
same cars, and the badges are still correct and still refer to real-world brands. And that’s not even the best part, look! The carpaint materials are significantly more
realistic, something that is really difficult to capture in a real-time rendering engine. Lots of realistic looking specular highlights
off of something that feels like the real geometry of the car. Wow. Now, as you see, most of the generated photorealistic
images are dimmer, and less saturated than the video game graphics. Why is that? This is because computer game engines often
create a more stylized world where the saturation, haze, and bloom effects are often more pronounced. Let’s try to fight this bias where many
people consider the more saturated images to be better, and focus our attention to the
realism in these image pairs. While we are there, for reference, we can
have a look at what the output would be if we didn’t do any of the photorealistic magic,
but instead, we just tried to breathe more life into the video game footage by trying
to transfer the color schemes from these real-world videos in the training set. So, only color transfer. Let’s see. Yes, that helps…until we compare the results
with the photorealistic images synthesized by this new AI. Look. The trees don’t look nearly as realistic
as the new method, and after we see the real roads, it’s hard to settle for the synthetic
ones from the game. However, no one said that Cityscapes is the
only dataset we can use for this method. In fact, if we still find ourselves yearning
for that saturated look, we can try to plug in a more stylized dataset, and get…this! This is fantastic, because these images don’t
have many of the limitations of computer graphics rendering systems. Why is that? Because, look at the grass here. In the game, it looks like a 2D texture to
save resources and be able to render an image quicker. However, the new system can put more real-looking
grass in there, which is a fully 3D object where every single blade of grass is considered. The most mind-blowing thing here is that this
AI finally has enough generalization capabilities to learn about cities in Germany, and still
be able to make convincing photorealistic images for California. The algorithm never saw California, and yet,
it can recreate it from video game footage better than I ever imagined would be possible. That is mind blowing. Unreal. And if you have been holding on to your papers
so far, now, squeeze that paper. Because here, we have one of those rare cases
where we squeeze our papers for not a feature, but for a limitation…of sorts. You see, there are limits to this technique
too. For instance since the AI was trained on the
beautiful lush hills of Germany and Austria, it hasn’t really seen the dry hills of LA. So, what does it do with them? Look, it redrew the hills the only way it
saw hills exist, which is, with trees. Now, we can think of this as a limitation,
but also as an opportunity. Just imagine the amazing artistic effects
we could achieve by playing this trick to our advantage. Also, we won’t need to create an 80% photorealistic
game like this one and push it up to a 100% with the AI. We could draw not 80%, but the bare minimum,
maybe only 20% for the video game, a coarse draft, if you will, and let the AI do the
heavy lifting! Imagine how much modeling time we could save
for artists as well. I love this. What a time to be alive! Now, all of this only makes sense for real-world
use if it can run quickly. So can it? How long do we have to wait to get such a
photorealistic video? Do we wait from minutes to hours? No! The whole thing runs interactively, which
means that it is already usable, we can plug this into the game as a post-processing step. And remember the First Law Of Papers, which
says that two more papers down the line, and it will be even better. What improvements do you expect to happen
soon? And what would you use this for? Let me know in the comments below! Thanks for watching and for your generous
support, and I'll see you next time!
Very clever. Thank you for posting this OP, this helps a lot with my own independent research.
This is a game changer