This idea sounds like science fiction, but by
the end of the video, you will see that this makes perfect sense. So, what is it? Well,
consider image inpainting. It is amazing. What can it do? Well, when we cut out a part
of an image, then, bam! It fills in the void with plausible information. This is image
inpainting. And now it works on video too. But it gets even crazier. Image outpainting
also works. Whoa! What is that? Well, we can essentially extend the image in any direction
and once again, fill it in with plausible data. Now please note the choice of words here: in
both cases I said it fills it in with plausible data. Data that could be there. But synthetic
data nonetheless. Now here is an insane idea from Google’s researchers. What if we would take
these, and fill them in in not with information that could have been there, but with information
that was actually there. Filling in with reality. That is of course, impossible, right? Well,
look. Oh yes. It seems like it is impossible. If we try to complete this image with previous
techniques, for instance, Stable Diffusion, we get something that is plausible, you know,
the hat continues, the post its also continue, that is good, but still it is likely not the
real thing. So, can I get the real thing? Well, let’s think together. What if we are trying
to outpaint a historical building? Wait a minute, that is the key! If we try to fill in information
for something that we have other photos for, it might be possible. Let’s give it a
try. This is the incomplete input photo, and here are our other photos. Now note that this
is still quite hard. We can’t just copy it. The angles are different, the lighting is different,
lens distortion is really different. But, in the age of AI, let’s see if it can be
done. And…oh wow! Look at that! Perfection. And it can do it for a variety of scenes
over and over again. It appears to work pretty much everywhere. Well,
it does not work everywhere, I’ll tell you about it in a moment.
But, all this is absolutely amazing. But still, wait. How do we know how real these
photos are if there is nothing to compare to? Well, let’s make sure that there is something to
compare to. Let’s take a real photo, cut off the top, and now we know exactly what should be there.
Stable Diffusion does not know. Paint by Example, a paper from almost exactly a year ago
does not know at all. But the new technique called RealFill, this one knows. Look. That is
incredible. Almost pixel perfect reconstruction. My goodness. What a time to be alive! Now
note that this is not a copying machine, it has access to information about the room,
but it has to understand which part is missing, and what that part would look like from
this angle. So it fills in reality after all. And it does it over and over
again with breathtaking accuracy. Now, I noted that it is still not perfect. I mean, all of these look nearly
perfect. So where are the issues? Ah. Of course. Text. It’s always the text.
Every time. We finally left behind the age of AI systems generating mangled, incorrect
hands, mostly, but text is still a challenge. I am fairly sure that this is something that will
be possible just one more paper down the line. And can you imagine what will be possible two
more papers down the line? My goodness. We can already do a pretty good reconstruction from just
one image. Not even a set of images. One image. This is supposed to be a failure case. If this
is a failure case, bravo, sign me up right now! So, adding a little more information to the
AI by reusing already existing images. That was the crazy idea, which in hindsight,
makes perfect sense. What a brilliant paper. Loving it! And one more thing. I have
a little daughter and when she was a baby, we could not really afford a good smartphone
to take better images of her. However, there are a lot of pictures, and I was thinking
that over my lifetime, there will surely be an AI that will be able to upscale those not great
images to a higher resolution version. And it should not just fill in things that could be, but
with things that are really there. And finally, we are here. I can’t believe it. And all it
needs is one to three photos. And as a family, we have thousands of photos of
ourselves to learn from. So good. This was Two Minute Papers with Dr. Károly
Zsolnai-Fehér. Subscribe if you wish to see more.