Zebras, Horses & CycleGAN - Computerphile

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

What I've been wondering is you know If you've got a lot of pictures of horses and you really want to turn them into a lot of pictures of zebras How are you gonna do that, right? Marker pen. Yeah, marker pen, just draw some lines there. Yeah, that is quicker Today we're going to talk about cycle-GAN, which is a really interesting innovation in generative adversarial networks(GANs) Now rob has already covered this in one of his videos what a GAN is and what it's used for typically Generating images of interest. Pictures of trees, pictures of people, pictures of buildings, you know, you can train GANs to do these kind of things What's interesting about GANs? Is that you actually have two networks you have one their work that's learning to generate images and you have a second network that's learning to try and tell the difference between The generated ones and some example real images and by training the both together You force the generator to produce more realistic looking images There's another domain that GANs are useful, right which is not generating an image from scratch It's turning an image from one domain into another so for example you might take a photo to try and make you look like a painting or you might take a Black-and-white photo and try and make it a colored photo or something like this There are lots of ways to do this kind of style transfer but gans are one of the ways you could do it It's gonna be slightly different to what what was talking about because in this case we have an image going in, right? and not just noise and we have an image coming out So one way you could train this is just with a standard kind of encoder decoder Network like we've talked about before so you might have an image let's say here so this is a photograph And you have a kind of encoder decoder Networks like this that we're gonna call our generator and then it outputs another image These aren't typically too big but they usually do filter the network down and back up again And then we have I know an artist impression or something like that of the picture now One way we could train this is to get a bunch of paintings and a bunch of photos of where those paintings were sort of captured font, right or Commission a bunch of artists to go out and paint the load of stuff that we have the photograph for and then we've got this Supervised training data that we could use we could just put in pairs of photo what it's meant to look like in painting form and then even train a net weather Hopefully we'll map one to the other and the idea I guess is that you then put in a new photo? But it's never seen and it does a nice job of painting it right? That's the hope So this is like a kind of basic level of training up a network to produce some kind of style transfer But the problem is that it's a bit of a pain to have to get these pairs of data, right? So a lot of the time the data isn't paired a lot of time. We have a lot of pictures of paintings We have a lot of pictures of photos, but it all look completely different places So producing a mapping between one and the other is a little bit more difficult What a gam does is has a discriminator here? So this is going to be a discriminator and this is is it a real or fake image the bet of a discriminator gets at? Detecting the fake paintings the better mix generators will have to get converting photos to paintings, right? That would be the idea and this works pretty well Why and it works well for generating photos and third foot changing photos into other domains But gans have a lot of problems with training that make them quite difficult to use in practice One of them is something called mode collapse imagine that we are trying to produce again Now that takes a picture horse and converts it into a picture of a zebra So what's happening is we put in a random picture of a horse and then the discriminators? Is it a real zebra or fakes everywhere? And if it's determined to be a fake this generators gonna have to get better at changing that horse into a zebra Horses into zebras is a kind of arbitrary problem that you might wonder what why we do in that well for fun mostly but also because actually it's just a textual check text will change the shape of a zebra is Approximately the same as a shape of a horse which makes this problem slightly easier, right? We're talking about adding stripes We're not talking about totally changing the shape of the object the problem is that let's imagine that the generators have realized for discriminator how in terminator worked and Just said okay i'm going to produce this exact same picture a zebra This is gonna look bad by the way. So this is my picture of a zebra How did I am good is it Right again again again would have been better an untrained can work. I've been better now So let's suppose the generator Generate this same picture every time any picture of a horse cause it just generates that picture It's a simple you asked was ever. Here's who's ever it's the only one you're getting man This is something called mode collapse. The idea is that the generator has now completely failed to produce interesting pictures Just producing the same picture every time imagine we wanted a Generator that produce digits like six five four three, and this was determining whether they were real did you or not? well if you just produce six every time it's problem is so much easier and we discriminate it can't Complain about that because they are technically digits. This is the exact same problem So cycle Gann is essentially a two gun system that deals with this very problem So how does it do it? Well, what we have is we have two guns. Alright, because if one is good then two better and what we try to do is we're trying to make sure that not only are we taking our Image and turning into something that looks good. And it's in it's indistinguishable from a real one But also that we can then go back again To prove that we haven't just generated the same license image. It's got nothing to do with the input, right? So, this is our input we have a generator a generator Network G So this is G which is going to generate a picture of a let's say as of a zebra then we have another network F Alright Which is taking pictures of zebras and turning them back into pictures of horses So there's two guns and there's going to be two discriminators, right? This is a scream later here saying is this a real picture of a zebra? Question mark to a discriminator and when as a discriminator here saying is this a real picture of a horse? So there's a lot of loss functions being applied here what's a loss function a loss function is a function we use to calculate how wrong or right a Particular network is so in this case The loss function is you were supposed to say this was a real image. And you said it was a fake image So your arrow, is that is that much right? Without going to any numbers. That's basically what it does right, but this is how we train these things, right? You calculate a loss just say look you were supposed to say this was a fake zebra But actually you the wheels Emperor so next time change your weight so that you say it was a feint ever That's what the loss function is for. So a horse comes in the G generator turns it into a zebra, theoretically Right. This goes into the F function It turns it back into a horse and then we have a final loss which is what is the difference? Between this horse here and this horse here because if we can't recreate the exact same image Then we just produce a different picture whereas everywhere but it's that's cheating my for this to work We have to convert this horse into a zebra not just produce a picture of his ever You see what I mean? Right the same with something like star transfer to a Monet painting, right? We take a photograph We turn it into a Monet painting. It looks great But if we can't turn it back then what you've done is just draw a male painting for me It's got nothing to do with the input so in some sense this G and F function are inverses of one another one does of a function and the other one undoes the function what's interesting about cycle gain apart from the fact this kind of a loop here is they have this loss that also measures a Distance between this image and this image. So not only ensure that these are both Producing realistic-looking inputs and outputs, but that these two at the end of the same and you can do it both ways You can go horse To zebra to horse and measure this distance here and to make sure they're same and you can go zebra to horse As ever and make sure this distance is the same right so you can work in both ways You can put F first or G first So this is exactly how you train it you take a number of images of horses and a number of images of zippers and Then you train it just like again you say well, here's a horse image go in. Was it a real image or not? Right, and sometimes you give the discriminative real images. Sometimes you give it fake images and it tries to get better with that You don't put it through after to toe it back into a horse back Discrimination is also working to make sure this F function is better and then you make sure these two are the same as well one Way of looking at this is that you're trying to separate out what the content of images and what a style of images I'm the content should be the same Alright the content of the horse image and the zebra image should be the same Because if it's not you won't be able retrieve one from the other The style is you know aspect how it looks my is it a painting? Is it a zebras at horse? These are things that we can apply later kind of to the base image. That makes sense And this can be applied to a whole array of different problems like this kids in medical. Imaging right? We're looking at it in plant. Imaging It's used for style transfer So in the paper alone, there was about 10 different examples of things you can do You can Kate you can take pencil drawings and turn them into into men doings of an object so like, you know a boy of a shoe - an actual picture of a shoe you can go from photos to Monet paintings to Van Gogh paintings You can go from horses to zebras apples to oranges and the list goes on right and if you have a look online You'll see all kinds of weird wonderful things that people are using this cycle can and variants of it Too to do is a resolution. Good. No It's okay, right the the resolutions are get so this kind of size of training data We're talking about 30 centimeters is in the hundreds of pixels squared So 2 to 500 pixels squared usually training them takes all the memory once they're trained you can put through So if what we say inference time, you can put through bigger images and there were ways you could train it so that It will work on bigger images when it actually gets to applying it Right, but the images I've been testing on a smaller 512 by 512 something like that. We're not converting for Herod is yet? There are there is work that maybe we'll talk about overtime lifestyle gang which does try and produce higher resolution output what you'll find I think if you watch a video of a cycle gun being applied to every frame of an image It's gonna be little bit noisy is you're gonna have to take steps it tries to move that out. There's a few videos of like Parts of 2001 a Space Odyssey as a Picasso painting where every frame is a Picasso painting it's quite something to look at right but there's a bit of noise because you know The input changes even slightly some is going to change on the output right and so it sort of flickers To get it to get it to be smooth You're gonna have to do some kind of extra processing So I've come on this right you can you can download the code online. We'll put a link in the description You can download it as long as you've got all graphics card You can start putting Random pictures of horses in turn them into random pictures of zebras and feel bit pleased of yourself These kind of Gans are being used for all kinds of things that you see like a super resolution For example way taking a very low resolution Image and you want to sort of blow it up so we can see a lot of fine detail You know, that's sort of CSI. It harms in arts it zoom in on that. I have some concerns about this because One thing to remember is that these are trained to look nice. They're not trained to be an accurate representation of The ground truth specifically right? There's no if you take a blurry image and you know, I'm saying this to cover my own back more anything else if you take a blurry image and you zoom it up and it looks Like me, that's probably a coincidence You can't trust that it was actually me maybe more but it makes you know But my point is like you're not gonna be a loser in a court of law Because it could be argued that it just created an interesting looking face. It's not created the actual face from these few pixels This it's gonna be a while before we're looking at Networks that can actually work out what truly someone look like If you look at the input and output of this cycle can for example You'll see that it'll take a horse and turn into a zebra and when it turns it back the images are close But they're not the same often. The grant is a little bit muted I think probably because there's ever tend to be on dryer grass. I'm sort of the plains and stuff, right? So it changes the rest of the image as well to deal with this So these interesting things there does so super-resolution. It's one thing that's used for Medical imaging converting from one domain of medical image like an MRI to a CT scan right? I'm quite worried I wouldn't use case for that is but you know There's a lot these sort of things can be done the recent Samson video where they had people's faces and things like deep fakes Right, which are I would just say morally questionable, but these are all things generated using these Joseph's adversarial networks You're taking inputs which are you know noise or other images or pose? Estimations if you're trying to make a face do and when it's generating an image what looks like that thing So there's lots of ways of controlling what different bits of this do to get an output that you want Encryption and specifically kind of modern encryption and how it works Now Before we jump straight into something like the Advanced Encryption standard. I wanted to talk about SP networks or substitution permutation networks because they are the basis for a lot of modern cryptography not all of it, but a lot of symmetrical

Info

Channel: Computerphile

Views: 118,735

Rating: undefined out of 5

Keywords: computers, computerphile, computer, science, University of Nottingham, Dr Mike Pound, CycleGAN, GAN, Generative Adversarial Network, Machine Learning, AI, Artificial Intelligence, Neural Network, Deep Learning

Id: T-lBMrjZ3_0

Channel Id: undefined

Length: 13min 10sec (790 seconds)

Published: Thu Aug 01 2019