Is DeepFake Really All That? - Computerphile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Clean the camera lens pls

👍︎︎ 3 👤︎︎ u/khan9813 📅︎︎ Jun 04 2021 🗫︎ replies
Captions
let's get out and say straight away that most  of what you see online that's claiming to be   deep fake isn't deep fake right it's some  kind of visual effects of some description   right people being made to look younger in movies  that isn't at the moment done with deep networks   they're not that good yet you know there's been  some very impressive demonstrations how much of   that is deep learning isn't always disclosed  but i'm really interested in the topic because   i don't think that deep fakes are that convincing  yet but i think they will be convincing in five or   ten years and then we have some serious problems  defects usually trained with a lot of pictures   of one person and another person and a network  learns to basically convert them over now one   of the things about being on computer file is  we have quite a back catalogue of videos of for   example me and steve so i had a go at this i  trained it up and it's the stuff of nightmares yeah so they were developing all this it doesn't  work amazingly all the time it doesn't handle   you know when the view goes off and things  like this but on the other hand for something   where i basically just put in a bunch of source  material it just kind of did it you know that's   pretty impressive there's a lot of stuff a lot of  talk about gans and things like this actually a   lot of deep fake isn't gans but they are encoder  decoder networks like we've talked about before   we have videos on gans and we have a video on  auto encoders or encoder decoders depending on   what your preference is the problem we're trying  to solve is that we're trying to take a face   and turn it into a different face right or it's  basically just trying to draw someone's face on   top of an existing one right and that allows you  to take a source video locate the face in it in   each frame and then paint on the new face right  and theoretically replace someone now there's a   lot more to faking than this you've got to do the  voice the hair and things like this right and one   of the problems if i'm trying to transfer my face  onto someone who doesn't look anything like me   is that it doesn't look good even if the face  is good most people who've taken interest this   will have seen this famous tom cruise look-alike  video it's very very impressive right how much is   visual effects and the actual deep learning i  don't know but the actual looks a lot like tom   cruise to begin with right it wasn't going  to work with me as the bass how do we train   a deep fake right so let's suppose we're taking  person a and we're trying to turn them into person   b so we need to first collect a load of faces of  each person and to do that we can use existing   deep networks so for example there are networks  which will find the key landmarks on your face   there are networks which will detect face we've  done face recognition before and so what that   means is you can take a video or series of videos  and find a line and crop out the faces essentially   automatically and that makes your training process  much much easier so you do this and let's say   you collect i know 5 000 pictures of me and 5 000  pictures of steve right and you crop right down to   the face level because if you have a load of arm  waving in it that's a much harder generate image   generation problem which is for another day right  if the person is front and center and just looking   like this then maybe you can get away a few images  but if they're looking and doing this and they're   smiling and you know facial expressions for the  network to learn all those different mappings   you need lots more data not to mention things  like lighting changes at different areas and stuff   right so this is person a i don't know why i've  drawn a full person on here because that doesn't   make sense right this is person b so these are  cropped images of faces not not all people right   um sort that out in post for me please so the deep  network we're using here is called an auto encoder   right i often refer to it as an encoder decoder  right because i just it's just in the literature   that i use that that seems to be what it's called  but it's not really important the important thing   is you have a piece of network here which down  samples your image makes it smaller but also   extracts interesting features and it turns it  into features and this is called the encoder so   this is an e for encoder right and we have another  encoder here now actually these are the same one   these have the exact same neural network weights  they're trained in the same way and we have   features here an example of this kind of network  is the one when we talked about face recognition   trying to take someone's face and put it into that  space where you can just determine who they are   right that's essentially what this is doing this  is going to be some 500 1000 2000 digit barcode   should we say representing all the interesting  aspects of their face then we have a decoder so   the decoders is essentially the exact opposite it  makes the image bigger again and returns it back   to the original size and we're trying to produce  another face which is an exact copy of this first   one right so this is the decoder now what's  interesting about these deep fake technologies   is the encoder is shared right which means that  essentially the same encoder is used to turn a   into features and b into features but the decoder  is unique to each person so this might be mike's   decoder this might be steve's decoder and so  how you train this is you just put in a load of   images of me you've put them through this encoder  decoder and you try and get as close a copy of me   out now if you think this isn't an image in here  this is sort of high level feature information   so this is essentially compression right it's  it's got to down sample this whole image into   a very small feature space of just the most really  important information and then it's going to turn   it back into an image and the same here it's going  to take a picture of steve or someone else going   to down sample them into the feature space and  then it's going to up sample them back because the   whole point of this network is you're trying to  make sure this matches this and this matches this   if your feature space isn't a good representation  of your face you're not going to be able to do   that right so that's why the features end up  useful right this is what auto encoders do   they turn your input into a fit into a latent  space or a feature space and then you turn this   back into the image the encoder is a set of  instructions that take a face and summarize   it and then the decoder is takes that summary  and turns it back into an image and it's going   to include things like orientation and stuff like  this now how do we actually turn me into steve or   vice versa right all we do we train this all up so  we make sure a looks like the output of a b looks   like the output of b there are other generative  adversarial versions of this you could use right   but this is just nice and straightforward and then  all we do suppose we want to turn steve into me   we put in a picture of steve right we encode  it into this shared feature representation   which is you know what what poses he has he  got but then we use mike's decoder so we switch   this decoder out for my one we turn an image of  the person we're trying to change we turn them   into the feature space and then we use someone  else's decoder to return them to a different face   right that's how it works and then you get a nice  picture of me out nice is relative right um and   you know i've trained this up and it works pretty  well right it's it's a little bit a little bit   weird wireless networking older than you didn't  believe it was super straightforward right i   just got a load of videos from you right which  downloading them was the slowest part and then i   basically it automatically extracted all the faces  aligned them all nicely and then i essentially   just trained it from there and i played around  with a couple of networks involved read up from   what they did but basically it it very it holds  your hand through the whole process and then at   the end you can put in a video and it'll give  you out a video with my face instead of steve's   there's obviously a lot of problems with it right  first of all the networks are still not quite high   resolution enough to deal with 1080 and 4k video  you know i didn't put enough training data i put   in quite a bit but not enough not varied enough if  you were really committed to providing incredibly   impressive output you would need to pay a lot more  attention and train it for a lot longer than i did   right i was just seeing how things go it heats up  my whole house this graphics card when i train on   it so um i was i was too hot so i turned it off  this seems like a bit of a one-trick pony right   well actually um so deep fake all this sort of  image to image conversion which is really what   it is it's quite common outside of faces  as well so for example satellite imagery   right people are using it for mri to try and  convert between one kind of scanner and another   because then you can get a deep network  that works on both and things like this   there's loads of times where style transfer  which is a sort of what this is a subset of   um is really really useful right so there's a huge  amount of research happening not just in faces   but in other domains as well right some of it's  like this some of it's using stargand cycle gan   you know it depends right um but yeah this is a  pretty broad uh broad area for me these networks   are pretty standard networks that i use day to  day but they're somewhat interesting right but   i'm interested in what happens to this in a few  years time when these networks start to become   really really good right because you know the pace  at which deep learning is improving is impressive   right no not at every task right it doesn't  just solve everything but you've got to imagine   that for low quality cctv the kind of stuff that  needs to hold up in court or something like this   it's not going to be that many years before we  can completely replace faces or other things   objects in those scenes right and we have to be  prepared for that we're long since past the idea   that you could put a lid on this right and i'm not  saying that would even be a good thing right so   for example i used a github repository called face  swap to do my training like really cool download   it have a go for legitimate reasons please but  it has something like 30 000 stars on github   10 000 forks you know people are using this  in bulk right and and this is not the only one   you can't you can't ban this kind of  technology because it's just academic   discovery right and the other thing is that if  you try and ban it you end up in a situation   where only a certain people who already understand  it still have access to some technology and no one   else can access it right and that isn't that isn't  going to solve any problems like you could imagine   a situation where only people with resources could  tamper videos not everyone else and then you've   got a whole different set of problems you've not  really got a solution and anyway in the list of   things it says please don't use this for illegal  activities right problem solved i mean no like   i mean deep fake has obviously been amused because  it is used a lot for illegitimate activity right   for putting faces on things for faking politicians  words and stuff i would prefer to focus not on the   fact that it's being used for those things but  some kind of solutions we might think about as to   how we actually fix those problems because they're  not going to stop just because we ask nicely right   we have to think about you know in 10 years time  when you can't tell the difference between a and b   what do we do then like can you still use video  evidence right can you trust anything you see   on a social media site that has someone talking  if they can fake video and audio perfectly well   no not really right so there are some solutions to  this that people are exploring so the most obvious   one would just be you this is a problem caused  by deep learning right we'll solve it with deep   learning so we just take a video or a picture we  put it into a classifier trained on deep fakes and   say is it fake or not right this is you know  the adversarial approach it works okay i mean   it's going to be dependent on getting enough video  data will it work on the videos of me don't know   probably but will it generalize i don't know  what you end up happening is that people are   researching better techniques to perform image  conversion and then someone develops a better   way of detecting it and you just go like this and  the whole thing you know it's like virus detection   it's just like antivirus and virus detection  right the viruses become more sophisticated   they they encrypt themselves and things like  this and it becomes harder to check and so on   you have to come up with new new techniques  so i think that deep learning does have   a role to play in trying to detect stuff like this  i suppose what i'm wondering about is what happens   when they become so good that it is impossible to  tell the difference right now i don't i don't know   how likely that is but i would say pretty  likely given that given that for an image   we're only talking about a few hundred pixels  across right you can make pretty convincing   facilities using cgi already right so doing this  automatically does not seem out of a question   i think that probably cryptography  has a pretty big role to play   you know there are mechanisms in cryptography like  mesh authentication codes to stop messages being   tampered with right there's digital certificates  to verify ownership of some kind of file or   message right you could imagine that if these  things were built into like cctv cameras   you couldn't tamper with the message because  you wouldn't be able to recompute the signature   and things like this right i've not thought  this through right there with people in the   comments do discuss it in the comments right and  tell me why i'm wrong but it seems to me that   cryptographic techniques to stop things being  tampered with and make sure you know where things   have come from and who signed what they have a  role to play in trying to stop some of this i   think right those things don't exist in this area  yet we have stuff like digital water marking but   it's all pretty low key more research is going  to be needed here pretty pretty soon i'd say   cipher text that we'll be breaking later does  it honestly start with zeus as in conrad's use   yeah the reality of random yeah yeah yeah  yeah um you see all kinds of words in the   random ciphertext and they mean absolutely  nothing another row of characters became   as it were normal to the paper and could be AUTOMATIC CAPTIONS - Manually Published
Info
Channel: Computerphile
Views: 130,140
Rating: undefined out of 5
Keywords: computers, computerphile, computer, science, Computer Science, University of Nottingham, DeepFake, Deep Fake, Dr Mike Pound, Mike Pound, Dr Steve Bagley, Steve Bagley, Deep Learning, GAN, Neural Net, Auto Encoder, Encoder Decoder, AI
Id: IT6-5ZbabVg
Channel Id: undefined
Length: 12min 30sec (750 seconds)
Published: Fri Jun 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.