Secrets Hidden in Images (Steganography) - Computerphile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So cryptography is the idea of encrypting a message so that although everyone knows the message has been sent they can't actually find out what it means. Whereas in steganography we're trying to hide the fact that we've sent a message at all. So a classic example would be if i was writing you a letter and then I wrote in invisible ink a whole different letter between the lines or on the other side or something like that... And only you knew that that was going to be there. So you get home and everyone else maybe looks at the letter and thinks "it's not very interesting at all". And then of course you can uncover the secret message. Today we'll talk a bit about "digital image steganography" because obviously there's a huge scope for hiding things in digital images: Images can be megabytes or more and you can hide files of megabytes or more in them. But of course as the the amount of steganography in images increase, so is the attempts to try and find it. So there's lots of statistical approaches to try to find these things as well. Perhaps the most simple form of steganography an image is "least significant bit steganography" So if we've got a bitmap of any kind (a PNG or BMP) then we can change the lowest bits to be our message and you'll have an almost imperceptible change on the actual way the image looks. It's a bit like if I change the number 800,351 if I changed the 1 or the 51 on that it's not gonna have a massive effect... - That's exactly right, the number so big that in the grand scheme of things it makes no difference. So generally speaking we'll change (in an image) in every single byte we'll change the last bit or maybe the last two bits if we really try to cram in a lot of data. Every byte is eight bits, we take the last two and change that to our message in the hope no one's going to notice. So, for every byte (that's every 8-bits) six of them are the regular image and two of them are our secret message, so a quarter of our message is now secret. So if we have a normal pixel, it's going to be 4 bytes long (so that's one byte) so for each byte we're talking about the last two bits in that byte. So that could be a 1, we can change it 1, change it to 0 or leave them both the same. And what we do is we read off our message so let's say our message we're trying to encrypt is 10 11 01 Okay? We get to the first byte and we say: well this is great, our first two bytes are already 1 and 0 so we don't need to change anything at all so that byte stays as it is. So we go to the next byte, so this will be maybe red and this might be green in our pixel. Okay? The last two bits of this byte are 0 and 1, the two we're trying to put in from our message are 1 and 1, so we change this one to a 1. So by changing that second least significant bit from 0 to 1 we've just increased this value by two and we're talking about one channel in a huge image - changing by two levels is probably not going to be too noticeable. If we start changing the highest significant bits then that might be a problem. All right, so I've written a program to do this and I've tried to hide a rather large file inside another rather large image OK, so this is a nice picture of a tree It's about 3 (and a bit) megapixels in size. So this is the original image of our tree and that is the steganographic image. First one, into the second one. - It's not changing! - It is changing. When you only change the last two significant bits of an 8 bits per channel image you're not going to see a huge amount of detail. if you actually do a subtraction on the images, you can see a difference but in general it's going to be pretty imperceptible. The really good thing to do would be to never release the original source image. I can tell that something has changed because I've got the original and the new steganographic image with me. But if I just sent out an image of my dog and I never sent out the original that the camera took no one's going to know that it's been imperceptibly changed because they haven't got a reference. If you take a public domain image and change it, it's gonna be easy to look for the original source. - [...] - Exactly The other thing is, it'll work better on photographs where there's a lot of variation (in the intensity levels anyway). So this steganographic image has the entire works of Shakespeare buried in it which comes to (when it's zipped up) about... 1.5 MB, something like that. This kind of simple steganography can be detected. This image here is an image that I've created by taking only the last two bits of each channel. I've scrapped all the other information. If a pixel has a value of 0, its black, if it has a value of 3 it's white, and then it ranges in-between. And you can see that it's a tree there, so you can see even in the first two bits that there's a tree and the sky is particularly bland. So if you look instead at the steganographic image I've done the same filters to that and you can see but the amount of noise is increased massively because that noise is all hidden in those least 2 significant bits So you can see if you compare the bits from one image to the other, you can see a difference and so hiding a messages in the least significant bits is fairly obvious particularly if you have the original for comparison. So this is the difference between those two images and I've massively scaled up the difference I mean it looks very gray. these white pixels and black pixels are values of plus or minus 3 intensity changes. So we're still talking very small differences over the image and it's very evenly distributed it's all sort of spread noisily throughout the image - Yes so you can't actually tell that there's a tree there now. - No, you can't tell there's a tree there. Which could be a clue! Perhaps the more sophisticated method of hiding something in an image would be to hide it inside the Discrete Cosine Transform Coefficients of the jpeg. So we talked a little bit about the DCT and how we convert an image into a series of cosine waves. And we have coefficients saying how much of each of those waves we have. If you change those coefficients instead of changing the raw pixel values, you have a much less predictable effect on the image: if you change the value of one of the alternating large current coefficients from 202 to 201 you're gonna have a very imperceptible difference and it's going to happen over that entire 8x8 block so you're not going to be able to see the clear sort of steganographic noise that we just looked at on that tree. A common algorithm that we see in use is called JSteg. So I see what you did there. And what JSteg does is it goes in and if it can it will cram DCT coefficients full of as much data as it can. And what it does is: any coefficients aren't 0 or 1 (because they might change and be a little obvious) so usually be low frequency ones might change up or down and you can see again that the difference is almost imperceptible. So here's a picture of a panda and what I've done here: I couldn't cram in as much information as before so it's just much better than this one. So there is the original and a steganographic image. And I've looked at these and I've [...] a bit of difference and you can see that again it's very, very, very slight so these pixels again have only changed by plus or minus 3 maybe one maybe two. - So that's just zoomed in on the... - That's zoomed in on the difference right there so you can see that, yes, the pictures have changed, but have not changed by a lot. And the other crucial thing about hiding your message in the DCT coefficients: the jpg has already completely messed up the least significant bits of the image. So if you do an image like I did where we're looking at just the bits, we will no longer be able to see a tree, w'll just be able to see very general jpg noise and that will be exactly the same in our steganographic image, so you can't do what they call a visual attack by looking and seeing if there is a steganographic message hidden inside, because there's no real change. So this is the original and i'm only showing here the least two significant bits. And you can see that they form into little blocks [...] blocks is the 8x8 DCT blocks. And this is the steganographic data so you can see that the blocks have changed, but the distribution of noise throughout the image hasn't changed at all, so it's very difficult to see there's a message buried in there And if the message took up only a certain amount of the image it's hard to see where in this image the message is. You could be trying to read off every DCT coefficient when in fact only some of them have a message in. - If you were sending this to someone as a message... how would they get it out? - OK, so in general you would also encrypt the message because, you know, you better be safe than sorry, why not use encryption. So we encrypt our message, we put it into DCT coefficients or in the least significant bits, and then we send it off to someone. Now, they're going to have to have known the process that we used because if they don't, they're gonna be looking in the wrong place, so they'll know that we used J Stag or F5 or one of the other DCT steganography tools and they're basically run the program they'll type in their decryption key which will actually remove the encryption and then out comes the message. When JSteg was invented, it was robust a visual attack so you couldn't look at it and go: "well that clearly has been altered". So they had to try and come up with -research had tried to come up with- some other way of detecting that an image has had a JSteg message buried in it and what in fact happens is that the coefficients change ever so slightly. Because we're applying quantization to our DCT coefficients, most of them will be set to zero. OK? And JSteg won't put anything in there, because it's too obvious; it'll only put them in a few at the top corner that are big, and you'll find that there's a subtle imbalance produced in where your coefficients are so you expect most your coefficients to be 0 and then a fair few of them to be -1 or 1 and -2 and 2 to be very close to zero. And in fact you start to get a few 3s and 4s that you weren't expecting and the distribution of these numbers goes off a little bit and you can start to predict that the JSteg file has been buried inside. What's more is that this happens in each 8x8 block, so you can actually do this test on every block and find out which blocks have messages in them, which books don't. And you might find for example that the first 60% of the file has a message in and then abruptly stops and that's a blatant clue that we've got something that isn't taking up the whole image It's has just been written sequentially into the file. So if we take the frequency of the number of occurrences of each DCT coefficient - so nought (0) is going to be the most common, there may be -1 and 1, and we plot those in a graph with frequency on the Y-axis and the DCT coefficient on the X-axis we get what's called a histogram and that's simply a plot of the frequency of occurrences of various things. So you can do a histogram on an image, but you can also do a histogram on these DCT coefficients and find out whether they've been imperceptibly changed. Once people started routinely detecting JSteg, other people came along and decided well that's, you know, it's too obvious, so let's try and make it be more subtle. So what they did was they wrote DCT steganography approaches where they pay attention to the statistics of the coefficients and try to keep them balanced. So if you put in a 1 you try to take one out somewhere else so that you keep the histogram and the probabilities of these coefficients occurring at the same. And that makes it much harder to use your standard histogram analysis technique to find out whether there's anything in the image. but now what they can do with the power of machine learning is: Take let's say a thousand images, 10 of which may or may not have something buried inside and a classifier will pretty well find out which ones they are. You just have to have a lot of positive and negative samples to throw at it. - It all sounds wonderful but you know, [???] - Well. yeah, so aside from spies I should say i'm not using these techniques. you know everyone's watching.... - [???] are going through your Instagram - Exactly, so I think one of the most common uses is digital watermarking. So in normal steganography what we want to do is try and hide a message as well as possible. And then all that really matters is the person on the other side can get it and no one else really notices. In watermarking what we want to try and do is fingerprint to file so that we know where it came for we know it's ours, maybe for copyright reasons or to trace who's been distributing illegal material. And the key to a watermark is, instead of it being as much payload as possible so instead of trying to cram the entire works of Shakespeare into an image what you should be doing is just a small... let's say a small logo or a small piece of text repeated over and over so that if the image gets cropped image gets re-compressed, it's still there. You can imagine that stock photo companies might do this to try and make sure that people aren't distributing their files elsewhere. And you can imagine that they would stroll through the web looking for steganographic images embedded in their particular way. Another case you might find if you were distributing pre-release DVDs of a film and then it gets leaked onto the internet... If there's steganographic data on the source buried in, you'll be able to see who it was that leaked it. - Each file could be tailored... - Each file could be tailored with the person they originally sent that file to and then when that particular find their way onto the Internet, that person is going to be in trouble. What was vital in recreating this image is now gone and we're not going to get it back and in fact that's exactly what you do see so if we show the actual output here we can see [...] is kind of visible but it's been completely dwarfed by all this random noise that's been added...
Info
Channel: Computerphile
Views: 1,267,079
Rating: undefined out of 5
Keywords: computers, computerphile, computer science, steganography, university of nottingham, secrets
Id: TWEXCYQKyDc
Channel Id: undefined
Length: 13min 13sec (793 seconds)
Published: Tue Aug 04 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.