Hashing Algorithms and Security - Computerphile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

The moon analogy could not be more spot on lol, thanks for sharing

👍︎︎ 1 👤︎︎ u/DonDucky 📅︎︎ Nov 09 2013 🗫︎ replies
Captions
Let's say you want to transfer a file from one computer to another and it is really important to know that it's got there intact in one piece erm, you could send it multiple times and then compare them all - but what generally gets used is something called a hash algorithm. A hash algorithm is kind of like the check digit in a bar code on a credit card. I think James Grime talked about this a long long time ago on Numberphile. The last digit in a bar code or on a credit card is determined by all the other digits on it and if you change one of those digits the last one changes as well so as you typed into a computer - you can know instantly if you've missed a key somewhere so a hash algorithm is kind of like that - but for an entire file that might be megabytes or gigabytes in size what it gives you is a code 16 or 32 or 64 characters generally hexadecimal basically just one long number expressed in that way that is a "sum up" of everything that's in that file If you crushed it down if you do all these manipulations to it and crush it down crush it down and crush it down and what it comes out with this thing that says this is a summary of that file you can never make it work backwards you can't pull that data back out but it's like a signature it's like a confirmation that this file is really who it says it is the simplest hash algorithm I can think of I would just be something like that's five add up all the digits in the file which is 4, 9, 14, 23 that's not a good hash algorithm for a few reasons hash algorithms have three main requirements the first one is speed it's got to be reasonably fast it should be able to churn through a big file in in a second or two at most but it also shouldn't be too quick if it's too quick it's easy to break and I'll explain that later the second requirement is that if you change one byte one bit anywhere in the file of the start of the middle at the end then the whole hash should be completely different this is something called the avalanche effect. If you're interested in how this is achieved do look up the actual algorithms themselves. It would take me an hour to explain vaguely how they work in a in a friendly way but if it's your kind of thing do look it up but suffice it to say one bit gets flipped anywhere in the message then the whole hash is completely and utterly different the third requirement is that you've got to be able to avoid what are called hash collisions this is where you have two documents which have the same hash obviously there is a mathematical principle called the pigeonhole principle you have it if you have 50 pigeons and 25 pigeonholes did you have to stuff two pigeons into one of the pigeonholes that's a terrible analogy when you say it like this but if I could explain it there are incredible numbers of documents out that possible with the hash meanwhile it's just one fairly long number that will be files out there which naturally have the same hash and that's okay because the odds against it are so unlikely that we can deal with that it's never going to happen naturally but if you can artificially create a hash collision if you can say create a file and change your name then we have a problem and that's that's where security comes into these because if i can make a file that sums to a certain hash then i can fake documents i can send different things and have this signature match so let's say I have an important document something that's i don't know, that's the "permission to to go to the moon" I don't know why I said that erm... oh yeah "permission to go to the moon" let's say that - and it's got someone's name on it and that file is sent and along with it through other channels comes this hash to verify that this is actually the document now let's say I can intercept that file and I can change it but because the hash algorithm is broken i can change it and change the name and change the data and change whatever i can send someone else to the moon because I can make this hash the same through carefully tweaking the bytes now it's incredibly difficult to do that in practice you'd want a massive file and a lot of computer code but there are old hash algorithms like md5 which was used for many many years which now have these collisions out in the wild and are considered broken because you can get a file not document with text in but a computer code anything like that where it's possible to send something malicious and have it come out with the same hash so this is important this is where speed comes it if the hash is too slow no one will want to use it but if the hash is too fast if you can create new ones in a few processor cycles then you can fairly easily create documents that match a particular hash. it is in a very real sense an arms race as I said for many years md5 was the accepted algorithm and it's still used for a few things but md5 is now thoroughly broken because computers are fast enough and there are a few -sort-of- interesting tricks you can use to try and create hash collisions deliberately. The other problem with md5 is because it was used so much and it was used everywhere on the web google has become an exceptionally good resource for breaking them You wouldn't want to store a password this way i'll talk about that in a later video don't use something like this for storing passwords but people did many for many years people did & in a lot of cases a word will be stored next to its md5 hash for some reason if you type an md5 hash into google frequently the word it was hashing comes out which means that for pretty much every word in the English language and a lot of other passwords besides the md5 can be solved by typing it into google so md5 is is comprehensively, constantly broken so everyone move to something called sha-1 and now there are rumors that that might start to be broken soon if it hasn't already because computers keep getting faster hash collisions are easier to generate so everyone is moving to sha-2 which for the time being is secure. sha-3 is going through the process of being ratified by all the agencies now and in a few years that'll be the standard - I mean ultimately I should really emphasize this **Don't use this for storing passwords** I'll talk about that in a later video these are used for verifying files for verifying transmission and that's all they should be useful there is one last thing which is that occasionally you will see download sites offering software who say that here's the file we're going to send you and click here to download it and if you want to be safe here's the hash of the file so you can be sure it's the right one - that's a terrible idea I mean it will verify you've gotta download intact but they're selling this as we guarantee that this software is safe and you can check it against that hash - which is a bad idea because if someone has been able to get into their website and change the software they're sending its pretty trivial to change that hash as well so they got that is hash algorithms that is taking a big chunk of data and turn it into a small amount to verify it & in a later video i will talk about how that's used and how that shouldn't be used for actually keeping things secure this episode of computer file was brought to you by audible.com and you can go to audible.com / computerphile and download a free book they've got a huge range that you can listen to on all kinds of devices your phone or in the car things like that I was thinking about a book to recommend and it made me think about the first audio book I ever listened to and that was Treasure Island and I listened to it on a cassette next to my bed as i was going to sleep each night I checked the audible website they do have treasure island so that's my recommendation today why don't you check it out audible.com/computerphile free book and thanks to them for supporting our videos
Info
Channel: Computerphile
Views: 1,190,735
Rating: 4.9429021 out of 5
Keywords: computers, computerphile, hash, md5, security, tom scott
Id: b4b8ktEV4Bg
Channel Id: undefined
Length: 8min 12sec (492 seconds)
Published: Fri Nov 08 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.