Everyone's passwords are terrible and they should change their passwords right now. Let's just get that right out in the open, OK, you all have bad passwords and you know you should feel bad. Probably not necessarily people who watch Computerphile, but the majority of the public don't have good passwords, and it's a real problem. It's a problem because– People like LinkedIn and TalkTalk get hacked, and a bunch of hashed passwords go out onto the Internet, um.. and then within ..you know, hours half of 'em have been cracked. And then people are going: "Oh well this user name and this password's been cracked. Well let's just go and log on over there and see if that username and password combination gets me into their Amazon. Oh it does? That's good news." And, and so on. Password cracking has massive implications for password security. Uh, for what passwords you need to use, how you need to store your passwords, and so on and so forth. In a previous video, Tom Scott talked about how to store passwords, Tom Scott: "Please please please please please, look up a recent tutorial for the language you're using." As a company right? That these things are still true. Right, the hashing algorithms you have to use have become longer. Because they don't hold up as well, the older ones. Which we'll see in a minute, and um... so, some things have changed, but really the principle remains the same. Right? We don't store passwords unencrypted in a database because that's a terrible terrible idea. What we do is we pass them through something called a "One Way Pseudorandom Function". Which basically take some plain text password, and turns it into gibberish. And then, When someone tries to login, We do the same operation on what they just typed, and if the gibberish matches up we know they've taught in their password correctly, without actually having to know what their password is. But if these hashes get dumped on the
internet then we can't reverse them because they're just random nonsense but
we can do is test the load of different words by hashing them and seeing if the hatches matc--hashes match any of the ones in the dictionary and if they do we
know we've cracked their password and And that's really really easy to do I'm going to show you it and it's got me
scared me the first time So yeah let's see-- I've changed my
password simply put that way Okay now this is what is necessary
terminal but this is Beast the aptly named Beast which is our deep learning,
one of our deep learning service right now. I'm not using a deep learning right now and
nor is anyone else I don't think, so just for a moment we'll
borrow it. It's about two or three times bigger
than a normal desktop but it's not service size rack and it's sitting
somewhere between behind a bunch of locks I think on this floor somewhere I
haven't seen it-- well I saw getting built and then I-- but it disappeared. Maybe we'll go and look at it some time. So if we-- we type in there nvidia SMI we can see what's what's this is
equipped with. For now, my most of my contact with it is my terminal and I ask it to do things and it does them very very fast. This particular server has four Titan X graphics cards in it. A Titan X is one of the foremost graphics cards. There are new generation 10 nvidia graphics
cards coming out and some AMD cards but a Titan X is still performing massively well. Certainly for deep learning it's very good because it has 12 gigs of onboard ram. Now in some games 12 gigs onboard ram might be necessary for really high texture resolution. So if I wanted to play... you know, the new Doom game and there was no one about then I could. Apart from they've installed Ubuntu on it so that doesn't help me much. It might be Fedora. Let's not go into that though. So if I say it's linux, right, and we've installed Caffe and other deep learning libraries and lots of people are using it all the time to do interesting deep learning problems. We've got a huge array of different problems, but right now we will use it to do some password cracking. I downloaded a program called cudaHashcat. Hashcat is one of the sort of foremost password cracking tools. It lets you do lots of different types of password cracking which I'll talk about and it does it very very quickly because it makes use of the graphics card or in this computer case it makes use of all four graphics cards in parallel. Each of these graphics card is capable of somewhere around I think it's ten thousand million so 10 billion hashes per second. my standard graphics card at home which is pretty good is about four billion so these about nearly two times faster each and there's four of them. Okay, so this is over eight times faster, let's say about 10 times faster than my computer at home? It takes 40 billion plaintext password hypotheses, hashes them using MD5, and compares them to a list at a rate of 40 billion per second. [Off Camera] And how many words are written just in English dictionary? [Mike] More than you'd think. A lot more than you'd think, which, is in some cases reassuring. In other cases if your password is not very long, not reassuring at all. Okay, so we'll talk about the different-- I'll show it working and I'll talk about the different kinds of
password cracking because they do have implications, different implications for passwords. Okay, so Hashcat is run off the command line... What I've got here, if I just show it... This example file is just a list of hashes that comes with Hashcat. There's about six or so thousand hashes in it that range in difficulty. So some of them are going to be "password1" because that's what some people's passwords are, and some of them are going to be much longer, so 20 or 30 characters, almost random, and they're
going to be very difficult to crack. So we won't crack all of them now but we will crack a fair few. So if I just show you this, these are what the hashes look like. MD5 produces a hundred and twenty-eight bit hash. Now only five should not be used by anyone ever, ever again. The problem is that... that lower... standard hashes like MD5 and SHA-1 still get used a lot for back end storage. Maybe the developers are thinking, "Oh, it's already in SHA-1, you know, it's a lot of effort to convert them all over. Maybe people won't be able to log in for a while... Hmm, let's probably not." Yes, do. Change your hashes to something like SHA-512 really quickly, because this is not acceptable. Hashing it takes longer literally just-- it takes longer for the GPU to process and so you will go down from 40 billion to, you know, a few million or a few thousand for really good hashing that's been iterated a lot of times. And that makes the process insurmountably harder, you know... Much more difficult, you know, and so that-- that would be what I would recommend as a developer. As a user, it just means you have to have a password that sui-- that's acceptable but you have to, in a way, assume that some of the websites that you use won't know what they're doing and will have it stored in MD5. If it's still in plain text, then all bets are off, there's nothing we can do. Okay, right, so let's just run this in brute force mode. So the first type of password cracking,
which sees some use but not a lot, is brute force. So this is simply a case of starting
with "AAAAAAA" and then "AAAAAAB" and "AAAAC" and so on for different character sets. If we assume that it's going to be some subset of passwords that use only lowercase letters we can brute force those very quickly, especially if they're not very long. So what I'm going to do first is I'm going to run a hash-- I'm going to run an attack on these passwords of, let's say, seven character passwords all with lower case letters. Okay, so that's like this... So it's, ah... Hashcat attack mode 3, which is brute force, example0.hash-- the hash file-- and then my mask which tells me what character sets I'm going to use. So L is a lowercase letter, so 1, 2, 3, 4, 5, 6, 7 lower case letters. Thinking... There we go! So it's done it. Okay, if I scroll up, those are the passwords it found. Okay, not very many, because there aren't very many, luckily for these users, lowercase only passwords. [odd cut] ...Seven of them. But it went through the whole combination of lower case letters at seven length, in, you know, a second? Which isn't great. Alright? So we step it up a notch. Now we say, "Well okay let's do eight characters," so we just add another L And we run it and... forty billion attempts per second... Here they go and they just, they just keep coming, right. Each of these lines is a hash and the associated password that has been cracked. So what it means is, at some point it's tried "mycubana," the combination of letters that spell "mycubana" It's hashed it and found, "Oh that does match one of the ones in our dictionary, in our hashfile, so we will put that in our output. Alright? Okay, so let's just step it up a little bit further. Some passwords, for example, will have two digits at the end. Okay? The vast majority part of that have numbers in have one or two digits at the end, maybe four, because they're dates? So let's say we say six character passwords with two digits at the end. Here we go. Aw, there were only a few of those, but we found them. There they are. This is a good start, right, it's very very quick. it starts to slow down as you increase
the number of characters so when you're doing a brute force, sort of naive brute force attack like this, then you're using the number of characters in your character set to the power of the length of your password. In this case, there are 26 lowercase digits, 26, to the power of 7, for when we were trying 7 passwords And then for, let's say, six character passwords with two digits on the end it's going to be 26 to the power of 6 multiplied by 10 to the power of 2. This is the status of the last attack I did, and it had this many passwords to crack and it's done them all and it was doing at 38 billion hashes per second. Which is why MD5 is not usable in any sense anymore, ever. Don't use it. Okay, is that clear yet? Okay, so this is a start, right? Now the problem is that first of all, I only get a few hashes each time because it, you know... this is only 6000 passwords. If it were LinkedIn's 40 million password database, you'd get a lot like this, but it's a bit of work for me to do this. There are ways of generating this mass automatically and iterating through them to try all the different combinations. The other thing is that once we start getting to 9 or 10 character passwords, even for this machine it starts to become impractical. Particularly if people are using larger character sets so consider that this 7 length password is 26^7. Well, if you're using lower and uppercase, it's going to be (26*2)^7. Okay, which is 52. 52^7... put me on the spot! 52^7, and then if you add symbols it's more, and it's something like, if you're having symbols and numbers as well, it's going to be somewhere around 90^7, each depending on your character set. So... that's too much, even for this. At 7 length, it might be feasible. At 8 length, it starts to get pretty difficult, if you're using symbols, and at nine still currently not really doable, even for MD5, okay, because that's how big the search space is. But, most password cracking these days doesn't work this way. Okay, this is a start and you can pick up the really rubbish passwords. So if your password is six characters long, it's being cracked right now, and it's being cracked quickly because we can go through all the 6 character passwords in a fraction of a second. For longer passwords, we have to make some assumptions about the way that people choose passwords. So, obviously the password "password" or in the "password1" is actually nine characters, you know, in which brute force is pretty good, but it's not good because it's just the sort of number one password to be used And so on the top of your list of hypothetical passwords, it should be right at the top and the first one you try. Okay, so this is what a dictionary attack does. We have a dictionary of a list of commonly used words or commonly used passwords, and then we try those. And then we manipulate them slightly, with rules, and we try them again and we append them to other words and try them again and we do lots of different combinations of things and try them again. And it's pretty scary. It's much more effective than brute force, and so it's the current way that things are done. The hashing rate goes down a bit because you're loading dictionaries and doing word manipulations but it's still pretty quick. So let's show you an example dictionary. So this dictionary has common passwords that have been cracked from other sources. There are other password lists, like the RockYou list and soon the LinkedIn list, I'm sure, which will have a big impact because they are real passwords of people are actually using, so if you make a word list out of those passwords that's going to be really effective. Let's run this example dictionary over our passwords, but now let's just manipulate it a bit to make it a little more well, scary is one way of looking at it. Right so, Hashcat, but this time we're going to run in attack mode 0, which is straight dictionary attack. Okay, if I did that against my example dictionary that we've got, then it would probably find, you know, a couple. It's very quick, because it's not that many, so it's already finished and it found one: this chat with "13lexon". So what that's telling me is there's only one guy who happened to have who happened to have the same password that's in the word list. Now that's quite common because I mean, In a really big database, you're going to have a lot of people who have "password" and "password1234" and "12341234" and so on-- All those people are going to be found this way but what we really want to do is mix up the dictionary little bit, swap a few letters around. So what these rules do, they do obvious things like they replace "I" with the number 1. Or they replace "E" with a 3. Or put an "@" in instead of an "&" or something. You know, mix it up a bit, common password substitutions: leet speak, weird things like this that people think are very secure, and in fact they're just got rules to just completely defeat them. Toggling case up and down, you know, if a password's viable, then the same password with the first letter as uppercase also probably viable. Alright, so you do all these things. And there are rules to do this. Now, if one of my, one of the-- If someone does a lot of password cracking, because maybe it's their job... which is kind of a weird job but people do do it, security experts and stuff, if you're really into this then you have your own dictionaries and your own rules I'm not using it today, I don't have my own dictionaries, my own rules, because fun as this is, mostly I have no more work to do. Now what I'm going to do here is I'm going to put in a ruleset called dive.rule. Now I don't know who dive is, I expect it's an alias of some hacker somewhere. He's got quite a good rule set that works quite nicely with this hash file. Okay so let's run it, and what it is going to do is go through all of the rules in turn and for each rule it's gonna through the whole dictionary and try all those different combinations against these hashes. And here they come. So that was about a thousand hashes we just got, which is a little bit worrying Some of them are rude, right, I've scrolled past those, okay, because some people have rude passwords. Those passwords are bad passwords because rude words are also in these dictionaries. Alright, so if you you're being clever by putting swear words in your password file you actually are just making it weaker. Okay, these are some not absolutely terrible passwords in some sense, most of them are lowercase with a few numbers. "leanadrien", which might be a name, is ok but probably a similar word's in the dictionary and it got manipulated in some ways and some letters got swapped around and suddenly it was cracked. So we've had some luck, we've done a bit of brute force, we've done a basic dictionary attack, we have a few rules just to mix it up, and we've got some passwords. So far I've cracked I think about 1700 passwords out of about 6 and a half thousand. Some of these passwords aren't crackable, in the sense that you could be here for days and still have some left, but I think I've previously gotten about sixty or seventy percent fairly easily. So how can we get even better? Well, we use a better dictionary. That's the key. This example dictionary is fine, it's not very long, you know some passwords are going to be in it, but as you remember we ran it and it didn't find many passwords. It found some when we ran it through some rules, but it didn't find a lot. So what we really want to do is find a list of actual passwords that people are using in real life and use that. Now luckily, these leaks happen all the time and so passwords are just being dumped out onto the internet all the time. So there's this password list called RockYou, which is a bit of a game changer in password cracking, if that's a thing and basically it's 14 million or so passwords I think, actually leaked from a proper database of real passwords that people were using. It was I think it was a gaming service or something like this and then it got leaked. And the point is that if you run the RockYou database over these hashes you start to really get results, because there's just much more interesting passwords in the RockYou database, there's just many more of them. If I run the same thing I did before, but I parse it over the RockYou dictionary... So I'm doing the exact same thing as before: same rule manipulations parsing over the RockYou dictionary, we should get many more passwords. Should we see? Okay. it's just compiling the cuda and then
off it goes. And here they come and they're just
going to keep coming, right, there's a lot more because we've got lots and we've got many rules and we've got many... 40 million passwords in this list. It's going to take quite a lot longer to do. Okay, so but it's pretty fast! If I pause it and if we look at the status... so we're calculating now in total 8 to 10 thousand hashes per second. So, about four times slower because of all the dictionary manipulations we have to do. but it's still pretty quick. [Off Camera] So you said compiling the cuda, I've heard of mentioned cuda in terms of graphics card before, what does that mean? So cuda is an nvidia-- I mean actually Hashcat can work on AMD cards as well-- but what it basically does is it compiles a C-like intermediate language that tells the graphics card what to do. Normally-- so cuda in detail is for a different video-- but normally, what a graphics card does is basically take a bunch of vertices in your world, transform them in front of your camera and render them to the screen very very quickly. And the reason it does it quickly, because it maybe as 2800, 3000 processes all doing the same basic stuff. It's essentially taking the RockYou list manipulating it using the rules and testing these words for passwords At a rate of 8 to 10 thousand million per second. [off camera mumbling] Yeah, 10 billion per second right? I just-- it's the way it's written here, so I keep saying 8 thousand million, 8 billion. 8 billion per second. [Off Camera] So that's 8 billion attempts? [Mike] Yeah. So, you know, it'll try "password1" then it will try "password2", then it'll try "password3" with a captial P and so on and so forth, for much more complicated passwords and we've already I mean, I paused it and we've already got 3000 now. We had 1700 so we've got 1300 in--I ran it for about 10 seconds? And if I keep running it, so I keep resuming, and theyre gonna keep coimng. Alright, and some of these passwords are really, really hard to crack By brute force or by normal dictionary, and this RockYou has changed everything. in the sense that it's just so varied that you just get password that you just get passwords that you think are really good. If I pause it and we look at the passwords... I mean this one, "nik21061989" You could guess that that-- because it's the guy's date, but it's been found in the dictionary "spacelightning" is quite a long one, but it's two words put together, so that's not secure. Alright, so it's been found. And so on and so forth. "laurence0901", even if 0901 is completely random, you're going to get caught. Alright, because you've used your name. Alright, so we can just keep going, we can keep going with this. And they'll just keep coming. If I look at the... how long we've got left, we've done 18%. And we've cracked another 200 since I've been talking. So it's just going to keep going. And finish off the database. So if like MD5 you are doing fairly basic things, you can plow through jobs and in this case, I'm doing it with however many calls per GPU with 4 GPUs. Which is a little bit, a little bit worrying. I mean, it's still going. My current-- my current count is... I'm 47% through this particular attack. I could use different rules, there are other rules, like toggling case rules and things. I've got 3 and 1/2 thousand now, nearly. Of these-- so, nearly half of the passwords, right? And some of these passwords are good. So I guess, for the people watching you got to think how good are your passwords? Are your passwords better than half the people in this list, right? And if they aren't, you-- that's probably the next thing you should do, is change them, you know? I mean xkcd alluded to this and we'll talk about that in a minute, you know it didn't necessarily answer every question but it did get a good message across and then there's other aspects, you know, should you reuse passwords and so on.
Decent video. Needs to mention salting the hash. Surprised he didn't mention rainbow tables, even by name once.
If anyone is freaked out, remember that he used MD5 hashing algorithm and lower case only passwords, etc.. IE he made it intentionally easy for the demonstration.
When I was younger I thought I was smart by using
ok, this is freaking scary. What is a good rule/set of rules to use for a password that is safe but easy to remember?
i use a password system. so i split the website's name into 2 parts. so let's say it's hotmail so i'll split that into hot and mail. then inbetween those 2 parts i use my general password. so it'll be like hot1234mail or ama1234zon etc etc. though i use a better pw than 1234 heheh. it seems to work ok. it's only annoying when sites give you a limit to the number of characters in your password, which seems stupid to me.
Jibawish. Hehehe.
He keeps mentioning the Linked In hack. That hack was the one that made me change all my passwords on all resources. It may be overkill, but I use Keepass to create a different 32 character (or as long as the site allows) with CAPS, lower, numerals and special characters.
Now all someone has to do is hack my Keepass file! Then I'm screwed :(
So what of this still holds true when the passwords were properly stored with a salt? It seems to me that the whole approach about testing all passwords simultaneously would break apart.
Okay guys, i need a new password. any idea?
yup nice tricks