Cracking Enigma in 2021 - Computerphile

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

I loved this. If the video is too long for you, spoiler alert - he cracks it. The video is really about why it is not as trivial to crack as one might think. Definitely worth watching the whole journey. He explains in great detail how surprisingly complex enigma encryption is and why his solution even has a chance.

👍︎︎ 22 👤︎︎ u/j1mmie 📅︎︎ Apr 19 2021 🗫︎ replies

Always felt the plugboard was substantially less secure than the raw numbers would suggest, and I'd say this video also shows this.

If we went with letter frequency tables rather than IoC, and we had the rotor settings, we'd be able to determine with quite a lot of confidence when we had the correct plugboard for the most common letters (t and e in English, e and n in German). They're frequent enough that there's a good chance of seeing them just from the unplugged cypher text letters.

With a known plaintext, as Bletchley park had, it's even easier. A Bombe simply tried different combinations until it reached a contradiction, and could backtrack, eliminating whole swathes of possibilities.

👍︎︎ 6 👤︎︎ u/squigs 📅︎︎ Apr 19 2021 🗫︎ replies

I could listen to Dr Mike Pound for an hour

👍︎︎ 19 👤︎︎ u/Humberd 📅︎︎ Apr 19 2021 🗫︎ replies

It's sensationalized, but if you haven't seen Imitation Game, definitely check it out! It was very cool in the movie to watch Turing's computer run, trying to decrypt the Enigma messages.

👍︎︎ 8 👤︎︎ u/prismatic-io-taylor 📅︎︎ Apr 19 2021 🗫︎ replies

Does anyone know of any other old timey encryption methods that can be easily cracked by a hobbyist today?

👍︎︎ 2 👤︎︎ u/JB-from-ATL 📅︎︎ Apr 21 2021 🗫︎ replies

10 years ago I would have enjoyed this style of video a lot better. Now I want the conclusion at the start, and the video explaining it. Having to watch the whole thing or poke the timeline at random is ridiculous these days.

👍︎︎ 10 👤︎︎ u/DragonCalypso 📅︎︎ Apr 19 2021 🗫︎ replies

How hard is it to multithread work in Java? I was looking at his code, and the longest part of the search, the rotor search, is an embarrassingly-parallel problem, but still being done in a single-threaded way.

👍︎︎ 1 👤︎︎ u/MEaster 📅︎︎ Apr 20 2021 🗫︎ replies

Captions

so it was in the news that there's a 50-pound note coming out with alan turing on it now alan turing has been featured from time to time on our our channel and rightly so dave's done some fabulous videos on all the work that he and a lot of others were doing at bletchley park you know cracking enigma and it got me thinking you know now that we're in the 21st century and i've got a laptop that's much more powerful than all the cheering bombs put together how easy is it to break enigma like today right so i thought let's code up and find out the first thing i should say is they've done some really interesting videos on the history of enigma and uh do go and watch those right and it'll give you an idea for sort of things they were doing to try and break enigma back during the 40s not to trivialize it but it's really difficult isn't it it's really really difficult right you know this isn't something one does by hand right not quickly um the enigma machine is not a stupid idea right it's well designed the only thing that we've got now is much much more processing power and so things that we couldn't possibly brute force back then maybe we can start to begin to brute force now right and that's what we're looking at today so let's look very briefly at what the knitting machine is and then we'll have a go at actually breaking one and see whether it holds up if you recall it has some lights on some letters and it has this is my technical description has some buttons that you press keys are they cool sometimes so you press an a right you press an a and it goes into the machine and it goes through something called a plug board and the plug board will swap it so this is the plug board here now plugs just swap certain letters in pairs so maybe this a comes out as a an f right or something like that then this goes into the first rotor and so that maybe comes out as a q or something like that so you know it sort of just wiggles around in here and then this comes in this comes out as maybe you know a p i'm making this up and then this goes into the next rotor to stick to three rotary enigmas for today hey uh comes out as a sort of an s and then it goes through something called a reflector so this is getting quite complicated but this is all mechanical right these rotors literally physically turn around and they just have wires soldered from one end to the other that connect up so the reflector just bounces s goes to something else i can't remember which led is and then it goes back through this like this and then it comes around through the plug board again so maybe that d goes to i mean a's over here that was that was a bit silly of me wasn't it um over here for the purposes of this it's all absolutely fine and this comes out as a zed right so in my in in my weird botched enigma diagram the a went in and it came out as a zed it was encrypted as a zed now what's hard about this problem well you can take these rotors out right each rotor has different wiring there's usually five or eight rotors to choose from and you can put them in any position you like the next one is that every time you press a letter one of the rotors rotates and sometimes it rotates the next rotors along and that means that this mapping this transformation changes every time you press a character so you press a it's not going to do the same thing again no actually if you press a the rotor first turns and then lights up a letter right so it's not going to do the same again if you press aaaa the only thing you're guaranteed due to one of the quirks of enigma is you won't get any a's back but you'll get a lot of random letters back during the war they solved this by trying to find possible rotor configurations that definitely couldn't work or could work based on some guest playing text so maybe they had this idea that the first part of the message contained the word weather report and so if they put that in and they then they could find all the encryptions that couldn't possibly have come out as weather report and so on and they could start to narrow down what their options were now that's called a known plane text attack but during the war if they hadn't known the plaintext they wouldn't be able to crack enigma they didn't have the brute force power to do it nowadays we have pretty fast laptops right and you know beyond my laptop other computers are even faster than that would you believe so theoretically we could start to try out some of these combinations even if we didn't know any plain text what we're going to be looking at at least to begin with is a ciphertext text only attack that's an attack where we've only got the cipher text we don't have any idea what the plain text is and we want to see if we can guess what some of these settings would be there's a few interesting weaknesses of enigma that make it a little bit practical to brute force but not actually as much as you think we talk a lot about how a letter doesn't encrypt ever to itself and that's quite relevant for plain text attacks known plain text because you can then work out where plaintext definitely can't or could be we're not doing that today so i'm less worried about that particular property of enigma more interesting to me is the fact that if you get some of the settings right like you get this rotor correct but these two wrong that will often be better at decrypting than if you get them all wrong right so as we start to move slowly towards a solution we get a little hint that maybe this ciphertext isn't quite as good as it was and we can start measuring it so really what we need to do is find some way of putting some some ciphertext into an enigma machine with some configuration and then reading the output and saying is that a plausible sentence or not right and preferably we'd like to do it really really quickly because otherwise this could take a long time so we're not going to be doing any deep learning can't be bored with that um we're going to use very simple statistical properties of text to try and measure whether one sentence is better as english than another sentence right and there's a few of these and i've implemented all of them because i thought well let's just test them out right so the first one is something called index of coincidence so let's suppose you have a cipher text so for example oh here it is i'm just looking at random text right so yeah okay i'm just going to copy some random cipher text from here right so why i can't just randomize my own soft text i don't know but i feel more comfortable copying it than i do just thinking up clever interesting letters on my own said well so there's no number files on the fact that i don't pick as random as i think and stuff like that yeah so you then do s e uh b h w this is some actual cipher text that we'll be breaking later does it honestly start with zeus as in conrad's use yeah the reality of random yeah yeah yeah yeah um you see all kinds of words in the random ciphertext and they mean absolutely nothing because it hasn't been decrypted yet uh oh g yeah well exactly e u so let's suppose we have some type of text like this we guess some rotor settings and we put it through our enig machine enigma machine and that will decrypt it right now it will probably decrypt it incorrectly but where we accidentally stumble upon the right plugs or we stumble upon the right rotor configuration even for a briefly we'll find that this decryption is slightly better than completely random right because actually mostly this is completely random right yes we have this stipulation that letters can't turn into themselves but generally speaking it looks completely random so this is what we get out so h f my writing is bad today v v f l i n g now the interesting thing about this is i mean it's complete nonsense right let's let's i'm not cracking any words on this but i do recognize ing now ing is a fairly common trigram or set of three characters in the english language right now that doesn't necessarily mean this is correct but it's slightly more english should we say than this one right so if we were measuring some amount of how english is this sentence how you know then it's a little bit closer than this so maybe one of our rotors is in the right position and the others are wrong right or our rotor's in the correct position but our plugs are wrong or something like this right and the idea is that we slowly go through different configurations of our enigma machine i say slowly as fast as we can right and we measure statistical properties of these output sentences to find the ones that clo most closely resemble correct decrypted text right and we can do this without looking at them we don't have to look at them and say well that's a real word we just measure statistical properties so what are some of these statistical properties well the first one is called the index of coincidence or i o c this is the probability that when you pick two letters at random they'll be the same so for example if we randomly pick p and then we randomly pick p that's versus the same if we pick p and then l they're not the same you know we won't write the formula the form is not that complicated so but what you have to do is go through and count count every single character and how many of each one there are you produce a histogram and then you can calculate the index of coincidence based on this now for random text that is text that's been put through the machine are not decrypted the index of coincidence is usually something like 0.038 right which is basically everything's evenly distributed there's nothing interesting going on there at all but for decrypted english text we usually get a higher index of questions about 0.067 and i think it's something like 0.072 for german text one way of looking at it is it measures the the the fact that some characters have more higher probabilities than others if everything's equally likely you get something like this if some characters are quite common and so they tend to come up in pairs it starts to look a little bit higher right it never goes higher than this so well not really what we can do is we can work through our rotors different rotors different positions different settings and we can calculate our index of coincidence and we take the best scoring texts right so where our output has a higher index equations we think maybe we've got the rotor settings correct right and that's basically how it works there's been a number file right where they actually got to use the enigma machine and i'm super jealous they talked about the number of different combinations so it's all very well saying okay we'll just go through all the rotor settings and work out you know what the best one is right maybe if i have you know a super super super computer um but actually enigma has a nifty weakness in this sense right which is that if you get some of the settings correct this will improve right if i get the voter positions correct even though the rotors are in slightly wrong position the result will be better than if i've got the wrong rotors in place right if i get one of the plug board settings right the results would be better than if i got none of the plug board settings right because basically fewer characters will be incorrect right subtly so you've got three out of let's say five rotors or eight rotors so that's physically the three that happen to be in the machine at the time yeah and you've got their different positions right so if for today we just talk about five rotors just because i've always been sitting here a little bit longer then we've got 60 possible positions one we've got to choose three out of five and then they can all go in any slot you don't tend to have the same motor twice right because i mean they didn't have those duplicates of rotors yeah one set of them yeah yeah yeah yeah um so we've got that then for each of these we have a start position from from uh 1 to 26 which is you know what letter is showing on the top right basically how how rotated it is so you've got the start position or the indicator setting so there's you know times let's say 26 of those times by three of the different motors right then you've got the ring setting now the ring setting is essentially rotating the internals of the rotor right now actually if you if we ignore the notch for a minute which i'll talk about if you rotate the ring and you rotate the the um actual whole rotor they kind of cancel each other out so it's about the notch position really the notch is when the rotor turns it turns the next one along and so the combination of your start position and your ring setting will mean where your notch is and then when it turns around so you've got the ring settings right which is going to be 26 x 3 again then we've got the plug board which as you know swaps random characters and that's got something like 10 different pairs out of 13 possible pairs that's 150 trillion i think different combinations that's you know out of reach of my laptop um certainly when i'm doing all these decryptions as well and we're multiplied by all these things right the number file goes into much better detail on this we're looking at five today because uh again i don't want to be here all day it does get harder to solve and we'll talk about that so this is a lot of combinations it's too many combinations for me to go through even with this nice little industrial coincidence thing right even though when i get this exactly right i will just get the plain text out and it will have a very nice illness coincidence and i might be able to find it so what do we do well the weakness of enigma is that if we get some of these things right even if the others are wrong we get a little bit closer to the answer usually so for example if you get the correct three rotors in the correct positions and you get their start positions roughly correct if your ring settings are wrong all that'll do is mess around with the notches so you'll get bits of your plain text correct and then bits of cipher text and then bits of plain text and you get these kind of pockets of valid characters coming out of into the into the decryption it will still score better on ioc or any other metric so that's what we're going to do and this is the same with the plugs if we get the rotors and the start positions and the ring settings correct then we can start to guess plugs and generally speaking if we guess one correctly the output would better and we can then move towards a solution there's a lot of possible variations but the fact that we can deal with some of them at a time makes this practical right if we had to brute force through all the different variations it wouldn't be possible that's the idea so i've written some code to do this um if you want to have a go i'll make my code available but also there's a really good online tool called crypto which lets you do this in a visual way uh we'll put a link to that in the description but i've written some pretty simple code here i implemented an enigma machine because it was fun um and then i implemented a number of different fitness functions right which is how good is our decryption in index of questions is one right i also um maybe we can talk about some others another time so you were kind enough to send me some ciphertext i don't know what it is and it's been encrypted by some enigma configuration with i think five plugs the first thing i do is i go through all the different rotor configurations and i find the one that has the highest index of coincidence score when it decrypts that message so this is of five different rotors each one tried in each position and at each starting position so that's 26 for each one so that's quite a few combinations about 17 000 but 17 000 for a laptop in 2021 not such a big deal right takes somewhere around 10 seconds or something like that right so you can see what it's doing now is it's stepping through the different rotors so one two three one two four one two five and we've done about 10 or 15 configurations already and for each of these it's going through all the different starting positions but we're not looking at ring settings and we're not looking at plug board because that will just multiply this astronomically by the number of things we have to do so we're already on rota 3 in the left-hand side we're keeping going this same thing works exactly the same for eight rotors it doesn't really change anything it just takes slightly longer and i'm a bit lazy so i have actually coded up the other motors as well um interestingly enough some of the later rotors have two notches on right which is not it doesn't make any difference in terms of the cracking because um so that just means it turns the next one twice as well yeah yeah twice often yep um only really affects the first two rotors the last one doesn't ever really turn that often and it doesn't have any other motor to turn so here what we've got is we've got a list of the top performing rotor configurations so 2 5 3 is the best performing rotor configuration with start positions of 21 3 and 25 i'm using zero indexing right which is not how you would normally do it in enigma but it was easier for my array indexing to do this right and that has an index of coincidence of 0.043 which is a lot higher than 3 8. i say it's a lot higher it's a little bit higher good enough so that suggests to me i mean we actually i listed the top 10 here because sometimes you might not get the one on the top one you might get the next rotor configuration or something like this it's worth maybe if you were trying to really actually pay attention to this what you would do is maybe start doing further attacks on the top three rotor configurations just to keep your options open so we're gonna fix at two five and three because you know it saves time right um so given rotors two five three from left to right and their starting positions what we now can do is we can start to brute force through the ring settings so we can find the best possible ring settings right now this is almost instant because there's now only 600 or so of those right um we don't need to try the left-hand rotor because it doesn't really rotate and so we do that and it's already happened and the best ring positions were 0 3 23 right now the zero we ignore because it's not the ring possessing remember affects where the notch is and the left rotor doesn't turn anything over so it doesn't have any effect so given that this is the ciphertext we've got out this was our original ciphertext and this is our slightly better ciphertext now it still looks like total nonsense right but it has a much higher index of coincidence score than the original which means in some sense it's less random so if you look you might start to see groups of letters that might be a real word they might not i don't know right but some of the real letters are going to come out here we might not be able to see what they are so given this we can finally start addressing this really problematic plug board situation remember there's far too many plug ball combinations to realistically just try them all but again we have this wonderful benefit that if we get one of the plugs correct the result will probably be better than if we got none of them right so what we do is we go through each of the first 300 and so different possible plugs just one at a time and we see which the best one is right and then we fix it and we do the next for the next one and that's two plugs and then we do for next to the next one and that's three plugs right and so on and if we do that we very quickly come up with a few sets of plugs and our ciphertext is starting to look a lot better right so this is our ciphertext here the first letter is a nonsense but then is reposed to consider the quest que consider the quest can machines think ah see now i'm starting to guess what this might be oh i see what you've done here this is the alan turing paper so some of the letters are wrong right so it should be i propose to consider and it's j brapos to consider right and we're nearly there but that's because when we optimized the motor configuration we fixed the rings at zero zero zero so it was never going to find the exact correct thing so essentially the turnover is slightly wrong everything's slightly wrong but it's still pretty good now if we go back to our original question as to how secure is enigma the answer is not very secure right and the reason is not because it's trivial to break right this took me a little bit of effort and for short messages where these fitness functions start to break down because you don't have enough information they're actually actually very robust right for a 50 character message very very difficult to break using something like an index of coincidence because even if some of the letters start to appear right there's not very many of them the index is just noise in the war they limited or they they attempted to limit messages to something like 200 250 characters for this reason because index of coincidence was already known right and there's now more powerful metrics like trigram scores and quadrants which i've also implemented which often work better particularly for the plug board and so if you have a short message you don't get very much information on the different frequencies of different groups of letters and so there's really no way to know what's going on at all right and you get very lucky or you don't and most likely you don't right the other issue is of course the number of plugs i've only done five plugs here so i've cheated a bit right for most german messages were sent using 10 plugs you're going to need 1200 to 1500 characters before fitness functions are going to start to give you something right if you know what the playing text might be this becomes much much easier right because if you can fix these characters have to be exactly this your fitness function is much less noisy right i know actually i've implemented that as well and it just starts breaking it no no like nobody's business right so it is crackable right if you know if you can guess what plaintext is and of course modern cryptography assumes you know what the plain text is right at least for some of the message for example whenever you send an http message to a web server the beginning bit always says http get or something similar right but there's a very known structure to these things you can start to guess what they would be we can't assume that you wouldn't know what any of the plaintext was but even if we don't you can see that these index of coincidence and trigram scores and things can start to tease out some information um so going back to the beginning enigma is actually harder to crack than i thought right people always talk about how hard it was to crack during the war and that's absolutely fine but you just kind of assume that now it's 2021 laptops should be able to just click go for all the settings find yourself the ciphertext doesn't really work that way right you have to try and stumble your way towards it and often it doesn't work and there's noise in the output and so you have to try and work out whether what you're seeing is actually improvement or not and so on which i think is quite interesting modern ciphers don't have this problem if you have a 128 bit as key you can't brute force the first bit because different like the zero or one won't give you any better or worse playing text right it will just be nonsense each time and that's true of any amount so you can't do like the first half of the key and then the second half of the key which is kind of what we're doing here so modern ciphers don't have this problem and so on we can't do a lot of interesting things with this image after just one set of convolutions but we're getting there so this one is starting to be transformed some of them are noisier than others paint associated with it showing which was the proper setting with a at the top an a against the dot of paint AUTOCAPS MANUALLY PUBLISHED

Info

Channel: Computerphile

Views: 787,747

Rating: 4.8946471 out of 5

Keywords: computers, computerphile, computer, science, Enigma, Turing, Mike Pound, University of Nottingham, Cipher, Cracking, Crypto, Cryptography, Alan Turing, Bletchley Park

Id: RzWB5jL5RX0

Channel Id: undefined

Length: 21min 20sec (1280 seconds)

Published: Mon Apr 12 2021