Mathematical Cryptanalysis in the Real World

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so hi everyone I'm MOS I'm an assal Professor here ATO Abu Dhabi um I'm hosting Nadia but also I'm hosting the conference that most of you participate at uh today though this is a public talk so besides conference participants we have people from outside the university that's one of the uh contributions anyway offers to the to the UE we offer public talks and everybody can come and enjoy uh we did combine opportunity to get Nadia here to talk to the public and to the conference so she has this difficult job to cater both to non experts and experts and see how this goes uh a few words about Nadia she's an associate professor in computer science and engineering in the University of California San Diego uh he is focus on mathematical and empirical crypto analysis of public crypto and that's actually the title of the talk she the recipient of the NSF career award prestig in the US best paper awards from ccs and Yik security and the test of time award from Yik uh from 2013 until 2018 she assistant professor at the University of Pennsylvania and she received a pH Compu science 2011 from priston and spent time as a post researcher at UC San dieo and Microsoft research New England Nadia the floor is yours thank you very much for the invitation to to be here um this is very exciting it's my first time here so um and yes so this talk is about mathematical Crypt analysis in the real world which is going to be very general in some sense um so I figured I would start by making it somewhat concrete what I want to talk about um so here I've put the cartoon diagram of a generic cryptographic protocol handshake which is intended to sort of abstract away details of of TLS SSH Ike kind of like the network communication protocols that that we use um and so here I guess the diagram of Alice is you the user and then we have Bob some host that um she wants to connect to and most of these protocols you start by doing some kind of key exchange usually Diffy Helman um and then you authenticate the key exchange using some digital signatures it could be oneway or it could be bir directional Authentication um and then after doing this handshake you switch to doing some kind of symmetric authenticated encryption mode um AES with whatever mode you want and okay so this is this big picture and the important part for this talk of this picture is that we have this key exchange algorithm thing and um a uh digital signature that's happening okay um the reason why we do this kind of cryptography is that there are many places where you have some kind of network interference and so the cryptography is intended to protect against this sort of network interference this I discovered um when edgy room was not working this morning anyway so um the cryptography that we use for this kind of public key um these public key operations the key exchange and the digital signatures um is actually quite old the the mathematical assumptions underlying them so the paper published in publicy cryptography was the Diffy and Helman paper from 1996 or 1976 um they introduced the discrete log assumption um and that is still in use for both um key exchange and digital signatures um integer factorization was published like a year later by the um RSA paper um and that in theory can be used for public key encryption but is not really used for that anymore it's mostly used for digital signatures but that is the pervasive digital signature algorithm use um and then in the mid 1980s the elliptic curve discret log assumption was formulated um and that can also be used for key exchange and signatures and those are basically the three mathematical assumptions underlying all public key cryptography in use on the internet today very this is going to change very soon and it is in the process of changing now um so uh the first sentence of the heling how many of you actually read the Diffy and helmet paper half the audience everybody who hasn't read this paper I mean it's it's amazing to go back and read it from like a modern perspective so if you haven't read it you should totally read it right um so the first sense of this paper which like not many of us actually have the ability to write in a paper is We Stand today on the brink of a revolution in cryptography and they were completely correct and it was not like just ego that they got to write this right this is like one of the foundational papers of of like the last century um and we stand today on the brink of another revolution in cryptography which is that we are about to trash all of these assumptions and have new postquantum ones right um so I put this slide in I felt obligated to put this slide in because everything else that I'm going to talk about for the rest of the talk is completely classical cryptography um but um it's about to go away um so we are in the midst of deploying the uh Quantum resistant algorithms that have been um standardized by nist so there are some new assumptions that are coming out that are believed to be secure against quantum computers so like module lwe hash based signatures and true um these are the current um algorithms it is going to be a bumpy road because we are discovering thanks to Crypt analysis um facts last minute about some of these algorithms that um are a little bit uh awkward so um a couple of uh papers came out um both in 2022 very late in the nist standardization process um one of them um breaking um super singular yogy Dy Helman um and Not only was it broken by quantum computers it's broken classically in 10 minutes on a single core which is like really awkward given how that this assumption had been around for at least a decade and that people had been building all sorts of fancy crypto schemes on it um and then a second paper um breaking rainbow takes the weekend on a laptop took out the most promising of the multivariate quadratic um schemes so um they're working on replacements for this um but this is the kind of amazing progress that shows that you can have Decades of work on cryptography based on some assumption and the Assumption turns out to be wrong the moment that somebody actually looks at it and understands it deeply so this is going to be an interesting time um okay so for the rest of the talk we're gonna go all the way back to 1976 and talk about RSA and I'm doing this because I don't know I like RSA it's easy we teach it to our undergrads right and like there's a version that half of you teach to to your undergrads as well um so the sort of textbook RSA and I'm going to specialize in signatures because that's basically what we use RSA for so we don't even bother with the encryption part anymore um so hopefully you have all seen this before because we're going to do versions of this um uh getting fancier and fancier over the course of the talk okay so in RSA key in theory you start by generating two primes p and Q say 10 24 bits each multiply them together that's your public modulus in your key you have encryption expon everybody uses 65537 um your decryption exponent is e inverse mod V of n okay that's your public key your private key and then to generate a signature you hash your message you Pat it the details of pading will be important later um but everybody you can imagine just some kind of fixed string and then you raise it to the de power mod n that's your um secret exponent D um that's your signature send it over order to verify it you raise your signature to the eth power mod end okay so this is textbook RSA I have added the padding and and um hashing bit and this is what everybody does on in every protocol on the internet now okay and it's amazing that this hasn't changed that much since 1976 and in fact the details of the padding and the hashing haven't changed since the 1990s okay so um sort of to put this in context we all think okay RSA is secure and in fact the reason we think RSA is secure is that we don't have a good way to compute the secret exponent D um unless you know the factorization of n so okay RSA is based off of the hardness of factoring and so here we have this picture of we have like the Rays of mathematics shining down upon like a tower of um abstraction within computer science we have of like algorithms producing cryptography producing security and then this is what we actually use to secure our computers so here's the picture and then the Tower of assumptions here well at the highest theoretical level we still don't know whether PE is equal to or not equal to NP and it seems like we have very little chance of resolving this anytime soon um or even resolving like where on oco's five worlds we actually live whether you can have cryptography um even outside of that the factoring problem itself is probably not NP hard um if it isn't if it is NP hard the polinomial hierarchy collapses and we are um which is sort of unlikely unless you think that like um something close to to P um uh actually I don't remember what the some some like resolution to P versus n even outside of that despite the fact that the RSA problem is like 47 years old we still don't have approvable equivalence or non-equivalence of um the hardness of breaking RSA and factorization which is amazing right everybody knows that RSA is is based off of factoring and yet there this is not a provable statement um it hasn't been proved it could be proved um even given that there are subexponential time factoring algorithms which is why RSA Keys have to be so large so much larger than say elliptic curve keys so we're already sort of chipping away at the at the foundations here on a mathematical level um then on a sort of more implementation level when you design your protocols you need to choose specific parameters um and you choose them hopefully to evade all known attacks you have to decide on some kind of padding fun um padding function you have to decide kind of choice of hash function then you embed your cryptographic algorithm into a protocol and hopefully use it in a secure way um and then when you actually implement it you have to do things like Generate random numbers hopefully you've done that right and then um even if you write this beautiful protocol this gets given to developers and the developers have to actually implement this and they read the protocol and they make mistakes and users make mistakes and then sort of at the bottom physical level level we have just that computers are physical devices that have physical phenomena like errors so we have this whole Tower of things that can go wrong with our algorithms so this is the perspective that I want to take in how to break things and we have kind of an amazing variety of like layers of abstraction here okay so this is um an cartoon that I feel obligated to show every time I talk about RSA so the first part of this cartoon is a crypt nerd's imagination his laptop's encrypted let's build a million dollar cluster to crack it no good it's 496 bit RSA blast our evil plan is foiled well okay we can ask what that actually looks like oh the colors are okay it's just dis monitor okay um so what this actually looks like in practice is um wanted to break factorization um using the best known algorithm um the best algorithm that we have is the number field Civ um this is an algorithm that dates to the early 1990s um it was developed by a number of people um it is a complex algorithm with multiple stages um and it relies on a set of number theoretic um puristic assumptions um there are a couple of high performance implementations of this algorithm and every 10 years or so you can see a Consortium of a bunch of people get together pool their Computing resources and do another factorization record so the progress has looked something like this okay 512 bit RSA was factored in 1999 uh 768 bit RSA was done in 2009 um 829 bit RSA was done in 2020 so that's the current record um a back of the envelope calculations says that 1024 bit RSA would be if you just sort of take these these numbers and and if you take these computation time and plug it into the ASM totic formula for the number field Civ would be about 500,000 core years uh without any algorithmic improvements this is eminently doable by any large organization and has been doable for more than 10 years now probably um the only reason it hasn't been done by academics is that the handful of people who are actually working on this problem don't currently have access to this much computing power um that is the only thing keeping it from happening um I don't know how many of you have done sort of this kind of Crypt analysis work but what it looks like to babysit a computation for six months or a year or two has anybody Babys had a computation for like six months or a year or two it's a drag like you um especially you know you imagine that like you have an international team and everybody has like a couple of racks of servers and and you're trying to like um coordinate this thing going back and forth basically you sit there and you like watch the output and then a disc dies and you have to like restart the job and so getting the developer time and the CIS admin time to just sit there sisadmin time being like a researcher has to actually just sit there and like CIS admin the machines that they're running um to do this it's a pain and the amount of computing power costs a lot of money so but that is literally the only thing that's happening that that's keeping this from H like more progress from happening um so people somehow think that factoring is a super well studied problem but in fact you know it's studied by five under French people and that's it right now um so uh I just want to scare you a little bit this is this is old news um but okay so the ASM totic running time of the general number field sa the best algorithm that we have for factoring is this horward looking expression um which it has this L function which appears in every number theoretic algorithm that uses estimates coming from the prime number theorem we don't really necessarily care about the details but we've got this like exponential in like log n to the 1/3 log log n to the 2/3 and then we've got this like horrid looking constant like cube root of 64 over9 um this number has been the same since the early 1990s um this is also the same running time for um Prime field discreet log and it was the same running time for all um discreet log over finite Fields until from from the early 199 until 2013 when a series of improvements um took this running time from this to essentially quasi polinomial time so like n to the um Big O of log n um and this was not just a theoretical Improvement they've now been able to compute discrete logarithms in fields of size like 6,000 bits um and this was this this Improvement just came out of you know I think one major Insight plus a series of of small improvements um and I just want to say even a very small Improvement in the number field save running time would have a large impact on RSA key sizes so for example um there's a version of this algorithm which has a constant that's cube root of 32 over9 rather than cube root of 64 over9 for this constant here um and that only applies as far as we know in special cases but this is doable for um special 10 24 bit um integers just on academic resources um and if there was an improvement in this one3 to like a one4 just back of the envelope depending on what this constant is this could take out 20 48 bit RSA that's how close we are to like the the key sizes are to to these kinds of algorithmic improvements so um the only reason I think that this hasn't happened it could be that this is fundamentally the true running time of factorization but ially if you look at this running time does this seem like the proper running time for such a fundamental problem I don't think so but you know um anyway so all of this rests on the heads of of five underpaid French people um okay so um here's another XKCD cartoon and this is I think kind of the state of a lot of crypto analysis that we have like all of our modern cryptographic infrastructure and then there's like um a piece of academic software that is maintained by five underpaid French people um so all right I just wanted to scare people a little bit okay so here's the other half of the xkcd the other XKCD cartoon that I was uh showing you so instead of blaster evil plan is foiled trying to factor uh RSA the hard way um what actually happens since this is kind of a security audience is his laptop's encrypted drug him and hit him with this $5 wrench until it tells us the password that's how you actually break things you don't break the cryptography out right you do the easier thing um this is this is how you actually do threat modeling um so I am a cryptography person still so I don't um I don't hit people with five doll wrenches I hit algorithms with $5 wrenches so there will be no violence in this talk except to algorithms okay now I'm going to go back to something that I did 12 years ago so maybe you've heard me give a talk about this before it will only be five minutes and then we'll move on to new stuff um but I want to show you the algorithmic Five Doll wrench um which is um we just spent a lot of time talking about how complicated factoring is um but it turns out that well RSA modui have a bunch of nice algebraic properties so for example if we have two RSA moduli modulus one is the product of some prime P times some other Prime q1 and modulus 2 is the product of some prime P times some other Prime Q2 you can compute the gcd of modulus one and modulus 2 and get out p and the important difference in doing this is that the amount of time that it takes to compute the GC C for two um 10 24bit RSA moduli like 15 micros where of course our back of the envelope estimate for the outright time to factor each of this each of these was like 500,000 core years so this is an important difference now I'm saying this but of course um we this should never work as a real world factoring algorithm right um You can imagine doing an experiment so we collect a bunch of RSA moduli and they're randomly chosen from a collection of of primes and if you tried Computing the par wise gcds of all of these this should always this should never result in a non-trial common divisor so the um back of the envelope estimate here is that okay you have like two randomly chosen primes per um RSA modulus um the prime number theorem tells us that we have about like 10 to the 150 52 bit primes um and so we have we have a balls and bins problem so we have like two M um primes uh Prime balls being thrown into 10 the 150 um bins and so the probability of a non-trivial common divisor is this function which like doesn't become anything approaching a constant until we have like M being about square root of p and so if we plot this function it's basically zero until square root of P which is pretty close to the number of atoms in the universe so this says that if you were building a um say like a certificate infrastructure for an Internet of Things where every atom had their own certificate you shouldn't use 1024-bit RSA but beyond that you're like probably probably fine so um here is the research program that we have for the rest of the talk which is um I guess um I've been calling it shooting cryptographic fish in a barrel um so if you think of some mistake that somebody might have made no matter how unlikely it seems so for example finding RSA moduli with shared factors and then you collect a bunch of data from a bunch of implementations um so you can do this by scanning the Internet by checking certificate transparency logs by setting up a network tap at your institution those kinds of things and then you just check if somebody managed to screw it up you will find somebody who screwed up and if you haven't found somebody who screwed it up it's that you haven't collected enough data yet and you can just sit collecting data until you find somebody screwed it up and then you write a paper and then you get okay so if you do this experiment for um RSA keys with common devisers it turns out to work way better than you expect it to um and in fact it has continued to work for more than 10 years and has factored large numbers of keys so we have um let's see 12 years ago there were a large number of um hgps uh certificates that shared Keys SSH servers whose uh host Keys shared common factors um some pgp users some um smart card uh generated keys that were in certificates uh for a um Taiwanese um citizen smart card deployment um age uh let's see some tour relay public keys and even in 2022 uh Google set up a project to look for this among a number of errors in certificate transparency logs and they found a couple thousand keys that had common factors so this is a problem that is not going away okay so I just told you that this should never happen and then I've told you that this happens rampantly on the internet and it's not going away um so what happened was that what we found was widespread random number generation failures on low resource devices so nearly all of the keys that uh were compromised in this way um either had some kind of basic vulnerability in um their software um or uh some kind of Hardware failure um and the basic pattern is that um they failed to see the random number generator before generating their keys um and what happened is that if you don't if you hadn't seen in your your random number generator yet when you generated your first RSA Prime and then seating did go through at some point while you were generating your second prime then the first Prime would be the same and the second prime would be different on different machine on different devices and so um uh this the most common failure mode was a a problem with the Linux random number generator in com combination with how op SSL was generating um uh RSA keys so the the picture here is that you have this sort of like Tower of random number generation so you have um sort of Hardware um sources of entropy going into like you being used to seed the Linux random number generator which was then used to seed the op SSL application random generator which was then used to generate RSA keys and if you think about a little Network device um they often automatically generate their crypto keys on first boot um a little Wi-Fi router lacks all of the entropy sources that were assumed by the Linux random number generator at the time and that meant that um on many of these devices um the Linux random member generator had not yet been seated when it was queried by opsl um so this was patched in 2012 um and in 2022 Jason dornfeld did a lot of work uh trying to randomize the entire design of the Linux random number generator um some of that was rolled back because of performance issues so I think the progress is like ongoing um but just because this was patched doesn't mean that the problem went away so this is a um a plot from a follow-up paper that some students of mine did um several years after this so we went through a ton of work to try to notify more than 60 vendors of this vulnerability um and um some of them acknowledged and pushed out patches some of them did not acknowledge and pushed out Pat push out patches but the Linux random number like the the Linux kernel had been patched So in theory you would think that this V would go away um but if you look at the number of vulnerable hosts over time and sort of starting um around 2012 there were regular Network scans happening of uh hgps so we could have some visibility into like the long-term trends here um the number of vulnerable hosts kept going up and here there's some differences due to like different scanning methodologies but you can see that the number of vulnerable hosts is continuing to go up um and this is in part because man facturers are introducing newly vulnerable products over time so doing all this work of of disclosure somehow um this vulnerability kept getting reintroduced often because they were using some old old Kel version um those of you who have tried to do disclosure know how difficult it is um so also in doing this um paper it ended up being sort of an unintentional experiment on disclosure um and this is what it looks like to try to disclose to like 60 companies um half of them just never respond at all um and then some of them respond and put out of public advisory and some of them respond privately but don't actually like ever put out a a public advisory of any of any kind um so I think the disclosure process has changed over the years in that um many vendors now have a more formal process and they understand that external security researchers find vulnerabilities but it's still extremely fraud um and we have something um under disclosure right now and I think uh it is much easier to go through an organization like C um if they will acknowledge you so this is uh one thing one piece of uh one piece of advice that I have for you is that um trying to disclose things to vendors is very difficult um okay so that's setting the stage for some more recent work that I want to talk about um so if we go back to our picture of textbook RSA signature Generation Um the picture that we have is um okay so we have um a signature being generated as as a padded message hash um to the D mod n and then verification we just take the signature to the E and check if it's equal to the p p m hash now most implementations use the Chinese remainder theorem to implement RSA signature generation so using the Chinese remainder theorem um just means that we do two modular exponentiations of half size one of them mod P and one of them mod q and then we combine them to get the answer mod n um and that's the signature that you actually send over the wire right so this is a common um implementation optimization it gives you something like a factor eight of Improvement in the time to do module exponentiation now there is a classic side Channel or classic fault attack against um this kind of RSA uh signature Generation Um and in this fault attack um what happens is if there's an error in one of these half um exponent Generations so like if there's a if p is gener if the um signature generated mod P is correct and the signature generated mod Q is incorrect um and then the implementation just blindly combines them using the chers and reer Theorem um then they will send over an incorrect implement or an incorrect signature over the wire and an e dropper who can see the signature can then um take the signature raise it to the E power then they have something that is equal to mod mod P and not equal to M mod q and so then they can just compute the gcd of this difference with n and then discover one of the prime factors of N and Factor n that way and as we've seen Computing gcds is super efficient right so this is a very old and classic vulnerability the mitigation for this is to validate signature RSA signatures after you send them um this is well known and it's implemented in most common libraries so the history of this vulnerability um goes all the way back to Bon Deo Lipton who wrote the first paper in 1997 on fault attacks against cryptography um and then um shortly after they published their paper um lenra gave an improvement which is the version that I that I just gave to you so um starting in 1998 um there were basically Decades of papers showing how to exploit this in the context of side Channel attack so if you want to evaluate the security of a smart card you glitch it with a laser or something while it's doing RSA signature generation and then carry out this attack and and so um in this in this uh threat model implementations must validate signatures before they are sent over the wire right or before they they exit the device um this was patched in op SSL in 2001 so a long time ago um however in 2015 there was this uh super cool white paper put out by Limer who did a bunch of network scanning of um TLS hosts and found implementations that were generating bad signatures in an untriggered fault mode like he just scanned he just connected to them via TLS got bad signatures and got their secret Keys um so um and he observed that like most of the cryptographic libraries that were in common use except for were unprotected against this attack so um in 2015 most of the common cryptographic libraries um implemented the the counter measure to validate signatures before sending them um and then in 2022 um I and student and some collaborators were looking at the traffic on our networks and found bad signatures that were vulnerable to this attack so the way that the exploit for this works um in practice is um it relies on pkcs number one v1.5 padding so this is a horrible name but this is the most common um RSA signature padding still um for TLS 1.2 and Below um TLS 1.2 is becoming less and less common over time but it is still about half of the TLs traffic on our Network at UCSD um TLS 1.3 uh requires randomized uh padding for signatures um but I think it is still allowed to use uh pkc as number one B 1.5 for um certificates for TLS 1.3 so the important thing about this padding scheme is that um it is deterministic and the padding it looks like basically you have your hashed message um you have some metadata that identifies the hash this is just a fixed value you have a zero BTE then you basically fill um the rest of the space up to get a value that's about the size of your modulus n with one byes or one bits um then you have like a one a one bite that is identifying the the type of hashing in a zero BTE just to make sure that you don't overflow anything so all of this is deterministic and known so the hash is the only thing that changes message to message and in TLS 1.2 the protocol handshake that is done so the um say key exchange um and the signature that the server is using to authenticate itself using the key exchange is all done in the clear and if it's using pkcs number 1B 1.5 padding this is um this padding part is all deterministic and that means that an an attacker has the entire value that is used to compute um the signature um because the hash is over values that are that are completely known and so this um gcd computation is is completely trivial for an attacker who can view the handshake that is going over the wire so going back to our research program um we can think of some mistake that some an attacker might have made so for example having A Fault In Their RS a key generation ory uh RSA signature generation you can collect a bunch of signatures and I want to talk a little bit about the difficulty of doing that um you can check for the error you can find someone who's screwed up and then you can write a paper okay [Music] so I mentioned a little bit about active scanning and um I think active scanning has been a well studied aspect of security research for 10 to 15 years now the first sort of full ipv4 scans were done around 200920 um and fast scanning tools were published fast sa TCP scanning tools were published around 2013 and so there are multiple options if you want to scan the entire visible ipv4 space for some protocol um so uh it may cause some errors like it may cause some difficulty with your institution depending on their policies and and whether or not they they understand doing this kind of network research but it's fairly well understood um something that is interesting that I have been working on doing but that to be much more difficult is passive network analysis so in theory there there are existing tools that can do this and the way that you would do this is you could say um get your system like Network administrators to agree to let you have access at your institution to some data um you could split your network traffic to some analysis machine that you have um and then there are software tools like Zeke formerly known as bro that will do say TCP flow reconstruction and protocol um and then you can pull out the the data that you're looking for and store and analyze it in practice this is much more difficult than what I just said especially for doing this kind of research um so of course there are like ethical legal and privacy issues you may have to deal with um I mean in the US we have irbs whatever the local analog of an um institutional review board that you deal with um the existing tools either scale poorly so bro is is reliable but you need a large number of machines to cope with um large amounts of network traffic there are theoretical academic improvements on this but they they are unreliable um because it turns out that doing all of the um sort of dealing with all the TCP stuff and all the protocol stuff is very hard to do at scale um and part of this is that debugging is hard in a real world environment so finding things that weird things that go on in networks um you eliminate them one by one it's very painful um and the final thing is that it's very difficult I mean you can't really share this data it's it's very privacy sensitive so even if you anonymize it we all know how bad anonymization is um and so um it's it's hard to get access to it's hard to share and then even when you um do have uh data that you can analyze um in a real context you may have very limited metadata um you may or may not be able to see things like certificates or Sni or even IP addresses and so if you see a weird handshake that goes over the wire trying to figure out what it is it's much more difficult if you can't if you like aren't actively connecting to the machine yourself so this is this is difficult and I put this PL here this is actually a plot that we put in the paper and this shows you kind of how hard it is um so the blue dashed is the amount of traffic that we were seeing at the UCSD Network and the red is um my collaborators at CU Boulder so you can see that their machine was down a lot of the time um just because but they they sometimes got more data and then here this this was 2022 and in January the entire University got shut down because of the Omron um wave and so you can see that there was very little traffic on the on the network and then it sort of came back when when things came back and then the next quarter that started here um we actually had much more traffic because there were many more students on campus so we had sort of real world logistical issues limiting the amount of data that we had access to okay so given all the caveats of like this is not a a reliable statistical sample um still we were able to collect TLS data on our networks for eight months we saw almost six billion TLS TLS handshakes um we validated 2.7 billion signatures um of those 1,700 of them were invalid RSA signatures and of those 1700 invalid ones there were 11 that resulted in a factored RSA public key and um these vulnerable signatures in fact resulted in three distinct certificate Keys all from different uh domain names belonging to bu.com so Vu is the Google of China so this is kind of a significant key have gotten um we were able to talk to um some people at V and what they communicated was that the their infrastructure um the way that they do TLS um is very similar to the way that a lot of large organizations do TLS which is that um they have um they offload signature generation to um Hardware accelerators so they have um um kind of the um a front end library that um was um using their custom um Hardware implementation to do RSA key generation or RSA signature generation and then um when uh their uh these signatures were like processed back through the library even though that they were using um something based off of go uh the way that they were using it was bypassing the existing software checks for RSA uh signatures and um it appears that their custom implementation was not validating the signatures before they were being sent back through this Library path um and in fact the particular pattern of the failures that we saw um this is the total amount of the total number of connections that we saw going to B like bu Associated domains from the vulnerable uh keys and then we saw just a handful of signatures um that were vulnerable just over the course of a week and then they disappeared so this appeared to be just a laaw that was present perhaps in a single dying fpga that then completely died or was removed from the um from the data center and then we never saw a problem again so this is this kind of of transient uh issue and this is the sort of thing that we ad done a single connection to bu.com we would not have likely seen a vulnerable um signature so it was only because we saw this large amount of traffic um from presumably kind of their entire data center over the course of months that we were able to see this kind of rare law okay um we also did some active scans to um see what we could find there and in our active scans um we found number of other devices um that were vulnerable the majority of them were um Cisco ASA devices um and what we saw okay so each column here is a distinct factored key and so you can see that um once we see like one bad signature we saw it in every single scan essentially um some of them were a little bit more transient going up and down and some other implementations seem to be like completely transient but for many of these it looked the just completely failed and then we never got a valid signature from from these devices um some of these were um devices that had been loaded with um the Wild Card certificates for the entire domain of an organization um because they were like say bpn products um the thing that is somewhat unexpected here is the Persistence of these errors so um what we thought that we might find in doing this kind of research is like say you know a gamma ray hits um some RAM and causes like a sporadic error we would like we thought maybe we would see like these very sporadic errors but in fact what we what we see is that somehow Hardware is dying and then the fact that the hardware is dying is just spewing out people's private Keys consistently over the Internet so was not quite expecting how common failing Hardware was um so yeah the the hardware failure aspect I have no idea if we're actually seeing you know sporadic failures due to things like cosmic rays because the hardware failures are so much more common that they are sort of outweighing everything else that that we're seeing so I want to put this in a little bit more of a historical surveillance context so um if you go back to say 15 years ago and you think about organizations that had visibility into Network traffic 15 years ago um this would have been ideal for um surveilling TLS traffic so um back 15 years ago when say TLS 1.1 and 1.2 were dominating um the internet um RSA keys were used both for signatures and for key exchange and you would use the same certificate key Either to do a signature if you were doing a hel key uh key exchange or to um for the client browser to encrypt the sh uh premaster secret and send it to the server and so if someone doing this kind of passive network analysis had viewed a single faulty signature that revealed the secret key they could then go back and decrypt all of the key exchanges that were using RSA encryption and get the um the premaster secret and then use that to decrypt the actual content of the of the traffic and so um this would have been extremely convenient and some of the documents in the Snowden slides which have been public now for 11 plus years um make it clear that uh multiple countries were both tracking um TLS and that they had um V ities that they were uh testing against RSA moduli so um the fact that most libraries in use on the internet did not patch until 2015 means that until then um probably a large fraction of internet traffic was actually vulnerable to this kind of vulnerability so um but this is this is true for TLS say 1.2 and Below um and so if we analyze sort of what an attacker needs um to carry out this attack the basic gcd attack that I um uh that I outlined we need to have a bad signature coming from the server we need to know what the hashed message is and we need to use deterministic padding for RSA so this was true in the TLs 1.2 and Below context um and so that means that we can attack this protocol passively um TS three uh we actually don't have any of these things so the signatures are actually ENC coming from the server are encrypted because the hand the entire handshake is encrypted after you do the first Diffy Helmet Key exchange so a passive Network attacker can't even view this signature um they also um don't know um the hash of the message because it includes things that a passive attacker wouldn't be able to to view um and then um signature scheme is using um non-deterministic padding and so um the uh fact that the attacker would need to have deterministic padding to carry out this attack um also is failed so we can't TLS 1.3 is fine from this perspective um SSH protocol the handshake that is done in SSH is actually done in the clear and you don't change to encrypted traffic until after you um finish the public key handshake so the signature for SSH is sent in the clear um but the hashed message includes the Diffy Helman shared secret and so a passive Network attacker would just see the Diffy Helman key exchange but they don't know the shared secrets so they can't compute the message hash so they can't do a gcd and get um the secret key even if a bad signature is sent um but the um sign for RSA um in SSH is often done using pkcs number one v1.5 so we do have deterministic padding so the SSH protocol was previously thought to be safe against this kind of uh fault attack but it turns out that it is not and we can overcome the problem of the the hash message contains the Diffy helmet shared secret so if we go back to um pkcs Number One V 1.5 signature padding um we have this deterministic fixed thing and then um the hash of the message which in SSH the attacker doesn't know but this is like relatively small compared to n so we have this big thing that the attacker knows that's a prefix and the kind of small little hash so I want to show you how we can actually exploit this so here's a toy version of this um so I'm going to generate a completely leg um 20 48 bit RSA modulus so 10 24 bit Prime P 10 24 bit Prime q and is p q and E is 6537 now um I Oh I thought I had better things okay so um I am going to just compute some fixed padding so this is just going to be like all one bits um and then I'm just going to generate a random value that I'm going to have a stand in for the hash because I Computing a real hash so this is a random value so my message is going to be my padding appended to the hash and so if I look at this message um I have like a bunch of one bits and then like this random hash value and so um if I have a bad signature I'm G to call it badore s so um badore s is correct mod P so if I take badore s and raise it to the E power mod P that's equal to M if I but it's incorrect mod Q so bad s to the E mod Q is not equal to n so I have my bad signature which is correct mod P and incorrect mod q and I would like to recover P just from this bad signature so the thing that I'm going to do I'm going to generate a value just from what an attacker might be able to see publicly so I have my bad signature I have my modulus n that's the public key and is the public key um and so I just compute this value which is the difference between like the bad signature and the pad which is also just like a fixed value um and then I compute a value that's like about the size of the unknown hash I'm going to generate a matrix from just public values I'm going to lll reduce that I construct a degree to polinomial from the shortest Vector that I got out of my lll reduced lattice I compute the first root of this value and then I compute the greatest common divisor of that added to this A and N and I got my Factor P out again so this is Lattis based Crypt analysis which many people think is very complicated and hard and I just did it in this screen on using sage and this is this takes less than a second in fact generating the primes p and Q is the most expensive part entire computation okay so this is a generalization of um cers Smith and hve Grahams um theorem which the way to remember this is that given half of the bits most rele significant bits of P where p is a prime factor of n um that is say half of the size of of n uh you can Factor n in polinomial time so in this case we're given something that is um that we just have to add a small value to um to get a and then that has a common divisor that is large enough with n so um this is a scary looking theorem but um in its simplest version it is in fact just this so our research program again we are going to collect SSH handshake sign we're going to run the lce attack we're going to find someone who screwed this up and then we're going to write a paper so um how do we collect SSH data it turns out that if you look at passive SSH data on at least our Network it is almost all Russian scanning like malware scanning um and not very much real SSH traffic um so it is like vastly disproportionate just like um people trying to Brute Force password for SSH servers um so it's not we didn't have enough traffic for the passive analysis part to be interesting so we're stuck with uh active scans so um we got some historical scan data um from between 2017 and 2020 um and then we did some of our own scanning um and you can see that RSA is becoming less common so ecdsa is becoming much much much more common among SSH servers um um and here's kind of the flowchart of um the attack so we have a signature um if it was invalid we ran the lattice attack um and we were able to get um private keys for um 189 unique Keys um from about 5,000 um uh signatures where the lattice attack succeeds um and this latus computation took about 0.1 seconds for for common parameter settings um interestingly the affected vendors included many of the same ones that we saw for TLS so um they were presumably using their same their the same um custom implementation okay so the analysis of the impact of this vulnerability is very different for SSH than for TLS so um for TLS we saw that historically this was a really bad problem because of the prevalence of RSA key exchange um for SSH um the keys that we were able to compromise are the host keys that are used to authenticate the server to the client client authentication is done later inside of the encrypted Channel and so this has nothing to do with the security of SSH client implementation um so the only thing that an attacker can do with this vulnerability is be able to impersonate an SSH server to a client but of course we all know that when you see your like when you SS into a server and you see oh no you have a different host key you just go oh man and then you delete it and then you like reconnect right nobody ever does like um SSH host key validation anyway so you know I'm sure there's some there's some scenario where this is an interesting thing anyway so I am almost done here but I have learned that when I give a talk about these kinds of vulnerabilities um that I need to end on a positive note because otherwise everybody gets all like devastated that everything is just broken all the time so in fact there is good news um so as we've seen TLS 1.3 has been specifically designed to be less vulnerable to passive Crypt analysis so in particular the handshake is encrypted after the initial key exchange and in fact um I think they're rolling out encrypted client hos also so that you hide metadata like Sni um and um RSA key exchange has been removed entirely um so passive analysis basically all you can see is the Diffy Helman key exchange and some of the cipher information you don't even see um see signatures anymore um if you want to get data from TLS 1.3 you can still collect data from active scans um and then of course the mitigations even Beyond TLS 1.3 you can do a number of things so um as we've seen it's very important to validate any signatures before they're sent over the wire this is the kind of thing that if you are implementing RSA for the first time you don't think about but is clearly very important um even in a Hardware context because we've seen that basically even these software software implementations can be vulnerable um pkcs number 1 v1.5 is known to be a disaster for encryption padding it is also even though it is proably secure for Signature padding um this kind of attack shows how fragile it can be um so TLS 1.3 uses RSA PSS for handshake signatures that is also a good choice um and in general I don't know RSA I love attacking RSA it's so fun um I don't think there's any good justification for using RSA in in 2024 honestly um yeah it's just I don't know it's it attacking it's fun anyway so um kind of the bigger picture of all of these vulnerabilities is that these kinds of vulnerabilities um are things that are crossing abstract boundaries so we're taking the iCal properties of the ciphers themselves and then finding these vulnerabilities in random number generators or like Hardware failures um that result in mathematical properties um and so this is the kind of thing that's very difficult to detect during routine testing you can't really test whether your random number generator is working um and um despite the fact that I'm hating on RSA a little bit I mean I love RSA but um I could have given it's I could have given aim talk about these kinds of mathematical structural things in Diffy Helman or ecdsa the the other things that are commonly used um it's not that RSA is specifically more vulnerable to these things um and even though I said that RSA should go away it is likely going to be around for the Long Haul because even though we're moving to Quantum resistant algorithms um the best way to deploy them is in hybrid mode with classical Al algorithms for the foreseeable future because you never know when somebody's going to come up with a polinomial time class algorithm for module like lwe maybe I see I see some sad faces audience it'll it'll take a while before we have four Decades of of confidence in h in the algorithm anyway so in conclusion um despite the fact that we think that our cryptography is super well studied it turns out that if you look you can find cryptographic vulnerabilities that are hiding in plain sight so you just need to look in the right way um this is Al an example of how some very simple math can get you some very non-trivial things um it's also an example of how our cryptographic Primitives can be very fragile to implementation mistakes so one little failure to to validate can reveal the TLs keys for like one of the largest internet companies in the world um so with that that is all I have and if you wanted to read any of these papers here the ones that I talked about so thank you very much
Info
Channel: NYUAD Institute
Views: 231
Rating: undefined out of 5
Keywords: NYUAD, nyuadinstitute, Abu Dhabi, United ArabiEmirates, Lectures, Talks
Id: fIYOqnWJH08
Channel Id: undefined
Length: 63min 4sec (3784 seconds)
Published: Thu Apr 25 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.