Tom Liston, Random Facts About Mersenne Twisters | KringleCon 2020

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to kringlecon 3. in this talk i'm going to tell you some random facts about something called a mersenne twister but first of all let me introduce myself hi i'm tom liston i'm new this year to counter hack challenges but i'm not at all new to security in fact i'm i'm kind of old i've been around for quite a while and i've done a number of things in the field of security you're listening to me today because you decided to click play on this video everybody makes bad choices every now and again so so don't feel too bad about it but realize that it's on you let's talk a little bit begin all of this let's talk a little bit about uh randomness in software anybody who's ever used a computer for any period of time has had at some point the computer freak out and start doing weird things on them seemingly random things and now while that may have happened it's not because of software software is not random software is by its very nature incredibly deterministic it does exactly what it's programmed to do so what we find is that generating true randomness in software is actually incredibly incredibly hard while you can do some things with hardware that will allow you to access a source of true randomness most operating systems and most computers probably don't have a good way to access a true source of hardware randomness and so when developers are creating programming languages one of the things that they try to do is they try to be very os or very system agnostic so they don't really have any sort of good cross-platform means by which they can generate true randomness or access hardware that would give them access to true randomness so most software languages most programming languages have some sort of pseudo-random number generator well what is a pseudo-random number generator a pseudo-random number generator is pretty much exactly what it sounds like it generates numbers that appear to be random but that actually are very deterministic in nature and that sounds like a pretty perfect match for software which as i said is in and of itself pretty much deterministic the output of a pseudo-random number generator is generally determined by an initial value oftentimes called a seed and that you you can generate that seed value itself somewhat randomly by you know oftentimes people will use the current time of day or the number of milliseconds since the machine started but but here's the thing no matter how randomly you may generate that seed value if you use the same seed value again you're going to get the same string of random numbers out of your pseudo-random number generator remember pseudo random it's an algorithm same input same output most of the time statisticians and people who care about this kind of stuff will not talk about what makes a pseudo-random number generator good they'll talk about the kinds of problems that makes a pseudo net random number generator bad and one of the biggest ones is something called the period of the pseudo random number generator that's the amount of numbers that the pseudo random number generator algorithm can generate before it begins generating the same sequence over and over again some pseudo-random number generation algorithms have a period that's dependent on the seed and that can be really really bad because if you pick the wrong seed value either accidentally or if you happen to choose one of them because that's the number of milliseconds since the machine started off you can end up with some really bad random numbers being generated bad from the perspective of being statistically bad now there's a ton of other tests that get done when they're trying to look at the statistic statistical soundness of a pseudo-random number generator algorithm uh you want your ou uh the the numbers that are generated to be uh distributed uniformly that means that you want this you want big numbers and small numbers all mixed up if you're if you're just if you're creating numbers within a range you don't want uh successive values to be correlated in any in any way so meaning you don't want like if it generates the number seven you don't want it to always after that generate the number 12 or you don't want it to always generate a small number then a big number then a small number than a big number all of these things play into what makes a pseudo-random number generator good and there's a ton of other statistical tests this is not something that i want to get into i hated statistics in college i'm sure you did too if you're a statistician i apologize but hey we're gonna move on so let's talk about the current state of pseudorandom number generators first of all there are a ton of different algorithms that can be used as a pseudo-random number generator uh up until 1997 most of the pseudo-random number generators uh that were in in programming language languages were of a type known as a linear congruential generator those were horribly bad from a statistical perspective and in fact they were so bad that they really couldn't be used uh when you were doing some kind of heavy duty research that required uh random numbers uh to be used in in your research in 1997 makoto matsumoto and takuji nishimura developed a new class of pseudorandom number generation algorithm called the mercen twister we're going to talk a little bit about where it got that name from in a bit but understand that this new type of pseudo-random number generator first of all is now the most widely used implementation of pseudorandom number generators in software and in fact there's a specific version of that algorithm called mt19937 that is absolutely the most widely used version of that mersenne twister algorithm what's nice about the mercen twister is that it avoids all of those statistical pitfalls that keep statisticians up at night it has a first of all it has a huge period the name mercen twister the mercen portion of that comes from the fact that the um period of the pseudorandom number generator can actually be set and it's set based on a class of prime numbers known as mercen primes so for example the period for the mt1993c mercen twister is based on the prime number that is 2 to the 19937 minus 1. now let's just take a look here is that number it's a huge huge period just for comparison that's a number that is about four times 0.3 times 10 to the 5921st power times the number of atoms in the known observable universe that is a big big number and it gives the mercen twister an enormous period that's the amount of numbers that it can generate before it will begin generating the same sequence again so in addition to having a huge period the mercen twister also avoids all of those other or at least most all of those other statistical pitfalls it passes almost every test for statistical randomness and what's really really nice about the mercen twister is it's actually relatively straightforward to be implemented in software and it works relatively quickly probably the most important point though for those in the open source software community is that the mercen twister mt19937 is not encumbered by any kind of patent and because of that it's been adopted in dozens and dozens of different open and closed source projects some of the ones you might be familiar with are things like the language python the random number generation within the language python is based on the mt19937 mercen twister pseudo random number generator as is the pseudorandom number generator in pascal php ruby sage math excel dozens and dozens of other software projects use this pseudo-random number generator the mt19937 pseudo-random number generator as i said is considered to be statistically sound when it comes to the generation of pseudorandom numbers but don't mistake statistically sound for secure and let's talk about why that is what it means to be secure is very different from what it means to be statistically sound in the generation of random numbers what if i told you that simply by having access to a few hundred recent values from a pseudo-random number generator in a language like python you could then predict all of the upcoming values with 100 percent accuracy that sounds crazy but let's talk about how it happens the 32-bit version of the mt19937 pseudorandom number generator keeps track of its current state with an array of 624 32-bit integers that's how it keeps track of where it is and and what it's going to be outputting it goes through that list of those 624 32-bit integers pulls them out one by one and then hands them back to the user as the pseudo-random number after it's been pushed through what's called a tempering function and this tempering function is designed to make the output more statistically well distributed and this is part of the algorithm that was developed back in 1997 the really big important thing here is this everything that that tempering function does is reversible and we're going to talk in a minute about why that's very very important so once it's gone through all of those 624 numbers it's pushed them all out to the user through that tempering function it then uses a different function called the twister function which is actually a linear feedback shift register function and it goes back to then pulling numbers off of that array again pushing them out through that tempering function out to the user now i said that it was important that that tempering function was reversible and here's why if the tempering function is reversible we have the potential to recreate that state array within the mt19937 pseudorandom number generator simply by taking the last 624 random values that we're given running them through a function that reverses the effect of that tempering function and then stuffing those untempered values back into an array of our own we can then use that array in our own mt19937 pseudorandom number generator and start generating random numbers using that and what we'll find is we're generating the same random numbers as the original random number generator we have in effect cloned the random number generator now i know what you're thinking we don't have to hit this at exactly the right time right before that twister function is being used the reason being is because that's a linear congruential shift register kind of function it actually works cyclically on this uh array and we don't have to hit it at right the right time just because of that you're gonna have to trust me because that that's the case because the the mathematics involved require a whole lot more time than i have time to explain here just so you can see how all of this works i've made some python code available to you it actually uses a created class for an mt19937 pseudorandom number generator to clone python's built-in random number generator and displays the content side by side on the screen for you i've given you the github repository for it it's on my github repository you can go and grab the code you can import it as a module and play around with this kind of stuff yourself or if you just run the code itself it'll actually do a demonstration showing you it cloning the uh the output of python's pseudo-random number generator the 32-bit version if you want to use it on your own and and try to clone a random number generator in python or in other languages understand this what you're going to need to figure out is how does that language or that application create other types of values so yeah if it's if it's just giving you the 32-bit integers that's pretty easy to do but what if they're talking about floats how does the application use the output the 32-bit integer output from the mt-19937 pseudorandom number generator how does it use that to create those floats or how does it use it to create numbers that fall within a specific range or how does it use it to create 64-bit integers i will tell you this it takes a little bit of looking and a little bit of examination but you can usually figure out how it's how the the application is using those 32-bit integers to create those other types of numbers remember most programs and especially most programming languages are very concerned with speed and so they're going to use potentially the simplest method possible to take for example 32-bit integers and make a 64-bit integer or take a 32-bit integer and turn it into a floating-point number so given that information giving given that kind of background method of thinking they're going to be doing something as quickly as possible you can usually figure out what it is they're doing so just to wrap things all up remember randomness in computing is a whole lot harder than you'd think remember statistically sound randomness isn't necessarily secure randomness the most widely used pseudorandom number generator out there today mt19937 is actually while it's statistically sound it's horribly insecure it's used in dozens of applications and programming languages python ruby r php excel etc and it's something that is just begging to be exploited so if you have any questions you can email me at t-listing counterhack.com once again up on the screen you'll see the location on github where you can go and download the python code that demonstrates this uh also finally one one final thing is my my twitter handle is at t listin so um thank you very very much for attending the kringle con talk today and i hope you have a wonderful wonderful holiday season thank you
Info
Channel: KringleCon
Views: 3,638
Rating: 4.8032789 out of 5
Keywords: Holiday Hack Challenge, KringleCon, SANS, InfoSec, CTFs, CyberSecurity, Cyber Security
Id: Jo5Nlbqd-Vg
Channel Id: undefined
Length: 17min 57sec (1077 seconds)
Published: Wed Dec 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.