The Randomness Problem: How Lava Lamps Protect the Internet

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

How does a hacker find a PRNG seed?

Even if they do, how do they get access to the time at which it iterates?

👍︎︎ 6 👤︎︎ u/[deleted] 📅︎︎ Oct 04 2018 🗫︎ replies

Not sure whether it fits the scope of /r/math perfectly, but I think it is a very interesting video for the average math enthusiast.

👍︎︎ 2 👤︎︎ u/ImJustPassinBy 📅︎︎ Oct 04 2018 🗫︎ replies

Half of my paper at school... was about this

👍︎︎ 2 👤︎︎ u/AroN64 📅︎︎ Oct 04 2018 🗫︎ replies

Captions

SciShow is supported by Brilliant.org. Go to Brilliant.org/SciShow to get 20% off of an annual Premium subscription. [♩INTRO] You jog onto the field for another NFL football game or at least your team’s video game avatars do, which, I mean, let’s face it, is about as close as you’ll get. The virtual ref tosses his virtual coin to see who gets dibs on the ball, and…you lose! Once again, you’re stuck letting the computer receive first. It happens so often it seems unfair. Come to think of it, what if it is? After all, the computer’s following a fixed program. How could it ever give you a fair, random coin toss? The whole point of computers is that they’re predictable! And if those coin tosses weren’t random, how would you ever know? These questions matter for more than just games. Getting computers to behave randomly is crucial for simulations, scientific studies, and cybersecurity. But for all its importance, randomness is surprisingly hard to define, and even harder to produce. Computers just aren’t that good at being random. In fact, most computerized randomness is basically faked; it’s totally predictable. For real, honest-to-goodness randomness, engineers turn to some pretty odd tricks. It turns out a lot of the internet is kept secure by lava lamps, Geiger counters, and the occasional rooftop microphone! There are lots of uses for digital randomness. Think of a pollster who wants to call a representative sample of people, or medical researchers deciding who in their study gets which treatment. For the statistics to work, they have to choose the phone numbers or treatment groups randomly. Another example is predicting the behavior of very complex systems, like weather forecasts or traffic patterns. These systems may not have simple, neat equations that tell you how they’ll develop. Instead, simulations often model the world as a collection of small randomized components. So a traffic simulator might assume that cars randomly show up at a highway entrance at an average rate of 10 per minute, then watch how the simulation proceeds. Randomness is also a critical part of basic network technology. If your laptop sends a packet of data over the wifi network at the same moment as your roommate’s, the packets collide they garble each other, and each computer has to resend. But it wouldn’t help anything if each computer, following the same deterministic program, tried to retransmit at exactly the same time. You’d just get collisions over and over! Instead, they wait a random amount of time after a collision. This type of symmetry breaking is impossible without some form of randomness. But the most common use for randomness is both high-stakes and very necessary: encryption, which requires secret data that an adversary can’t predict. For example, every time you connect to a secure website like an online banking portal, your computer and the server running the site have to agree on encryption keys. The keys are like a more sophisticated version of a secret decoder ring: each computer can scramble its messages so that only someone with the same key can decode it. Obviously, no one should be able to guess the secret key, or they’ll be able to see your bank account information or even steal your money. So your computer and the server base the keys on random numbers to make them nearly impossible for your online nemesis to guess. Randomness is so valuable that when the RAND Corporation published a 600-page book with a million random digits in 1955, it was considered a landmark contribution to science. Now that computers have gotten faster, the RAND book is mostly useful as a source of hilarious Amazon reviews. But getting computers to generate numbers that are actually random is still much harder than you might think. The first problem to solve is how to even define randomness, because our intuitions about what’s random aren’t always great. Take this demonstration from paleontologist and author Stephen Jay Gould. In which image would you guess the dots were placed totally at random? If you guessed the one on the left, you’re not alone. Most people see the clumps in the right image as patterns. But in fact, it’s just the opposite! It’s the even spacing on the left that’s the result of a pattern: the program that generated the dots was modified to forbid them from being too close together. Here’s another brain twist: tossing a coin twenty times is just as likely to produce ten heads and then ten tails as it is to mix them up. Neither sequence is inherently less random. To define randomness properly, the trick is to think not just about particular string of numbers, but about an infinite sequence of them. If a so-called random coin flipper kept producing clumps of heads and clumps of tails forever, then we could definitively say it’s not random. Mathematicians have used the concept of infinite sequences to come up with a few different definitions for randomness, but the simplest one describes randomness as unpredictability. If you were betting on future digits in the sequence, you couldn’t find an algorithm a mathematical procedure for a computer to follow whose predictions would make money in the long run. The other definitions all turn out to be equivalent to this: if a sequence is random, you can’t describe it using some sort of algorithm. Maybe you can start to see the problem here: if by definition, you can’t use a program to describe a random sequence, how are you supposed to write a program to generate one? Not only that, but if you try to build a random number generator, how can you tell if you’ve succeeded? You can’t check a whole infinite sequence for randomness, or check all possible betting algorithms to see if any make money. When it comes to generating random numbers, most of the time, computers make do with pseudo-random number generators, or PRNGs. The numbers they generate aren’t truly random … but they’re close. The algorithms take a seed, some value to start with, generate a number from it, update the seed, and repeat. For instance, mathematician John von Neumann suggested the middle square method: you take the seed, square it, and grab the middle few digits as the next random number. That number also serves as the next seed. Unfortunately, the middle square method tends to start looping through the same cycle of numbers. But computer scientists have designed many more sophisticated algorithms that do better, some with awesome names like “Fortuna” and “the Mersenne Twister.” To check if a pseudo-random number generator is a good approximation of randomness, you can use a combination of theoretical analysis and empirical testing. Theoretical analysis looks not at a particular sequence of numbers, but at the process that produced them. Even if the process wasn’t truly random, mathematicians can sometimes prove statements like “This algorithm will generate at least 14 gazillion digits before the sequence wraps around and starts repeating.” For empirical testing, you just generate some random numbers and see if the statistics look right, things like whether evens and odds are equally common. The stats will always be a bit off, so you can never say for sure that the sequence was randomly generated. But you can at least estimate how likely the stats you’re seeing should be. Under most statistical tests, good pseudo-random numbers are indistinguishable from true randomness… if you don’t know the initial seed. That’s good enough for most purposes. I mean, who cares if your virtual football game isn’t 100% unpredictable? It’s not a big deal for things like medical research or predicting traffic patterns, either, as long as the numbers produce the same statistics as randomness. In fact, sometimes pseudo-randomness has real advantages. A randomized program is easier to troubleshoot if you can make it do all the same things again by providing the same seed. And in some uses of PRNGs, determinism is a core design feature. For example, a remote car key fob works by running a PRNG with a seed known only to it and your car. When you click the unlock button, the fob radios its next random number to the car, which checks that number against the next outputs of the PRNG. To a thief, the numbers look unpredictable. But the system only works because the key and the car produce the same series of numbers from their shared seed. Sometimes, though, PRNGs aren’t enough. Even if the algorithm is flawless, an attacker who somehow figures out the seed can predict all the numbers. Ironically, the website Hacker News once became a poster child for this problem. Back in 2009, a security researcher realized that the site’s random number generator, which assigned IDs to logged-in users, was seeded with the time when the server started up. By triggering a server restart, the attacker was able to narrow down the possible seeds to a small set of numbers. That let him predict other users’ IDs and impersonate them. A similar problem a decade earlier allowed another team of researchers to cheat at online poker. A leaked seed could be especially catastrophic for a PRNG that was used for encryption: if an attacker uncovered the seed, they could guess the encryption keys, then go back and decrypt all the previous messages. So when the stakes are high, people get creative looking for sources of true randomness. The most common place to look is external noise factors that someone could in principle influence, but which the software can’t predict. For example, the website Random.org has a rooftop microphone that captures atmospheric noise, unpredictable radio waves coming mostly from lightning and space. Most computers use more readily available sources of noise, like the timing between mouse clicks or incoming network messages. Some software will even ask you to scoot your mouse around while it’s generating big encryption keys. For the highest-stakes uses, though, you want intrinsic randomness, something that’s guaranteed unpredictable by the laws of physics no matter what the attacker knows or does. The Internet giant Cloudflare, which provides encryption for about 10% of all Internet traffic, points a video camera at a wall of lava lamps in the lobby of its San Francisco office. Meanwhile, in its London office, a camera watches three pendulums hanging from pendulums. Both are examples of chaotic systems: the tiniest inaccuracy in your knowledge of their configuration will send your future predictions wildly off-base. So those camera streams are effectively random. Intrinsic randomness can also come from smaller-scale physics. Some hardware random number generators record tiny fluctuations in the movements of electrons in the circuitry. And Geiger counters trained on decaying hunks of uranium will click at a rate determined by quantum mechanics, which is inherently probabilistic. So, by the strict definition, it’s more likely than not that your football game isn’t truly random. It’s probably flipping a coin based on a simple PRNG seeded with the time you started playing. Even so, losing a bunch of coin flips is probably just a brief unlucky streak; you’d be hard-pressed to show any bias with larger statistical tests. For high-stakes contexts like security, though, rock-solid randomness is essential, and there’s a whole cottage industry for supplying it. Lava lamps may not be the height of cool anymore, but as it turns out, they are still useful for security. Or maybe programmers just really miss the 70s. If you miss 1970s bedroom decor, or even if you don’t, you can still strengthen your programming skills at Brilliant.org/SciShow with the Computer Science Fundamentals course. Even if you know nothing about computer science, the lessons and quizzes build on each other. So it’s impossible to finish a lesson without improving. And if you are a confident computer programmer, the quizzes might still bend your brain a little bit, and you can always work on problems posed by other Brilliant.org premium members. Brilliant is offering the first 200 SciShow viewers to go to Brilliant.org/SciShow a 20% discount off of an annual premium subscription. So check it out! You’ll definitely learn and have fun, and you’ll be helping to support SciShow while you do! So, thanks! [♩OUTRO]

Info

Channel: SciShow

Views: 526,494

Rating: 4.9152589 out of 5

Keywords: SciShow, science, Hank, Green, education, learn, The Randomness Problem: How Lava Lamps Protect the Internet, stefan chin, coin toss, randomness, lava lamp, geiger counter, cybersecurity, random number generator, pseudo random, simulation, packet collision, encryption, encryption keys, RAND, random, unpredictability, PRNG, seed, john von neumann, middle square method, statistics, hacker news, external noise, intrinsic randomness, random.org, chaotic systems, double pendulum, cloudflare

Id: 89EX1NF7eHQ

Channel Id: undefined

Length: 10min 45sec (645 seconds)

Published: Wed Oct 03 2018