How can a
deterministic computer produce genuinely
random numbers? And does it even need to? [MUSIC PLAYING] Computers need to have
access to random numbers. They're used to
encrypt information, deal cards in your game
of virtual Solitaire, simulate unknown variables
like in weather prediction and airplane scheduling,
and so much more. But how are those random
numbers generated? Computers are
deterministic, meaning their actions are determined
by a prescribed set of instructions. How can it possibly
produce a random number? Well, as Donald Knuth put
it in the art of computer programming, "In a sense,
there is no such thing as a random number. For example, is 2
a random number. Rather, we speak of a sequence
of independent random numbers with a specified distribution." To simulate the role
of a die, you'll want to create a sequence
of random numbers X 0, X 1, X 2, X 3, and
so on, each of which has a 1/6 chance of being
each number between 1 and 6. That's the specified
distribution. But how can a deterministic
computer do that? Now, computers can
generate sequences of truly random
numbers, but they rely on some type
of external input from a process which
we assume to be random. They inherit the
randomness of the world. Random number
generators typically measure some random
physical phenomenon using an extra
piece of hardware, like a Geiger counter, which
tabulates background radiation over a short period of time. The sequence will be random
because radiation is. But these types of
number generators have major disadvantages. For one, they require
extra hardware. Also, they're not very fast. It takes a while to gather
all that external data. And the random number
generation is not reproducible. If you're testing
a program, you'll likely want to repeatedly use
the same set of random inputs to distinguish the
features of the randomness from the features of your
program, which isn't easy when the randomness derives
from physical phenomenon. Fortunately, most
of the time, we don't need to produce
actually random numbers. They just need to seem random. This is where pseudo-random
numbers enter the picture. Sequences of
pseudo-random numbers look like sequences of
genuinely random numbers and have many of the statistical
properties associated with random numbers,
but they are actually produced using a deterministic
process, that is they follow a set rule, a pattern. For example, here's
John von Neumann's middle-square algorithm. You start with a
number called the seed. Let's say it's a four
digit number like 4,321. As the algorithm's
name suggests, you square it and extract
the middle four digits. That's the next term
in the sequence. Then you just
repeat that process. For all n define x sub
n plus 1 as the middle for digits of x sub n squared. If the square has less
than eight digits, pad it with leading zeros
until it's eight digits long and extract the middle four. Starting with a four digit seed,
the middle square algorithm generates a sequence of
random-looking numbers between 0 and 9,999. This is definitely
not the only way to create a random-looking
sequence of numbers. The current gold standard
for pseudo-random number generators, the
Mersenne Twister, is based on the much simpler
linear congruential generator. So we'll start there. The linear congruential
generator has four inputs-- the modulus m, the multiplier a,
the increment c, and the seed, the sequence's starting value. We'll run with the
example m equals 7,829, a equals 378, c equals 2,310,
and the seed is 4,321 again. Multiply the seed
by a, add c to that, and then take the
whole thing mod m. In other words, you
look at the remainder when it's divided by m. In our example,
that gives 7,216. Repeat this over and over to
get the rest of the sequence. X sub n plus 1 is equal to a
times x sub n plus c mod m. The numbers in the
resulting sequence will be between 0 and
7,828, which is m minus 1. With all these options for
creating random-ish sequences, how do you decide whether to
use a genuine random number generator or one of the
two pseudo-random number generators-- the
middle-square method or the linear
congruential algorithm? Remember, that one of the
downfalls of generating truly random numbers by measuring
some physical phenomenon, for example using
a Geiger counter, is that you cannot
reproduce the sequence, which can make it difficult
to test a program. But both of the pseudo-random
number generators start with a seed, a
single number which determines the entire sequence. If you want to test a program
using the identical sequence of random numbers,
you don't need to store the entire sequence. You just need to know the seed. That's a huge benefit that
pseudo-random number generators have over random
number generators. The sequence is
easily replicable. They also don't require
any special devices, and they're
computationally efficient. The obvious disadvantage is that
pseudo-random numbers are not actually random, but they can
appear more or less random. Here's our sequence from the
middle square algorithm where the numbers range
between 0 and 9,999. And here is our sequence
from the linear congruential generator where the numbers
range between 0 and 7,828. Which one looks
more random to you? But first, what does it mean
for a sequence to be random? Well, there's no
one way to determine if a sequence is random. Instead, there are many
different statistical tests for randomness. For example, you could compute
the running average, or mean, of the numbers. Or you could plot a
histogram of them. For example,
something that would make a sequence
seem less random is if the same pattern of
numbers kept showing up. Well, all sequences of
pseudo-random numbers eventually repeat. In other words, the
sequence cycles. The number of terms in the
sequence before it repeats is called the period. If, for example, you're
generating numbers between 0 and 9,999,
it's impossible to go through more than 10,000
terms in your sequence without repeating a number. And as soon as it
repeats a number, it has to start cycling. That's because pseudo-random
number generators follow a rule,
essentially a formula, by which each number
in the sequence is determined from
the previous number. The sequence can
repeat very quickly. For example, if a
sequence produced using von Neumann's
middle-square algorithm has the term 2,916,
then the next number is 5,030, then 3,009, then
540, and back to 2,916, where it just keeps repeating. This cycle has period four. Or if the term 0 ever
shows up in the sequence, then it becomes 0,
0, 0, 0, and so on. It cycles with period 1. The pseudo-random sequence of
numbers between 0 and 9,999 generated by the
middle-square algorithm will always cycle with a
period smaller than 4,096. And as we've seen, the
period can be much smaller. A sequence generated by the
linear congruential generator will also repeat. But if we select
good values for m the modulus, a the multiplier,
and c the increment, then the period will be n. So we want to pick a
big value for m, like 2 to the 32nd kind of big. In general, the linear
congruential generator will run for much
longer before repeating than the middle-square
algorithm, which is one feature that
makes it a more preferred pseudo-random number generator. And finally, I want to
mention the distribution of these sequences. So far, we've looked at methods
which produce random integers or simulate it with
pseudo-random integers between 0 and some
fixed value, like 9,999. In other words,
the distribution is uniform on the integers
between 0 and 9,999. But it's much more useful to
have a sequence of numbers with the uniform distribution
on the interval 0 to 1, which means that for any
given term in the sequence, all the numbers between 0
and 1 are equally likely. Actually, since computers
only have a certain amount of decimal precision, let's
say for a simplified example, that only the first four
decimal digits matter. The uniform distribution
on 0, 1 just means the uniform
distribution on fractions of the form x over 10,000 where
x is between 0 and 10,000. But that's easy to obtain. Just divide the sequence
of integers by 10,000. This is useful because
with a sequence of numbers with the uniform
distribution on 0, 1, we can generate sequences
of random numbers according to other distributions
using something known as inverse transform sampling. Let's say we want to generate
a sequence of random numbers with the normal
distribution with mean 162 centimeters and standard
deviation 5 centimeters to represent the height
of an American woman. You may have seen this picture. It's the probability
density function. Roughly, it tells you
how likely a randomly selected American woman
is to be that height. The function is
highest around 162, since that's the most likely
height and really tiny at 181, since it's unlikely for
someone to be that tall. Just for comparison, this is
the probability density function of the uniform
distribution on 0, 1. It's flat, since all
points are equally likely. But now, let's talk about
the cumulative distribution function. It tells you how likely you are
to randomly select something below that value. For the uniform distribution,
it's pretty easy. For a randomly
selected point, there's a 0% chance it's smaller
than 0, 50% chance it's smaller than 1/2, and in 100%
chance, it's smaller than 1. This is the cumulative
distribution function for the heights
of American women. The value of the
function at a point x tells you the probability
that a randomly selected woman will be shorter than x. Now, let's say I want
a sequence of numbers which represents the height
of a bunch of American women. In other words, we want
to generate numbers according to this distribution. Fortunately, thanks to
inverse transform sampling, we can do this using our
uniform 0, 1 sequence. Here's how. The first number in
our sequence is 0.0671. So we locate that point on
the cumulative distribution for heights whose
y-value is 0.0671. In other words, we
find the height such that 6.71% of American
women are shorter than that. The next number in
our sequence is 0.9. So we use the
cumulative distribution to locate the height such
that 90% of American women are shorter than that, and so
on, creating a new sequence. Let's check that this
makes sense intuitively. Most American women
are around this height. That's why the cumulative
distribution function grows so quickly there. The steep slope corresponds
to a high probability and shallow slope to
a low probability. We uniformly randomly
select a point on the y-axis and then determine the
point on the x-axis that gives that function value. And more of those
points correspond to places of steep slope
of high probability. Less of them correspond
to places of shallow slope of low probability. Somewhat amazingly
this method allows us to transform the
sequence of uniform 0, 1 random or pseudo-random
numbers into a sequence with a different distribution. Even with a deterministic
computer, the truly random and pseudo-random techniques
we explored give us numbers with the uniform
0, 1 distribution, which in turn,
gives us sequences with other distributions, like
the heights of American women or wind speeds or dice rolls. See you next time
on Infinite Series. Hello. You all had lots of great
responses to our second video on cops and robbers
Humberto Vanegas asked, in the case of the
infinitely branching tree, would there be one branch
with infinitely many nodes? And in this case,
would the robber be chased infinitely
long on this branch? That's a great question. So in the case of
the tree that we were talking about in that
episode, none of the branches are infinitely long. They have length 1, length
2, length 3, and so on. So none of them are
infinitely long, but they've become
arbitrarily long. It's kind of a
weird distinction. There are infinitely
many branches though. So the tree we
were talking about is a cop wind tree because
on any finite branch, the robber will
eventually get caught. But if there were an
infinitely long branch, then the tree would
become robber wind because the robber could
just keep going along that infinitely long branch. So that's a great question. And we asked you
guys two questions at the end, the
challenge questions. So we asked for the
cop number and lazy cop number of two graphs. And there were a lot
of great responses, a lot of great proofs. A lot of them were really
different too, which was cool. But our winner is
Leonard Romano. And the answer is that the
graph that looks like a star, it's called the
Petersen graph, it has cop number and
lazy cop number three. They're the same in that case. But the graph that
looks like a three by three grid, that has a
cop number two, but lazy cop number three. And so in your
proofs, many of you pointed out that the cop
number has to be less than or equal to the lazy cop number. That was a great insight. All right. See you next time.