[MUSIC PLAYING] DAVID MALAN: So odds are you know
someone who is or perhaps are yourself a computer person. But what does that mean? Well I propose that a
computer person is simply someone who's really good
at computational thinking. Thinking methodically, or more
formally, thinking algorithmically, whereby you take inputs to some
problem, you solve that problem and produce output, but you do so
step by step by step, along the way defining any of the
terms or ideas that you need to use so that anyone
else following along can understand exactly
what it is you're doing. Computer science itself is really
about computational thinking, then. It's not about programming,
with which it's often conflated, although programming
is a wonderfully useful tool with which you can solve problems,
but computer science itself is about solving problems themselves. And it can be in any number
of domains, whether it's software or hardware or artificial
intelligence or machine learning or data science or beyond. That's the wonderful thing
about computer science-- it's just so wonderfully
applicable to other fields. It's all about equipping
you with a mental model, so to speak, a set of
concepts, as well as some practical skills via which
you can bring to bear solutions to problems in other domains entirely. Now what, though, does it
mean to solve a problem? I propose that we think about
problem-solving quite simply as this. If you've got some problem to
be solved, you have an input. And the goal is to solve that problem
and come up with some solution, which we'll call simply our output. And in between those inputs
and outputs hopefully are the step-by-step instructions
via which we can solve that problem. Now how we're going to solve that
problem we'll have to come back to. For now we'll consider it to
be the proverbial black box. So we don't quite know or need to
know right now just how it works, but that it can work, and
we'll come back to this soon. But for now, we do need a way
to represent these inputs, and we do need a way to
represent our outputs. Right now I happen to
be speaking English and you happen to be hearing
English, and so we've sort of tacitly agreed that we'll represent our
words using this particular language. Now computers tend not to use
English, they have their own system, if you will, with what you might
be familiar even if you don't quite speak it yourself, and indeed,
there are different ways you can represent information. For instance, let's consider the
simplest of problems-- for instance, counting the number of people in a room. I could do this old-school. I don't need computers or English, I
can simply use my physical hand and say, I'm starting with zero
people and now I'll count one, and then two, and then
three, and then four, and then five, and then-- I'm out of fingers, but I can at
least employ my other hand, maybe a couple of feet and get
as high as 10, maybe even 20 using this physical system. This is pretty equivalent,
actually, to just keeping score counting the old-school
way with hash marks. Right now we've not counted
anything, but as soon as I start drawing some lines,
I might represent the number 1, or 2 people in the room, or 3, or 4. Or by convention I can draw a slash just
to make super clear that this is now representing 5, and then I
can do another five of those to count up the 10 and beyond. Now these systems of hashes, as well as
this system of counting on my fingers, actually has a name. That of unary notation. Uno implying one, simply signifying
that I only have one digit-- pun not intended-- via
which I can count things. This takes the form of my fingers on my
hand or these hash marks on the screen. But it doesn't get me very
far, because of course on my one hand with five fingers,
I can only count up to five total. And even on the board it feels pretty
inefficient, because for every number I want to count higher,
I have to draw yet another line on the screen, which
just continue to accumulate. But suppose we were a little
more clever about this and we thought about this
problem from a different angle. Maybe I could take into account
not just the number of fingers that I've raised, but the
order in which I've raised them or the pattern via which I've
permuted them rather than just taking into account how many of them are up. If I take that approach, maybe I can
actually count even higher than five on just one hand before resorting
to another hand or feet. Now how could I do that? Well instead of counting
up 0 to 1 to 2 to 3 to 4 to 5, why do I instead start
with some patterns instead? So I'll still start with 0, I'll
still start with 1, but now for 2, I'm not just going to
raise the second finger, I'm instead going to
go from 0 to 1 to 2, putting down that first finger
or thumb, simply representing 2 with this one finger. And when I'm ready to
represent the number 3, I'll bring that finger back up. And so using just two fingers now have
I represented four possible values-- 0, 1, 2, and 3. From 0 to 3, of course,
is four total values, and so I've counted as high as 3 now
using not three fingers, but only two. How, though, can I count higher than 3? Well, I just need to raise
an additional finger. And so let's start at 0, to
1, to 2, to 3, to 4-- whoops-- to 5, to 6, and now to 7. And if it didn't hurt so much, I could
continue counting as high as 8 and 9 and 10 and beyond by simply permuting
my fingers in different ways. So how high can I now
count on this one hand? Using unary notation with fingers just
up or down, I can count as high as 5-- from 0 to 5. But with these same five fingers,
if I instead use not unary, but let's call it binary
notation where we're actually taking into account whether the fingers
are up or down and the pattern thereof, well now it would seem
that each of these fingers can be in one of two possible
states or configurations. They can either be up or they can be
down, or just one of them can be up or can be down, or just one
other can be up or down. So if there's two possible states or
configurations for each of these five fingers, that's like 2 times
2 times 2 times 2 times 2, or 2 to the power of 5 for five fingers,
which is actually 32 possible patterns. Which is to say with my
one human hand, now can I count using not unary, but binary from
0 to 31, which is 32 possible values. Now what's the goal at hand? It's quite simply to
represent information. Because if we have
inputs and outputs, we need to decide a priori how we're
going to represent those values. Now it turns out in computers,
binary is wonderfully well-suited. Because after all, what's the one
physical input to any computer you have, whether it's your
laptop or desktop or these days, even a phone in your pocket? Well odds are, at the end of the day--
or even perhaps earlier in the day-- you need to physically plug that device
into the wall and actually recharge it. And somehow or other there's
electricity or electrons flowing from the wall that are
stored in the battery in your device if it has a battery, and that is the
input that drives your entire machine's operation. And so if that's the
only possible input, it's pretty straightforward to imagine
that you might have your computer plugged in or not,
power is flowing or not, you've got a charge in
your battery or you don't. And so this binary world where something
can be in two possible states-- on or off, plugged in or not-- is wonderfully conducive to now using
binary in the context of computers to represent information. After all, if I want
to represent a number, I might simply turn on the lights. And so my phone here has a light built
in, and right now that light is off, but if I consider this then
to be off, we'll call it a 0. And if I turn this flashlight now on,
I can be said to be representing a 1. And so simply with a simple light bulb
can I represent two possible values just like my finger can be up and down. But computers don't necessarily
use lots of little light bulbs, they use something else
that's called a transistor. A transistor is a tiny little
switch, and that switch, of course, can be either on or off--
letting electricity in or not. And so when you have a
computer, inside of which is a motherboard and CPU and
bunches of other pieces of hardware, one of the underlying
components ultimately are these things called transistors. And computers today have millions
of them inside, each of which can be on or off-- so far
more than my five fingers, and that's ultimately how a computer
can go about representing information. So long as it has access to some
physical supply of electricity can it use that electricity
to turn these switches on or off in distinct patterns,
thereby representing 0, 1, 2, 3, 4, all the way up to 31 or even
millions or billions or beyond. And so computers use binary. Computers speak binary-- only
0's and 1's or off and on, because it's so wonderfully
well-conducive to the physical world on which they're ultimately based. We humans, meanwhile,
tend to communicate not only in English and
other spoken languages, but in a numeric system called decimal. So decimal, dec meaning 10,
has 10 digits at its disposal-- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
whereas binary has just two-- 0 and 1-- and unary, you
might say, has just one-- 1, the hash mark that
I drew on the board. But if I only have 0's and
1's at my disposal, how can I possibly count to 2 or 3 or beyond? Well, we need some way of mapping
these 0's and 1's to the more familiar numbers that we ourselves know. So how can we go about doing this? It's one thing to describe it in
terms of patterns on your fingers, but again, a device just has
switches that it can turn on and off. Well it turns out even if
you don't yet speak binary or have never really
thought about doing so, turns out that it's not all that
dissimilar to the system of numbers with which we grew up. In fact, in grade school,
you and I probably grew up learning how to represent
numbers in rather the same way. For instance, consider this-- clearly the number 123. But why is that? It's not inherently the number 123. Really, this is just three
symbols on the screen. We of course immediately
recognize them as 1 and 2 and 3, and we infer from that, oh, this is the
number 123, but why is that exactly? Well if you're like me,
you probably grew up representing numbers
in the following way, even if it's been quite some time
since you thought about it like this. With the symbol here, 1, 2,
and 3, if you're like me, you probably grew up thinking
of this rightmost digit as being in the ones place,
and this middle digit is being in the tens place,
and this leftmost digit as being in the one hundredths place. And if it were a longer
number, you'd have the one thousandths place and
ten thousandths place and beyond. Now how do we get from 1, 2, 3 to 123? Well, the arithmetic we were taught says
to do 100 times 1, and then 10 times 2, and then 1 times 3, and then
add all three of those products together, giving us, of course, 100
plus 20 plus 3, which of course is 123. Now that actually doesn't feel like
much progress, because we started at 123 and we got to 123, but
that's not quite that. We actually started with
the pattern of symbols-- 1, 2, 3, like a pattern of fingers, and
ultimately ascribe mathematical meaning to those symbols as 123. And it turns out computers, even though
they speak binary and therefore only have 0's and 1's at their disposal-- no
2 or 3 or 4 or 5 or 6 or 7 or 8 or 9, they count in exactly the same way,
and they represent inputs and outputs in fundamentally the same way. Consider this. Suppose that I have 3 bits at my
disposal, 3 switches or light bulbs, if you will. And those light bulbs or
switches are all initially off. I claim that I'll represent
those states as 0, 0, 0. And together, those three 0's represent
the decimal number we ourselves know is 0. Now why is that? Well, if here if we consider this
to be the ones place as before, but the middle digit we won't
consider being in the tens place, we're going to consider this
as being in the twos place, and this leftmost digit as
being in the fours place. Now why? Well in our decimal
system, there's a reason that we have the ones place, the
tens place, the hundredths place, and beyond-- those are actually all powers of 10. 10 to the 0, 10 to the 1,
10 to the 2, and beyond. Now in binary, bi meaning two, you only
have two digits at your disposal, 0 and 1, and so instead of using powers
of 10, we're going to use powers of 2. And indeed we have. 2 to the 0 is 1; 2 to the 1 is 2, 2
to the 2 is 4, and if we kept going, we'd get 8, 16, 32, 64, and beyond. And so this pattern of symbols
or this pattern of light bulbs or this pattern of switches, all
of which I'm proposing are off, simply represents now the
number we humans know as 0. Why? Well we have 4 times 0 plus
2 times 0 plus 1 times 0, which of course gives me 0 plus 0 plus
0, for a total numeric value of 0. So in binary, 0, 0, 0 or all
three switches off represents 0, and indeed, that's what we would expect. But we've instead we did this. Suppose that using binary, and
therefore these same three places-- ones place, twos place,
and fours place and beyond, we had a different pattern of symbols. How, for instance, might we represent 1? Well, if we had a pattern that's
0, 0, and 1 in binary, that's going to translate, of course, to the
decimal number we all know and agree is the number 1. How do I represent the number 2? Well, if I start from
scratch again, I'm going to represent the decimal number we know
as 2 as a pattern that's 0, 1, and 0. A switch that's off, another
that's on, another that's off. Why? 4 times 0 is 0, 2 times
1 is 2, 1 times 0 is 0, and so we're left with total of 2. And if we want to
represent 3, I don't even need to change the value
or state of those switches, I can simply turn this
one from off to on, and now I'm representing the number 3. If I want to count up to 4, I can
simply start fresh in the fours place, in the twos place, in
the ones place here-- 1, 0, and 0. And then lastly, suppose that
I want to count up even higher. 7 can simply be 1, 1, and 1, because
that gives me a 4, a 2, and a 1. 4 plus 2 is 6, plus 1 is 7. But what if I want to
represent the number 8? To represent the number 8, it would
seem that I need a little more space. And so I might need to introduce
the eights place so that I can have a 1, a 0, a 0, and a 0, eighth
place being 2 to the 3, but to do this, I do indeed need more hardware. I need another switch, another light
bulb, and another physical device inside of my computer. Now fortunately, our computers today
have millions of these tiny transistors and switches, so it's not all that
problematic to count as high as 8, but the implication is, that in order
to represent larger and larger values, do we need more physical storage? And indeed, thematic and computer
science is exactly that constraint. You can only do so much
computationally, perhaps, if you have a finite amount of
memory or a finite amount of hardware or switches. And we'll see, then, that there is
non-trivial implications for how much data you can represent, and
how many problems, therefore, you can solve. So that's it for numbers. Using just 0's and 1's can we count
as high as we want and represent any number of values so long as we
have enough bits at our disposal. But numbers only get
us so far in computing. After all, it's not just
Excel and calculators with which we use computers. It's of course to send
information textually, and to use letters of the alphabet,
and compose text messages and emails and documents and more, but how,
then, do you represent letters if all you have at the end of
the day are these 0's and 1's underneath the hood, so to speak? Well to do that, we all
simply need to agree, just like we've agreed to
speak in here English for now, on a particular system by which to
represent letters of the alphabet. Now we don't have all that many
choices at hand because if we only have 0's and 1's-- switches that
can be turned on and off-- that doesn't give us
terribly many options. But what we could do as humans is just
collectively agree that you know what? We are going to use a certain pattern
of 0's and 1's to represent capital A. And we're going to use a different
pattern of 0's and 1's to represent capital B. A different
pattern of 0's and 1's to represent C and D and all the
way through Z. And you know what? If we care about
lowercase letters, we'll use slightly different patterns of 0's
and 1's to represent those as well. And this is exactly what
humans did decades ago. We standardized on
something called ASCII-- the American Standard Code
for Information Interchange, which was the result of people
agreeing that we are going to use-- you know what? The number 65 in decimal
to represent capital A, and the number 66 to
represent capital B, and the number 97 to
represent lowercase a, and 98 to represent lowercase b,
and everything before and beyond. And so this system, this code, ASCII,
is simply a collective agreement that whenever you're in the context
of a text-based program and not a numeric-based program,
any patterns of bits, such as those that might
represent 65 in a calculator, should instead be interpreted
in the context of Microsoft Word or an SMS message or iMessage as
actually representing a letter. So how bits are interpreted
is entirely context-dependent. Depending on the program you have
opened, the same pattern of bits might be interpreted in different ways-- as numbers or letters
or even something else. So how, then, do we represent
so many other symbols that aren't just A through Z? Well it turns out that
the American Standard Code for Information
Interchange was indeed fairly skewed toward American
representation of letters, particularly English. But there are so many more characters
with accents and other symbology, let alone punctuation marks,
and in foreign languages, too, are there hundreds if not
thousands of distinct characters that you need to represent
ideally in order to send that text and write that document. But ASCII alone couldn't handle it. Because ASCII tended to use 7 or
maybe in some contexts 8 bits total, and with 8 bits-- or binary digits, if you will-- can you represent how
many possible values-- 2 times 2 times 2 times 2
times 2 times 2 times 2 times 2 for 8 possible bits, each
of which can be a 0 or 1. That only gives you
256 possible letters. Now that's more than enough for 26
letters of the English alphabet, A through Z, both uppercase
and lowercase, but once you start adding in punctuation and
accents and other characters, you very quickly run out. And so the world
produced Unicode instead, which doesn't just use 8
bits or 1 byte, if you will-- 1 byte simply meaning
quite simply a bit, but instead introduced a variable
length encoding of letters. And so some letters might
use 8 bits or 1 byte. Other letters might
use 2 bytes or 16 bits. Other letters might
use 3 bytes or even 4. And via 4 possible bytes maximally-- 2 to the 32 bits-- can you actually
represent 4 billion possible values, which is a huge number of values. But how does the system
of encoding actually work? Let's consider by way of an example. In both ASCII and
Unicode, ASCII now being a subset of Unicode insofar as
it only uses 1 byte per letter, you might represent A with-- capital A with 65, capital
B with 66, and so forth. And dot-dot-dot implies we've handled
the rest as well and they all happen to be back-to-back-to-back. So suppose in the
context of a text message you happen to receive a pattern of
0's and 1's that if you actually did the math in the ones
place and the twos place and so forth, actually worked out
to be the decimal number 72, 73, 33. Well what text message
have you just received? Of course at the end of the day,
computers only understand binary, and that information is
sent from phone to phone over the internet-- more
on that another time. But those 0's and 1's interpreted in
the context of a text messaging program will be interpreted not
as binary, not as decimal, but probably as ASCII or
more generally Unicode. And so if you've received a text message
that says, 72, 73, 33, well what might that spell? Well if we consider our
excerpt of a chart here, that's 72 and 73, of course,
translates to H and an I, and you wouldn't know it from that
preceding chart, but 33 it turns out-- just represents a very
excited exclamation point, because that, too, has a
representation in ASCII and Unicode. Now underneath the hood are patterns
of 0's and 1's, but we don't need to think about things at that level. Indeed, what we've done
here by introducing ASCII, and in turn, Unicode,
is introduce what's called an abstraction, which is a
technique in computer science via which we can think about a problem more
usefully at a higher level as opposed to the lowest level in which
it's actually implemented. After all, it'd be much easier to
receive a message that you view as H-I-! than actually as a pattern
of 0's and 1's that you then have to think about and
translate in your head to the actual message that was intended. So an abstraction is just a way of
taking a fairly low level if not very complicated representation
and thinking about it in a higher, more useful
level that lends itself more readily to communication, or
more generally to problem-solving. Now what about all of those
other characters on the screen? After all, on a typical
board might you have letters and numbers and punctuation,
but also these accented characters, too. Well thanks to Unicode, you indeed have
as many as 4 billion possibilities, and it's no surprise, perhaps,
that proliferating on the internet these days are a little
friendly faces called emojis that you might have received
yourself in text messages, emails, or some other form. Now these emojis here, though
they might look like smiley faces, are actually characters that companies
like Google and Microsoft and Apple have simply decided to
represent pictorially. Underneath the hood, anytime
you receive one of these emojis, you're actually receiving
a pattern of bits-- 0's and 1's that in the
context of an email or text message or some other program,
are being interpreted as and therefore displayed
as a corresponding picture that Google and Microsoft
and Apple and others have decided how to present visually. And indeed, different platforms present
these same patterns of 0's and 1's differently depending on how
the artist at those companies have decided to draw these emoji. Consider this one, for instance-- face
with tears of joy is its formal name, but more technically, this is encoded
by a very specific decimal number. Capital A is a 65, capital B is 66,
what is this face with tears of joy? Well it is the number 128,514. After all, if you've got
four billion possibilities, surely we could use
something within that range to represent this face
and any number of others. But underneath the hood,
if you actually looked at a text message in which you
received a face with tears of joy, what you've actually received is this
pattern of 0's and 1's somehow encoded on your phone or your computer. And so in the context of a
calculator or Microsoft Excel might we interpret some
pattern of bits as numbers. And in the context of a
text messaging program or Google Docs or
Microsoft Word might we interpret that same pattern of bits
as instead characters or even emoji. But what if we want to represent
different types of media altogether? After all, the world of computing is not
composed of just numbers and letters. There's also colors and images and
videos and audio files and more, but at the end of the day, if
we only have at our disposal 0's and 1's, how do we go about
representing all of those richer media? Well as before, we humans just
need to agree collectively to represent those media using
0's and 1's in some standard way. Perhaps one of the most familiar
acronyms, even if you've never really thought about what it is that you
might see on a computer, is this one-- RGB. It stands for red, green, and blue. And this refers to the process
by which many computers represent colors on the screen. In fact, what is a color? Well, you might represent the color
black and white simply by a single bit. 1 might represent black and 0 might
represent white, or vice versa. We in that case only have what's
called 1-bit color, whereby the bit is either on or off implying
black or white or white or black. But with RGB, you actually
use more bits than just one. Conventionally you would
use 8 bits to represent red, 8 bits to represent green,
and 8 bits to represent blue. So 24 bits total, the first 8 of which
or first byte is red, second is green, third is blue. Now what does that mean? Well in order to represent
a color, you simply decide, how much red do
you want to add to the mix? How much green and how much blue? Combining essentially
those wavelengths of light in order to see a disparate
color on the screen. And if you think about red, green, and
blue now as being single bytes each, that means you have a possible
range of values of 0 to 255. After all, if a single byte is 8
bits, and with 8 bits, each of which can be a 0 or 1-- so 2 times 2 times 2
times 2 times 2 times 2 times 2 times 2 gives you 256 values. So if you start from 0, you
can count as high as 255. Suppose that you had 72 as your amount
of red for a color, and 73 for green, and 33 for blue. That is to say in the
context of a text messaging program, this same pattern of bits-- 72, 73, 33 presented as decimal
represented a textural message of HI!. But suppose you were to see
that same pattern of bits instead in a photo-editing program like
Photoshop, or in the context of opening an image on the screen. Well that same pattern of bits,
or in turn, decimal numbers, just represents some amount of
red, some amount of green, and some amount of blue. So 72 is a decent
amount-- a medium amount of red out of 255 total values,
73 is a medium amount of green, and 33 is a little bit of blue. If you combine now all of these
three values in your mind's eye by overlapping them, what you
get is this shade of yellow. Which is to say that if
you wanted to represent this particular shade
of yellow would you use three bytes whose values
are respectively 72, 73, and 33. And in the context of Photoshop
or some other graphical program would that pattern be interpreted as and
displayed as this color on the screen. Now this color is deliberately
presented here as just 1 square-- 1 dot, if you will, or
more properly, 1 pixel. Because what is an image? Well if you've ever looked really
close on your computer screen or on your phone or even on
your TV, you might actually see all of the thousands
of dots that compose it. This is getting harder to do
on today's modern hardware, because if you have something
like a retina display, that means these dots or pixels
or ever so tiny and ever so close together that it's actually hard
now for the human eye to see them, but they are in fact there. Any image on the screen,
any photo on the screen is really just a pattern
of dots or pixels-- a grid of dots from left
to right, top to bottom. And every one of those dots
or pixels on the screen probably has 24 bits representing
its particular color. Indeed, a picture is essentially
just a pattern of color so small that you don't really
see all of those dots, but you see the net effect of a
beautiful photo or image or something else altogether. So consider how we might see these. When Apple or Google or Microsoft or
any other company that supports emojis presents those emojis on
the screen, we of course see them as images because
Apple and Microsoft and Google have decided what images
shall be displayed when it receives a certain pattern of bits. But how are they storing
those images and how is your Mac or PC or phone
displaying that image to you? Well it's hard to see where those
bits are let alone the pixels in it at this particular size, but if I go
ahead and zoom in and zoom in and zoom in a little more still,
you can begin to see the individual dots or squares or
pixels that compose even this one emoji. And so just to display this smiling
face, this face with tears of joy, do you need 24 bits for this
pixel, 24 bits for this pixel, 24 bits for this pixel, 24 bits for
this pixel, and so on and so forth? So if you've ever noticed that when you
download an image or download a photo or receive one in the mail, odds
are it's not in the order of bytes. It's probably kilobytes
for thousands of bytes, or megabytes for millions of bytes. How in the world does a simple photo
have so many bytes or in turn bits? It's because every one
of the dots in that image takes up some amount of space. So to represent yellow might we use
a certain pattern of 0's and 1's or 3 bytes. To represent black or
gray or any other number-- colors on the screen might we represent
those using different patterns still. Now that's it for images,
but what about videos? A video is really just
a sequence of images being presented to you so quickly
that you and your human eye don't notice that you're really just
watching image after image after image. In fact, it's conventional
in Hollywood and elsewhere to display as many as 24
images or frames per second, or perhaps even 30 frames
or images per seconds. And so even though they are
in fact distinct images, you are seeing them so quickly,
and the characters on the screen are moving within each
frame or image ever so slightly, that it creates
ultimately the illusion of movement. And if you think back
to childhood, you might have had one of those
old-school flip books on which was drawn some kind of
cartoon one frame or image at a time. And if you flipped through
that flip book one page ever so quickly again and
again and again and again, letting the physics of
it all take its toll, you might see a character or the picture
in the book actually appearing to move, but it's not. All of those pages are just images and
all of those images are not moving, but when you see them so fast
and so quickly in succession does it appear to create
that same movement. So these things here,
Animojis in Apple's world, are just videos ultimately that
track your facial movements and such, but all they really are are a
pattern of bits, each set of which represents an image. And each of those images is
displayed to you on your phone so quickly that you see
the illusion of movement. And so here again is an abstraction. What is a video? Well, a video is a collection of images. Well what's an image? An image is a collection of pixels. Well what's a pixel? A pixel is simply some number of
bits representing some amount of red and green and blue. Well what is a bit? It's simply a representation
digitally of something being present, like electricity or not. And so it's much more
powerful now to be operating at this level of abstraction-- talking
about videos for what they are, and not getting into the weeds of the lowest
level that at the end of the day, it's really just some
form of electricity that we call bits, that
we in turn call pixels, that we in turn called images,
that we in turn call videos. And we can operate in a number
of these layers of abstraction in order to represent
inputs to some problem. To create a film, to convert
a film, to display a film might be just one of
the problems to solve. All right, so we now have a
way to represent information, whether it's binary, decimal, or
some other approach altogether. So it's time to solve problems. But how do we go about solving problems? What is inside of this black box? Well that's where our
so-called algorithms are-- step-by-step instructions
for solving some problem. And to solve these
problems, we simply need to decide first what problem to solve. Well let's consider a familiar one,
like that of looking up someone's name and number in a phone book. Now these days, the phone book, of
course, takes a more digital form, but at the end of the day, these
two formats are pretty equivalent. After all, here in my hand is a
phone book with maybe 1,000 pages, on which are bunches of names
and numbers sorted alphabetically from A to Z. On my phone, meanwhile, while I
might have to click an icon instead, do I have contacts that
are similarly sorted from A to Z by first name or last
name, and touching any one of them pulls up its number. So this one, though, is a
little more physical for us to see the solution to a problem,
like finding, say, Mike Smith. Mike Smith may or may not be in
this phone book, but if he is, my goal quite simply is to find him. So let me try this. Let me try opening this
book, and then one step at a time looking down for Mike Smith. And if I don't see him,
move on to the next page. If I still don't see him, move on
to the next page, and next page, and next page, and so
forth, one page at a time. I propose that we consider
first-- is this algorithm correct? Opening the phone book, looking
down, turning page, and repeating. Well, at some point, if
Mike is in this phone book, I am going to reach him, at
which point I can call him. And if Mike Smith is
not in this phone book, I'm eventually not going
to reach him, but I am going to hit the end of
the phone book, at which point I'll presumably stop. But it doesn't feel terribly
efficient, however correct it may be. Why is it not efficient? Well I'm starting at the
beginning of the phone book. I know it's alphabetical,
and yet I'm turning one page at a time through the A's, through
the B's, through the C's and beyond, just waiting and waiting until
I finally reach Mike Smith. Well how might I do this better? Well, I learned in grade
school not only how to count by ones, but
also an account by twos. So instead of 1 page, 2 page, 3
page, 4, why don't instead do 2, 4, 6, 8, 10, 12-- flying
through the phone book. It certainly sounds faster, and it
is, but is that approach correct? Well, I propose that
there could be a problem. I might just get unlucky, and once I do
reach the S section of the phone book, what if by bad luck Mike Smith is
sandwiched between two of those pages? And therefore I hit the T section
and Z section and run out of pages? I might conclude incorrectly
that Mike Smith is not, in fact, in this phone book. But do I have to throw out
the algorithm altogether? Probably not, right? I could be a little clever here
and improve this algorithm, maintaining its efficiency,
but fixing its correctness. After all, if I do see a page on which
there are last names starting with T, while I can stop and say, well we
know Mike is not farther than this because Smith starts with S,
so I can double-back hopefully just one page or few,
and therefore fix what's otherwise an incorrect
approach in this algorithm. In fact, I can be even
smarter than that. As soon as I see someone's name
that starts with S-N instead of S-M, I can similarly flip back
one page and therefore ensure that I can go twice as fast
through most of the phone book, and only once I hit that section do I
have to double-back one or terribly few pages. So I get the overall performance
gains, and I also solve the problem. But honestly, if you even
use this technology anymore, odds are you don't start at the
beginning turning one or two pages at a time. If you're like me, you
probably more instinctively open up roughly to say the middle. You'll look down and you realize,
oh, I haven't found Mike yet because I'm actually
here in the M section. But what do you know about
where you've just jumped? Well Mike Smith is
indeed to the right here, because he's in the name starting
with S. But if I'm in the M section, I do know something else. I know that Mike is not in the
left-hand section of this book-- he is not among the A through M's. And so I can both literally and
figuratively tear this problem in half, throw half of it away, thereby
leaving myself with just, say, 500 pages out of 1,000. So whereas that first algorithm I
took one byte at a time out of it-- one page at a time, the second algorithm
I took two bytes at a time out of it-- two at a time. Well with this algorithm, I just
took 500 bytes out of the problem all at once, thereby getting closer
to the output to this problem much more quickly. What do I then do? Well, I simply am probably going
to apply that same algorithm again and again and again-- dividing
and conquering this phone book, if you will. Jumping roughly to the middle
now, looking down-- dah, darn it! I ended up in the T section
now, so I've gone too far, but I know Mike is not in the
right-hand side of this book-- he's not among the T's through Z's. So I can again tear the problem
in half, throw that half away, thereby leaving me with just a
quarter of the original problem, say 250 pages down, down
from 1,000, finding Mike ever so much more quickly than before. And if I repeat and I repeat and I
repeat, each time actually looking down looking for Mike, I should hopefully
find myself ultimately left with just a single page on which Mike
either is or is not, at which point I can call him or quit. So just how much more quickly did I
find Mike Smith via this algorithm than the first two? Well in a 1,000-page phone book, I might
get unlucky and Mike's not even there, and so in the worst case, I might
look in as many as 1,000 pages. In the second algorithm, I might also
get unlucky and not ever find Mike because he's just not there, and
therefore look at as many 500 pages. But in this third algorithm
where I'm iteratively dividing and conquering again and
again, splitting the problem in half so I go from a very large input to a
much smaller to an even smaller problem still again and again, how many times
might I split that phone in half? Well if I start off with roughly
1,000 pages and I split it in half, that gives me 500,
250, 125, and so forth. Turns out that I can split 1,000
pages roughly 10 times in total before I'm left with
just that final page. And so consider the
implications of that. First algorithm-- 1,000 steps, maybe. Second algorithm-- 500 steps, maybe. Third algorithm-- 10 steps, maybe,
to find Mike Smith before I can call or decide he's not there. And that's a pretty powerful technique. And if we extrapolate from this
relatively small example or phone book to maybe a much larger
phone book, say a phone book that a bit nonsensically has
4 billion pages in it, well, how many steps might those
three algorithms take? The first algorithm
might take 4 billion. The second algorithm
might take 2 billion. The third algorithm might
take 4 billion divided by 2 divided by 2 divided by 2-- you can divide 4 billion and half
roughly 32 times only, at which point you're left with just that one page. So 4 billion steps for the
first algorithm, 2 billion steps for the second algorithm, but
32 steps for the third algorithm. And simply by harnessing this intuition
that we all probably already have in formalizing it in more methodical
form-- in algorithmic form, if you will, can we solve problems
using some of the building blocks that we already
have at our disposal. And indeed, problem-solving
in computer science is all about implementing
these algorithms and applying them to
problems of interest to you. But how do we think about just
how good this algorithm is? It sort of stands to reason that
it has to be minimally correct. But how efficient is it? I've been talking in
terms of absolute numbers. 1,000, 500, or 10, or 4
billion, 2 billion, and 32. But it would be nice to compare
algorithms in a more general way so that I can use the same
searching algorithm, if you will, to find not Mike Smith,
but someone else. Or to search not a phone book, but
Google or search engine more generally. And so computer scientists also
have a more formal technique via which to describe the
efficiency of algorithms. And we might consider these not even
with numbers or formulas, per se, but with simple visual
representations thereof. Here, for instance, is a simple plot. On the x or horizontal axis is
going to be the size of the problem. How many pages are in the phone
book, which we'll generally call n, where n is a number for
a computer scientist. Much like a mathematician
might go with x or y or z, computer scientists might
tend to start counting with n. On the vertical or
y-axis here, we'll say this is going to be the
time to solve a problem, be it a phone book or something else. And that might be seconds
or minutes or page turns or some other quantifiable
unit of measure. So how do I go about describing
that first algorithm where I turn one page at a time? Well I propose that it manifests a
linear relationship, or n, or a line, a straight line on the chart. Why is it a straight line? Well, if Verizon or the local
phone company, for instance, next year adds just one more page to
that phone book, that means for me, it might take me one more
unit of measure of time to find someone like Mike Smith, because
we've gone from 1,000 to 1001 pages. So the slope of this line indeed
can be straight or linear. That second algorithm now where
I'm flying through the phone book twice as fast is really fundamentally
the same algorithm or same shape. It just so happens that for
a given size of the problem, it's not going to take
this many seconds n, it's going to take me n
over 2 seconds because I'm doing twice as much work at a time. And so this line, too,
is linear or straight, but we might describe it in terms of n
over 2, where n is the number of pages but I only have to look at half of them
except for maybe that very last one if I double-back, and so its
shape is fundamentally the same. So better, but not fundamentally
better, it would seem. But that third and final algorithm
has a fundamentally disparate shape, depicted here with our green curved line
which has a logarithmic slope to it. So if n is the number of
pages, and the algorithm in use is division and conquering, it turns
out that has a logarithmic slope. Now what does that actually mean? Well it turns out that if Verizon
adds one more page to the phone book next year, that is a drop in the
bucket for that final algorithm where I'm just tearing the problem in half. But more powerfully, if Verizon
doesn't just add one page, perhaps to neighboring towns merged
together and form one massive phone book next year, going from
1,000 pages to 2,000 pages all in one big, fat
book, well how many more steps does it take that third
algorithm to search for anyone in it? Just one. Because with 2,000 pages, you can take
1,000-page byte out of that problem all in one fell swoop even though
it's twice as big as before. And so even as the size of the problem
gets bigger and bigger and bigger and even farther away,
the amount of time it takes to solve a problem
using that algorithm only increases ever so slightly. That's not a one-to-one
or a one-to-two ratio, it's instead said to be logarithmic. And this, then, is a
fundamentally different curve-- it's a fundamentally better algorithm in
that the problem can grow exponentially large, and the amount of time
it takes to solve it itself grows far less quickly than that. Now it's one thing to talk
about all these algorithms. At some point we need
to put them to paper, or more technically, program
them into a computer. So how do we go about formalizing these
step-by-step instructions in such a way that all of us can agree that
they are correct, all of us can discuss their efficiency,
and most importantly, all of us can program a device to
execute these steps for us? Well let me propose
that we first implement an algorithm like that of searching a
phone book in what's called pseudocode. Pseudocode is not a formal programming
language, it has no one definition. It generally means using short,
succinct phrases, often in English, to describe your algorithm
step-by-step-by-step in such a way that you yourself understand it, and anyone
reading it can also understand it. Now along the way do you have
to make certain assumptions, because you need to decide at what layer
of abstraction to actually operate. And we'll take a look at
where the opportunities there are, but let me propose
that we implement this algorithm for
searching for Mike Smith, for instance, in the following way. Now this is an algorithm, or really, a
program, albeit written in pseudocode. And even though written in
pseudocode or English, it manifests certain constructs
that we're actually going to see in any number of today's
popular programming languages, a programming language being
an English-like language that humans have decided
can be used in a certain way to tell a computer what to do. Now this pseudocode
is just for us humans, but what are some of the
fundamental constructs in it that we'll see in
actual programming languages? Well first highlighted
in yellow here are all of the verbs or actions that more
technically we'll call functions. Statements that tell the computer,
or in this case, human what to do. Pickup, open to, look at, call,
open or open, or quit-- all of them calls to actions. In the context of a computer
with these same types of verbs, we more traditionally call it functions. What else is laying
inside of this program? Well these pictured here in yellow. If, else if, else if,
else, well these represent what are called conditions or branches. Forks in the road, so
to speak, via which you make a decision to go this way
or this way or this way or that. But how do you decide
down which road to go? You need what are called
Boolean expressions. Named after mathematician
Boole, these Boolean expressions are short phrases that either
have yes or no answers, or true or false answers, or if you
will, 1 or 0-- on or off answers. Smith is among names,
Smith is earlier in book. Smith is later in book-- each of
them could be asked effectively with a question mark
to which the answer is yes or no, and based
on that answer can you go down any one of those four roads. And then lastly is this-- go back to step 2 or go back to step 2. Highlighted in yellow here's an
example of a loop, a deliberate cycle that gets you to do
something again and again and again until some
condition is no longer true and you eventually quit
or call Mike instead. And so looping in a program
allows you to write less code, but do something again
and again and again just by referring back to some
work that you've already done. Now these constructs, though we've
seen them in the context of pseudocode and searching a phone book, can be
found in any number of languages and programs, be it
Python or C++ or Java. And so when it comes to
programming a computer, or more generally solving a
problem with a computer, we'll ultimately express
ourselves in fundamentally the same way, with functions,
conditions, and Boolean expressions, and loops, and many
other constructs still, but we'll do it in a way that's
methodical, we'll do it in a way wherein we've all
agreed how to represent the inputs to that process
and the outputs thereto, and ultimately use that representation
and these layers of abstraction in order to solve our problems. But sometimes too much abstraction
is not always a good thing. Sometimes it's both
necessary and quite helpful to get into the weeds of the lower level
implementation details, so to speak. And let's see if we
can't see this together. Take out if you could a pen or
pencil, as well as a sheet of paper, and in just a moment allow me
to program you, if you will. I'll use a bit of verbal
pseudocode to program you to draw a particular
picture on that sheet of paper. The onus is on me to choose an
appropriate level of abstraction so that you're on the
same page, if you will, as I, so that ultimately what I program
is what you implement correctly. Let's try. All right. First, go ahead if you
would and draw a square. All right. And next to that square to the right,
go ahead if you could and draw a circle. To the right of that circle, go
ahead and draw a diagonal line from bottom left to top right. All right, and lastly,
go ahead if you would and draw a triangle to
the right of that line. All right. Now hopefully you're quite
proud of what you just drew, and it's perfectly correct as you
followed my instructions step-by-step. So surely what you drew looks like this? Well that's a square, to a circle to
the right, a straight line from bottom left to top right, to the
right of which is a triangle. Now with some probability, you
probably didn't draw quite this. Fortunately there's no one there to
see, but where might you have gone wrong or maybe where might I have gone wrong? Well I didn't exactly specify just how
big that square should be, let alone where it should be on the page. And you might have quite reasonably
drawn it right in the center-- maybe the top left or the top right. But if you drew in a place
that I didn't intend, you might not have had enough
room for that actual square unless you flipped
perhaps the paper around. As for the circle, I said
draw it to the right, but I didn't technically say that it
should be just as wide or just as high as that square, so there might
have been a disparity there. As for that straight line, I said
from bottom left to top right. I knew that I meant from
the bottom of the circle to the top right of what
would then be the triangle, but maybe you started from here
and all the way up to here. I was making an assumption,
and that, perhaps, was not necessarily precise enough. And then lastly, the triangle. Pretty sure that I learned
growing up that there's all sorts of different triangles. Some have right angles, some have
acute angles or obtuse angles, or some triangles or even isosceles. Now I happen to intend this one
and you might have drawn just that, but if you did, you
probably got a bit lucky. And unfortunately when
it comes to computing, you don't want to just get lucky
when programming your computer, you want to actually ensure that
what you write is what it does. And in fact, if you've
ever seen on your Mac or PC a spinning beach ball
or hourglass indicating that something is taking quite long,
or the computer freezes or reboots, odds are that's because some
programmer, now like myself, might not have been specific
enough to the computer as to what to do in all circumstances,
and so sometimes unforeseen behavior happens. The computer starts thinking forever. The computer reboots
or crashes or freezes, which is simply the result of
one or more lines of code-- not in pseudocode, but in Java
or C++ or something else-- not having anticipated some user's
input or some particular direction. So maybe it would help
to operate not quite at this high of a level of abstraction. And indeed, all of these
shapes are abstractions. A square, a circle, a line, and a
triangle are abstractions in the sense that we've ascribed words to them
that have some semantic meaning to us, but what really is a square? Well again, if we go
back to grade school, a square is a rectangle, all of
whose sides are of equal length and whose angles are all right angles. But very quickly, my own
eyes start to glaze over when you get into the weeds of that
implementation, and so all of us have agreed since to
just call it a square. And same for circle
and line and triangle, but even then, there are variations. So let's get a bit more
into the weeds this time and let me take an approach
with the second of two pictures, programming you this time at
a much lower level of detail. Go ahead and take out another
sheet of paper or flip over the one that you already have. Toward the top middle of that sheet of
paper, roughly one inch from the top, go ahead and put down the
tip of your pen or pencil. Keeping your pen or pencil
applied to the paper, start to draw from the
top middle of the paper down toward the left at roughly a
45 degree angle for two or so inches and then stop. Keeping the tip of your
pen or pencil on the paper, go ahead and draw another line
perhaps three inches straight down toward the bottom of the
paper, stopping two or so inches shy of the bottom of the paper. Then, if you would,
hook a right, this time moving at a 45-degree angle toward
the bottom-middle of the paper, stopping ultimately one or so
inches from the bottom of the paper. Then bear right yet again, this
time going up at a 45-degree angle upward toward the middle of the paper,
this time extending a couple of inches, and then stopping at the same height
your previous line started at. Now draw a vertical line
about three inches high, stopping exactly where
two lines ago started. And if you're on the same
page, no pun intended, go ahead and hook a left this time, going back
a 45-degree angle toward the top-middle of the page, hopefully
closing the loop, if you will, and connecting your very first
dot to the last place you stopped. At this point, you hopefully
have a shape on your page that has six total sides. What comes next? Go ahead and from the top-middle of the
page follow the line you already drew down to the left at a 45-degree angle
and stop where you made a corner before-- before you went straight
down on the page. And instead of following
that vertical line, go ahead and go down 45 degrees
to the right a couple of inches, stopping once you're directly
below the top-middle of the dot you initially drew. Then go ahead and do two things
from the point you're now at. Go ahead and draw the opposite line up
and to the right at a 45-degree angle, connecting that line to one of
the corners you previously made. And then from that same initial
point, draw a vertical line down to the bottom-middle of the page
where you had another angle still. Lift up your pen or pencil and take
pride in what you've just drawn, because it is most surely exactly this. Was it? I don't know, because that was quite
painful for me to say in this program because I was operating
at such a low level really with no abstractions other
than line and point and angle. I was instead directing you much like a
paintbrush on a page to go up and down and left and right,
collectively creating what will be an abstraction that I
would dare say call a cube, but a cube that I did not name by name. Had I said draw a cube, frankly
we learned from last time that that could have opened
up multiple possibilities. What should the angles be? What should the size be? What should the rotation there of it be? And so an abstraction alone
might not have been enough if I wanted you to draw
precisely this cube. But if I do want you to be ever so
correct and consistent with what I had in mind on the
screen here, I really did need to go down to such
a low level, so to speak, that I was talking in terms of
edges' length and angles therein. And even then, I might
not have been as clear as I might have been-- more
precise measurements and angles and such might have been
ideal so that you ultimately end up precisely with the same thing. So if you veered off-track, not a
problem, my fault more so than yours. But what's the takeaway here? Well at some point it
would be nice not to have to write programs at such a low
level again and again and again. Wouldn't it be nice if just one
of us, the best artist among us could implement code-- write code-- that
implements a cube, that implements a square, a circle,
a line, and a triangle? But when using those drawing
functions, if you will, that someone else has
implemented, you should probably get into the habit of asking
me additional questions. You want a square. Well how long should each edge
be and where should its position be on the page or the canvas? Same goes for circle
or line or triangle-- what are the angles and
lengths and positioning be? And if you can parametrize
your functions in that way-- in other words, write your functions
in such a way that they themselves take input so that what a function does
is at the end of the day produce output subject to those inputs, we
have then solved a problem. Not necessarily the one
problem at hand, but we've solved other people's problems. And indeed, that's what the world
of programming ultimately is-- writing code to solve
problems, and ideally solving those problems
once and only once. And whether it's internally
within your firm or your company, writing code in such a way that
other people-- maybe yourself and your colleagues can then use it--
or in the case of open source software, anyone in the world can use it. So that in an ideal world,
only one of us humans ever need write code in some language to
draw a circle or cube or square or line or triangle, and everyone
else in the world can stand on his or her
shoulders, use that code to build their tool
or product on top it. This, then, was computational thinking. You have inputs, you want outputs,
and in between, you have algorithms. You just first need to decide how to
represent those inputs and outputs, and you have to decide for yourself
at what level of abstractions to work.