Hi, I’m Carrie Anne, and welcome to Crash
Course Computer Science! As we’ve touched on many times in this series,
computers are incredible at storing, organizing, fetching and processing huge volumes of data. That’s perfect for things like e-commerce
websites with millions of items for sale, and for storing billions of health records
for quick access by doctors. But what if we want to use computers not just
to fetch and display data, but to actually make decisions about data? This is the essence of machine learning – algorithms
that give computers the ability to learn from data, and then make predictions and decisions. Computer programs with this ability are extremely
useful in answering questions like Is an email spam? Does a person’s heart have arrhythmia? What video should youtube recommend after
this one? While useful, we probably wouldn’t describe
these programs as “intelligent” in the same way we think of human intelligence. So, even though the terms are often interchanged,
most computer scientists would say that machine learning is a set of techniques that sits
inside the even more ambitious goal of Artificial Intelligence, or AI for short. INTRO Machine Learning and AI algorithms tend to
be pretty sophisticated. So rather than wading into the mechanics of
how they work, we're going to focus on what the algorithms do conceptually. Let’s start with a simple example: deciding
if a moth is a Luna Moth or an Emperor Moth. This decision process is called classification,
and an algorithm that does it is called a classifier. Although there are techniques that can use
raw data for training – like photos and sounds – many algorithms reduce the complexity
of real world objects and phenomena into what are called features. Features are values that usefully characterize
the things we wish to classify. For our moth example, we’re going to use
two features: “wingspan” and “mass”. In order to train our machine learning classifier
to make good predictions, we’re going to need training data. To get that, we’d send an entomologist out
into a forest to collect data for both luna and emperor moths. These experts can recognize different moths,
so they not only record the feature values, but also label that data with the actual moth
species. This is called labeled data. Because we only have two features, it’s
easy to visualize this data in a scatterplot. Here, I’ve plotted data for 100 Emperor
Moths in red and 100 Luna Moths in blue. We can see that the species make two groupings,
but…. there’s some overlap in the middle… so
it’s not entirely obvious how to best separate the two. That’s what machine learning algorithms
do – find optimal separations! I’m just going to eyeball it and say anything
less than 45 millimeters in wingspan is likely to be an Emperor Moth. We can add another division that says additionally
mass must be less than .75 in order for our guess to be Emperor Moth. These lines that chop up the decision space
are called decision boundaries. If we look closely at our data, we can see
that 86 emperor moths would correctly end up inside the emperor decision region, but
14 would end up incorrectly in luna moth territory. On the other hand, 82 luna moths would be
correct, with 18 falling onto the wrong side. A table, like this, showing where a classifier
gets things right and wrong is called a confusion matrix... which probably should have also
been the title of the last two movies in the Matrix Trilogy! Notice that there’s no way for us to draw
lines that give us 100% accuracy. If we lower our wingspan decision boundary,
we misclassify more Emperor moths as Lunas. If we raise it, we misclassify more Luna moths. The job of machine learning algorithms, at
a high level, is to maximize correct classifications while minimizing errors On our training data, we get 168 moths correct,
and 32 moths wrong, for an average classification accuracy of 84%. Now, using these decision boundaries, if we
go out into the forest and encounter an unknown moth, we can measure its features and plot
it onto our decision space. This is unlabeled data. Our decision boundaries offer a guess as to
what species the moth is. In this case, we’d predict it’s a Luna Moth. This simple approach, of dividing the decision
space up into boxes, can be represented by what’s called a decision tree, which would
look like this pictorially or could be written in code using If-Statements, like this. A machine learning algorithm that produces
decision trees needs to choose what features to divide on…and then for each of those
features, what values to use for the division. Decision Trees are just one basic example
of a machine learning technique. There are hundreds of algorithms in computer
science literature today. And more are being published all the time. A few algorithms even use many decision trees
working together to make a prediction. Computer scientists smugly call those Forests…
because they contain lots of trees. There are also non-tree-based approaches,
like Support Vector Machines, which essentially slice up the decision space using arbitrary
lines. And these don’t have to be straight lines;
they can be polynomials or some other fancy mathematical function. Like before, it’s the machine learning algorithm's
job to figure out the best lines to provide the most accurate decision boundaries. So far, my examples have only had two features,
which is easy enough for a human to figure out. If we add a third feature, let’s say, length
of antennae, then our 2D lines become 3D planes, creating decision boundaries in three dimensions. These planes don’t have to be straight either. Plus, a truly useful classifier would contend
with many different moth species. Now I think you’d agree this is getting
too complicated to figure out by hand… But even this is a very basic example – just three features and five moth species. We can still show it in this 3D scatter plot. Unfortunately, there’s no good way to visualize
four features at once, or twenty features, let alone hundreds or even thousands of features. But that’s what many real-world machine
learning problems face. Can YOU imagine trying to figure out the equation for a hyperplane rippling through a thousand-dimensional decision space? Probably not, but computers, with clever machine
learning algorithms can… and they do, all day long, on computers at places like Google,
Facebook, Microsoft and Amazon. Techniques like Decision Trees and Support
Vector Machines are strongly rooted in the field of statistics, which has dealt with
making confident decisions, using data, long before computers ever existed. There’s a very large class of widely used
statistical machine learning techniques, but there are also some approaches with no origins
in statistics. Most notable are artificial neural networks,
which were inspired by neurons in our brains! For a primer of biological neurons, check
out our three-part overview here, but basically neurons are cells that process and transmit
messages using electrical and chemical signals. They take one or more inputs from other cells,
process those signals, and then emit their own signal. These form into huge interconnected networks
that are able to process complex information. Just like your brain watching this youtube
video. Artificial Neurons are very similar. Each takes a series of inputs, combines them,
and emits a signal. Rather than being electrical or chemical signals,
artificial neurons take numbers in, and spit numbers out. They are organized into layers that are connected
by links, forming a network of neurons, hence the name. Let’s return to our moth example to see
how neural nets can be used for classification. Our first layer – the input layer – provides
data from a single moth needing classification. Again, we’ll use mass and wingspan. At the other end, we have an output layer,
with two neurons: one for Emperor Moth and another for Luna Moth. The most excited neuron will be our classification
decision. In between, we have a hidden layer, that transforms
our inputs into outputs, and does the hard work of classification. To see how this is done, let’s zoom into
one neuron in the hidden layer. The first thing a neuron does is multiply
each of its inputs by a specific weight, let’s say 2.8 for its first input, and .1 for it’s
second input. Then, it sums these weighted inputs together,
which is in this case, is a grand total of 9.74. The neuron then applies a bias to this result
- in other words, it adds or subtracts a fixed value, for example, minus six, for a new value
of 3.74. These bias and inputs weights are initially
set to random values when a neural network is created. Then, an algorithm goes in, and starts tweaking
all those values to train the neural network, using labeled data for training and testing. This happens over many interactions, gradually
improving accuracy – a process very much like human learning. Finally, neurons have an activation function,
also called a transfer function, that gets applied to the output, performing a final
mathematical modification to the result. For example, limiting the value to a range
from negative one and positive one, or setting any negative values to 0. We’ll use a linear transfer function that
passes the value through unchanged, so 3.74 stays as 3.74. So for our example neuron, given the inputs
.55 and 82, the output would be 3.74. This is just one neuron, but this process
of weighting, summing, biasing and applying an activation function is computed for all
neurons in a layer, and the values propagate forward in the network, one layer at a time. In this example, the output neuron with the
highest value is our decision: Luna Moth. Importantly, the hidden layer doesn’t have
to be just one layer… it can be many layers deep. This is where the term deep learning comes
from. Training these more complicated networks takes
a lot more computation and data. Despite the fact that neural networks were
invented over fifty years ago, deep neural nets have only been practical very recently,
thanks to powerful processors, but even more so, wicked fast GPUs. So, thank you gamers for being so demanding
about silky smooth framerates! A couple of years ago, Google and Facebook
demonstrated deep neural nets that could find faces in photos as well as humans – and
humans are really good at this! It was a huge milestone. Now deep neural nets are driving cars, translating
human speech, diagnosing medical conditions and much more. These algorithms are very sophisticated, but
it’s less clear if they should be described as “intelligent”. They can really only do one thing like classify
moths, find faces, or translate languages. This type of AI is called Weak AI or Narrow
AI. It’s only intelligent at specific tasks. But that doesn’t mean it’s not useful;
I mean medical devices that can make diagnoses, and cars that can drive themselves are amazing! But do we need those computers to compose
music and look up delicious recipes in their free time? Probably not. Although that would be kinda cool. Truly general-purpose AI, one as smart and
well-rounded as a human, is called Strong AI. No one has demonstrated anything close to
human-level artificial intelligence yet. Some argue it’s impossible, but many people
point to the explosion of digitized knowledge – like Wikipedia articles, web pages, and
Youtube videos – as the perfect kindling for Strong AI. Although you can only watch a maximum of 24
hours of youtube a day, a computer can watch millions of hours. For example, IBM’s Watson consults and synthesizes
information from 200 million pages of content, including the full text of Wikipedia. While not a Strong AI, Watson is pretty smart,
and it crushed its human competition in Jeopardy way back in 2011. Not only can AIs gobble up huge volumes of
information, but they can also learn over time, often much faster than humans. In 2016, Google debuted AlphaGo, a Narrow
AI that plays the fiendishly complicated board game Go. One of the ways it got so good and able to
beat the very best human players, was by playing clones of itself millions and millions of
times. It learned what worked and what didn’t,
and along the way, discovered successful strategies all by itself. This is called Reinforcement Learning, and
it’s a super powerful approach. In fact, it’s very similar to how humans
learn. People don’t just magically acquire the
ability to walk... it takes thousands of hours of trial and error to figure it out. Computers are now on the cusp of learning
by trial and error, and for many narrow problems, reinforcement learning is already widely used. What will be interesting to see, is if these
types of learning techniques can be applied more broadly, to create human-like, Strong
AIs that learn much like how kids learn, but at super accelerated rates. If that happens, there are some pretty big
changes in store for humanity – a topic we’ll revisit later. Thanks for watching. See you next week.