Hi! My name is Alejandro Carrillo, and i'm a
robotics engineer at an agricultural company. Specifically my team uses
machine learning and robotics and computer vision, to identify the
difference between the crops that we eat, and weeds that take nutrients away. We're able
to remove those weeds without any chemicals. My name is Kate Park and I work at Tesla
Autopilot. I build self-driving cars. Any place where there can be resources used
more efficiently, is a place where technology can play a role. But of course, one of the best
impactful ways of AI is through self-driving cars. Have you ever wondered how a computer can
recognize a face, or drive a car? Or maybe you've wondered why it's so hard for a computer to tell
the difference between a dog and a bagel? Well it all has to do with something called computer
vision: the way machines interpret images. Let's take a look at a simple example of how
computers learn to see. Here are two shapes: an X and an O. At some point you've learned
the names for these shapes, but a computer looking at these images for the first time just
sees a bunch of little squares, called pixels. Each pixel has a numerical value for a
computer to see. It needs to make sense of these numbers to figure out what is in
the picture. In traditional programming, you could tell the computer to check which
pixels are filled to decide what shape it sees. If the center and corner pixels
are full, then it's an X. If the center and corner pixels are empty, then
it's an O. Traditional programming works great for this kind of thing, but what about asking the
computer to recognize these images? What might the computer think these are? We gave the computer
a strict definition of what an X looks like, but these images don't fill all the
necessary pixels to fit the definition. So if the computer doesn't
think these are X's at all, in fact the computer thinks these are O's
because the corners and center pixels are blank, and that fits the definition
of an O that we gave it. In this example, traditional programming only
works some of the time, but with machine learning, we can teach the computer how to recognize shapes
no matter their size, symmetry, or rotation. Teaching a computer requires thousands or even
millions of examples of training data, and a whole lot of trial and error. So let's start training!
Here are some simple shapes we can use to train the computer to see. At first the computer is
completely clueless, and makes a totally random guess from a preset group of options, and it
guesses wrong. But that's okay, because this is where the computer learns. After it makes a
guess, the computer is shown the correct answer. It's like learning with flashcards: sometimes
you have to get it wrong before you get it right. With every guess, the computer looks at
each pixel and the surrounding pixels. It tries to recognize patterns
and make rules to help it guess, like if it sees a row of orange
pixels next to a row of white pixels, there's an edge. If the computer sees two edges
oriented a certain way, say a 90 degree angle, then it's likely to guess that it's looking
at a square. It won't get it right every time, but with more trial and error, it will slowly
build a more confident guessing algorithm. Whether it's trying to guess shapes,
animals, or any other category, machine learning finds patterns by learning
from its mistakes. The training data is used to make a statistical model, which is just
a fancy way of saying a guessing machine. When we give it training data, the guessing
machine is tuned and optimized to recognize the pictures we gave it, with the hope that
it will then be able to recognize new pictures with the same accuracy. It may seem easy to
tell the difference between an X or an O, or to even categorize basic shapes,
but most images aren't that simple. Let's take a look at how computer vision
can learn to recognize complex images, or scenes like ones in the real world. Most complex images can be broken down into small
simple patterns. For example, an eye is made up of two arcs and some circles inside. A wheel is made
up of concentric circles and some radial lines. The way a computer recognizes the patterns
in all these pixels, is by using a neural network made of many layers. The first layer of
neurons takes pixel values as numerical inputs, to identify edges. The next few layers of neurons
take those edges and try to detect simple shapes, until finally the computer puts
it all together to understand. It can take hundreds of thousands, or even
millions of labeled images, to train a computer vision system. But sometimes even that's not
enough! Some face recognition systems have trouble even seeing people of color, because the system
was primarily trained with photos of white people. Sometimes problems in computer vision are
silly, like when a computer gets confused trying to tell the difference between
these dogs. Oh wait, that's not a dog! But it does kind of look like
a dog. At least this dog. But as society relies on computer vision for real
problems, like detecting diseases and medical imagery, or helping a self-driving car identify
pedestrians, it becomes increasingly important that we all understand how these systems work and
what types of problems they're appropriate for. Computer vision can open up a
miraculous world of possibilities, but a machine is ultimately only as
good as the data used to train it.