How Computer Vision Works

Video Statistics and Information

Video

Captions Word Cloud

Captions

Hi! My name is Alejandro Carrillo, and i'm a robotics engineer at an agricultural company. Specifically my team uses machine learning and robotics and computer vision, to identify the difference between the crops that we eat, and weeds that take nutrients away. We're able to remove those weeds without any chemicals. My name is Kate Park and I work at Tesla Autopilot. I build self-driving cars. Any place where there can be resources used more efficiently, is a place where technology can play a role. But of course, one of the best impactful ways of AI is through self-driving cars. Have you ever wondered how a computer can recognize a face, or drive a car? Or maybe you've wondered why it's so hard for a computer to tell the difference between a dog and a bagel? Well it all has to do with something called computer vision: the way machines interpret images. Let's take a look at a simple example of how computers learn to see. Here are two shapes: an X and an O. At some point you've learned the names for these shapes, but a computer looking at these images for the first time just sees a bunch of little squares, called pixels. Each pixel has a numerical value for a computer to see. It needs to make sense of these numbers to figure out what is in the picture. In traditional programming, you could tell the computer to check which pixels are filled to decide what shape it sees. If the center and corner pixels are full, then it's an X. If the center and corner pixels are empty, then it's an O. Traditional programming works great for this kind of thing, but what about asking the computer to recognize these images? What might the computer think these are? We gave the computer a strict definition of what an X looks like, but these images don't fill all the necessary pixels to fit the definition. So if the computer doesn't think these are X's at all, in fact the computer thinks these are O's because the corners and center pixels are blank, and that fits the definition of an O that we gave it. In this example, traditional programming only works some of the time, but with machine learning, we can teach the computer how to recognize shapes no matter their size, symmetry, or rotation. Teaching a computer requires thousands or even millions of examples of training data, and a whole lot of trial and error. So let's start training! Here are some simple shapes we can use to train the computer to see. At first the computer is completely clueless, and makes a totally random guess from a preset group of options, and it guesses wrong. But that's okay, because this is where the computer learns. After it makes a guess, the computer is shown the correct answer. It's like learning with flashcards: sometimes you have to get it wrong before you get it right. With every guess, the computer looks at each pixel and the surrounding pixels. It tries to recognize patterns and make rules to help it guess, like if it sees a row of orange pixels next to a row of white pixels, there's an edge. If the computer sees two edges oriented a certain way, say a 90 degree angle, then it's likely to guess that it's looking at a square. It won't get it right every time, but with more trial and error, it will slowly build a more confident guessing algorithm. Whether it's trying to guess shapes, animals, or any other category, machine learning finds patterns by learning from its mistakes. The training data is used to make a statistical model, which is just a fancy way of saying a guessing machine. When we give it training data, the guessing machine is tuned and optimized to recognize the pictures we gave it, with the hope that it will then be able to recognize new pictures with the same accuracy. It may seem easy to tell the difference between an X or an O, or to even categorize basic shapes, but most images aren't that simple. Let's take a look at how computer vision can learn to recognize complex images, or scenes like ones in the real world. Most complex images can be broken down into small simple patterns. For example, an eye is made up of two arcs and some circles inside. A wheel is made up of concentric circles and some radial lines. The way a computer recognizes the patterns in all these pixels, is by using a neural network made of many layers. The first layer of neurons takes pixel values as numerical inputs, to identify edges. The next few layers of neurons take those edges and try to detect simple shapes, until finally the computer puts it all together to understand. It can take hundreds of thousands, or even millions of labeled images, to train a computer vision system. But sometimes even that's not enough! Some face recognition systems have trouble even seeing people of color, because the system was primarily trained with photos of white people. Sometimes problems in computer vision are silly, like when a computer gets confused trying to tell the difference between these dogs. Oh wait, that's not a dog! But it does kind of look like a dog. At least this dog. But as society relies on computer vision for real problems, like detecting diseases and medical imagery, or helping a self-driving car identify pedestrians, it becomes increasingly important that we all understand how these systems work and what types of problems they're appropriate for. Computer vision can open up a miraculous world of possibilities, but a machine is ultimately only as good as the data used to train it.

Info

Channel: Code.org

Views: 85,680

Rating: undefined out of 5

Keywords: Code.org, computer science, code, Hour of Code

Id: 2hXG8v8p0KM

Channel Id: undefined

Length: 6min 24sec (384 seconds)

Published: Tue Dec 01 2020