Machine Learning & Artificial Intelligence: Crash Course Computer Science #34

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hi, I’m Carrie Anne, and welcome to Crash Course Computer Science! As we’ve touched on many times in this series, computers are incredible at storing, organizing, fetching and processing huge volumes of data. That’s perfect for things like e-commerce websites with millions of items for sale, and for storing billions of health records for quick access by doctors. But what if we want to use computers not just to fetch and display data, but to actually make decisions about data? This is the essence of machine learning – algorithms that give computers the ability to learn from data, and then make predictions and decisions. Computer programs with this ability are extremely useful in answering questions like Is an email spam? Does a person’s heart have arrhythmia? What video should youtube recommend after this one? While useful, we probably wouldn’t describe these programs as “intelligent” in the same way we think of human intelligence. So, even though the terms are often interchanged, most computer scientists would say that machine learning is a set of techniques that sits inside the even more ambitious goal of Artificial Intelligence, or AI for short. INTRO Machine Learning and AI algorithms tend to be pretty sophisticated. So rather than wading into the mechanics of how they work, we're going to focus on what the algorithms do conceptually. Let’s start with a simple example: deciding if a moth is a Luna Moth or an Emperor Moth. This decision process is called classification, and an algorithm that does it is called a classifier. Although there are techniques that can use raw data for training – like photos and sounds – many algorithms reduce the complexity of real world objects and phenomena into what are called features. Features are values that usefully characterize the things we wish to classify. For our moth example, we’re going to use two features: “wingspan” and “mass”. In order to train our machine learning classifier to make good predictions, we’re going to need training data. To get that, we’d send an entomologist out into a forest to collect data for both luna and emperor moths. These experts can recognize different moths, so they not only record the feature values, but also label that data with the actual moth species. This is called labeled data. Because we only have two features, it’s easy to visualize this data in a scatterplot. Here, I’ve plotted data for 100 Emperor Moths in red and 100 Luna Moths in blue. We can see that the species make two groupings, but…. there’s some overlap in the middle… so it’s not entirely obvious how to best separate the two. That’s what machine learning algorithms do – find optimal separations! I’m just going to eyeball it and say anything less than 45 millimeters in wingspan is likely to be an Emperor Moth. We can add another division that says additionally mass must be less than .75 in order for our guess to be Emperor Moth. These lines that chop up the decision space are called decision boundaries. If we look closely at our data, we can see that 86 emperor moths would correctly end up inside the emperor decision region, but 14 would end up incorrectly in luna moth territory. On the other hand, 82 luna moths would be correct, with 18 falling onto the wrong side. A table, like this, showing where a classifier gets things right and wrong is called a confusion matrix... which probably should have also been the title of the last two movies in the Matrix Trilogy! Notice that there’s no way for us to draw lines that give us 100% accuracy. If we lower our wingspan decision boundary, we misclassify more Emperor moths as Lunas. If we raise it, we misclassify more Luna moths. The job of machine learning algorithms, at a high level, is to maximize correct classifications while minimizing errors On our training data, we get 168 moths correct, and 32 moths wrong, for an average classification accuracy of 84%. Now, using these decision boundaries, if we go out into the forest and encounter an unknown moth, we can measure its features and plot it onto our decision space. This is unlabeled data. Our decision boundaries offer a guess as to what species the moth is. In this case, we’d predict it’s a Luna Moth. This simple approach, of dividing the decision space up into boxes, can be represented by what’s called a decision tree, which would look like this pictorially or could be written in code using If-Statements, like this. A machine learning algorithm that produces decision trees needs to choose what features to divide on…and then for each of those features, what values to use for the division. Decision Trees are just one basic example of a machine learning technique. There are hundreds of algorithms in computer science literature today. And more are being published all the time. A few algorithms even use many decision trees working together to make a prediction. Computer scientists smugly call those Forests… because they contain lots of trees. There are also non-tree-based approaches, like Support Vector Machines, which essentially slice up the decision space using arbitrary lines. And these don’t have to be straight lines; they can be polynomials or some other fancy mathematical function. Like before, it’s the machine learning algorithm's job to figure out the best lines to provide the most accurate decision boundaries. So far, my examples have only had two features, which is easy enough for a human to figure out. If we add a third feature, let’s say, length of antennae, then our 2D lines become 3D planes, creating decision boundaries in three dimensions. These planes don’t have to be straight either. Plus, a truly useful classifier would contend with many different moth species. Now I think you’d agree this is getting too complicated to figure out by hand… But even this is a very basic example – just three features and five moth species. We can still show it in this 3D scatter plot. Unfortunately, there’s no good way to visualize four features at once, or twenty features, let alone hundreds or even thousands of features. But that’s what many real-world machine learning problems face. Can YOU imagine trying to figure out the equation for a hyperplane rippling through a thousand-dimensional decision space? Probably not, but computers, with clever machine learning algorithms can… and they do, all day long, on computers at places like Google, Facebook, Microsoft and Amazon. Techniques like Decision Trees and Support Vector Machines are strongly rooted in the field of statistics, which has dealt with making confident decisions, using data, long before computers ever existed. There’s a very large class of widely used statistical machine learning techniques, but there are also some approaches with no origins in statistics. Most notable are artificial neural networks, which were inspired by neurons in our brains! For a primer of biological neurons, check out our three-part overview here, but basically neurons are cells that process and transmit messages using electrical and chemical signals. They take one or more inputs from other cells, process those signals, and then emit their own signal. These form into huge interconnected networks that are able to process complex information. Just like your brain watching this youtube video. Artificial Neurons are very similar. Each takes a series of inputs, combines them, and emits a signal. Rather than being electrical or chemical signals, artificial neurons take numbers in, and spit numbers out. They are organized into layers that are connected by links, forming a network of neurons, hence the name. Let’s return to our moth example to see how neural nets can be used for classification. Our first layer – the input layer – provides data from a single moth needing classification. Again, we’ll use mass and wingspan. At the other end, we have an output layer, with two neurons: one for Emperor Moth and another for Luna Moth. The most excited neuron will be our classification decision. In between, we have a hidden layer, that transforms our inputs into outputs, and does the hard work of classification. To see how this is done, let’s zoom into one neuron in the hidden layer. The first thing a neuron does is multiply each of its inputs by a specific weight, let’s say 2.8 for its first input, and .1 for it’s second input. Then, it sums these weighted inputs together, which is in this case, is a grand total of 9.74. The neuron then applies a bias to this result - in other words, it adds or subtracts a fixed value, for example, minus six, for a new value of 3.74. These bias and inputs weights are initially set to random values when a neural network is created. Then, an algorithm goes in, and starts tweaking all those values to train the neural network, using labeled data for training and testing. This happens over many interactions, gradually improving accuracy – a process very much like human learning. Finally, neurons have an activation function, also called a transfer function, that gets applied to the output, performing a final mathematical modification to the result. For example, limiting the value to a range from negative one and positive one, or setting any negative values to 0. We’ll use a linear transfer function that passes the value through unchanged, so 3.74 stays as 3.74. So for our example neuron, given the inputs .55 and 82, the output would be 3.74. This is just one neuron, but this process of weighting, summing, biasing and applying an activation function is computed for all neurons in a layer, and the values propagate forward in the network, one layer at a time. In this example, the output neuron with the highest value is our decision: Luna Moth. Importantly, the hidden layer doesn’t have to be just one layer… it can be many layers deep. This is where the term deep learning comes from. Training these more complicated networks takes a lot more computation and data. Despite the fact that neural networks were invented over fifty years ago, deep neural nets have only been practical very recently, thanks to powerful processors, but even more so, wicked fast GPUs. So, thank you gamers for being so demanding about silky smooth framerates! A couple of years ago, Google and Facebook demonstrated deep neural nets that could find faces in photos as well as humans – and humans are really good at this! It was a huge milestone. Now deep neural nets are driving cars, translating human speech, diagnosing medical conditions and much more. These algorithms are very sophisticated, but it’s less clear if they should be described as “intelligent”. They can really only do one thing like classify moths, find faces, or translate languages. This type of AI is called Weak AI or Narrow AI. It’s only intelligent at specific tasks. But that doesn’t mean it’s not useful; I mean medical devices that can make diagnoses, and cars that can drive themselves are amazing! But do we need those computers to compose music and look up delicious recipes in their free time? Probably not. Although that would be kinda cool. Truly general-purpose AI, one as smart and well-rounded as a human, is called Strong AI. No one has demonstrated anything close to human-level artificial intelligence yet. Some argue it’s impossible, but many people point to the explosion of digitized knowledge – like Wikipedia articles, web pages, and Youtube videos – as the perfect kindling for Strong AI. Although you can only watch a maximum of 24 hours of youtube a day, a computer can watch millions of hours. For example, IBM’s Watson consults and synthesizes information from 200 million pages of content, including the full text of Wikipedia. While not a Strong AI, Watson is pretty smart, and it crushed its human competition in Jeopardy way back in 2011. Not only can AIs gobble up huge volumes of information, but they can also learn over time, often much faster than humans. In 2016, Google debuted AlphaGo, a Narrow AI that plays the fiendishly complicated board game Go. One of the ways it got so good and able to beat the very best human players, was by playing clones of itself millions and millions of times. It learned what worked and what didn’t, and along the way, discovered successful strategies all by itself. This is called Reinforcement Learning, and it’s a super powerful approach. In fact, it’s very similar to how humans learn. People don’t just magically acquire the ability to walk... it takes thousands of hours of trial and error to figure it out. Computers are now on the cusp of learning by trial and error, and for many narrow problems, reinforcement learning is already widely used. What will be interesting to see, is if these types of learning techniques can be applied more broadly, to create human-like, Strong AIs that learn much like how kids learn, but at super accelerated rates. If that happens, there are some pretty big changes in store for humanity – a topic we’ll revisit later. Thanks for watching. See you next week.

Info

Channel: CrashCourse

Views: 682,886

Rating: 4.940207 out of 5

Keywords: John Green, Hank Green, vlogbrothers, Crash Course, crashcourse, education, computers, computing, computer science, compsci, machine learning, artificial intelligence, ai, deep learning, neural networks, ibm, watson, google, alpha go, siri, alexa, google assistant, self-driving cars, autonomous cars

Id: z-EtmaFJieY

Channel Id: undefined

Length: 11min 51sec (711 seconds)

Published: Wed Nov 01 2017