Visual Intro to Machine Learning and Deep Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody welcome my name is jay alamar i work in venture capital i'm a software engineer by training the last five years i've been trying to learn as much as i can about machine learning and i found the best way for me to learn was to write and try to explain concepts and some of that writing seemed to to echo with people a lot of a lot of the times when i read about machine learning it i feel that it's a little intimidating to me um and i find that this is the experience of a lot of other people as well um and so i've over the last five years i've been trying to break down concepts into um narratives that don't require that you become an expert in calculus or statistics or even programming because i think sometimes just grabbing some of the underlying intuitions can help you build confidence to continue learning about about the topic that you that you want it's a stretch to say you'll learn everything you need to know about machine learning and deep learning in in this session but what i aim to do is give you some of the main key concepts that you will come across as you dive deeper into machine learning that will guide your way a little bit so they'll you know illuminate their way stations um on your way to to to the top of this mountain that we can call machine learning or and deep learning so why why learn about machine learning uh for me the first reason is that it's super interesting and one of the best demos that i can use to to showcase this idea is this tech demo of a product that came out this is a video that came out in 2010 i'd been working as a software person for for 10 years when this came out and it blew me out of the water absolutely i did not know technology was able to do something like this so this is an app that came out in 2010 so you get the idea this is probably the most impressive tech demo i've seen in all of my life and i did not know technology was able to do that at that point this is running offline on an iphone 4 not connected to the internet it was doing all this image processing and machine translation on the device these are wizards these are gods who who create stuff like this and so when i saw this i was like i did not study any machine learning in college i you know whenever the next chance comes by where i can start to learn about machine learning i will i will do that and that came in 2015 when tensorflow was open source and i was like okay this is the time for me to do it language is one of the most interesting applications that i that i feel about about machine learning and i talk a lot about them but then i want to show you this other demo that came out one month ago that's an extension to this so this was an app called word lens google bought this company they lumped the team into google translate and this evolved you can say into this feature of google assistant called interpreter mode hi there how can i help you hola sorry i only speak english but you can select the language you speak espanol hey google be my interpreter what language should i interpret too spanish ok go ahead how can i help you i'm looking for a place to have lunch before going to the airport what would you like questions [Music] pizza or salad there's a great place around the corner you can take the train from there to the airport you get the idea the video goes for another 20 seconds but then two ideas from from science fiction come come to my mind when i see something like this first is the arthur c clarke quote of technology that is sufficiently advanced enough seem indistinguishable from magic and this to me seems feels like magic and then if you've watched the red the hitchhiker's guide to the galaxy there's this babble fish future technology where you insert this fish into your ear and it translates and we have a live feature demonstrating that so machine learning is super interesting and exciting but it's also very important it's going to change how every one of us how our our relatives our co-workers do their jobs um automation is happening on a on a large scale it's only going to happen more and more in the future and so for me this was a guiding uh one of the motivations to be on the forefront and really understand what happens there because this will have a direct impact on my livelihood and the livelihoods around everybody in around me i guess so in venture capital this is one of the main slides or ideas that that factor into the saying that software is eating the world this is a figure of the largest 20 companies in the us in 1995 where the largest company is at a hundred percent and then you have the next 20 next 19 at that time there were only two technology companies two software companies wintel microsoft and intel you can see them in the center here 20 years later technology and software companies became the by the you know the the leading five are technology companies these are gaffa google apple facebook amazon and then you still have intel there but then every other company is also working on on software and developing software so software is leading the world and machine learning is is the latest let's say suite of methods that enable software to eat the world and eat every industry jobs uh over the years have disappeared we used to get into an elevator there was a person who would drive that elevator no more there were suites and offices and floors of people and accountants and clerks that you know their jobs are now automated by just say a a spreadsheet you used to be you have to talk to a person to talk to somebody over the phone you don't need to do that anymore so technology before even before machine learning has been automating and changing the the nature of jobs but that's not always negative there are jobs that were never meant to be for humans so there are positive implications here as well so when i was when i wanted to learn machine learning machine learning has so many different applications you can spend an entire lifetime learning about machine learning and not find or find something that gives you that much practicality in their in the real world so i wanted applications that are have some uh relation to to to commerce i guess and so these are the things that would have business applications what i ask what are the machine learning applications that have the most commercial applications let's say or this is using dollar amount as proxy for how important a method is and i can tell you that after after looking into it myself for a while i think this is something a lot of you have probably found is there is one application of machine learning that we can say is the most commonly used application in machine learning in all of commercial applications and that is the concept of prediction so we can think of prediction we can think about it as a model that takes in a numeric input and spits out gives out a prediction another number so we can think of this as just a simple machine learning model and that is our first concept so we're going to run through about 10 concepts so prediction and i call it different names here estimating and calculating because this kind of prediction does not always have to be about things in the future we just use that word prediction but you can interchangeably use estimation or calculation so predicting values based on patterns in other existing values is the most commonly used application of machine learning in practice so that's easy so i had a quote that i mentioned about magic but machine learning is not magic let's let's uh look at a an example let's say three people walk into an ice cream shop how much would they pay how much would their collective tab be this is the kind of question that you will not find an answer to in in a business book but one way we can try to solve this is by looking at data and the way to look at it in data we say okay let's look at the last three purchases how many groups were in each of these how many how many people were in each of these groups and how much did they pay so we had a group of one person who paid ten dollars for ice cream we had a group of two people who paid twenty dollars for ice cream we had a group of four people who paid forty now we have never seen any group of three people can we tell is there something we can learn from this data set that can give us uh some sort of answer for three so how much would how much would that be 30 yes perfect thank you so that is the basic idea behind all the hype of machine learning that thing that you intuitively did just now but let's just put some some names on it what you did is you found a number a magical number that maps the relationship between these two columns and then you used it to make predictions using this feature so this is our our lingo and this is language that we can use from the simplest prediction model up to google translate and siri and alexa so the the first column the green column is is a list of features and then we have labels which are correspond to the value that we want to predict this is called a labeled data set and this is called a weight so this is probably one of the simplest prediction models it is also the most simple this is the most simple neural network it's a neural network of one weight 10 that just multiplies the feature and outputs its predictions we can think about this model as looking like this so spit any uh input at it it will multiply it and then we'll give you a prediction on the other side if you leave the talk right now you can go out to the world claiming that you know machine learning because this is the basic trick at the heart of it everything else behind this is just taking it one step later one step ahead one step ahead how to do this with images how to do this with text how to do clean the data so you uh have better models and then we'll take a few of these steps uh hopefully so you can get a little bit more context when you dig deeper into it so this is let's say the second concept of the talk here is the vocabulary of of machine learning so we have the features we have our labels we have the value we want to predict we have a model that makes predictions and we have a weight this is a language that will will take you from beginning to to to the end of predictive models so that was an easy example this is a much more hard more difficult task so this is machine translation this is kind of like the example that we looked at in the first video but then the features here are words and the labels are like sentences i guess and the labels are also sentences and then we use those to predict let's say or calculate translations for sentences that we've never seen before so the same language applies the difference is the models will be a little different so to the best of my knowledge so the last two three years uh these transformer architectures of neural networks have been the leading models for natural language processing and so i guarantee you with a 95 accuracy that um whatever they use for interpreter mode is the transformer and for these tough or more difficult more complex language tasks uh you would use layers and we'll talk a little bit more about that um because that's how you can solve that complexity because this relationship between these columns this features column this label column this is much more complex than just multiplying by 10. a lot of knowledge understanding has to go there if you're able to find a model that makes that translation we'll also talk a little bit about representation so how do you numerically represent words how do you numerically represent sentences or images because you need to do that if you're good to calculate predictions because at the end it's just you're multiplying weights by whatever inputs that you get and that's the mechanism if we're to be just very mechanical about what happens inside of a neural network if you step out of this building today you're faced with this glorious structure in front of you who knows what that is i'm not from london but somebody named that building that's westminster abbey it houses the remains of some very famous people one of the most famous people here is charles darwin he says there's a quote of him saying i have no faith in anything except actual measurement and the rule of three so he wasn't big on mathematics the rule of three is this basic idea that if you have a over b equals c over d and you had you know any three of these the values of any three you can tell the fourth that's a little bit of what we we've done there so we had a data set that sort of mapped that was a we were able to solve with this but then this is not really what we're doing with machine learning we have to take it one step further and then you notice the date there 1882 three years after that his cousin charles darwin's cousin sir francis i'm going to get his last name at some point he came up he saw a problem in darwin's theory of relativity francis delton i think gelting he said he he was looking at how children of tall parents tended to have heights that are closer to the mean of the population and shorter parents would have children tend to have children that are closer to the mean of of the population and so this seemed to be a problem with the theory of relativity because genes are passing through why is that why is that happening and so to do that he came up with this figure explaining a little bit about this relationship and he said okay these are the heights of the parents this is the mean this is the average height of the population there is this tendency of the children's heights to be a little bit closer to the mean of the population than their their parents and with this we can use this line let's say to make a prediction so if we have parents of let's say this height we can use this line to say okay we estimate that their children would be of this height this is the basic idea that he called and we still use this name for it regression so this is the basic trick at at the center of a lot of machine learning we saying everything is is is cutting edge and that's the name of the track but then the central idea is 1885 regression so this is the data set that we looked at it was a very clean model one maps to ten two to twenty four to forty and then to make a prediction we've drawn this line whose slope uh of is is about is ten right that's the the weight that we have to make a prediction what do we do we say okay we want to predict three all right let's see three from here that's our feature let's draw a line up to the to see where it meets the prediction line what value of ice cream purchase is there it's 30. that's the prediction that's how we use a prediction line but then real data is never that neat and clean real data always has noise it goes up and down there's measurement error it's never that clean so with regression what we do is that we say okay our line doesn't have to go through the the different lines but we just need to have the least amount of error in it and so it's okay to make predictions we'd have a prediction line here but then the prediction line with the least error we can make useful predictions using the correlation and so that's the third concept with regression we can predict numeric values using correlations in in the existing data so and this is i wanted this this algorithm to think about machine learning as a software engineer and it goes like this do you want to predict a value is there a value that is useful for you to predict then find features that are correlated with it and then you can choose and train quote unquote we'll talk about what that means a model that maps the features to the labels with the least amount of error that's the basic principle of regression how it applies so in the beginning of me trying to learn machine learning i really wanted this these goggles to say okay how can i turn real world problems into machine learning oh how can i solve them with machine learning and this is a general algorithm that you can use and notice that it's it's correlation so we never talk about causation all of machine learning is at this moment is just about correlations between between the features and the labels so we have two example models here we can say each line is a is a model i'm going to ask you which one do you think is better raise your hand if you think the one on the right is the better model okay about maybe 30 raise your hand if you think the one on the left is the better model zero perfect you you get the idea this is the concept so the least amount of error is better that's concept number four model with less heritance to produce better predictions we talked about the length of the model of the errors or the average of of the lengths that's what's called mean absolute error more commonly you'll find mean square error so we take this you know we square of these and then we average them and that's where we get the the error value that we try to minimize in the training process so we are not doomed to just creating random lines and seeing which one has less error if we are to end up learning a little bit about deep learning uh the machine learning algorithm that we need to talk about is called gradient descent and this is the model that starts out with weights and then successively improves the weights and finds a model that makes better predictions it works kind of like this let's break it down into two steps first step it picks random weights and then it just keeps changing the weights to decrease air and then it does this ten times a thousand times five thousand times sometimes it runs for days some models run for months in training this is basically training when we say we're training a model it's about finding the best ways to to decrease the error in a model so step one step two repeat until your error stops decreasing let's take another closer example we had that problem where weight 10 was was a pretty good solution to that to that problem uh how do we come to that numbers we say okay step one let's choose a random weight let's start from anywhere and then we choose let's say number two we calculate three things we calculate the predictions the error and something called the gradient based on these calculations specifically the gradient so we we use prediction and error to calculate this value called called gradient that gives us a mathematical signal that tells us if you want to reduce the error you better nudge this number a little bit up or a little bit down either increase a little bit or decrease it a little bit so we update our weight so the mathematical signal that we got says increase it so we increase it a little bit so we're now our weight is up to five from two and then we go to step two we go in with a new weight we do the exact same thing we calculate the predictions the error and the gradient we update the weight and then you wait this is now ten we keep doing this over and over again until our weight stops decreasing this is how you know this simple model is trained this is how google translate is is trained so we keep repeating until the error stops improving or maybe just beyond a certain threshold so this mathematical signal comes from this other person who rests in westminster abbey this is a page from my book in 1915 about a picture of of this person's grave this is how it looks today if you're to go there and see it cannot be can i guess who that is that is isaac newton exactly so this is calculus and this is 300 meters from where we stand right now so concept number five is model training when you hear somebody say it says model training this is all it is finding the right weights to allow the model to make better predictions and using this this simple algorithm let's say let's talk a little bit about tools so let's say this is the first step in in gradient descent we have our weight two and we have these are the features that we have in our data set and we know that we have our labels here uh to calculate a prediction we just multiply our weight by our features right and we get these these predictions so we can do it one by one but more commonly in when you're dealing with machine learning a lot of the times you're just multiplying vectors together in matrices together so you calculate everything all at once and so these are the predictions that this model with the weight su would would predict right so a group of people of one person will probably pay two dollars we know this is mistaken but it will improve with time so now we have our predictions and we have our actual label that we know how much these people actually paid we just subtract the two and then the result is the amount of error in the that this model has is generated and that is another vector we can take we can take absolute value and average these but this is this is fine for now so if you were to tell me five years ago to implement this i would be doing all kinds of loops to multiply the two by this array of numbers and then this array by this array we have the tools to do that now we don't need to do it through loops especially if you haven't used matlab in college i did not so i numpy was the first tool that i knew that can do something like this very conveniently so the way to do this is we import numpy as np so this is the first tool general purpose tool uh in the python ecosystem that a lot of machine learning is based on and if you want to end up doing a lot of deep learning python is a little bit unavoidable you can do a bunch with other languages but it's it's pretty much the dominant one so we'll use a couple of examples here we'll not go too deep but you can see how convenient this can be so weight is due we just assign it a a integer a number it's a variable and then we can declare these as arrays numpy arrays that we pass python lists to okay so we have features is now a uh a numpy array and then label is now in empire as well so how do we calculate predictions no looping you just multiply two by this vector numpy knows what you want to do it does something implicit called broadcasting so it generates so it says okay this is one value let's say one row this is three rows i know what you want to do is multiply this column by a column of two to two and so it multiplies them so that's a clever trick called broadcasting with with some interesting rules that make dealing with with vectors a lot easier and so we calculated our predictions in only one line no looping no nothing is extremely convenient so how do we subtract these two vectors from each other to calculate the error here predictions minus label that's all there is to it extremely convenient so that's numpy the power tool uh tensorflow rests on top of numpy and so whatever you want to do with machine learning with deep learning you will always run into numpy the second tool that i think is very important for for anybody in machine learning to to work with is jupyter notebooks there is a url here for this simple notebook that i've published to to github and a jupyter net notebook is basically a way for you to execute code and also document it so it's a xml file let's say it has text cells and code cells and you can download it and run it in your own machine if you have that that setup so you can execute each cell in time and then if you give it so this is the code that we just run through and if you give it the name of a variable it will just output whatever is stored in that file in that variable there's a link at the top here called open in collab and that's the third and final tool that we'll be discussing so that is a shortcut so you don't need to install jupyter and python and all of these tools on your machine and we all know how you know installing environments can be can take a little bit too much time sometimes so this is a notebook that can run completely on the cloud in your browser no setup just hit this link open this notebook click on the blue link that you'd find saying open in colab and then you're you'll just need to sign in with your google username and password and then you can just execute these cells just shift enter executes a cell or you can just click on on plus here and then that's the second most commonly used tool or third so we've let's look back a little bit to where we've come now we've have five concepts we have three of the main tools that we can talk about in machine learning but we have not talked about applications so when looking for things that are going to have value in commerce because these will have value over your company your job we'll talk about maybe about four of these applications the first one is credit risk scoring so that's asking the question of what credit limits to to to grant an applicant so let's say this is a bank how much money should they be comfortable lending to me and so we can go back to that that algorithm we have a value a numeric value that we want to predict we get a data set that can hopefully help us find an answer to that number and then in this case what could be a useful data set of features and labels that can give us a prediction of a good value so we can say these are previously approved limits of people that of loans that the bank has given before these were approved by humans so we can say a feature that is probably correlated to limit to uh approved limit is maybe credit score so that's a very simple algorithm or prediction model data set that we can use to train a model that says okay grant people loans just based on their on their credit score this is very simplistic but then you build out and you build out so we do what we did before we can graph these this is the credit score on the x-axis y-axis is how much money they were approved uh we've never seen a 600 before what do we do one thing is what we've learned so far is apply a simple line but if we're using only one weight then we are limiting ourselves to lines that all have to pass through the center through zero zero uh but then if we know the line formula we don't need to do that we can have more flexibility of adding a y-intercept and that's what we do if we introduce one more weight and so we have these two weights that map to this line that does not have to go through zero zero and this line is a much better let's say prediction line this is still regression this is the very simplest regression called linear regression you can take this to the next level in the next level but then what this does is okay to predict how much a loan the bank is willing to give me based on my credit score of 600 we'll do the following we have two weights w0 this is also called the bias and we have w1 and this is x so we just apply the line formula to this so we say 600 times 27 plus which in this case is minus 9 000 that's the approved credit limit that's 7 200. that fits on the line there and so that's that's the prediction that's linear regression for credit risk scoring for a very simple one feature column example so is one column good enough you always hear that you need more data and more data to create better and better uh predictions and so it would be useful to to to have another column that says uh okay has this person paid their previous loans on time or not so we can keep adding more and more features uh to improve our the prediction of our model and then with every column let's say our linear regression model would have more and more weights so this is concept number six the more good features we have the better our model and predictions could become and notice the emphasis is on good here because you can throw too many data at your model they can confuse it at times or they can bias it they can have you can have you can very easily we just saw an example with vincent in the previous talk where a dataset given to a model can generate a racist model because we just fed it data that is inherently racist so while we're on the topic of banking let's talk about a second application so fraud detection so everything in in financial technology has to do with fraud detection so the question we can ask here is what is the probability that a specific transaction is fraudulent let's see what data sets we can use to [Music] make this prediction so we have a column of transaction amount we have another column of the merchant code for this specific merchant and then we have a label notice that the label here is a little different these are all zeros in one right now where zero means non-fraud and one means fraud so these are past transactions that happened on on the system that were flagged either as fraud or not fraud and so we need to make this prediction let's call it on this transaction is it fraud or is it not fraud so we can do the same thing but we have one small addition here so we'll have weights we'll have a model that outputs a value a numeric value but then we pass that through a very well-defined mathematical function called sigmoid that's the called the logistic function what it does is that it shrinks the output into between zero and one and then we can use that as a measure of probability if we train the model against that data set using a model like this we can assume that the output of the model stands for probability and so this rain model says okay the output it will be 0.95 and so the probability that this transaction is fraudulent based on all the other transactions i've seen before and i was trained on is 95 this is concept number seven this is what the logistic function looks like you give it any number it will map to between one and minus one this outlooks in math but then it just helps us squeeze numbers and we can think about probabilities which you will see is very very helpful and useful this is uh stripe so we talked about fried fraud detection this is a little bit of ui on how fraud detection appears to commercial consumers and so this is a payment of ten dollars that was blocked due to high risk so that cleared a certain let's say threshold of risk score and so a application like this would would flag uh would flag that as as fraudulent so i would say probably the most lucrative machine learning model is this one this is the one that makes the most billions for probably anybody on the planet this is how google makes 85 of its revenue advertising so this is click prediction or click-through prediction so we know how google works you go and search for something you get ads and you get web pages if somebody clicks on an ad google makes money so this very core of google's business model is they have ads they have queries and they have to match relevant ads to show when somebody is searching for for a specific query so let's take an example let's say we have six campaigns that people have uploaded or have set up on on google adwords two for you know booking.com in london booking.com in paris hotels let's say amazon pages for phones an amazon page for shoes and then two maybe t-mobile packages one for post paid or a contract or and one for the top-up or prepaid accounts let's say we have a query coming coming in they say a user have searched for iphone which of these ads would we show to them if we want to maximize the probability that somebody clicks on it because that's how google makes money do we think it's going to be the first one or two if we're just to think about it ourselves probably not because these are not very relevant to iphone it could be phones but maybe not shoes so somebody searching for an iphone could want to buy phone directly or they could want to buy maybe a phone with a contract and let's say a phone bill with it so we can think about this we can flip it into you know our goggles of how machine learning maps problems so we can say we have features about the query we have features about the ad we input those into a click prediction model it will do the exact same thing and output a probability through a sigmoid function and so that probability here for example would be 40 and so what does google do before um they show the result they will say okay this is the query let me score all of my ads on it so okay london hotels paris these are one percent click probability this is two percent click probability and this is this is a trained model that we're talking about so ignore the training process that happens before that for now so we have these probabilities for these ads and then we just select the two highest ones and we show them and that's how google makes was it 120 billion dollar in 2018 i think so we can think about this as these are the features that we can think about how that maps too uh so these are previous ads shown to previous people with uh these features about about the people and then we can have also columns about the query that was searched and then the label would be did the user click this ad or did they not click that ad and if you have millions and billions of these you can train models that are very accurate and so that's not only how google makes money that's also how facebook makes money except it's not queries it's users and so click prediction makes the vast majority of revenue for both the two tech giants google and facebook this is a paper you can um about some of the engineering challenges about ad click predictions from google uh very good read you can take a picture of the screen and look at look up the paper if you'd like um it's fascinating because as as straightforward as i'm trying or simplistic i'm trying to explain these models from an engineering standpoint it's it's a fascinating challenge so here we are now we have seven concepts we have three tools and maybe three applications that probably make a few hundred billion dollars a year let's talk about one more application this is also very lucrative the world around the world is not just limited to the tech giant and the question here is that if you have you're a company let's say a subscription company and you have a marketing budget are you is it better return on your investment to keep an existing customer or to get a new customer who would say keep an existing customer i have 15 people maybe who would say get a new customer i have maybe 20 people it turns out that keeping an existing customer is about you know five to ten times cheaper than getting a new customer and so one of the best marketing let's say in general return on investment activities they can do is keep existing customers when you predict that they're going to leave the service this is an application called churn prediction and this is when this is a model to predict when a customer is about to leave the service or not to renew a subscription and so if you're if you're a phone company anywhere and you have somebody who's on a contract and paying you let's say a hundred dollars or 200 every month you'd be wise to make sure to pay attention when they let's say start using the service a lot less maybe use a lot less data because they're probably transitioning to another service in that case you might want to have a customer service representative talk to them see what the problem is address it and keep that very delicious subscription revenue coming in and that's what churn prediction is this is um an interesting ui that i found of how this company clavio close enough visualizes churn prediction and so this gives you customer lifetime value of a specific customer that has spent 54 uh dollars at this store let's say but it's also predicting that the the value the probability that this person has left the service and will not be back is about 96 and they visually represent it here with colors and so for the first let's say few months it was yellow and then it became really high because we haven't seen this user in in about you know six months so churn prediction is is another very lucrative for any subscription service any telco they need this talent either as people individuals or maybe consulting companies so how can we think about this problem what's a general uh pattern to fit this into what we've discussed so far is that okay we have these five uh customers we have these probabilities for their churn we have some understanding of how to calculate that we set a certain level of threshold we say okay fifty percent anybody over fifty percent i will treat as high probability that that person will churn and if it's lower than 50 then i would say it's not and that's just a general uh heuristic let's say and so that's the the prediction that we get so these based on the probabilities based on the threshold these four will remain this one will churn this one the model predicts that this customer will leave the service and will churn and this is a trend prediction model and so i snuck up on you the eighth concept which is one of the most central ideas in in machine learning classification so if you have a probability score and a threshold you can do classification which classification in this case is we looked at assigning something a class of between two options let's say so for example if it's a customer data you can say okay will this customer churn or remain that's that's a binary classification if it's a transaction we can say okay is it fraud is it not fraud that's another classification model if it's an email message it's either spam or not spam it's a picture if you've watched the silicon valley show it's either hot dog or not hot dog it's a medical image you can start talk about some serious things and see some of the latest things in research of cancer not cancer if it's text you can say is the text talking positively about a thing or negatively which is sentiment analysis which is text classification so with that let me talk a little bit a couple more concepts that we'll discuss before we wrap up that we'll hit on deep learning a little bit and then i may might have lied when i said word lens is my favorite tech demo of all time it's probably this one from two years have something else ago hi i'm calling to book a woman's haircut for our clients um i'm looking for something on may 3rd so give me one second who are what time are you looking for around at 12 p.m we do not have a 12 p.m available the closest we have for that is a 1 15. do you have anything between 10 a.m and uh 12 p.m depending on what service she would like what service is she looking for just a woman's haircut for now so this is google duplex i'm sure some of you have seen this this is a conversation between a human and a machine a robot and the human does not realize that they're talking with a glorified chatbot so i i was there at google i o and is this who knows what the turing test is is this raise your hand if you think that this qualifies as being the turing test i did i did too right a human talked to the robot did not was not able to tell if that is a robot or not or a machine or not um it turns out that it's not this is a constrained version of the turing test that this this model is able to do but it goes to tell you how machine learning and natural language processing specifically is advancing at a ridiculous pace and this was 2017 this was three years ago this area is one of the most highly and rapidly developed areas of research and so any day now you're going to see something that just you know blow this in in the water so this is a model called google duplex we can think about it as a model that has input and output you put some words in you get a word out and this is you can say the same thing about machine translation models it's also a model inputs and outputs but then what we're oversimplifying here is that there is representation so we can't just pass words or letters or ascii representations to it we have to find some sort of representation that captures maybe the meanings of behind the words and this is how we do it this is how these models do it this is how alexa siri google translate the word king here is represented by a list of numbers this is a list of 50 numbers this is called an embedding of the word king so these models represent each word or each token as a list of numbers and you can represent people or sentences or products and we'll see how that is done as lists of numbers to visualize that a little bit let me put them all on one row let me add some color to them so if they're closer to two they're red they're closer to minus two they're blue and if they're closer to zero they're white so this is the embedding of king this is the embedding of man this is embedding of woman and you can see that there's a lot of similarities between the embeddings of man and woman and this is the kinds of uh word embeddings that you can get from a training from a model or an algorithm like word to vec and so you can see that just by the similarity between these two this tells you that these numbers are capturing some of the meaning behind these words so that's concept number six word embeddings are my favorite topic in machine learning i gave a talk about it last year here in qcon you can go to the blog and see so basically it's what we use for language but then we took it out of language and we use it for product recommenders if you have used airbnb or spotify or alibaba or tinder these companies have an embedding of you as a user and they have you have embeddings of their products again just a list of number that represent you or represent the products and if you just multiply any two embeddings that tells you a similarity score that's an incredibly powerful concept that powers a lot of machine translation product recommendations and you can check more in the blog that's that's number nine our final application and concept is um text classification so it builds a little bit on on embeddings this is a data set of film reviews this is a label data set but i'm not showing you the labels right now we'll take you know a poll so this these are all sentences talking about films and these scores would be either one or zero so one would be if it's a positive uh one would be if it's negative so who says that this first sentence is saying something positive about the film so we have about let's say 70 who says negative nobody okay what about the second one is this a positive sentence is it negative okay and the last one it's it's it's not super clear all the time right we think we're better than these models but then where if we're labeling these these ourselves like like vincent said this is not as straightforward as you might think so these are the actual labels one zero and then one positive negative and positive so this is an application that machine learning or deep learning specifically can help us do and then our next talk suzanne will go a little bit deeper into everything outside of the model how to collect the data how to visualize it during doing something like this sentiment analysis using this birth model some of the latest cutting edge natural language processing models but we can think about it just to use the same lingo from concept number two is that okay this is the input to the model this is the output it would output either as one or zero based on a probability score that is output by the by the model but then since this is a very complex task we can't just calculate it in in one go so these models go through successive layers and that's why it's called deep learning this here is the depth of deep learning and then using the concept number nine the the inputs into this model will be word embeddings so the specific embeddings of each word in this input the output would be a sentence embedding so it'll be like 700 numbers that capture this entire sentence and then that we can use to to train a a classifier to to classify and so this is concept number 10. you have the language right now and the vocabulary to think about machine learning deep learning you know what features are you know what labels are you know what embeddings are weights layers and this sums it up uh this is the last slide here these are some of the most interesting ideas in in machine learning you have hopefully uh when you run into them you would be less intimidated by them because trust me you you got this the intuition is a lot easier than it might seem if you look at the math behind it if you want to do more i advise you go check out fast.ai they have beautiful videos the coursera machine learning course is also very good and then you can stick around for suzanne's talk she will talk a little bit more about this thank you very much
Info
Channel: InfoQ
Views: 6,288
Rating: 4.8608694 out of 5
Keywords: Machine Learning, Deep Learning, Data Science, InfoQ, QCon, QCon London, Transcripts
Id: TTEEkMRZAC8
Channel Id: undefined
Length: 54min 30sec (3270 seconds)
Published: Mon Aug 17 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.