Lecture 1 "Supervised Learning Setup" -Cornell CS4780 Machine Learning for Decision Making SP17

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everybody I let me remind you NO LAPTOP rule please no laptops no handguns in the lecture theater thank you all right and let's point out I got it to work it turns out actually the new MacBook Pros have a problem and no problem they don't work with all projectors but this is VGA so this seems to be working all right all right this is good all right good and one quick things I told my students that a lot of people like Python my TA they apparently they were right like it was it was close but it was you know but one thing I forgot to mention julia has the same syntax as matlab so if you know matlab eurozone Oh Julia does that change things okay who's in favor of Julia now well he's noise oh yeah all right comparison Python all right it is still a tie! all right well at least some there's some excitement now about Julia I spent all Christmas vacation doing it okay good so we left off last time was what is a machine learning so I don't know if you recognize this guy in the background this was actually my motivation to go into machine learning. I wanted to have my own terminator to beat up my brother so that was you know when I was 15 or something ... you know still working on it okay good so let me just give you a high-level overview of my view of what machine learning is so basically you all know traditional computer science or at least you know something about it and in traditional computer science you have a computer which is this thing in the middle and you have some data which is the input and so the data could for example be an mp3 file write or some music files or video file or something and you want to generate some output and where the programmer comes is the programmer writes a program that generate that basically takes the data at the input and generates output out of it so if you for example want to play a music file mp3 file well you know given that you've taken a programming course before all right how would you do this well you would look up on Wikipedia what are the specifications of an mp3 file and then you would sit down over a long weekend and you could probably write something in Python or and see rather busy takes the file decodes it generates the white waveforms and so on and it's not a big deal right after a couple days you will have a program that takes that date that data and generates the I'd output that's fine that's not the kind of problem that I am confronted with as a machine learning person so the kind of problem that I get is something like this right so someone comes to me from the med school and says here this fMRI scan of this person and I would like to know if this person has Alzheimer yes or not right and the problem is there's no Ricka pedia page to look up how to generate in how to extract that information out of you know an fMRI scan right so if you're the program at this point you know you're not to happen but it turns out it's a very specific type of problem so what I can do is I can say well like we help you with this particular image yet but there's something else right so if I ask the physician to go back in busy look at the past files and look at fMRI images that he or she has collected maybe five years ago well you know let's say within the last five years that actually has become apparent if the person has Alzheimer or not right so he can take the past images that you took from people a long time ago and you can now annotate them and say well it turns out this one did have Alzheimer and this one did okay so what I can do is essentially I can collect data for which I know what output I want all right so I say this guy basically he isn't as good of a person the answer should be yes he has Alzheimer here's a person she does not have Alzheimer etc and that's exactly when we can do machine learning right so machine on ebay he turns things upside down as input you stick in the data and the output that you would like to know like you would like to have for that data and the machine learning algorithm then crunches this data and outputs a program right which if it had given that data as input would generate that output okay does that make sense we basing senses we're replacing the program and this program sometimes literally can be C code right often it's not off to some numbers because that's just more efficient but you can convert this into you know into C or whatever programming language code you have it's usually not very pretty right it's very long and cryptic and so on but it is you know you could go through it and it makes sense and then of course we do the obvious thing right we go back to this doctor that came to me the first place and now I have this program and now I stick in the data point that he actually or she you know wanted me to analyze and now with this program generates an output and that is my prediction and we call this part your training and this part here testing all right any questions we will go through this and much much more detail in you know in the future by the way just one morning this is the last PowerPoint side I will get after this actually everything will be back board and so suppose initially and people came up with this this seems like a crazy idea right can you even do this right this you know like the program by like you know the computer busy programs itself right it seems nuts but it turns out over the years you know it really really works and nowadays actually what's fascinating about this is a can write programs that raise the you know a lot of work to right now you can make it automatically but on the other hand that's the most fascinating part you can write programs that you don't even know how to write but you can generate programs that you wouldn't even begin to know how to write that program so machine learning was formally introduced it's actually dates back before this but the the first formal definition that I am aware of comes from Tom Mitchell 1997 so he wrote and let me just write this be this out a computer program a is set to learn from experience II with respect to some class of tasks he and performance measure P if it's performance a test the T as measured by P improves with experience II got that the short version is algorithms that improve on them tasks with experience right that's you know without the e NP and you know that's very very simple right why does this improve if you go back here where's things improving well if I give this more data here I can basically generate a better program right and I would do better okay so that's the idea the more data I give it the better the program is and that's why we call it learning all right okay let me go through a little bit let's just walk down memory lane and give you a little bit before we get dive into how to do this and a little bit of history of how this all started and where I came from so arguably the very very first well that some dispute but the the first algorithm that learned from experience was Samuels checker player in 1952 it wasn't really kind of what we would consider machine learning now but it was basically the first time someone came up with the idea to you know to actually learn you know basic to stores and moves in the database and actually as you play more you eventually get better checker since then actually has is a solve problem so it was a benchmark for AI this time there was no mentioning of machine learning machine honey did not exist yet so the field first actually there was a you know at this point you know at some point people at first called artificial intelligence and eventually machine learning emerged out of this the checkers actually became as benchmark task for a I for a while cuz Alex there was a a world champion of checkers he was amazing he was probably the best human to ever play you know it'll never be someone please check us that well he was undefeated for you know over a decade it was ignore one was ever close to him the big dream of the AI field was to finally beat this guy and they tried many attempts and turns out you know he always beat the computer and eventually actually I forgot which here exactly it was they finally had a program that beat him and he died very short thereafter and so I looked into this and trans Alex he's not a coincidence that he died very shortly thereafter turns out actually he had terminal cancer and he at the time because he was already in his death bed he was in a hospital he was dying in a few days left and someone rushed in and was like beat this computer program and when they beat him like oh yeah you take this died I don't know I feel like that's you know it's got a dark point in the history of AI I could have you know let him die undefeated like really anyway checkers is actually now and solve problems so for anything a position there's the optimal position the optimum move is worked out and white will always wear all right so that's it's no longer a challenge it's not a game in that sense what I would call the first machine linear I wear them actually comes from Cornell in 1957 by Frank Roosevelt so he invented the perceptron which was there's something people talk in their you know in detail I would devote a whole lecture to it but this is really an algorithm that it's still used today and was really revolutionary at the time and ultimately this is what an artificial neural network or were deep learning is today so there's actually very little difference to what he invented back that so here at this basically his insight back then is you know it's busy ninety percent of the way there so he you know water he was really missing was computation of power so he did not have this large computers that we have nowadays and so he was really a visionary and this is ingenious algorithm that we will talk about very soon at the time though one thing you have to keep in mind this is I will not talk about this too much but it's basically he could only train very very small networks because he was very constrained by the speed of computers and as 1957 think about this probably the in all the computer power the entire world is less than your phone right and so he had me buddy mostly analyzed was like a single layer perceptron is a very very limit accurate but when this came out there was a huge at this time there was a huge excitement about air because this was something new all right like you could I've rhythms I could could learn from experience you know they could play checkers right to some degree right and so on the people were really really excited it was a lot of funding for it a lot of funding research and you know generals were already dreaming of you know armies of robots it said I mean Elizabeth generals do right it's dream about this we have nightmares of that kind of stuff but so but you're not actually it was a little bit to it so what happened was something called the AI winter this was the AI boom people were really really excited that was the first day at home we're really excited about AI am and this came out and everybody was talking about it it was articles in the newspapers and so on and then came a book by Minsky and Papert and I don't think they had the intention to kill AI but they did and they actually analyzed these algorithms and they came up with a very very simple data set that everybody could understand the X or data set it's very very simply can draw it on to it and they could say that the perceptron has rows and blood shows that the single layer perceptron can never learn that date and it seems so simple space is just as four points they could you know to a positive to a pluses to our circles they could say you could never actually learn to distinguish between these and it seems like such a simple thing that everybody understood made if they can't even learn this right how can they ever become killer robots right like you know there's a disconnect here and so what happened is actually funding for e I collapsed like you know drastically in the United States and in most countries around the world so this is called the AI winter and people still refer to this in space even you know people got really disillusioned but they're so they're more people did as they said well wait a second we don't do AI we do machine learning it's totally different and you should find that it's the same people that you know so that was really actually care and this is the sarcastic way but it's kind of where I came from right so and but it's to some degree the fact that the old that you know they gives if some research area dies there's also a moment for facing new approaches right that be maybe before and weren't taken seriously because it wasn't kind of it wasn't part of the way people thought right it wasn't along the lines of people people thought suddenly new approaches would take you seriously and that is essentially how it would place machine learning so on one hand you know if you want to get funding you have to say you have to do machine learning but on the other hand actually there was also a difference and I would say the difference is twofold number one AI was too we're very focused on the human eye base hair like we want to build something like a human and it was very focused on problems that humans think very very hard and try to solve them with algorithms machine learning is more the bottom-up approach so even you start with a computer and it's more like basically what I've told you earlier right you base you try to find a program to learn something from data there's not really you know machinery per se does not really have the ambition to be you know create humans essentially okay it's something like a job and it can be very different approach and that's very freeing right because you can do things that you know well that's clearly not what's going on in the brain but it works really well on these kind of you know on computer hardware that we have and computer hardware is very different from the human brain another thing that's very very important is that machine learning focused on statistics and optimization and not logic so know much computer science traditionally is very entrenched in logic right there's a very strong connection between you know most algorithms right away you argue about algorithms is a through logic and that is a you know that was a natural choice to think about AI and if you look at old movies like Star Wars etc right the way AI you know is presented is as completely logical beings right there everything they say is like c-3po is always logical right because that's how people thought a IR actually to be honest if you read these old papers some people thought that at the time that humans are completely logical right like you know it's not in the brain everything is very obviously they didn't have many friends but um it seems like ultimately the wrong approach right people try that you know that some point is like people inventing fuzzy logic because they realize you have to have uncertainty right sometimes we just we have believes about something it's not you know you don't have a clear truth like it's you know people really made these statements like is the light on yes or no and if the light is on then I don't do anything if the light is off then I switch the light on all right but sometimes you're not sure right about something and the right approach to do that is really statistics I mean statisticians have dealt with uncertainty for you know four generations right so before when that wasn't really taken into account it was you know I'm not saying he is not old statistics is not statistics at all but machine learning was definitely a much more Cystic or and optimization based approach okay so there's a very interesting point in 1994 and where I would say this was a pinnacle point because it was the first time but people really kind of realized how powerful that ideas right that it really works so you have these moment sometimes researchers are working on things and they're in their own bubble but occasionally they see people around the world notice what's going on and that might those moments was in 1994 but Jerry T's hour of an Iban wrote a a backgammon player so I know people have heard of backgammon it's actually a very very popular game around the world almost every country except the United States yep people don't play but it's actually it's a you has a huge following in a huge part of the world it's very very competitive there's World Championships etcetera and what Jerry did is he wrote a program that basically just check if a move is legal and that's easy and then he you know the tie the time the way people thought game playing works is use the minimax algorithm that those people have taken AI they know how this works basically what you do is that's it you know that comes you know you play checkers and you would do this he bassy say well you go through every possible move that you could make and he say if I would make that move what would my opponent do if I was the opponent would make that move but what would I do I would try this move its Arab and you always flip the board and at the end you say well who looks better now and if now my opponent looks better than that's a bad starting position and if I look better than that may be good so he did something else he instead took an artificial neural network which is basically you know what percent would roast got invented in 1957 and he made later later that the new network made the decision which move to choose so he said you know here's a list of all possible legal moves the new network chooses which one to play and so when he did this the algorithm may totally random moves clear didn't work all right doesn't do anything but then he did something really really cool he took the network and he let the program play against itself and so you had two copies of the same program and the letter play against itself at the end one of them wins right there pretty random and make pretty random moves but one of them wins and you tell that one what you did was good and the other one you lost what you did was bad right so on base he tries to so reinforce the network to do but even average you get positive feedback you try to let more when you get negative feedback to try to that less and so you had that feedback loop babies just let this run and then he went home and he let this thing run play against itself and he let his rule that's a run overnight whatever next morning comes back it takes the program it tries to play against it and he loses the program beats her had overnight just play against itself but no other input has learned to play better than he did so he goes around the deport you know is this office right challenges other people like play my program right and the beats every single one of these people so he gets really excited right so what he does is he calls the world you know the organization with a world championship in backgammon and he says like you know have this program and it can play really really well you know backgammon really really well and I would like to challenge the world champ and so you know okay they were curious and the world champion was interested and so they actually made an appointment and they paid against each other and in the meantime by the way he you know he kept let that program play against itself a couple hundred thousand times a million times and the two at the faceoff and the program won and that was something that took people completely by surprise right and people go up to him after was like well that's amazing right how does the program do it he's like I don't know because I've learned that itself in fact Jerry by you know it's not such a good backgammon player action he actually talked about I talked with him about it he actually said well afterwards well you know because then he you know he had to talk so much about backgammon and so on he actually started to get a really into it but at the time he really wasn't a good play at all like the program was much much better than he was and without him actually putting any of backgammon knowledge into it and in fact the amazing thing is that doing that face off with a world champion the the his program made a very interesting move very interesting opening that the world champion thought was was crazy I was not of a good movie at all and so but it turned out it was actually a very good so the world champion actually asked for a copy of that program and started analyzing it and then actually realized that's actually an opening that was undervalued actually turns out there was a new opening that basically this program discovered and it's now actually in the backgammon folks so after this happened his bosses at IBM we're really excited about like this was awesome but why did you choose backgammon no one cares about back-end so they did they are let's do chess and so they again put you know this time they put a big team together and a lot of resources and the challenge Garry Kasparov you know you know world champion at the time and probably one of the best player the history of humankind and their to matches against them in the second match actually the first time Gary Kasparov won and they challenged him again in the second time deep blue won no fairness actually this was less about learning approaches was more of a minimax traditional AI approach because they played it really safe but the evaluating if a position is good was that now machine learning is everywhere and you may not even know right this here is a search engine is Yahoo I don't know if you've ever seen it so search engines for example let me you think about what search engine is a completely ridiculous problem right we have billions of web pages and you type in three words and you expect based on these three words that the search engine knows which one of these billions of web pages you want all right clear that cannot work right how but does how does it work well the reason is that humans actually have very predictable expectations right of what you want to see etc and you know a lot of bad pages are not very interesting etc and so the way this actually is done this really is actually learn so but with Google and Yahoo and Bing and so on what they do is they employ hundreds and hundreds of people that actually goes the search engines amazing whenever you search something on Google there's a small chance it gets to one of these people and what they do is they actually label web pages and say this was a good answer this was not a good answer this was a good answer this was not a good answer and then they you know basally put all this into a big learning algorithm the learning algorithm then learns what's you know what's a good answer and what's about it of course spam features is a great example for something that that's that it's really really hard to write a spam pit and it's almost impossible right for to reactor you know for two reasons a what is a spam email well a spam email is an email that you don't want all right well what does that mean hey well kitty some people want these emails right otherwise it would respond to these evil right so for example emails about stock right or stock options or something for me though we spam because I'm not really interested in this but Matt you know I know someone was actually you know finance right well for them actually it's not right and here also train your spam filter there you know whenever you get emails from your mother-in-law right you can just spam you know train it to say that's not so whatever you think is a spam right it's a very very subjective definition that's one problem the second problem is that the moment you would write a spam filter that's hard right ins that be say like whatever you you know for example the word viagra right I know this is some I know what that is spam but like for why there was a lot of viagra spam and so you know it's very simple Google people but ever you have the bird viagra in the email that's probably spam right it's very rare that people actually you know email people are like you know I love viagra is on the you know you know we do that good maybe some people do but but that doesn't work right because the moment you do this then you know the spammers will just misspell it so they busy spell of one instance of an eye or different you know something slightly different and then you still get the message across because it's adversarial behavior right so people would actually immediately then change the way they are spamming and now you have to rewrite your whole program the nice thing about learning is you just take the new data and you're just retraining back right five minutes later you have a new spam filter that actually is now adapted to the to the new new way span looks or you can actually do online training that basically that's the way we do it works is every time you get a spam email you train a little bit and so it always keeps up-to-date another example here's news right so Google News actually learns what you're interested in apparently I'm an interested in Lindsay Lohan well I guess it's not super so it will learn your parents right and basically suggest new stories to you and so one thing you know that I guess a lot of people talk about those a lot of research here at Cornell is for example self-driving cars right and so that's also something that really I think you know a big part of why that is possible now is actually to machine land right so a lot of the sensors have been around before was there some also some new developments but the fact that it's you can basically you know learn from from how to drive a car safely really kind of was a game changer in this respect and there's something by the way where I was surprised I remember a couple years ago someone asked me maybe ten years ago as I'm asking far are we away from self-driving cars now is that yeah it's going to take 50 years and you know next year basically people start a drive a crowd so and the reason is actually because suddenly a lot of things come together it's a sudden you get these leaps because suddenly a lot of little problems get solved and then you put them together and suddenly you can do something big and when will it stop so there is of course we have this this carrot that's kind of dangling in front of us that's the human brain right so the human brain really just is a big computer it's just a very different hardware but we know humans are really really good at learning and the principles are very similar the actual implementations of the algorithms are very different but the good thing is we know there's basically we are not even close to that right so we can do much some people claim this nowadays the deep learning is now better than when you human brains I completely disagree that's they're very good at you know very specific things you can train deep nerves to be you know if you have a very specific task you can train algorithm to be better at that but humans can do so so many things so that's certainly in some sense good news that keeps me employed and the the search in some sense means there's you know there's still a lot you know we still have a lot to do a lot of opportunities the other thing is machine learning now is you know at the breakneck speed is being applied in all those various so people now in biology and chemistry etc like 10 years ago when I got my PhD when I talked to people outside of my computer science and I told them I didn't mean learning nobody knew without words I think they thought I thought about doing building machines you know this was something that was completely unknown and now people in biology chemistry I you know tell anybody you do machine learning and they overlay oh okay yeah you know we should talk so it's raising out you know suddenly you know people in feels that have nothing in common right use the same algorithms to analyze the data and make predictions and that's extremely powerful and it's still very hard to design machine learning algorithms and there's not many people who can actually design new algorithms that are meaningful and useful as a small community but there's many many people who can use them oh wait oh yeah I have a little more okay good so let me just talk about one of the different types of versioning there are and there's on a you know very roughly there's three types of machine learning that's supervised learning and unsupervised learning and reinforcement so this class is really only about the first one so in this class we will talk about supervised then supervised learning is exactly what I just told you earlier with the fMRI images so I give you data and I know the answer the data and I would like to have a function that goes from that data to that answer to that output for example spam filters and search engines etc unsupervised learning is when I give you data but I have no labels I just tell you like here's some MRI images do you think that something interesting going on here all right maybe you can find some patterns that repeat or something like this that's a okay class machine learning for data science is mostly covering this that's basically we're alternating these two classes there's also reinforcement learning and reinforcement learning is when you do something for a while at the end you get feedback and say you know that's good oh that's bad and that's what but Jerry's are used for its backgammon so that's mostly covered in the AI course and in the robotics course those are the desert there's a significant overlap in the methods that we using it's just the way the feedback is incorporated is different apparently machine learning is number one skill that employers wish for these days unfortunately I don't remember where I got this far so so someone claims that on the internet so but there was a list of 12 IT skills that employers can't say no if you close there now a little bit old but you know Bill Gates once said a breakthrough machine learning would be worth 10 Microsoft's now I'm waiting for that machine learning is the next Internet machine is the next hot thing and so on bad rankings today are mostly a matter of machineries that's really very true machine learning is going to result learn in a real of evolution machine learning is today is this continue with you all right good that's the last slide and from now on we will switch to blackboard any questions about the stuff that I just said um good all right and so while I do this I can hand out notes of the way people do this is Irie don't like it when people take you know I write something in the blackboard and you know the entire class writes down what I'm saying and no one pays attention so what I'm doing is I give you exactly my notes so I yeah I just know I have no information advantage in some sense so you can just pass these back and if you free to annotate there's a downside so you get my notes in my handwriting and I apologize right now you'll a minute you know what I'm talking about so good alright okay so if you only have 15 minutes today and but let me get started um a little let me at least get started guys so enjoy this is the headline of the entire class actually so the entire class ovo a supervised learning and what I want to do at least today is to formalize the set up so what's the high-level goal and what have you given what are we trying to do and let me just write this down an automatical notation and and give you some basic terminology that people use throughout the course so make sure you you know you familiarize with this yourself with this because this is exactly the stuff people you'd be using the entire class so so on a high level but supervised learning attempts to do is make predictions from data so the set up is you're given a data set and we call this D of data points and their labels so we're given n pairs of points where my X is my data and Y is the output that I would like to generate we call that the label click on that the feature vector so in terms of spam filtering that would be the email and this here would be its add a spam or it's not spam all right so all right you know in terms of stock market right does he has busy some preaches about a stock and this here would mean up or down I try to predict as a stock up let's talk about so these data points are sample from some distribution and if X I Y I Sam who's under stab you should P P is the it's a distribution that we have no access to it's the distribution that only God herself knows right it's basically where the data comes from I had a v-8 access to this distribution and everything would be easy but we don't right there's some probability distribution that if I would pick someone here randomly and take an MRI scan of that you know their brain I basically get you know you get an MRI scan if I would do this repeatedly right there's be some distribution of MRI scans that I get with that process okay it's a some strange distribution that some from from Mother Nature okay and we observe data from this but you know in terms of emails this is the distribution you know from which emails are sampled so everybody may have a slightly different one right so some people sign up to the newsletter cetera so that changes the distribution of the emails that arrive in your inbox okay and so what we would like to do is take data from this particular distribution and learn a function that basically goes from X to Y okay any questions all right so this kind of data is either you know there's multiple cases the most common case is either a time series for now let's assume it's not a time series for now just assume these are iid so these are drawn iid anyone know what iid stands for I think all of you together got it right who knows it raise your hand yep that's right better so they're all from the city distribution P and bacey knowing one of these won't tell you anything about the other ones if you know the decisions okay good so what are these XS and one of these rights so let me just quantify this so these X's come from a space but I called curly ax and the Y's come from a space called curly Y and so in this class he axes are typically a d-dimensional space of real numbers so basically what we're doing is we say we take our data and we summarize our data in as the numbers so it's a vector that has D numbers and that summarizes everything we know about this particular data instance here's an email you could for example imagine these are the words of the email or something right and I'll show you in a minute how to do this so for now we assume you can always do this and it turns out very very awfully candidates in fact if you think about it if you store your data on the computer what you computer what is it do right it has a long file of numbers right well that's a vector right of length T so you know it's always possible somehow all right so and we just cause our D Israel's and it's D dimensional feature space that's about me referred to and the X is basically a vector that represents the ithe data sample so the eyes email or the eyes MRI scanner why I they can you read this Novak okay not so good you know what I write it's a bigger chalk okay Lloyd - okay I would try to remind me tomorrow morning to bring bigger chalk all right okay awesome so so why I is out of the label space curly why and that's amazing the answer that you would like to know is as I that you know for example the despairing does not span etc okay let me give you some examples for what Y can be let me oh that was the coast all right okay let me give you those some examples so for example one is binary classification so examples of Y so binary classification is Kelly y is either 0 or 1 or you can also write it as minus 1 or plus 1 and so an example here would for example be email spam filtering right we basically say minus 1 is spam and plus 1 is not spam so you have exactly two options and for every email it's one or the other another option is well you know for example face detection you take an image is that a face yes or no right so your camera does this detects faces there's a little window that goes over the face over the image and tries to find a face there's methods among the most common settings in machine learning another one is multi-class classification so here Kerli why you have multiple classes so I call this K different classes that can be 1 2 3 4 5 right so let's say for example one option would be that I take a articles on the web and I would classify them as either sports you know or politics or you know but gadgets something reviews or something like that so right so busy have different categories and you want to know if her sample in Google News right they ask you are you interested in entertainment are interested in politics are you interested in tech news or something right and so what they do is they go through the newspapers of the world and they classify each article that just came out is one of K classes and the one made for example to be sports and you know K is politics any questions about this raise your hand if you're with me alright awesome cuz that another one is regression and regression Y is just the real numbers that's one way of doing regression so here you don't differentiate between different categories you try to hit those specific number so for example I don't know if people know Zillow zero is a webpage we can actually go to anyone any location in the united states take on any house and it will tell you how much that house is worth and and so that's interesting if you want to buy a house you can look around and kind of check the neighborhood's how much different houses cost right but it's a relatively hard task so what they try to do is they busy say you know well you know how many bathrooms does the house have what's the square footage right what's the proximity to the schools what's the ranking of the schools etc right all these kind of different features so these are the feature items that I have my vectors in different dimensions and I try to predict how much the house hopeful and whenever new house gets sold I see put that add that to my data and I retrain my Iowa alright so but the answer could be anything between 0 and I know a couple millions theoretically I guess and another one is like for example if you would like to predict the height of a person either the age of a person all right good any questions about classification about this binary regression with classification etc all right good yeah hi I cannot I don't have more does anyone have two three you're not allowed to sell them I will post them on Piazza how about this and you can just print out your own who does not have one raise your hand oh that's terrible the oldest I see they're all in the same location got him look at the neighbor okay someone's really nice you could just pass it pass it back and then look at your neighbors but they it's only three minutes anyway left okay so that is why right so basically if you have you have our X and we have our Y so Y is what you're trying to predict and he just says kind of either be you know a set of numbers that you're trying to shoot for - 101 or zero one can be you know K numbers that's you know K multi-class classification or can be real numbers you know you know if you want to do regression now what about X let me give you some examples of X and well let me give you first one it's for example you know patiently and so one thing that's really really big nowadays actually well it won't be anymore I think but but you know one part of the Affordable Care Act I guess was you know there was a lot of incentives for hospitals to not to not discharge patients that then come back very very quickly and so that's that's generally not considered in the interest of the patient right so a few measly get discharged from the hospital but then a week later you have to go back in well probably you shouldn't have left the hospital alright that's usually what generates more costs for the insurance and so basically there were penalties for hospitals if the patients go back to the hospital within six weeks I believe there was some some time period so a lot of effort was made saying predicting if a patient will come back in the next six weeks and so is the participation healthy enough that he or she can be six weeks outside on his or her own and do fine because then after six weeks you're probably good okay and so well you could do is you could basis a about my ex I is basically the specifications office patient so one thing you may want to stick in is the key a binary features 0 1 are you male or female right so he it could make a difference who knows and the other one is the the H right so you know here's like person 76 years old was the age and years right so that's another way of describing is the height right 183 centimeters of course you know you don't know what that means but you know you're six feet or something I don't know how many hands I don't know some you know height in centimeters yeah you know and you now basing that's a you have some blood pressure and you know heart rate or whatever and now you try to predict basically does the person will the person come back yes or no I didn't just teach a story data from people that you that did come back it did not come back and you train you algorithm that way so another very common one is text documents alright so you here you have an email for example and so that doesn't look like a vector that looks like you know straight numbers as are a string of words and so what people do is that something called a bag of words bag of words about representation but you basically say well word orders over ate it right I just you know who cares about in which order you put in the words down let's just only look at the word occurrences which words are in the text and turns out that works surprisingly well so and then you can actually take the following you can take a vector that basically is d dimensional where D is the number of words in the English language works out alright so I don't know how many words there are in the English language maybe a million right and so then you start with the first word not vertical what it doesn't matter right so ant right and down here you have zebra right and so then for every single word in your email for example you just say how many words that occur so for example the word the you may have five times right in the word moose right seven times right I said I don't know right and and no right and zebra you didn't talk about etc right so that because you just you discount the words in your email and that is a vector of just word counts and that is a vectorial representation off your email turns out that works surprisingly well so that's an easy you know usually you can identify it's an email as spam or not spam or a news article you know if that's politics or or sports right you can detect that very well just in this representation which makes sense right so if you talk about baseball a picture or something it's probably sports you know if you talk about Washington DC your president something it's probably politics why don't we leave it right here and I see you again on Monday
Info
Channel: Kilian Weinberger
Views: 124,975
Rating: 4.9882007 out of 5
Keywords: machine learning, cs4780, cornell, kilian weinberger
Id: MrLPzBxG95I
Channel Id: undefined
Length: 47min 49sec (2869 seconds)
Published: Mon Jul 09 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.