Machine Learning: What is easy, medium, and hard?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi there today I want to talk about machine learning and artificial intelligence and in particular with the rapidly changing state of the art what is easy what's medium and what's really hard today with with machine learning and artificial intelligence what is likely to change in the future where were we just a few years ago so that's what I'm going to talk about today ok so machine learning easy medium and hard now across nearly every discipline both in terms of scientific research industry and in our everyday lives machine learning is is redefining the art of what is possible and even what we thought was possible five years ago is getting redefined now with tremendous advances okay so these are driven by rapidly increasing volumes of data so for example Google Images is this vast amount of data on naturalistic images the Facebook database is a tremendous you know database of millions and millions of human faces from different angles and different perspectives you have surveillance cameras you know Google searches so the amount of data is growing at an incredible rate in addition you have faster and faster computers parallel architectures like GPUs better data storage so cheaper hard drives and memory and faster transfer and so you you have this kind of culmination of big volumes of data and hardware that can handle those those big volumes of data and in the middle you have advancing and emerging mathematical techniques kind of applied math statistics being developed in many different disciplines these algorithms are essentially mining features from this big data and allowing allowing you to essentially pull out relevant information coherent patterns and actionable information and so these kind of three things together so big data big computers and you know better sharper tools algorithms are really driving this field forward an incredible rate okay so that's what we're going to talk about and this is going to be just through a couple of case studies so what is actually possible in examples and I really like this xkcd comic so I'm going to put links to as much of this as possible so you can go find these links and and dig in this is an xkcd comic I really like so I'll let you read it for a minute so I think this is kind of fun because it really is easy in the modern era to think of machine learning is almost like a magic wand okay so or a black box right you put in data you get out awesome predictions and results or you know you hire someone and you say okay I want machine learning to solve all my problems go and so really it's easy to set up problems that are trivial with data science so telling if a picture is taken in a national park is just a matter you know photos are tagged with location but figuring out if the photo has a bird in it much much harder right you need a research team in five years now it's funny this was actually this comic is from a number of years ago when this was actually hard now with the the incredible pace of progress in image classification this is basically a solved task by many research teams across the world okay so this also illustrates that what used to be very hard and need a research team in five years what used to be very hard is now essentially you know trivial okay so facial recognition is a huge one what I always tell my students is you know in my collaborators if if you can take your data and make it look like an image then you get decades of incredible research and algorithms for free because image recognition face recognition has been one of the cornerstones of machine learning you know machine vision for decades and a lot of the most powerful algorithms are developed for for mining and classifying and making decisions based on images so here you see a selfie this was a picture that that Ellen took with a bunch of celebrities and some some researchers essentially used the Viola Jones facial detector algorithm to automatically pull out all of the regions in the image that it thought were faces and then you can use a face detector algorithm to tell who these faces are with pretty high accuracy okay so for example I think the face book facial recognition algorithm of who is who so not just putting boxes around faces but telling telling the user who the people actually are is now approaching human accuracy in terms of classification which is quite quite impressive right this is taking decades for algorithms to get as good as humans at telling who belongs to what face but now they're able to do it and fun fact one of the ways that they actually do this that image recognition and facial recognition algorithms actually do this and reach that human level performance so for a long time they were stuck a few percentage points below humans in accuracy and essentially what what bridged this gap at least one of the the technologies was estimating what is the actual skeletal structure what's the three-dimensional structure of the face from this 2d image so as a human I have all of this experience walking around the world in three dimensions and seeing people from different perspectives so it's very easy for me to estimate the depths of a cheek or a nose or how deep you know someone's someone's eyes are behind their you know their forehead and so that's the kind of technology now computers from these 2d images are able to estimate that that skeletal reconstruction and then that gives them enough information for that human level performance okay so this is a problem people have been working on for decades now I mean I don't want to say it's completely solved but you know everyone holds up their phone and you know there's a number of people your phone automatically puts boxes around all those people's faces so that's using very similar technology which is very likely based on the viola Jones algorithm okay so a lot of image recognition so image classification used to be a kind of a and challenge machine learning and machine vision problem and then I think in 2012 this this landmark data set this image net data set was curated and put online for all of the world to try to develop and test their machine learning algorithms to classify different images so this is just a tiny tiny slice of these millions and millions of images that were labeled okay so these are labeled images where you know you know what is actually in the image so if i zoom in on a little region here you can see there's there's everything okay there's you know a picture of an iPod and a person with some headband and you know I mean there's there's every picture you can imagine and they're all labeled and so this massive volume of training data along with the massive increase in computer power and GPUs and deep neural nets it made this kind of quantum leap in performance of image classification so this was the landmark data set and also demonstration of deep learning where it really got back on the map so people have been using neural networks for forever for decades but having this wealth of labeled data so this is called supervised classification when you have labeled data so someone's telling you what is what and then you're learning that and that was really kind of this this massive change in the performance of deep learning that that really started it off on this exponential trajectory and you know it's undeniable that deep learning has essentially improved you know almost every field it's been it's been applied to where there is this kind of wealth of data of labeled data so it's really changing what's possible in terms of performance metrics in many many tasks now the caveat is you need a tremendous amount of training data so lots of times we want to learn with minimal amounts of data that's still quite hard you know humans and biological systems are experts at one shot or two shot learning where you have a few trials of information and you can adaptively learn that very quickly because you have this experiential framework to hang that information but computers are getting better when you have this volumes of data so this used to be kind of a grand challenge hard problem now this is kind of the base level entry level so now this is kind of an easy problem because we have big enough computers and big enough data sets so image classification is doable now this is just kind of an example of how rich this imagenet data set is so nearly everything you can think of so these are like tractor-trailer or sorry these are these are harvesters right like John Deere trucks these are pictures of broccoli these are some purple flowers pretty much anything you can think of there are dozens if not hundreds of examples of these of these images in this image net training set and so to some extent your anything you will see in the future you are likely to have seen something similar to it in this training data so that's also one of the catches with this this deep learning is oftentimes we think of it as like a really fancy interpolation but you have enough richness of the training data that anything in the future you'll see can kind of be interpolated based on what you've seen in the past and this is this nonlinear hierarchical interpreted interpolation now this is an interesting I'm pulling this this picture from this great paper in cvpr 2017 I encourage you to go read it the first time that I I downloaded and use this image I didn't even realize this second column here or maybe this fourth column these are actually artificially generated images by the deep learning algorithm so these are called gans generative adversarial networks where essentially you have one part of the network that's trying to tell is this a real image of a purple flower is this a real image of a harvester and then you have another network which is kind of trying to generate images to trick your classifier into thinking that it generated a purple flower or a tractor so they're kind of adversarially training against each other you have a classifier kind of the traditional image net classifier and then you have this this generative network that's trying to trick it and so remarkably I mean if you look very closely you can see that these are kind of just make-believe images this is really broccoli this is just green blobs in a statistical pattern that looks a heck of a lot like broccoli from where I'm standing and similarly with these with these harvesters so what's really really interesting is now actually generating fake images that can trick a trained classifier this totally fooled me the first time the first several times that's how good these fake broccoli images are they look just like broccoli these purple flowers are even better and so these these Gans are getting amazing at at kind of faking these these naturalistic images and they're even getting to the point where you can fake videos so this is becoming a big kind of social hot topic right now because what if I can fake a video of a world leader saying something either offensive about another country or you know sabre-rattling like what if I can actually fake a video of a politician and and trick another country into thinking that it was really our leader or another country's leader so there's this kind of misinformation potential when you can fake images and fake videos that's really quite worrisome and it's it's progressing at a rate that we kind of I don't think anybody saw this coming how fast we were gonna be able to fake videos I mean this is this is science fiction technology that's now pretty much possible if you have enough enough training data ok so very interesting interesting topic now image classification is not I don't want to make it sound like it's it's completely done there are still things that humans are able to do that that machines really aren't because we have intuition so I love showing this picture blueberry muffin or Chihuahua okay so this is kind of a comical example of where a computer fails so a lot of these chihuahuas really do look a lot like a blueberry muffin I mean this one just tricked me a little bit but a trained human looking at this knows for a fact which ones are which ones are chihuahuas and which ones are blueberry muffins we just have this kind of deeper intuition this there's almost physical intuition of what is what and that's really more subtle than what computers have okay so I also like fried chicken dog I think that's one of my favorites you know is it a is it a dog or a fried chicken okay so and there's tons of examples like this where you can trick machine learning classifiers with pretty simple and obvious misclassifications but overall image classification is pretty good and so what people are moving towards now is more rich scene classification classification of movies kind of real-time surveillance of many many images at many scales so can i you know in real time from a surveillance video can i track all of the cars pull out their license plates can I detect erratic behavior and what is erratic behavior well that's hard to quantify there are outlier events let's say you know I had the London Underground Metro platform and everybody rushes in and and crosses paths and goes their separate ways can I tell if someone left a piece of luggage by a support pillar okay that's a very important surveillance problem very unstructured very hard so can you detect something that's out of the ordinary that's abnormal something that you should focus your attention on and get a human to look at okay so that's becoming the new kind of difficult level of machine vision problem creating and analyzing and classifying movies is a very hard problem so I think that the YouTube database is going to be kind of that next image net database now we have I don't know millions or billions of hours of video content and that's an even vaster data set that we really don't have the computational power to to do the full analysis on so that's kind of where we're gonna have to catch up to in terms of computational power and this is extremely important for self-driving cars for autonomous platforms for drone surveillance you know Amazon Prime air is gonna be flying around drones they need to be understanding the scene that they're they're working in self-driving cars are gonna be driving around they need to understand what's happening in their okay so this is I don't know kind of medium hard we think it's doable but it's not done yet image captioning is one of my favorite examples so this is really neat this is again one of those problems that even a few years ago we would have thought like a computer sure they can classify an image like dog cat blueberry muffin but actually making a coherent sentence describing what they see in a scene that seemed like a very hard task maybe five years ago okay and so it's kind of remarkable how far image captioning has come this is a great paper I recommend you you reading another cvpr from 2015 and you can see in these images that does a pretty good job man and black shirt playing guitar construction worker and orange safety vest working on road to young girls playing with Lego toy well I'm not sure she's a young girl or not but you know it gets the the gist of the picture there's a kid playing with LEGOs okay and it's actually quite good you can read about the statistics of how effectively they can and can't label these scenes and it's really quite remarkable now there's some comical examples of when it fails so young boy holding a baseball bat you know it's clearly not a cat sitting on a couch looks more like a ferret to me and this is my favorite a horse standing in the middle of the road so no horse but there is a road okay so so it's it's much better than we thought it would be just five years ago but still not perfect okay so so that's kind of image classification and unseen classification and I think I'm gonna go back here for a minute so so one of the points I like to make when I talk about image captioning and classification and surveillance is do we think where do we think this is going do we think that the Machine is going to get better at classifying scenes are they going to be able to watch a movie you know Iron Man and their are they gonna be able to write a transcript are they going to be able to write essentially the the script of the movie just by by watching all of those scenes evolving in time that's something that we can't do now with machines but we're working towards that and so you have to ask yourself this question you're thinking about easy medium and hard also what is possible and what how feasible is it that this will happen in the near or long and future okay and so one of the the kind of baseline questions I asked myself is can an average human do this task okay so an average human yes they can image classify they can more or less watch a scene and see kind of track a car or see if something suspicious happens an average person can caption an image pretty accurately and so these are all examples of something that don't require extremely specialized you know a lifetime of skills kind of knowledge to do and so these are things we think computers can probably do if they have enough training data and we have powerful enough computers and and we try to tackle it long enough you should also ask yourself the question are there hundred billion dollar companies that care about this problem and for this one absolutely so that's another thing that will you know be driving progress in these fields is is it actually impactful and important is it something that is worth doing and is it worth spending real resources and and human lives working to solve these problems and so you know image captioning surveillance absolutely now there are some tasks where it does take a long time for humans to build their expert intuition but we actually can train machines to do it if you have enough training data so I think there are some interesting examples of medical imaging where it takes trained humans long times to to figure out what they're looking at in these images but with that labeled images you can essentially transfer that human experience into into machine classifiers but then making real actionable decisions about how to help a patient improve their health that's something that's going to take a lot longer for machines to to do well or at least we think now maybe in five years we'll just be blown away with the performance okay so I'm gonna switch gears a little bit now we're gonna talk about actually wrapping this machine learning so what we've done so far is just the the the algorithm is kind of this this brain-in-a-vat or this you know it's just watching the world and classifying and annotate and labeling things so it's kind of not touching the world it's just observing and classifying now what I'm going to talk about is actual control okay so I I love control theory I think you know if you're going to model a system if you're going to build a model you should be trying to change the system somehow or oftentimes you're trying to change the system so for example you know Amazon is is collecting this data and understanding how humans decide to buy things and our preferences but eventually I think they want us to buy more stuff and they want us to be happier when we buy it and to you know they're essentially trying to manipulate the system so that you know everybody benefits that's how that's that's how that should should hopefully work and so kind of changing the behavior of the system so in a self-driving car you're not just seen classifying you're moving through this world if I have a robot walking I'm not just classifying objects I'm actually interacting with the world so this is another one of those kind of amazing topics that that deep learning has been taking off so this is a 2013 nips paper playing Atari with deep reinforcement learning so reinforcement learning is a type of kind of data-driven experiential control where you try things you see what happens you build some estimate of the value of different actions at different times and basically people are using the wealth of data and deep learning to augment this reinforcement learning for dramatically improved performance so this is a classic Atari game I think it's called breakout and basically it's kind of like pong you're moving this thing back and forth trying to bounce this little red dot and you're trying to break this rainbow you're trying to get all of these rainbow blocks broken okay so this this is kind of a challenging game after two hours of playing this game and all it has is the pixels on the screen and it knows what its score is after two hours of playing this game it's pretty good like I don't think I could play this this effectively maybe with two hours but I don't think so this is pretty good but the deep learning what you really want what the ultimate goal is this deep reinforcement learning is the machine not just being good but learning these the kind of magic transitions where it goes from being good to learning the trick the kind of the secret of the game okay so the the cheat okay and here you're gonna see after a couple more hours it learns the strategy that is much more effective for winning do you see this okay so it broke through here and essentially it just uses the physics of the game to break all of the the blocks at once okay so that's the the promise of this this deep reinforcement learning is if you if you interact with the system enough can you actually learn the rules the physics so that you can find these these deep exploitations that are not just you playing the game well but you being really a master ok and we've seen you know deep mind google deepmind is not just playing Atari games they're playing chess they're playing go and they're beating the best humans in the world they're beating hundreds of years of collected human knowledge and transferred knowledge now essentially by playing and learning and adapting their strategy and finding these deep kind of exploitations and and physics and symmetries of these these structured games and that's really really exciting so this is one of the more exciting areas of machine learning research is kind of deep reinforcement learning ok another one I love this is this is some great work in accra 2009 so again please read this paper this is kind of robotic reinforcement learning okay so imitation learning so a human shows it how to do it a few times maybe once or twice you can read how many times and essentially this machine it has a white background so I'm assuming it's using video data to figure out where the position of the blue cup and the red ball are and after something like 15 trials it's doing pretty good I mean like it's getting as close as I would after fifteen trials probably and after you know a few dozen trials it's basically learning the physics that's refining its control strategy and it's using this kind of data-driven policy here so so this is really important this imitation learning this kind of reinforcement learning from data from you know it's using computer vision to do this is really important for a number of applications especially things like manufacturing this is super important so I saw a talk by someone who works on this and they said that I think the anecdote was they had one of their you know relatives a kid maybe a twelve-year-old or something try this game and it took the child about the same number of trials to learn how to do this maybe a few less maybe they did it in 50 trials instead of a hundred but the same order of magnitude learning rates here with this robotic arm and with you know a human child so it's really remarkable if you think about automation and manufacturing you know you have this kind of forward assembly line model where the same machine is doing the same task over and over and over I'm gonna like turn this bolt turn this bolt turn this bolt that's all that machine does it was built to do that forever but in modern manufacturing maybe I have more customizable designs maybe I have a robot that has to do multiple different tasks in one area it has a work zone this is more like a skill a skilled worker and so you want to be able to teach those robots how to do new tasks quickly and how to transfer knowledge between one task and another so this is very exciting another area that's driving tremendous research is is robotics and also self-driving cars so this is kind of a DARPA competition and DARPA both has been pushing self-driving cars but also robots and this is just kind of a funny video of how bad a lot of these these robot walking and manipulation tasks are so we like to see videos of things like the big dog go Google Boston Dynamics big dog and you'll see how great our robotics is now but there are you know for every success story of robotic walking or running there's a lot of failure these things don't have physics intuition yet they don't have kind of our degree of control Authority and actuation and that's something I think is super important to think about it may very well be if you think of the science fiction future of you know 20 30 40 years from now it may very well be that machines surpass our ability to pull out patterns and actionable decisions and and reason based on data in many many cases and in high dimensional data but robotic bodies are not nearly as good as biological bodies so the fact that I have very sensitive sensation in my skin I can feel very minut pressures and frictions and temperature changes I can feel wind you know I have I don't know how many but a ton of different muscle groups that can work in concert I don't just have one servo joint okay and so the fact that you have this this this deep complexity of sensing and actuation in biological systems means that I can go out into the world and gather experience I can be curious I can try things think about when you were a kid and you had something like the ball in the cup you would just play around with it for hours and you learn physics and you'd also learn your own neural motor control systems and sensing systems and so robotic systems are really far away in that sense they don't have a body to carry them through the world to learn okay so again I want to change gears a little bit now so we talked about what we think is gonna be possible what used to be hard is now easy in many cases what's medium hard now we think will be solved in the future and I want to point out a couple of cases that I'm gonna go so it's a little risky making a video in a field that changes so fast like this but I'm gonna go on and speculate that machine learning is not going to write the next great American novel anytime soon it'll be classifying and writing captions of scenes better than we can maybe but it's not going to write the next great American novel because that requires not only a lifetime of experience but it requires a singular kind of insight and incentive experiences and giftedness that is very rare it can't be replicated these are these are far far outlier events of genius and and human skill and intuition I don't think we're gonna get this anytime soon so it does beg the question can you make can you generate other types of maybe not the next great American novel and there is this really cool example by Botnick studios I love this one they use machine learning they basically dumped all of Harry Potter into this learning algorithm and they tried to generate their own Harry Potter fanfiction and it's pretty terrible I mean not know dig against them it's it's impressive how good it is considering that I would have never thought it could even get close I mean it has complete sentences and some of the sentences are actually pretty interesting and pretty you know they have a little texture they have the right kind of words and feel other sentences are just ridiculous and so I think we're pretty far away from this kind of generative fiction or generative paragraphs but you know people are trying this now and it's pretty interesting what what people are able to do in a more kind of serious vein people I mean this was a while back I guess 2015 a bunch of you know some researchers at MIT essentially used machine learning to generate papers academic papers so you know research papers for academic academic publication and they actually faked and got a lot of these accepted so basically the the scientific peer-reviewed publishing system was pretty embarrassing and letting through these fake papers that that were just generated kind of by by an algorithm okay and so I encourage you to read about this so this is similar to that kind of fake news and fake videos and fake images you can generate things that pass at least kind of the preliminary BS detector of professionals in the field and that's worrisome maybe it says more about how little time we spend actually reading papers for review but but I think it's it's an interesting case study okay and then I also want us to be thinking about what is canonically challenging and what are things that machine learning probably can't solve or at least can't completely nail there there will be fundamental learning limits for some systems I think weather prediction is a really interesting one so weather prediction there's no doubt that machine learning and artificial intelligence can and will and is improving our prediction capabilities for dynamical systems like weather and climate but there are hard information limits this is a chaotic dynamical system right so everyone's heard of the butterfly effect a butterfly flaps its wings in Texas and it causes a tsunami somewhere else in the world a month later and that that's kind of ridiculous I don't think anyone really you know that's just a soundbite but there are these sensitive dependence on initial conditions that do occur in these complex chaotic dynamical systems so and what that means is if I don't measure my system perfectly if there is like one percent error in my measurements here my future prediction a month from now I there might be a fundamental information limit based on that uncertainty now amplifying through this dynamical system I might not be able to tell where that hurricane is going to land a month from now or if it's even going to form so there are these information theoretic limits for these chaotic uncertain nonlinear dynamical systems and that's going to be very hard for for machine learning to really move the needle so we actually need better measurements you need you know we have mathematical hard mathematical theory that says for some prediction problems you just need better measurements you need more precise measurements at a finer grade scale and and then machine learning could definitely help with the modeling okay so the last slide I want to show you is kind of this example just what could go wrong so I've shown you all of these great kind of science fiction examples of robots self-driving cars you know the computer is gonna label scenes in and all of this really cool technology there are some some kind of worrisome parts of this all right so we should definitely be thinking what are the implications of societal surveillance like when surveillance becomes cheap and easy when would my phone can just listen to everything I say and parse it into text and then make decisions based on that when we have video surveillance watching everything how is that going to change the world is it going to be a better place obviously there will be some good things about it but there will be some bad things this is what all technology is like this is just a very kind of digestible example of this right so in an Amazon cart you know I buy a calculus book it maybe suggests hey you might want a differential equations book or a physics textbook write calculus based physics so it suggests what goes also with those objects there have been cases where people have wanted to buy one good and the the learning algorithm this is based on data this is based on historical patterns of shoppers presumably it has suggested the parts to make dangerous devices like you know I mean this looks like we're making thermite okay so that's pretty worrisome that I go and I want to buy art materials and this algorithm is telling me hey maybe you should make thermite okay this isn't like this was a malicious algorithm or someone was trying to to be destructive but algorithms are only as good as the data you're training on the historical data and that's pretty worrisome that the historical data is saying hey make dangerous devices this is also a huge a huge issue if you are using machine learning for something like hiring people if your corporation has historically made hiring decisions that are biased then you don't want to train your algorithm on who are the best people to train because you might get this biased workforce in the future also this is also an issue with you know machine learning for who to give parole to or who to give who to where how to set bail right so these are actual case studies people are doing right now is you know can we use machine learning to help take some of the burden off of overburdened judges for setting bail for parole for things like that but if you have a historically biased set of training data if one segment of the population has been getting a biased treatment as if it hasn't been historically fair you have to be very careful using that data for future inference and model building or else you run the risk of baking that bias into into the structure of your learning and so blindly trusting the output of these algorithms basing these on historical data you know there's huge ethical and societal issues that we have to understand so that's that's hard okay so easy medium and hard I really think that's one of the hardest issues we we face okay so that's that's basically it I just kind of wanted to give you this high level kind of fun overview of what is currently easy medium and hard what is hard now you know may be easy in the future and vice-versa but really machine learning and artificial intelligence it's undeniable this is changing the world it's changing what's possible in transportation and health in manufacturing in you know ad placement basically everything where we have big data and we have hard problems where we're starting as a community to build these machine learning and artificial intelligence systems now things that will move the needle in the future I think you know of course better computers and better algorithms that's that's a given more data more investment we're investing tremendously in this world we're you know this is kind of the space race we're all pushing towards these better artificial intelligence systems across the board what will essentially one of the things that I think will move the needle is understanding physics so image classification if you know if you remember like the facial recognition it got better when it understood the physics that faces are not 2d but they live on a three-dimensional skeleton they live on a skull okay so that that's an example of physics what I'm going to call physics are the rules of the system and so humans and biological systems are math stirrers of understanding abstractly and in a hierarchical fuzzy way the rules of the system and using that to improve our learning rates and to assimilate new data more rapidly and so I think that's going to be a transformative piece of this this kind of developing machine learning and AI is how do we how do we bake in physics that we know how do we how do we teach the computer F equals MA and there's conservation of mass and energy and momentum so that we can get faster learning rates with less data how do we extract unknown physics from these systems if I if I just showed a machine learning algorithm a video of the night sky for weeks and weeks can it learn the planetary orbits can it learn that there are stars and planets and that they're different can it learn F equals MA can it do what Newton did I mean we're not there yet we're not even close okay but extracting those symmetries and conservation laws and unknown physics from data you know that's something I'm personally very excited about it's relevant for self-driving cars it's relevant for robots walking I think it's relevant for image classification because what is an image but you know a measurement of photons interacting with the physical world okay so so I think that's going to move the needle you know as much as volumes of data and better algorithms and better computers okay I hope I hope you're excited it's gonna be an interesting world you should watch this again in five years and laugh at what I said was gonna be hard and how easy it is okay thank you
Info
Channel: Steve Brunton
Views: 3,899
Rating: 5 out of 5
Keywords: Machine learning, artificial intelligence, image recognition, face recognition, reinforcement learning, control theory, generative networks, neural networks, deep learning, computer vision, data science
Id: 5ket_A4BpDg
Channel Id: undefined
Length: 38min 53sec (2333 seconds)
Published: Mon Jun 11 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.