Artificial Intelligence, the History and Future - with Chris Bishop

Video Statistics and Information

Video
Captions Word Cloud
Captions
[Music] [Applause] [Music] artificial intelligence is a term which for many people I think comes up images of Hollywood movies of killer robots or perhaps even the the more subtle but perhaps more sinister type of artificial intelligence we saw in how in the film 2001 in which it becomes increasingly paranoid and eventually kills some of its astronaut masters that's the world of science fiction there's something very interesting has been happening over the last 5 years I think one of the evidence points for this is just some of the the books which have appeared Israel books which have been published in the last couple of years and just look at some of the titles our final invention artificial intelligence at the end of the human era super intelligence past dangers and strategies the artificial intelligence revolution with artificial intelligence save us or replace us and even various luminaries have commented on this Stephen Hawking uh said the development of full AI could spell the end of the human race Elon Musk said rampant AI is the biggest existential threat facing mankind somewhat pessimistic view of the subjects I would say so this evening what I want to do is to look at some of the reasons why we've seen this explosion of interest in our official intelligence to look at some of the science behind the technologies and hopefully to paint a slightly more optimistic view of how a I might help humanity rather than destroy us one of the things I want to do as well in this talk is to look a little bit at the history of AI because the history is quite fascinating and it goes back a long way certainly at least as far as this person Alan Turing I'm sure you've all heard of Alan Turing he was a Cambridge mathematician famous for many things but amongst them in the 1930s he really laid the foundations for modern computer science in in a sense conceptualized the what today we call a digital computer and he asked the fall in question could such a machine emulate the capabilities of the human mind could a machine think well he was largely left to theorize because he didn't have in his day the technology to take this forward and really the big development that has allowed us to think of artificial intelligence potentially as something of a practical value is the development of the digital computer the modern in particular the modern silicon computer this is a rather old silicon chip containing what a modern ship will contain billions of components these chips are incredibly powerful my laptop for example can could probably do several billion operations a second and here's an example of an operation take two 16 digit numbers multiply them together get the answers correct to 16 digits so these are this humble rather ordinary laptop can probably do several billion several thousand million of those per second so what that machine is doing is its slavishly following instructions in this case the instructions to multiply two numbers together but back in the 1950s computer scientists working with machines far less powerful in this laptop wondered whether a set of instructions could be found to program a computer which would cause it to exhibit intelligence so this is an example of such a program this is from 1964 this is called Eliza as developed at the AI laboratory at MIT this this particular script this particular version was a psychotherapist in it today we call this a chatbot so you had a little conversation with the computer and it would use various tricks it would ask you for your name and it would call you by your name and too many people at least superficially at least for a short period of time it appeared that it was exhibiting something a bit like intelligence of course if you interacted with it for more than a few minutes you realize as extremely durman's was not in the slightest bit intelligent but at least officially it seemed to mimic intelligence and this field of artificial intelligence was tremendously popular back in the 1950s in the 1960s it was founded on the idea that computer scientists would be able to program computers to be intelligent that is to say to write a computer program or a set of instructions such that when the computer followed these instructions perhaps blindingly fast billions per second perhaps it would exhibits intelligence and there was a tremendous amount of hype rather like there is today a tremendous amount of excitement about artificial intelligence and its potential however that that excitement didn't last forever and probably didn't really last for very long by the time we reached the the 1970s things began to change and one landmark moment in the history of artificial intelligence was the publication of a report by light Hill in 1973 in which he presented a very pessimistic prognosis for this field which led to an almost complete cessation of funding first in the UK and then elsewhere and led to what some people have called the AI winter where AI artificial intelligence became very unfashionable now that report was it was a seminal report in the field and shortly after its publication the BBC televised a debate from this very lecture theater from the Royal Institution and I'm going to play a short extract from that debate and this is hosted by Sir George Porter who is the president of all society and also the director of the Royal Institution good evening and welcome to the Royal Institution tonight we are going to enter a world where some of the oldest visions that have stirred man's imagination blend into the latest achievements of his size tonight we're going to enter the world of robots robots like shakey developed by the Stanford Research Institute shakey is controlled by a large computer he's directed through a radio antenna through a television camera he gets visual feedback from his environment the box appears on the monitor screen the computer analyzes the traces which appear on the visual display until it can interpret them as an object it recognizes shakey gets tactile feedback through his feelers he's able to move boxes with his Pushpa he's programmed to solve certain problems that can be contrived in his environment to choose say an alternative route to a certain point when his way has been blocked shakey is unquestionably an ingenious product of computer science and engineering but is he anything more is he the forerunner of startling developments which will end our machines with artificial intelligence enable them to compete with the leave now strip the human brain one man who's pessimistic about the long-term prospects of artificial intelligence is our speaker tonight so James light Hill one of Britain's most distinguished scientists he's a location professor of applied mathematics at Cambridge and has worked in many fields of applied mathematics he's a former director of the Royal aircraft establishment at Farnborough last year he compiled a report for the science Research Council which condemned work on general-purpose robots not surprisingly scientists who've been working on such robots reacted strongly in defense of their field three of them are here tonight to challenge to James findings after they've had their say the discussions will be open to bring in members of the audience here with many mathematicians and engineers computer scientists and psychologists among them their contribution will be particularly welcome Security's know how hairstyles have changed since 1973 so that was the beginning of the AI winter and we had a period when it was positively unfashionable to work on AI or to say you're working on on artificial intelligence nevertheless of course the field of computer science advanced with great rapidity and there were many exciting developments one that'll draw attention to is in the field of computer programs that play chess so in the heyday of old-fashioned artificial intelligence chess was considered to be one of the the pinnacles of human intellectual achievement surely if a machine could play chess everything else like solving world poverty and global warming whatever would be sort of trivial by comparison well it turns out that in 1997 a computer program like the chess machine called deep blue built by IBM beats the world chess champion Garry Kasparov now what's interesting about this is the way in which it worked because this machine was dedicated to playing chess it did one thing and one thing only which was the play chess and it did it very well and it did it by following a series of instructions to analyze moves and responses to moves and to evaluate board positions and it made use of the very high speed of digital computers it would analyze literally millions of possible moves and countermoves in order to choose a good move for the machine a more recent example again from IBM is the machine Watson which defeated Ken and brave who's Brad who are the two champions at this US television quiz show called Jeopardy and so this challenge was conducted on live television and again very impressive this is a sort of general knowledge question answering type of quiz and Watson made use of information that it pulled off the internet including the whole of Wikipedia and much else besides again tuned by a team of very smart people at IBM over a period of about seven years in order to do this one very specific thing which was to win at jeopardy and so we have during this period two examples and there are other examples of computers doing things which previously only humans to do being able to do things better than humans things which appear to be very intellectual in nature but they're very specific and of course every time another task was completed by machine to a level greater than humans people said well ok that wasn't really intelligence after all that's not really artificial intelligence about some cynics have actually said artificial intelligence is simply anything that computers can't yet do so every time we solve another problem we've not we've not advanced artificial intelligence and in some ways that that's fair because one of the most extraordinary things about the human brain it's not that we can do question answering although we can play chess but we can do all of these things and we can learn to do new tasks so there's something rather special that we haven't captured in examples I've shown so far but something very interesting began to happen about five years ago any concerns a field which has been around as you'll see later since again going back to the 1960 the field of neural networks and the development of so-called deep learning a little bit later in this talk I'm going to dive into some of the science behind neural networks and deep learning deep learning was developed by Geoffrey Hinton at the University of Toronto and other academics and colleagues around the world and appeared to show great promise and so an attempt was made to see whether this would scale to some sort of real-world problem and back in 2012 Jeff and others from Toronto collaborated in this case with Microsoft Research in Redmond to apply deep learning to the problem of speech recognition a speech recognition is a tough challenge it's been around for many years performances of speech recognition systems if you've tried one from 10 years ago you'll know it's pretty pretty bad could deep learning help with speech recognition now at the time there's a whole community of people working on speech recognition people doing PhDs going to conferences publishing papers but the performance the error rate of speech recognition systems have been pretty much flat for an entire decade along came deep learning and immediately produced a 30% reduction in error rate now that was dramatic if you think well the effort that had gone into this field over the previous 10 years so this particular example this is Rick rash it was the founding vice president who set up Microsoft research worldwide he's seen here in Beijing and he's illustrating the power of deep learning by giving a talk to a Chinese audience in Mandarin now he doesn't speak any Mandarin what's happening is that he's speaking in in English and that is being translated for first of all captured and transcribed into text by a deep neural network that English text is the translated into Mandarin by another deep network and in finally a third peak Network is taking that Mandarin and synthesizing speech but using samples of Rick's own voice so even though it's Mandarin it still sounds a little bit like Rick so that was a sort of seminal moment these deep neural networks we're not programmed to do this they learned how to do that they learned from data so a very different approach to artificial intelligence and this is a particularly moving occasion because they're about 5,000 Chinese students in the audience and very some of the students are in tears at the thought of the language barriers might at last come come crashing down so that was interesting but what's particularly interesting is that this one technique of deep neural networks in seems capable of solving many different tasks so this is a British startup called deep mind and in 2014 they applied deep neural networks because of the technical reinforcement learning to again tackle some games not chess this time but some old Atari games about 50 of them and the neural network learns to play these different games by effectively a process of trial and error so it makes random news it sees how a scorer is doing tries different things it gradually learns over a period of many many games how to play the game successfully in about half of the game that achieved human level performance now what's interesting it wasn't programmed to solve a specific game the exact same architecture the exact same software was able to learn to stall or play a whole variety of different games so this is much more like the capabilities of the brain the ability to learn and the flexibility to learn new problems her deep mind was acquired by Google and in 2016 earlier this year they use much the same technique deep neural networks and reinforcement learning to tackle a very much harder game the game of Go so this is a deceptively simple-looking game involving black and white stones on a simple grid of squares but the combinations of moves is very much larger than in chess so from the computational point of view this is a very much harder problem than chess and therefore building a machine that achieves human level performance had had proven he's very much harder than with chess and it was thought at least another decade before that was achieved so it's very surprising then when only this year alphago as the program was known beat one of the world's leading go players probably about a decade earlier than people had anticipated so that's game playing that speech recognition but much the same technique of deep learning can be applied to very different fields here's another historically very important example this is called imagenet so this is a data set of the order of a million images classified according to many different thousands of categories and the goal is for a machine to take an image and to assign it to the correct category so it needs to take the top left image of his input and the output has to be the label judo and not the label Oceanfront for example well that's a tough problem and people have been working on this for a number of years and then they applied the learning and the immediate effect was to have the error rate compared to any previous technique so again a very dramatic improvement in performance and back in 2015 a deep neural network developed by Microsoft Research was applied to this data set and achieved the same error rate that a human makes now I should say that one of the reasons the neural network is as good as a human is that it's better than humans at distinguishing 57 different varieties of mushroom but it also makes the mistake that we think of as rather silly and perhaps no human would make but nevertheless remarkable that this same architecture the same concept can be applied to this very different domain that achieved human level performance and so it's really that spectrum of different successes in many different domains that really has underpinned this explosion of excitement around artificial intelligence so I thought I'd spend a little bit of time now looking at neural networks and deep learning and tell you a little bit about some of the science and some of the technology behind these successes so neural networks as the as the name suggests are inspired if it'll be loosely by by the human brain and in particular by the neurons in the brain so the interesting part of the brain of these electrically active cells called neurons and here's a photo micrograph showing some of the neurons which have been stained because they are very complex structures lots of branches and they make lots of connections with each other and they send electrical signals to each other and thereby process information now the human brain is often described as the most complex object in the known universe and I just want to give you a little feeling for just how extraordinary the brain is so this is a picture of South America and outlined in red is the Amazon rainforest and in spite of our best attempts to destroy it it's nevertheless still absolutely enormous and you can see a picture of the UK for scale I used to have a picture of Europe but I've had to update it recently unfortunately now what's interesting is that the number of neurons in the brain is of the same order as the number of trees in the Amazon rainforest so that's not really the interesting part the real interesting part about the brain of the neurons but the connections between the neurons these are called the sign answers and each neuron that makes perhaps 10,000 connections with other neurons and so the number of sign apps is in the brain and the sign axes are thought to be the seat of learning and the number of sign apps is in the brain is that the same order is the number of leaves on the trees of the Amazon rainforest so the brain is truly extraordinary we certainly don't understand how the brain works but our limited knowledge of how the brain works has been inspirational in developing a technology called neuro networks so here are two neurons the neuron on the left if it stimulated an appropriate way can can fire can send an electrical impulse down this cable the axon and that axon makes connections called sine axes with other neurons and can stimulate those other neurons themselves to fire or can inhibit them from firing and the strength of those synaptic scan change as a result of the operation of the brain as a result of processing information so the brain has the capability to learn as a result of the effectively the data that it sees the inputs that it receives and so going back to as far as the 1940s people began to build mathematical models of neurons and synapses and learning in the brain and there are some very sophisticated models but the ones that interest us are extremely simple ones and we can describe them by this little picture so the the dots down the left hand side represent inputs if you like there you can think of those inputs from other neurons and they're combined together to cause neuron here labeled Y either to fire or not to fire and the connections between them labeled W represent the strengths of those sign APS's so this little model can be expressed mathematically and this is the only equation in the entire talk I promise you that the equation captures this very nicely it says that you take each of those inputs X I and you multiply by a weight a strength W that can be positive or negative and you add them all up you add up all those weighted strengths and you pass them through this function Sigma and Sigma just says that if the the total combination of inputs is positive the output is a 1 or if it's negative the output is a 0 so you could imagine this little neuron being a little classified imagine those inputs being something that's extracted from an image and perhaps the output says whether or not there's a face in the image so our goal is to have this neuron output a 1 if there is a face and output 0 if there isn't a face and so we could imagine adjusting all those little weights those parameters then sign up strengths if you like using lots of examples of images of faces and images of not faces until we have tuned those premises in such a way the system has learned to solve that particular task so that very simple mathematical model of a neurons called a perceptron and there was a lot of excitement and a lot of hype again around these very early neural networks back in the nineteen the 1960s this is one of the pioneers of this field this is Frank Rosenblatt in the late 1950s and early 1950s did a lot of work those theoretical and experimental on perceptrons and what's interesting of course is he didn't ask access to wonderful machines like this laptop he couldn't just program these in software so he had to build analog Hardware instantiations of sept roms and here he is in front of some symbols at the back there's a triangle the circle the square shows a typical problem that the perceptron would be asked to solve could it distinguish between a circle and a triangle could it be trained to tell the difference between a circle and triangle now the input to the set Ron was this box on the desk this is an array of twenty by twenty cadmium sulfide photo cells photo sensitive or light sensitive resistors which formed effectively a very primitive digital camera these like the pixels of a very slow very low resolution camera and so here's a typical experimental setup in this case it's going to try to distinguish between some letters of the alphabet this is a as I said a very poor quality camera so we need some very powerful light shining to the object and there's a lens that focus on the image onto those photo cells so the output of that camera then goes into this big rack of equipment and what you see here are effectively the finances of these neuron models so in the rack here in this person's hand each of those cylindrical objects is a combination of an electrical motor and a rotary resistor or potentiometer so by purely a letter process the electric motor can change the resistance value so the value of that resistance represents the strength of that sign apps and Rosenblatt invented something called the perceptron algorithm which was a mathematical procedure by which those motors could adjust the strengths of the sine apses in response to various inputs in order that the system could learn so let's say we're distinguishing between triangles and squares your presenter triangle and inputs get the output it's a triangle that's fine if the system makes a mistake and outputs a square there's an algorithm it's making little adjustments to all of those sine apses to make the output closer to the desired value now you presenta nother image represent a square the outputs are square that's fine if it isn't we make some adjustments to the sine answers and so that perceptron learning algorithm allowed the system to learn by seeing lots of examples of each class and if you gradually improve in performance and hopefully solve the problem now there's a lot of excitement about these perceptrons because it turns out that they could actually learn to solve things like distinguishing shapes and letters the alphabet and so on and for the day that was remarkable but there's wasn't just an empirical result Rosenblatt also approved a theorem he showed mathematically that if the perceptron was capable of solving a problem that is to say if there existed a setting of all of those resistors such as the system would solve the problem then the perceptron was guaranteed to find that solution so people got very excited about this I'll just show you a little bit more of the structure of the perceptron what you see it's on the right therapy are these racks of attention or misses on the left this jungle of wiring this general of wiring looks like it's just random the reason it is it is just random these are the input to those neurons they're called features and they're just little combinations of those pixels we got a 20 by 20 grid of photo cells the pixels at each of these the neurons would combine some input so-called features which were just little combinations sub sets of those pixels combined together in some fixed away chosen by the design of the perceptron and this particular and there's lots of ways of choosing these lots of research was done one one way of choosing them is just to take random subsets of those pixels and combine them and that's what this random looking wiring is the reason this was interesting is that even though you'd randomized the inputs the system could still learn to unscramble them and solve the problem so we're sort of remarkable and has even more remarkable is you could take a pair of wire cutters take a system which is learn it's been trained solve a problem go in with a pair of wire cutters and cut 10% of the wires and its performance would degrade a little bit but it would continue to work it's a little bit like you know going down the pub having a few too many beers few extra neurons die the next day you know you can still function maybe not quite as well as as previously they call that graceful degradation it's a property which I'm pretty sure my laptop didn't have I started cutting wires in my laptop very soon it just stopped working completely so again this is a little bit more a little bit brain like in some in some ways so that generated a tremendous amount of excitement and so let me just summarize what's going on here a little picture so on the left the the nodes of units or the neurons if you like on the left the left-hand column represent in the case of a perceptron those those pixels the original wor pixels and the dots down the middle or what we call the features so each of those dots would be some combination it might be just the the sum of a randomly chosen subset of those pixels and that's represented by those green connections so there's green connections of correspond to that jungle of random looking wiring and that's fixed so be chosen by the designer at the outset then it's fixed it doesn't change during learning then what we have is this red layer and the red connections represent those resistors these are adjustable parameters in this perceptron and so the the neurons or the nodes on the right hand side again take combinations of some subsets of the features but this time the string the combinations I learned those are the adjustable parameters so what you see is we have a layered structure in fact again this is reminiscent to the brain if you think of the visual processing the brain occurs through a series of layers of neurons an important thing is that only one of these layers is actually adaptive only one layer changes during learning now perceptrons were interesting because they could they could learn to solve problems I'm just really exciting but sometimes you would give it a very similar problem which looked just as easy and it would fail to learn so what was going on sometimes they work sometimes they didn't well these two computer scientist Minsky and Papert analyzed perceptrons mathematically and they showed that there are some very severe limitations to the capabilities of perceptrons but they are very limited in what they can do and that limitation arises because there is only a single layer of adaptation and they published this in this famous book called perceptrons and it's often said that the publication of this book led to a loss of interest in this alternative approach to to artificial intelligence we're sort of programming the community to be intelligent here the the system is learning to be intelligent and this book of course is a piece of mathematics people kept it it was correct so it was it was hard to refute but the proof applied only to a single layer two systems who have single layers of adaptive connections at the end of the book they conjectured that even if you had more than one layer similar results would apply they conjectured these neural networks were never really going to be very useful that part was a pure conjecture so there we have the perceptron with a single layer of adaptation the field of neural networks had been very exciting 1960s and had gone into abeyance and people had lost interest as a result of the mathematical discussion of perceptrons and their limitations then something very interesting happened which was the discovery of algorithms different from the perceptron learning algorithm which would allow networks having more than one layer of adaptation to be trained techniques like so-called error back propagation and so people could now apply multi-layered systems to various problems and see if they worked and they discovered that these systems were very much more powerful than the single layer perceptrons now to various technical reasons it turns out that you can really only train the system with usually at most a couple of layers but nevertheless those systems were very powerful they were capable of solving lots of problems that hitherto had been impossible and led to the second way you have excitement around your networks in the late 1980s and 1990s now I began my career as a physicist I did a PhD in quantum field theory and I went often works on the fusion program so this tie line is at Kellerman of oratory in Oxfordshire as a theoretical plasma physicist working on nuclear fusion and I read about the discovery of the back propagation algorithm and these two layer neural networks and their ability their brain like ability to learn to solve problems and it reminded me of know how the computer artificial intelligence I thought this is tremendously exciting my sense was that we were at the dawn of a new era this was so exciting that I was actually going to change fields and change career and I did that by taking your networks and applying them to data that we were gathering from experiments this is the inside of the jet tokamak the world's largest soccer max is down in Oxfordshire and is operating at the container a hydrogen plasma at about 200 million degrees and it's bristling the outside of this is bristling with all kinds of Diagnostics and lasers and magnetic measurements and so on so for the day we had a huge amount of data we could analyze it in all sorts of interesting ways and so I said about applying new networks to analyzing this data and I became so excited about this I changed fields I left physics and actually moved at the field of computer science so those are the two layer networks they were very powerful but they were a long way short of achieving human level performance on some of the tasks of the kind that I've talked about so these systems were deployed in practical applications they were very useful I think it's fair to say they remained reasonably niche what happened then is other techniques came along there's technical support vector machines that was very popular that issues slightly better performance than these neural networks and so for the second time neural network sort of went into decline people lost interest a little bit moved on to other techniques for for so-called machine learning and then about five years ago there was something of a breakthrough for a number of years people like Hinton himself had been pivotal in the development of backpropagation and neural networks of the 1980s to layer neural networks discovered how to train networks having more than two layers in fact having many layers and so the term deep learning refers to systems which have many many layers of processing this makes them extremely powerful because if you think about a task such as taking an image and then describing what's going on in that image in English language that's something which isn't going to be done in two simple steps there'll be some very low level processing discovering edges in the image discovering combinations of edges to make corners discovering relationships between corners that make shapes discovering how those shapes combine together to make objects like faces looking at shapes of faces whether somebody's smiling or not completing those and using those to generate words and combining those into sentences eventually generating language that describes what's going on in that image that's many many many layers of processing and so we need to be able to train networks that are that are deep that have many many layers of processing and so really this is the breakthrough that underpins the new excitement in the field of artificial intelligence so here's a actual deep neural network this is one that's used for image processing for example taking images and labeling them according to the objects that are present in the image and these blocks represent groups of nodes or neurons so in one of those blocks there are many layers each layer as a whole grid of units and the units make connection two patches or set of fields in the previous block so you can see the structure is pretty complex but that whole system is adaptive and that whole system is trained on large data sets and so now we have neural networks containing thousands or even millions of adjustable parameters trained on millions or sometimes even billions of examples of data points so that's a modern deep neural network and I like this this is a this is a fairly geeky a magazine called wired if you're in the IT business you'll know wide magazine if you're not you perhaps won't but this is the front cover of Wired magazine for a massing June of this year it just says the end of code soon we won't program computers will train them so behind all the if you like to hide the excitement around artificial intelligence there really is a very fundamental transformation happening in the field of computer science that's a transformation from programming a computer directly to solve a problem that is they're human or team of humans devising a set of instructions such that when the computer follows those instructions it solves a particular task and instead doing something very different writing a set of instructions or computer program which allows the computer to learn and then train the computer to solve the task by using large amounts of data so I view that is a very fundamental shift in the nature of computation there's something else going on as well and I interest rate it with this slide I got asked in college just to write the word uncertainty what you see of course is the tremendous variability in human handwriting you can see from this example why it's so difficult to program a computer directly to do something like recognize human handwriting there's tremendous variability if you think of a little rule which describes the shape of the letter e you'll very quickly find an exception to that rule and so you can write another rule that captures the exceptions but there'll be exceptions to the exception there's this combinatoric explosion of possibilities that's really what defeated old-fashioned AI back in the 1950s and 60s plus of course the lack of fast computers there's something else going on as well and in a sense it's complementary to this idea of learning so we're seeing a shift in computation that I think a revolution in computation between software which is written by humans and software which is learned from data so there's something else going on not only are we seeing a transformation in computation from software which is written by hand to software which is learn from data but we're seeing a shift from software which is based on logic that is everything is zero or one is deterministic to software which deals with uncertainty it quantifies uncertainty it deals it if you like shades of grey and ambiguity so I'm going to show you a little a little demonstration and this demonstration was actually designed really to illustrate this idea of uncertainty and to show you a modern view of machine learning so I've shown you what I would think of as a traditional view of machine learning adjusting these sign apps are adjusting these parameters in a neural network to bring it closer and closer to the desired performance but there's a very different view of machine learning and it shows you the critical role played by uncertainty so this is an example that be very familiar to many of you it's what we called a recommendation system in this case is going to recommend films or movies to people so this is a huge table at each column of the table represents a different movie and each row of the table is a different person and our goal is to recommend movies to somebody which we think they might enjoy watching now in a real system we would certainly make use of features so it makes use of features of the film for example its length and its genre is it a comedy or an action-adventure or whatever and who are the actors and so on we'll also make features of the user that age their gender their geographical location other things we might know about them and those are certainly very helpful in matching movies to people those are the purpose of this demo let's ignore those features all we know are the ratings which people have given to movies and so we know that a certain person has watched a particular movie and they like that movie that's represented by the ticks in these boxes so where a particular person has watched a particular movie and they've given it a positive rating because sometimes people watch a movie and they don't like it and so I give it a negative rating and so that's those are the crosses now this is essentially a big table it might have ten thousand movies and ten million people so this is an enormous table and it's mostly empties we don't have very many ratings and I'll go effectively is to fill in the blanks so where a person has not yet watched a movie we want to predict will they like the movie or will they not so I'm going to show you a demonstration of a system which solves exactly that task it's based on machine learning and this is this is a little demonstration system although the actual technology behind this is used in real systems and in this case we've chose de couple of hundred movies and the system has already done a certain amount of learning based on the ratings of a few tens of thousands of people on these two hundred movies now what it's going to do is make recommendations for me so to learn about my preferences now I wasn't one of the people in the original dataset so it knows nothing about me at this point so I need to do is watch a movie and decide whether I like it or not so let's say I've watch this movie and let's suppose I do like it okay so what it's doing now is it's reordering the other movies according to whether it thinks I'll like them or not now the vertical position on the screen is is irrelevant they're just spread out vertically so you can see them what matters is the horizontal position so if a movie is close to the right hand edge of the white region it is close to that green region then the system is very confident that I will like the movie and it measures that confidence using probability so it signs a high probability to my liking that movie conversely if it positions the movies on the left-hand side of the white region towards that red edge it's very confident that I won't like it and if it's in the middle if it a 50/50 it's really very unsure what you see is that most of the movies are clustered around the middle there's a lot of white space down the right there's a lot of white space down the left and that's not surprising the only thing it knows about me is that I like that one movie that's all it knows about it so hasn't had much data for me to learn and so it's really very unsure about most of this so let's pick another another example let's suppose I don't like this one so now what we see is the movies are spreading out some of them are moving towards the right where it's more confident that I will like them so I'm moving to the left where it's more confident that I won't like them and this if you like is the modern view of machine learning if the reduction in the uncertainty of the system as a result of seeing the data and so I can carry on I can pick another one that's air like this one pick another one suppose I don't like that one so that it's seen for example so now you see a very different picture you see a lot of movies clustered down the right-hand side very confident I should go and watch those ones down the left-hand side pretty confident I won't like them most of the white space is now in the middle there are few that it's really quite unsure about the Sound of Music for some reason number but nevertheless you can see that it has learned from data through a reduction in uncertainty so that I think this is the modern view of machine learning I'm going to use this demo to illustrate something else as well which i think is really very powerful and ticket Illustrated is very nicely which is the concept of information so the whole field called information theory it was invented by Shannon back in the 1920s and he provided a mathematical basis for the concept of information and that really is foundational in modern computer science and information technology and I can illustrate that by going to one of these movies down the right-hand side so it's equality on the right hand side it's pretty confident that I won't like this movie so let's suppose I watch it and suppose it's pretty confident I will like the movie so let's suppose I watch it and indeed I do like it so watch what happens watch carefully because I let go of the mouse button okay actually I'll pick another one so it's confident that I'll like this so let's say I do like watch what happens a tiny change the reason is that there was very little surprise there's very little surprise in that in that data so there very little information and if I pick another one here that it's very confident I'll like let's suppose that I don't like this movie again watch what happens as I like the other nice button okay so this time we see a dramatic change is that it's now got rather confused again a lot of things have gone back to the middle so there there was a high degree of surprise really confident that was going to like the movie and I said I didn't that was very surprising is that Shannon defined information is the degree of surprise now what's interesting is that this is a I think a very nice illustration of the difference between data and information because in every case the amount of data is the same it's one bit or one binary digit in order to say that I like a movie or I don't like the movie I can find it a 0 or a 1 so each of those was its 7 ratings I provided so far is represented by one bit of data the amount of data is the same but the amount of information is very different so if it's a movie on the right hand side that I like the amount of information goes to 0 at the right hand side when it becomes certain that I like it and I do like it mean and if information is goes to 0 and the amount of information goes logarithmically to infinity as we go across to the left hand side so there's a very nice illustration of the distancing data and information ok and that's the first example we call collaborative filtering because people are collaborating together to help each other work out which movies they're going to like so this quantification of uncertainty is based on a branch of mathematics that goes back certainly 350 years called probability theory natural mathematical equations of probability are deceptively simple it's a very beautiful and very elegant theory and it's just a way of putting numbers behind uncertainty in a way that's very consistent so probability is really the calculus of uncertainty now there are two kinds of probability so if you were taught probability in school you're probably taught probably probably taught probability as the limit of an infinite number of so the probability that a coin will land heads if it's a fair coin is 0.5 or 50% what that means in precise terms is if you flip the coin a number of times and you measure the fraction of times it lands heads that if you take the limits you flip more and more times you flip an infinite number of times that fraction it will fluctuate around but it'll eventually converge to a value and that value is the probability we call that the frequentist notion of probabilities it's the frequency with which something occurs there's another view of probability which we call the Bayesian view which in a sense is more general because it encompasses the notion of frequency but it applies also to things like one-off events unrefuted events if we want to ask what's the probability that the moon was once part of the earth compared to being a separate body that was captured by the Earth's gravitational field we can't sort of repeat the origin of the universe millions of times and see which fraction of the times you know it's so on it doesn't make sense it's a one-off event but we're using probabilities to describe uncertainty and it's interesting that we use the same terminology and the reason we use the same terminology is that if you try to ascribe numbers to quantify uncertainty those numbers are base and very simple equations and those equations are exactly the same equations as a paid by frequencies and lots of things like coin flips and so we use the same terminology we call the probability this is a much more general definition I tried to illustrate it with this example so here's a coin that it's a bent coin so I flipped a coin it wasn't going to land it might land concave side up okay concave side up more often that it lands concave side down let's suppose I don't know if that's that right according the physics let's suppose it is so imagine that I flip this coin an infinite number of times look at the fraction of times it lands concave side up that'll be 60% or naught point 6 that's the probability of landing concave side up so that's a frequentist probability blood suppose that one side of the coin is heads and the other is tails but imagine that you don't know which side is heads you don't know whether the concave side is heads or the convex side is but as soon as I asked you to take a bet bet 5 pounds according to whether it's heads or tails which way should you bet well for your point of view its symmetrical you don't know which side is heads or tails so even though you know it's going to land concave side up more often than concave side down because you don't know which is heads on which is tails it's symmetrical and so if you're acting rationally you were bet according to a probability of not 0.5 it doesn't mean that you believe that in the limit of an infinite number of trials it will land heads 50% of the time you believe that it will either land heads 60 percent of the time or it will land heads 40 percent of the time but you don't know which so in a sense the frequency with which it lands concave side up in that case is a bit like a frequentist probability but this uncertainty over which side is heads or tails it's a one-off event one side is heads or the other is heads it's not a repeatable event it's a one-off thing you just don't know which it is that's like this quantification of uncertainty this Bayesian view of policy now at this point you might thinking well I'm making a lot of fuss here because we've just a tiny change instead of having zeros and ones we've now got numbers between 0 & 1 it seems like a very small change I just give you one little illustration of the fact that probabilities can behave in very peculiar ways so this is not a trivial change at all this is really quite some I think quite significant so here's a little example here's a bus and let's suppose that the bus is longer than the car and there's a bicycle and suppose the car is longer than the bicycle then if the bus is longer the car and the car is longer than the bicycle and I think you'll all agree that the bus must be longer than the bicycle does anybody not agree with that good we call that mathematicians call that property transitivity so these if you like deterministic numbers these certain numbers these lengths of objects behave in this transitive way but if we now go to uncertain quantities we discover that they can be non transitive and this is extremely peculiar and it's not just a theoretical thing these are these are non transitive dice and they're very easy to construct they're just regular dice the only unusual thing about them is the choice of numbers on the faces so these are unbiased dice they command on each of the six eyes of equal probability that the choice of numbers is a bit unusual and a particular number only appears on one of the dice and so you can never have a draw so if you roll one die against another one of them are always come up with a higher number we'll call that the winner and what you discover is that if you let's say roll the red die against the yellow die then 2/3 at the time the number on the yellow dye will be bigger than the number on the red die so the yellow dye actually just has threes so always comes up with a three the red die have a couple of sixes and four twos so 2/3 of the time it rolls or two and one service and rolls are six so 2/3 of the time yellow beets red okay that's fine 2/3 of the time likewise purple will be yellow 2/3 of the time Green will be purple rubber the higher number and here's the amazing thing two-thirds of the time red will be green so you can equip yourself for the set of non transitive dice if I sell these at very reasonable rates by the way and you can have a very profitable evening down the pub with your friend because you show them these dice and you say that you examine them to your heart's content now you pick whichever you like and I'll pick one of course you pick the next one in the sequence you say let's do the best of the leaven or best of 15 throws bet 5 pounds after a reasonably large number throws it Alma is very very likely that you'll win the bet so they get a bit cross and they want the dye that you've just used and because you pick a different one and so on and you'll always win so this is just one of many many examples of fact that probabilities behave in very unusual and very peculiar ways so we've seen the idea that artificial intelligence is being revolutionized by learning from data and that learning from data happens through the quantification of uncertainty so learning from data is one of the key ideas so these algorithms things like deep neural networks are one of the foundations of this revolution another of the foundations obviously is the data the explosion that we're seeing in data is one of the things that's enabling this this revolution the amount of data in the world is doubling on a very short very short timescale probably less than a year and so we're seeing a tremendous growth in the amount of data that can be used to fuel machine learning there's a third ingredient as well and that's computation so these techniques are very hungry for computer power so today we use neural networks with millions of adjustable parameters trained on millions or billions of data points using extremely powerful computers and these computers live increasingly in what are called data centers so here's a picture of a data center this particular one is a Microsoft data center somewhere in North America what you see are these low buildings with no windows and inside racks and racks and racks full of computers and storage and networking so these are really the world's most powerful computers these days now these data centers the foundation what we call cloud computing the idea that computing is now increasingly centralized in these data centers and accessed anytime anyplace from any device and the trend is growth in cloud computing and many companies and including in particular Microsoft are expanding these data centers this data center if you look closely at the top of the picture you'll see some Bill dozers and some construction work going on because this data center is being expanded that if we fly around a hundred eighty degrees and up from the other end you'll get some idea of the scale of expansion so this particular data center is obviously increasing in size by an enormous factor and so all around the world new data centers are being constructed all the time we're seeing this tremendous growth in the capacity of these data centers now the last few weeks have been very interesting there have been some variants the announcements in the last few weeks and in particular the announcement by Microsoft of the world's first exascale supercomputer and it's based on a technology called FPGA or field programmable gate array so the way to understand what this is is think of it as a hardware chip but we're the the architecture of the hardware can be changed using software so it's a very flexible kind of chip the chip itself is not as powerful as fast as a fixed architecture chip like a central processing unit in this laptop but it's very flexible and so we can change the architecture and run and try out lots of different kinds of algorithms and neural networks and so on and so in these data centers as well as the regular computation we've been deploying these field programmable gate arrays on a very large scale to the point where a couple of weeks ago we announced the world's first exascale AI supercomputer so an exascale need to can do an excerpt of a second that's a billion Giga operations per second or a million million million operations per second I'm sure this won't end up being the fastest computer there's more to come so this is just an extraordinary growth in processing power coupled with data coupled with these new algorithms is driving all of the excitement around machine learning and artificial intelligence what's this being used for well many many many many different things one example of course is personal assistants many organizations are working on large companies are working on developing personal digital assistants this is Microsoft's this one called katana and these technologies are at a very early stage of development but I'm very confident the next decade will see the capabilities of these types of assistant advanced very very rapidly there are many many other applications of machine learning and the technology continues to advance at a tremendous pace so again just in the last week or so an announcement in this case again by Microsoft of the achievement of human parity in speech recognition so this is an automatic speech recognition system which achieves the same error rate at the word level as a human transcriber when humans franchise speech they make a few errors sort of the machines the error rate is now the same what else we'll be using this for we building killer robots and wipe us all out well they're actually many many more useful things we can do with this and I'll just briefly tell you about a research project that we're looking at in in Cambridge at Microsoft Research in Cambridge core project inner eye this is using these machine learning and artificial intelligence techniques to look at the treatment of cancer so what we see on the left is a cross-section of an MRI scan of a brain tumor very nasty brain tumor and the the radiologist is using a mouse and looking at this image and segmenting the tumor that is defining the boundary of the tumor by hand in order this can be used for radiation therapy planning firing in x-rays and radiation to try to kill the tumor and do the minimum damage to the surrounding tissue so that's being done by hand and is a very time-consuming process but we can use these machine learning techniques to speed this up and also to improve the accuracy and reproducibility of this so a little bit of human intervention now to provide some initial segmentation and after about 10 seconds or so the segmentation is complete and it's more accurate or more reliable than the human segmentation this is a nice example of artificial intelligence being used in partnership with humans so it's the case today despite all the advances I've talked about it's the case today and I think it will be the case for quite some time to come to the capabilities of machines are different from and complimentary to the capabilities of humans so here the radiologist with the experience of looking at these images many different images from many different patients for many years it's built up a good qualitative understanding of the tumor that h of the tumor how it should be treated what the computer is good at is this three dimensional segmentation defining accurately and reproducibly which of those three dimensional pixels as voxels is tumor and which is normal tissue I started my talk with a rather gloomy outlook of killer robots I think that belongs firmly in the world of Hollywood but nevertheless this is a very powerful technology it's a very general purpose technology and as it's deployed and I'm sure it will be deployed in many ways which are of enormous benefit to society helping us as a species tackle some of the tremendous problems that we face in the 21st century well we must of course expect a few bumps on the road and to help us think about issues around privacy and security around the implications of this transformation to the world of solutions which are learn from data again we just announced in the last week or two the formation of the partnership on AI where some of the leading organizations working on artificial intelligence at large scale have come together to work together to see how artificial intelligence can best be used for the benefit of society and finally I if you're worried about killer robots I think I think we're always remain in control thank you very much [Applause] [Music] [Applause] you
Info
Channel: The Royal Institution
Views: 361,706
Rating: 4.7655921 out of 5
Keywords: Ri, Royal Institution, chris bishop, artificial intelligence, science, computing, computer science, machine learning, neural networks, algorithsm
Id: 8FHBh_OmdsM
Channel Id: undefined
Length: 61min 22sec (3682 seconds)
Published: Wed May 17 2017
Reddit Comments
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.