CONFERENCE JENSEN HUANG (NVIDIA) and ILYA SUTSKEVER (OPEN AI).AI TODAY AND VISION OF THE FUTURE

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Ilia unbelievable today is the day after gpt4 it's great to have you here I'm delighted to have you I've known you a long time the journey and just my mental hit my my mental memory of your of the times that I've known you and the seminal work that you have done starting in Universal University of Toronto the co-invention of alexnet with Alex and Jeff Hinton that led to the Big Bang of modern artificial intelligence your career that took you out here to the Bay Area the founding of open Ai gpt123 and then of course chat GPT the AI heard around the world this is this is the incredible resume of a young computer scientist um you know an entire community and Industry at all with your achievements I I guess my I just want to go back to the beginning and ask you deep learning what was your intuition around deep learning why did you know that it was going to work did you have any intuition that are going to lead to this kind of success okay well first of all thank you so much for the quote for all the kind words a lot has changed thanks to the incredible power of deep learning like I think this my personal starting point I was interested in artificial intelligence for a whole variety of reasons starting from an intuitive understanding of appreciation of its impact and also I had a lot of curiosity about what is consciousness what is The Human Experience and it felt like progress in artificial intelligence will help with that The Next Step was well back then I was starting in 2002 2003 and it seemed like learning is the thing that humans can do that people can do that computers can't do at all in 2003 2002 computers could not learn anything and it wasn't even clear that it was possible in theory and so I thought that making progress in learning in artificial learning in machine learning that would lead to the greatest progress in AI and then I started to look around for what was out there and nothing seemed too promising but to my great luck Jeff Hinton was a professor at my University and I was able to find him and he was working in neural networks and it immediately made sense because neural networks had the property that we are learning we are automatically programming parallel computers back then the parallel computers were small but the promise was if you could somehow figure out how learning in neural networks work then you can program small parallel computers from data and it was also similar enough to the brain and the brain works so it's like you had these several factors going for it now it wasn't clear how to get it to work but of all the things that existed that seemed like it had by far the greatest long-term promise even though in the time that you first started at the time that you first started working with deep learning and and neural networks what was what was the scale of the network what was the scale of computing at that moment in time what was it like an interesting thing to note is that the importance of scale wasn't realized back then so people would just training on neural networks with like 50 neurons 100 neurons several hundred neurons that would be like a big neural network a million parameters would be considered very large we would run our models on unoptimized CPU code because we were a bunch of researchers we didn't know about blast we used Matlab the Matlab was optimized and we'd just experiment like what is the what is even the right question to ask you know so you try to want to gather to just find interesting phenomena interesting observation you can do this small thing you can do that small thing you know Jeff Hinton was really excited about training neural Nets on small little digits both for classification and also he was very interested in generating them so the beginnings of generating models were right there but the question is like okay so you've got all this cool stuff floating around what really gets traction and so that it wasn't so it wasn't obvious that this was the right question back then but in hindsight that turned out to be the right question now now the the year Alex net was 2012. now you and Alex were working on Alex net for some time before then and and uh at what point at what point was it was it clear to you that you wanted to build a computer vision oriented neural network that imagenet was the right set of data to go for and to somehow go for the computer computer vision contest yeah so I can talk about the context there it I think probably two years before that it became clear to me that supervised learning is what's going to get us the traction and I can explain precisely why it wasn't just an intuition it was I would argue an irrefutable argument which went like this if your neural network is deep and large then it could be configured to solve a hard task so that's the key word deep and large people weren't looking at large neural networks people were you know maybe studying a little bit of depth in neural networks but most of the machine learning field wasn't even looking at neural networks at all they were looking at all kinds of Bayesian models and kernel methods which are theoretically elegant methods which have the property that they actually can't represent a good solution no matter how you configure them whereas the large and deep neural network can represent a good solution to the problem to find a good solution you need a big data set which requires it and a lot of compute to actually do the work we've also made advanced work so we've worked on optimization for for a little bit it was clear that optimization is a bottleneck and there was a breakthrough by another grad student in Jeff hinton's lab called James Martens and he came up with an optimization method which is different from the one we are doing now using now some second order method but the point about it is that it's proved that we can train those neurologics because before we didn't even know we could train them so if you can train them you make it big you find the data and you will succeed so then the next question is well what data and an imagenet data set back then it seemed like this unbelievably difficult data set but it was clear that if we were to train a large convolutional neural network on this data set it must succeed if you just can have the compute and write that right at a time gpus you and I you and I are history and our paths intersected and somehow you had the the the the observations at a GPU and at that time we had this is our couple of generations into a Cuda GPU and I think it was GTX 580 uh generation you had the you're at the inside that the GPU could actually be useful for training your neural network models what was that how did that day start tell me you know you and I you never told me that moment you know how did that day start yeah so you know the the GP the gpus appeared in our in our lab in our Toronto lab thanks to Jeff and he said we should be good we should try this gpus and we started trying and experimenting with them and it was a lot of fun but but it was unclear what to use them for exactly where are you going to get the real traction but then with the existence of the imagenet data set and then it was also very clear that the convolutional neural network is such a great fit for the GPU so it should be possible to make it go unbelievably fast and therefore train something which would be completely unprecedented in terms of its size and that's how it happened and you know very fortunately Alex krajevski he really loved programming the GPS and he was able to do it he was able to code to to program really fast convolutional kernels and then and then trained the neural network data set and that led to the result but it was like it shocked the world this shocked the world it it broke the record of a computer vision by such a wide margin that that it was a clear discontinuity yeah yeah and I wouldn't I would say it's not just like there is another bit of context there it's not so much like when we say break the record there is an important it's like I think there's a different way to phrase it it's that that data set was so obviously hard and so obviously outside of reach of anything people are making progress with some classical techniques and they were actually doing something but this thing was so much better on a data set which was so obviously hard it was it's not just that it's just some competition it was a competition which back in the day an average Benchmark it was so obviously difficult so obviously Out Of Reach and so obviously feed the property that if you did a good job that would be amazing Big Bang of AI fast forward to now uh you came out to the valley you started open AI with some friends um you're the chief scientist now what was the first initial idea about what to work on at open AI because you guys worked on several things some of the trails of of inventions and and work uh you could you could see led up to the chat GPT moment um but what were the initial inspiration what were you how would you approach intelligence from that moment and led to this yeah so obviously when we started it wasn't 100 clear how to proceed and the field was also very different compared to the way it is right now so right now you already used we already used to you have these amazing artifacts these amazing neural Nets who are doing incredible things and everyone is so excited but back in 2015 2016 early 2016 when you were starting out the whole these things seemed pretty crazy there were so many fewer researchers like 100 maybe they were between a hundred and a thousand times fewer people in the field compared to now yeah like back then you had like 100 people most of them are working in Google slash deepmind and that was that and then there were people picking up the skills but it was very very scarce very rare still and we had two big initial ideas at the start of open AI that state that had a lot of staying power and they stayed with us to this day and I'll describe them right now the first big idea that we had one which I was especially excited about very early on is the idea of unsupervised learning through compression some context today we take it for granted that unsupervised learning is this easy thing you just pre-train on everything and it all does exactly as you'd expect in 2016 unsupervised learning was an unsolved problem in machine learning that no one had any insight exactly any clue as to what to do that's right iyanla Khan would go around and give a talk give talk saying that you have this Grand Challenge and supervised learning and I really believed that really good compression of the data will lead to unsupervised learning now compression is not language that's commonly used to describe what is really being done until recently when suddenly it became apparent to many people that those gpts actually compress the training data you may recall that Ted Chiang Neo Times article which also alluded to this but there is a real mathematical sense in which training these autoregressive generative models compress the data and intuitively you can see why that should work if you compress the data really well you must extract all the hidden secrets which exist in it therefore that is the key so that was the first idea that we're really excited about and that led to quite a few Works in openai to the sentiment neuron which I'll mention very briefly it is not this work might not be well known outside of the machine learning field but it was very influential especially in our thinking this work like the result there was that when you train a neural network back then it was not a Transformer it was before the Transformer right small recurrent neurological lstm sequence work you've done I mean there's some of your some of the words that you've done yourself yeah so the same lstm with a few twists trying to predict the next token in Amazon reviews next character and we discovered that if you predict the next character well enough it will be a neuron inside that lstm that corresponds to its sentiment so that was really cool because it showed some traction for unsupervised learning and it validated the idea that really good next character prediction next something prediction compression yeah has the property that it discovers the secrets in the data that's what we see with these GPT models right you train and people say just statistical correlation I mean at this point it should be so clear to anyone that observation also you know for me intuitively open up the whole world of where do I get the data for unsupervised learning because I do have a whole lot of data if I could just make you predict the next character and I know what the ground truth is I know what the answer is I could be I could train a neural network model with that so that that observation and masking and other other technology other approaches you know open open my mind about where would the world get all the data that's unsupervised for unsupervised learning well I think I think so I would I would phrase it a little differently I would say that within supervised learning the hard part has been less around where you get the data from though that part is there as well especially now but it was more about why should you do it in the first place why should you bother the hard part was to realize that the training these neural Nets to predict the next token is a worthwhile goal at all they would learn a representation that it would it would be able to understand that's right but it will be use grammar and yeah but to actually to actually it just wasn't obvious right so people weren't doing it but the sentiment neuron work and you know I want to call out alakratford is a person who really was responsible for many of the advances there the sentiment this this was this was before GPT one it was the precursor to gpt-1 and it influenced our thinking a lot then the Transformer came out and we immediately went oh my God this is the thing and we trained let me try and GPT one now along the way you've always believed that scaling will improve the performance of these models yes larger larger networks deeper networks more training data would scale that um there was a very important paper that open AI wrote about the scaling laws and the relationship between loss and the size of the model and the amount of data set the size of the data set when Transformers came out it gave us the opportunity to train very very large models in very reasonable amount of time but what did the intuition about about the scaling laws or the size of of models and data and your journey of gpt123 which came first did you see the evidence of GPT one through three first or was there the intuition about the scaling law first the intuition so I would say that the way the way I'd phrase it is that I had a very strong belief that bigger is better and that one of the goals that we had at open AI is to figure out how to use the scale correctly there was a lot of belief about an open AI about scale from the very beginning the question is what to use it for precisely because I'll mention right now we're talking about the gpts but there is another very important line of work which I haven't mentioned the second big idea but I think now is a good time to make a detour and that's reinforcement learning that clearly seems important as well what do you do with it so the first really big project that was done inside openai was our effort at solving a real-time strategy game and for context a real-time strategy game is like it's a competitive sport yeah right we need to be smart you need to have faster you need to have a quick reaction time you there's steam work and you're competing against another team and it's pretty it's pretty it's pretty involved and there is a whole competitive league for that game the game is called DotA 2. and so we train the reinforcement learning agent to play against itself to produce with the goal of reaching a level so that it could compete against the best players in the world and I was a major undertaking as well it was a very different line it was reinforcement learning yeah I remember the data that you guys announced that work and this is this by the way when I was asking earlier about about there's a there's a large body of work that have come out of open AI some of it seem like detours um but but in fact as you're explaining now they might might have been detours it seemingly detours but they they really led up to some of the important work that we're now talking about chat GPT yeah I mean there has been real convergence where the gpts produce the foundation and in the reinforcement learning from DOTA morphed into reinforcement learning from Human feedback that's right and that combination gave us chat GPT you know there's a there's a there's a misunderstanding that that uh chat GPT is uh in itself just one giant large language model there's a system around it that's fairly complicated is it could could you could you explain um briefly for the audience the the uh the fine-tuning of it the reinforcement learning of it the the um uh you know the various surrounding systems that allows you to keep it on Rails and and let it let it uh give it knowledge and you know so on so forth yeah I can so the way to think about it is that when we train a large neural network to accurately predict the next word in lots of different texts from the internet what we are doing is that we are learning a world model it looks like we are learning this it may it may look on the surface but we are just learning statistical correlations in text but it turns out that to just learn the statistical correlations in text to compress them really well what the neural network learns is some representation of the process that produced the text this text is actually a projection of the world there is a world out there and it has a projection on this text and so what the neural network is learning is more and more aspects of the world of people of the human conditions their their their hopes dreams and motivations their interactions and the situations that we are in and the neural network learns a compressed abstract usable representation of that this is what's being learned from accurately predicting the next word and furthermore the more accurate you are is predicting the next word the higher Fidelity the more resolution you get in this process so that's what the pre-training stage does but what this does not do is specify the desired behavior that we wish our neural network to exhibit you see a language model what it really tries to do is to answer the following question if I had some random piece of text on the internet which starts with some prefix some prompt what will it complete to if you just randomly ended up on some text from the internet but this is different from well I want to have an assistant which will be truthful that will be helpful that will follow certain rules and not violate them that requires additional training this is where the fine tuning and the reinforcement learning from Human teachers and other forms of AI assistance it's not just reinforcement learning from Human teachers it's also reinforcement learning from human and AI collaboration our teachers are working together with an AI to teach Rai to behave but here we are not teaching it new knowledge this is not what's happening we are teaching it we are communicating with it we are communicating to it what it is that we want it to be and this process the second stage is also extremely important the better we do the second stage the more useful the more reliable this neural network will be so the second stage is extremely important too in addition to the first stage of the learn everything learn everything learn as much as you can about the world from the projection of the world which is text now you could tell you could you could fine tune it you could instruct it to perform certain things can you instruct it to not perform certain things so that you could give it guard rails about avoid these type of behavior um you know give it some kind of a bounding box so that so that it doesn't it doesn't wander out of that bounding box and and perform things that that are you know unsafe or otherwise yeah so this second stage of training is indeed where we communicate to the neural network anything we want which includes the bowling box and the better we do this training the higher the Fidelity with which we communicate is about inbox and so with constant research and Innovation on improving this fidelity we are able to improve this fidelity and so it becomes more and more reliable and precise in the way in which it follows the intended intended instructions chat gbt came out just a few months ago um fastest growing application in the history of humanity lots of lots of uh uh interpretations about why but some of the some of the things that that is clear it is it is the easiest application that anyone has ever created for anyone to use it performs tasks it performs things it does things that are Beyond people's expectation anyone can use it there are no instruction sets there are no wrong ways to use it you you just use it and if it's if your instructions are our prompts are ambiguous the conversation refines the ambiguity until your intents are understood by by the by the application by the AI the the impact of course uh clearly or remarkable now yesterday this is the day after gpt4 just a few months later the the performance of gpt4 in many areas astounding SAT scores GRE scores bar exams the number of the number of tests that it's able to perform at very capable levels very capable human levels astounding what were the what were the major differences between chat GPT and gpt4 that led to its improvements in these in these areas so gpt4 is a pretty substantial Improvement on top of chat GPT across very many dimensions we trained gpt4 I would say between more than six months ago maybe eight months ago I don't remember exactly GPT is the first build big difference between Shad GPT and gpt4 and that's perhaps is the more the most important difference is that the base on top of gpt4 is built predicts the next word with greater accuracy this is really important because the better a neural network can predict the next word in text the more it understands it this claim is now perhaps accepted by many at this point but it might still not be intuitive or is not completely intuitive as to why that is so I'd like to take a small detour and to give an analogy that will hopefully clarify why more accurate prediction of the next word leads to more understanding real understanding let's consider an example say you read a detective novel complicated plot a storyline different characters lots of events Mysteries like Clues it's unclear then let's say that at the last page of the book the detective has got all the clues gathered all the people and saying okay I'm going to reveal the identity of whoever committed the crime and that person's name is predict that word predict that word exactly my goodness right yeah right now there are many different words but by predicting those words better and better and better the understanding of the text keeps on increasing gpt4 predicts the next word better I tell you people say that the Deep learning won't lead to reasoning that deep learning won't lead to reasoning but in order to predict that next word figure out from all of the agents that were there and and all of their you know strengths or weaknesses or their intentions and the context and to be able to predict that word who who was the murderer that requires some amount of reasoning a fair amount of reasoning and so so how did how did the how is it that that and that it's able to pre to learn reasoning and and if if it learn reasoning um you know one of the one of the things that I was going to ask you is of all the tests that were that were taken um between Chad GPT and gbd4 there were some tests that gpt3 or Chan GPT was already very good at there were some tests that gbd3 or chai GB was not as good at um that gbt4 was much better at and there were some tests that neither are good at yet I would love for it you know and some of it has to do with reasoning it seems that you know maybe in calculus that that it wasn't able to break maybe the problem down um into into its reasonable steps and solve it is is it but yet in some areas it seems to demonstrate reasoning skills and so is that an area that that um uh that in predicting the next word you're you're learning reasoning and um uh what are the limitations uh now of gpt4 that would enhance his ability to reason even even further you know reasoning isn't this super well-defined concept but we can try to Define it anyway which is when you maybe maybe when you go further where you're able to somehow think about it a little bit and get a better answer because of your reasoning and I'd say I'd say that there are neural Nets you know maybe there is some kind of limitation which could be addressed by for example asking the neural network to think out loud this has proven to be extremely effective for reasoning but I think it also remains to be seen just how far the basic neural network will go I think we have yet to uh tap fully tap out its potential but yeah I mean there is definitely some sense where reasoning is still not quite at that level as some of the other capabilities of the neural network though we would like the reasoning capabilities of the neural network to be high higher I think that it's fairly likely that business as usual will keep will improve the reasoning capabilities of the neural network I wouldn't I wouldn't necessarily confidently rule out this possibility yeah because one of the things that that is really cool is you ask you as a treasury video question but before it answers the question tell me first first of what you know and then to answer the question um you know usually when somebody answers a question if you give me the the foundational knowledge that you have or the foundational assumptions that you're making before you answer the question now that really improves the my believability of of the answer you're also demonstrating some level of Reason well you're demonstrating reasoning and so it seems to me that chat GPD has this inherent capability embedded in it yeah to some degree yeah this cup the the the the the the way the one way to think about what's happening now is that these neural networks have a lot of these capabilities they're just not quite very reliable in fact you could say that reliability is currently the single biggest obstacle for these neural networks being useful truly useful if sometimes it is still the case that these neural networks hallucinate a little bit or maybe make some mistakes which are unexpected which you wouldn't expect a person to make it is this kind of unreliability that makes them substantially less useful but I think that perhaps with a little bit more research with the current ideas that you have and perhaps a few more of the ambitious research plans will be able to achieve higher reliability as well and that will be truly useful that will allow us to have very accurate guard rails which are very precise that's right and it will make it ask for clarification where it's unsure or maybe say that it doesn't know something when it does anything he doesn't know and do so extremely reliably so I'd say that these are some of the bottlenecks really so it's not about whether it exhibits some particular capability but more how reliable exactly yeah you know one is speaking of speaking of factualness and faithfulness uh hallucination I I I saw in in uh one of the videos uh a demonstration that that um uh links to a Wikipedia page uh to it does retrieval capability uh has ever been included in the gpt4 is it able to retrieve information from a factful place that that could augment its response to you so the current gpt4 as released does not have a built-in retrieval capability it is just a really really good next word predictor which can also consume images by the way we haven't spoken about yeah it is really good at images which is also then fine-tuned with data and various reinforcement learning variants to behave in a particular way it is perhaps I'm sure someone will it wouldn't surprise me if some of the people who have access could perhaps request gpt4 to maybe make some queries and then populate the results inside inside the context because also the context duration of gpd4 is quite a bit longer now yeah that's right so in short although lgbt4 does not support built-in retrieval it is completely correct that it will get better with retrieval yeah yeah multi-modality gpt4 has the ability to learn from text and images and respond to input from text and images first of all the foundation of multi-modality learning of course Transformers has made it possible for us to learn from multimodality tokenized text and images yeah but at the foundational level help us understand how multimodality enhances the understanding of the world um Beyond text by itself and uh and my understanding is that that that when you when you um I do multi-modality learning that even when it is just a text prompt the text prompt the text understanding could actually be enhanced um tell us about multi-modality at the foundation why it's so important and and um what was what's the major breakthrough in the the and the characteristic differences as a result so there are two Dimensions to multi-modality two reasons why it is interesting the first reason is a little bit humble the first reason is that multi-modality is useful it is useful for a neural network to see Vision in particular because the world is very visual human beings are very visual animals I believe that a third of the visual core of the human cortex is dedicated to vision and so by not having vision the usefulness of our neural networks though still considerable is not as big as it could be so it is a very simple usefulness argument it is simply useful to see and gpt4 can see quite well the there is a second reason to division which is that we learn more about the world by learning from images in addition to learning from text that is also a powerful argument though it is not as clear-cut as it may seem I'll give you an example or rather before giving an example I'll make the general comment for a human being us human beings we get to hear about 1 billion words in our entire life only only one billion words that's amazing yeah that's not a lot yeah that's not a lot so we need to company we need does that include my own words in my own head make it 2 billion but you see what I mean yeah you know you can see that because um a billion seconds is 30 years so you can kind of see like we don't get to see more than a few words a second then if you're asleep half the time so like a couple billion words is the total we get in our entire life so it becomes really important for us to get as many sources of information as we can and we absolutely learn a lot more from vision the same argument holds true for our neural networks as well except except for the fact that the neural network can learn from so many words so things which are hard to learn about the world from text in a few billion words may become easier from trillions of words and I'll give you an example consider colors surely one needs to see to understand colors and yet the text only neural networks who've never seen a single Photon in their entire life if you ask them which colors are more similar to each other it will know that red is more similar to Orange than to Blue it will know that blue is more similar to purple than to Yellow how does that happen and one answer is that information about the world even the visual information slowly leaksane through text slowly not as quickly but then you have a lot of text you can still learn a lot of course once you also add vision and learning about the world from Vision you will learn additional things which are not captured in text but it is you know I would not say that it is a binary there are things which are impossible to learn from the from text only I think this is more of an exchange rate and in particular as you want to learn if we are if you if you are if you are like a human being and you want to learn from a billion words or a hundred million words then of course the other sources of information become far more important yeah and so so the the you learn from images is there is there a sensibility that that would suggest that if we wanted to understand um also the construction of the world as in you know the arm is connected to my shoulder that my elbow is connected that somehow these things move the the the the animation of the world the physics of the world if I wanted to learn that as well can I just watch videos and learn that yes yeah and and if I wanted to augment all of that would sound like for example if somebody said um the meaning of of great uh great could be great or great could be great you know so one is sarcastic one is enthusiastic uh there are many many words like that you know uh that's sick or you know I'm sick or I'm sick depending on how people say it would audio also make a contribution to the learning of the the model and and could we put that to good use soon yes yeah I think I think it's definitely the case that well you know what can we say about audio it's useful it's an additional source of information probably not as much as images of video but there is another there is a case to be made for the usefulness of audio as well both on the recognition side and on the production side when you when you um uh on the on the context of the scores that I saw um the thing that was really interesting was was uh the the data that you guys published which which one of the tests were were um uh performed well by GPT three and which one of the tests performs substantially better with gbt4 um how did multi-modality contribute to those tests you think oh I mean in a pretty straightforward straightforward way anytime there was a test where a problem would were to understand the problem you need to look at a diagram like for example in some math competitions like there is a cont math competition for high school students called AMC 12 right and there presumably many of the problems have a diagram so GPT 3.5 does quite badly on that on that text on that on the test gpt4 with text only does I think I don't remember but it's like maybe from two percent to 20 accuracy of success rate but then when you add Vision it jumps to 40 success rate so the vision is really doing a lot of work the vision is extremely good and I think being able to reason visually as well and communicate visually will also be very powerful and very nice things which go beyond just learning about the world you have several things you got to learn you can learn about the world you can then reason about the world visually and you can communicate visually where now in the future perhaps in some future version if you ask your neural net hey like explain this to me rather than just producing four paragraphs it will produce hey like fear is like a little diagram which clearly conveys to you exactly what you need to know and so that's incredible you know one of the things that you said earlier about about an AI generating generating uh tests to train another AI um you know there's there was a paper that was written about and I I don't I don't completely know whether whether it's factual or not but but um that there's there's a total amount of somewhere between 4 trillion to something like 20 trillion useful you know tokens in in language tokens that that the world will be able to train on you know over some period of time and that would have run out of tokens to train and and um I I well first of all I wonder if that's you feel the same way and then this secondary secondarily whether whether the AI generating its own data could be used to train the AI itself which you could argue is a little circular but we train our brain with generated data all the time by self-reflection working through a problem in our brain uh you know and and uh or you know some I guess I guess neuroscientists suggest sleeping we do a lot of fair amount of you know developing our neurons um how do you see this this area of synthetic data generation is that going to be an important part of the future of training Ai and and the AI teaching itself well that's I think like I wouldn't underestimate the data that exists out there I think this probably I think is probably more more data Than People realize and as to your second question certainly a possibility remains to be seen yeah yeah it's it really does seem that that um one of these days our AIS are are you know when we're not using it maybe generating either adversarial content for itself to learn from or imagine solving problems that that it can go off and and then and then improve itself tell us whatever you can about about uh where we are now and and what do you think will be and and not not too distant future but you know pick pick your your horizon a year or two what do you think this whole language Model area would be in some of the areas that you're most excited about you know predictions are hard and um speed it's a bit although it's a little difficult to say things which are too specific I think it's safe to assume that progress will continue and that we will keep on seeing systems which Astound us in there in the things that they can do and the current Frontiers are will be centered around reliability around the system can be trusted really get into a point where you can trust what it produces really get into a point where if it doesn't understand something it asks for clarification says that it doesn't know something says that it needs more information I think those are perhaps the biggest the areas where Improvement will lead to the biggest impact on the usefulness of those systems because right now that's really what stands in the way you have an area of asking neural network you ask a neural net to maybe summarize some long document and you get a summary like are you sure that some important detail wasn't omitted it's still a useful summary but it's a different story when you know with all the important points have been covered at some point like and in particular it's okay like if some if there is ambiguity it's fine but if a point is clearly important such that anyone else who saw that point would say this is really important when the neural network will also recognize that reliably that's when you know same for the guardrail say same for its ability to clearly follow the intent of the user of its operator so I think we'll see a lot of that in the next two years yeah that's terrific because the progress in those two areas will make this technology trusted by people to use and be able to apply for so many things I was thinking that was going to be the last question but I did have another one sorry about that so Chad uh chat gbt to gpt4 gpt4 when when it first when you first started using it uh what are some of the skills that it demonstrated that surprised even you well there were lots of really cool things that it demonstrated which which is which were quite cool and surprising it was it was quite good so I'll mention two excess so let's see I'm just I'm just trying to think about the best way to go about it the short answer is that the level of its reliability was surprising where the previous neural networks if you ask them a question sometimes they might misunderstand something in a kind of a silly way whereas the gpt4 that stopped happening its ability to solve math problems became far greater it's like you could really like say sometimes you know really do the derivation and like long complicated derivation you can convert the units and so on and that was really cool you know like many people it works who are proof it works through a proof it's pretty amazing not all proofs yeah naturally but but quite a few or another example would be like many people noticed that it has the ability to produce poems with you know every word starting with the same letters or every word starting with some it follows instructions really really clearly not perfectly still but much better before yeah really good and on the vision side I really love how it can explain jokes it can explain memes you show it a meme and ask it why it's funny and it will tell you and it will be correct the the vision part I think is very was also very it's like really actually seeing it when you can ask questions follow-up questions about some complicated image with a complicated diagram and get an explanation that's really cool but yeah overall I will say to take a step back you know I've been I've been in this business for quite some time actually like almost exactly 20 years and the thing which most of which I find most surprising is that it actually works yeah like it it's turned out to be the same little thing all along which is no longer little and it's a lot more serious and much more intense but it's the same neural network just larger trained on maybe larger data sets in different ways with the same fundamental training algorithm yeah so it's like wow I would say this is what I find the most surprising yeah whenever I take a step back I go how is it possible those ideas those conceptual ideas about well the brain has neurons so maybe artificial neurons are just as good and so maybe we just need to train them somehow with some learning algorithm that those arguments turned out to be so incredibly correct that would be the biggest surprise I'd say in the in the 10 years that we've known each other uh your your uh the near the models that you've trained and the amount of data you've trained from uh the what you did on alexnet to now is about a million times and and uh uh no no one in the world of computer science would have would have believed that the amount of computation that was done in that 10 years time would be a million times larger in that that you dedicated your career to go go do that um you've done two uh many more uh your body of work is incredible but two seminal works and the invention the co-invention with Alex Ned and that that early work and and now with uh GPT at open AI uh it is it is truly remarkable what you've accomplished and it's great to catch up with you again Ilya my good friend and and um uh it is uh it is quite an amazing moment and it's a today's today's talk the way you you uh break down the problem and describe it uh this is one of the one of the uh the best PhD Beyond PhD descriptions of the state of the art of large language models I really appreciate that it's great to see you congratulations thank you so much yeah thank you so I had so much fun thank you
Info
Channel: Mind Cathedral
Views: 15,138
Rating: undefined out of 5
Keywords:
Id: ZZ0atq2yYJw
Channel Id: undefined
Length: 53min 4sec (3184 seconds)
Published: Thu Mar 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.