Mapping GPT revealed something strange...

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] to me the difference feels like language models start with this highly abstract language representation the system as a whole can try to predict the next token with greater and greater accuracy and so the difference it seems is that the adversarial inputs for us tend to look a lot different than the adversarial examples for llm once you try and go outside of this sphere of what is Meaningful to humans the possibilities grow exponentially I was recently in Toronto a beautiful city to film with cooh here and hold on those videos will come out very shortly but um around the same time someone shared a paper on our Discord server and it's called what's the magic word a control theory of prompting large language models and it's by Ammon Bava and Cameron wowski now what these guys did is theoretically think about a language model as a dynamical system and use the lens of control theory to think about the space of reachability why is this important well language models we think that they think in language space this abstract language space but they don't they actually think using the shogo they think in this very high resolution token space and it's just this horrible hairy gnarly mess right no one has created any firewalls for large language models yet when companies publish their language models you know you just have an API and you just send tokens up and I always had the misconception that rhf or these forms of you know kind of fine-tuning or preference steering using human feedback I thought that they significantly reduce the reachability space because in language models we do the pre-training which is distribution matching and then we do rhf which is mode seeking which essentially chops down the reachable space given a prompt by snipping off all of those trajectories turns out I'm wrong the reachability space is much larger than I thought it was and this is one of the things that they point out in their paper and we kind of knew this right because we can do adversarial attacks on these language models you know people have observed that if you use sort of Human Social Engineering tricks on them like oh I'll tip you $500 then you know it'll do a bit better uh but then there's this whole other sort of perceptual layer I guess you could call it where there's this sort of chaotic regime of adversarial prompts kind of like hypnosis kind of like magic where if you give it these very strange very inhuman looking prompts that will steer it to this to just making a certain output extremely likely right and so to me it feels really similar to digging into like magic and the human perceptual system just with llms where we're learning about basically the shape or the what the nature of these language models are in terms of how they interact with the world and how they how their Dynamics really work rendance please prepare the Captain for [Music] [Music] descent for as long as I can remember the thing I've wanted more than anything else is to figure it all out I've never shied away from the big questions why are we here what are we all doing what is this thing we call life that we are all experiencing and one and the same a part of while these questions are all you know 30,000 ft in the air one thing that Drew me back down to Earth was the field of engineering and when I graduated high school this had a very strong appeal a poll because in engineering you can Design Systems you can design real operable things that you can work with and design and understand how they work and so through engineering perhaps you can begin to investigate and understand the intricacies of our world that's my hope at least so throughout my career I majored in robotics and very soon I was drawn to the idea of intelligence because intelligence seems to underly so much of our world so much of the design process of engineering itself but what is intelligence and how we how can we we understand this it's a question of systems design really where we're trying to figure out okay we're humans we've been in Civilization for for some time and we've sort of figured out how to cooperate with each other we obviously have challenges with that we're not perfect by any means but when it comes to adding language models to the mix I think it could go both both ways where we could have a world where language models just make us much Dumber much less capable maybe make for a worse world but I think that if we think carefully and we really understand what's going on with the language models if we can get a fundamental understanding of them one way or another then there's much more hope that maybe we could make a world where our language models don't just make us smarter but make our world substantially better and perhaps lead us towards some greater Enlightenment and basically ability to cooperate much better than we were even before do you think language models are intelligent that's a great question I think that they're able to simulate intelligence one of the really interesting things I'm starting to see now is we are building software abstractions and controllers on top of language models we've been talking about doing this for years right because at the end of the day we have this idea that we can have this big foundation model and it does all of the things it's multimodal it knows how to reason and the fact of the matter is that's not really true we need to control them and I think initially we're seeing Frameworks that allow you to do things like prompt injection but the next step is thinking of controllers using control theory to think about these large language models anyway I really hope you enjoy the conversation today now these guys are fascinated not only with controlling language models but also with things like AGI general intelligence collective intelligence it was a really interesting conversation and if you stick around to the end you can also hear about the Institute that they've set up around AGI technology enjoy the show so my name is Aman uh I'm a PhD student at Caltech studying computation and neural systems recently uh we released this paper called what's the magic word towards a control theory of llms and did that over the last summer with Cameron here and yeah I guess I was here for my undergrad at the University of Toronto doing engineering science I specialized in machine intelligence sort of been bouncing around between you know doing machine learning stuff applying it to computational biology trying to understand you know some up in theoretical neuroscience and most recently getting back into uh the llm space as as well as trying to study collective intelligence how very simple machines can come together to produce a very complicated and uh beautiful system as a whole so yeah amazing and Cameron yeah so my name is Cameron mckowski and I'm I went to undergrad here I did engineering science as well um I majored in the robotics engineering option and now I'm a grad student I'm pursuing a masters in electrical Computer Engineering um advised by Steven Brown and Kevin Dr I'm really interested in the Deep questions of intelligence and right now I'm pursuing research related to morphogenesis and computational models of it like mom mentioned last summer I went down to Caltech and we wrote this paper on prompt engineering uh well a control theory of prompt engineering and I'm excited to get into it you folks have just written an incredibly interesting paper it was shared in our Discord server and um I saw your presentation am and we'll share a clip of that in in the introduction but I was intrigued by it straight away and what you're doing is you're talking about control theory in respect of large language models can you explain what what that is yeah so I guess I'll get started with control theory uh so back in the day you know the late 1800s this guy Maxwell uh observed that you know people were making these engines and they were putting these things called Governors on them where you know if your car or your machine was experiencing varying loads you wanted the engine to still go at the same rate right and people had these things called Governors you know there's this flyball Governor which is this sort of hand tune thing that you put on top of the engine to try to make sure that it'll be consistent that it'll do what you want that it'll be going at a consistent speed right and people were hand tuning these things and obviously the engines were working uh but it wasn't very rigorous and it wasn't very robust and we didn't have many guarantees as to how it would end up working in practice and so what Maxwell did uh was he formalized the not of feedback control where if you have this system even if it's quite complicated as it turns out if you feedback the output of the system into a controller and try to compute some error metric and try to correct for that at every moment in time it turns out to be a much easier problem to solve from an engineering perspective than trying to make a perfect you know system that just does the right thing off the bat so this idea of feedback was really powerful and sort of gave birth to Modern control theory and as it turned out that was a really powerful way to look at system Building Systems and controlling them and doing engineering on them so that they could be robust do what we want and so that we could predict them and so when it comes to llm control theory uh what I we saw is that we're kind of at a similar place with language models where we have these engines we have these language models that are very powerful they can do a lot they seem to exhibit many interesting attributes of intelligence and there's a lot of utility there for people to build further systems on top of them and people are already doing that but right now it's sort of this hand tuned handcrafted prompt engineering that's going on where it's really hard to get at the fundamentals of what exactly it means to control an llm system and how you might do it at this point it's very heris and so we sort of saw that as an opportunity to try to figure out what would a control theory for llms look like that you know hopefully if we can do it right will give birth to all of these really really useful engineering insights and also just fundamental insights as to the nature of llm systems so that we can better control them make them reli reliable and robust and be able to do Engineering in a more principled manner on them uh than we're currently able to so that's sort of the the general direction and the motivations for our control theory of language models yeah that's absolutely fascinating I mean for many years I've been thinking that we need to have some kind of a controller for a large language model but I guess I'm interested in first of all what are the differences between large language models and something like a steam engine and also with a steam engine you might be optimizing the efficiency or the perform perance or the speed or something like that what is it that we are kind of trying to make better with a large language model so first off we'll talk about the differences between large language models and other types of systems that you might want to control typically a control system you might first be um introduced to control theory in the context of like say you're trying to control an engine or something else where the States can be represented by a set of numbers or a set of real numbers that is fixed size so perhaps we have a an X and a y coordinate where you're trying to control or a position and a velocity these are common types of systems in scenarios that show up in control theory um the difference with an llm the first major difference is that the token space the state space of the system is discreet because we're dealing with tokens we're dealing with words we're not operating in the space of real numbers anymore and so this introduces some complications and complexities when dealing with control theory the second thing that's really significant is is that each time an llm generates a token or a user inputs a token that state space actually expands it grows by one token and this is very interesting and unique for llm systems on the one hand this can be exploited to try and get the llms to engage in reasoning or chain of thoughts or kind of take a winding path to the answer you actually want them to outputs but of course this makes it very difficult for a control theory because each new token you add the space of possible sentences grows exponentially and in language models the vocabulary size is on the order of 50,000 to 100,000 so this grows extremely extremely quickly these are some of the challenges and with a control theory of say engines you're trying to optimize the efficiency it's a good question what you're trying to optimize for language models um I think this is this is definitely a direction for future research am on do you have any thoughts on this yeah I think the thing that we saw was that even very simple questions about how these llms operate you know they input output relationships when you start to treat them just as a system that you know maybe there's some imposed input like a system prompt and then you get to pick a subset of those tokens right when you start to treat it like that and you just ask a really simple question like let's say that I want it to generate a specific string you know we're not going to be trying to use it to do some intelligent information processing I just want to see can I make it do something and what we found and what sort of motivated us to do this is that we really had no idea when it would be possible or if it was generally possible to make it do anything we want can we just make an llm system generate any output we desire and if the answer is yes which you know seems like it's probable if you get to have a lot of tokens in your input that you control it seems reasonable that you'd be able to probably get it to Output a wide variety of you know at least reasonable English sentences or linguistically valid sentences but the question that we had was okay if you have a finite budget for that would you be able to get it to do anything and what budget of tokens like how many tokens do you have to be able to control if you want to be able to make the system do whatever you want and that was the initial motivation where it was like yeah there are all these you know high and mighty sort of questions of how do we make these systems do what we want in a you know alignment sense how do we make them do what we want in the sense of cooperating towards some information processing objective but we realized that these really really simple questions just okay you have an input that you get to partially control and you're trying to make it do something that question was completely unanswered and we were sort of taking bets on it I think Cameron was the one who started to make bats he was like I bet like 10 bucks that we can get this done we can make it emit this output within you know five tokens and that was really the initial motivation where I was like even the feed forward dynamics of this system are really mysterious and getting a grip on those it seems like that's a really strong way to start building up a fundamental control theory and a really strong understanding of these llm systems where in control theory at least when you start to really deeply understand just a single system with its own Dynamics and how the input output relationships work what the reachable sets look like uh how controllable it is then when it comes to building more complicated systems where maybe you have a more complicated objective uh maybe you have interacting systems when you really understand the fundamentals it makes that way easier and so the example in classical control theory is that you observe that if you couple a bunch of linear controllers and linear systems together what you get a just one bigger line system and all of the same stuff applies so what we were hoping is that by starting to answer this really simple question of just okay how much can we control this what does the reachability of these llms look like we're really hoping to build that up and to me it feels like we're kind of doing our homework where in engineering we had to take all these classes in control and that was sort of our homework to be able to you know go out into the world and if it ever comes time to build some electromechanical system and get a PID controller in there now we've done our homework so we can you know have a sense what to expect how we could do engineering on it and so that's really where I feel like it's at and I think this is a really promising way to try to get a really fundamental understanding of what's going on with these language model systems amazing so in a second we're going to introduce this concept of reachability but I've thought about this because i' I've had a couple of days to reflect on this and my intuition intuitions just seem a little bit mixed up so um I've interviewed Nicholas Carini for example and he's done lots of work you know building on adversarial examples and and writing algorithms to find aders Neal examples and we know that neuron networks are not robust you can quite easily perturb uh let's say an input image in a in a vision model and if it's a classifier you can make it pretty much say anything with a very small perturbation and that's kind of the same thing as what you mean as reachability it's it's this idea to kind of reach into the state space and make it do something quite weird outside of what what you would expect now for some reason I had the intuition and I now think I'm wrong that llms do you know I didn't think they had this problem but but they do have this problem and you introduced this really interesting I guess it started out as a thought experiment and you coded it into a game and it's the Roger federa game I think that's quite instructive so can you tell us about that yeah for sure so one of the earliest examples that we were thinking about uh was just a simple example of you have this state sequence that's imposed you don't get to pick it it says Roger Federer is the and then the next thing that you want it to say the thing that you want the llm to generate is the word the greatest so you want to say Roger feder is the greatest and you're trying to pick out a prompt that comes before then uh that will steer the system so that it'll output that so we're basically asking the question you know is this word in the reachable set of outputs given that we have some finite control over the input where the goal of the game is to for one get it to actually output the right answer which is the greatest which you know is a fairly reasonable English you know thing to say um and the the metric that we use to grade how well you're doing on that is basically how efficiently you're able to do control where you know in the original control theory this idea of efficient or optimal control is really important you know you have this linear quadratic regularization idea where you're like I have only a finite energy budget for the signal I put in similarly with language models uh what we're interested in is the minimal length of the control input that will steer the model successfully to what you want it to to do and it turns out that the game is actually very challenging uh at least Le with this gpt2 model which is the one that we're using right now since it's just running out of a desktop on my desk at home uh and so yeah there's this game that you can play uh we can link it where you get to put in a prompt to the system and it'll come back to you and say okay you got the answer right or you got the answer wrong as well as your basically your error on that so your cross entropy loss on getting the correct output the desired output and the game is to basically get the shortest prompt that will steer the model to the desired output and it's it's actually quite challenging with gpt2 where I think only four people including Cameron and then my friend Michael zelinger who we had made this thing called fang check for which is this resume Checker that uses language models to basically predict your probability of getting into a a Fang company uh I think those two were the only people who actually ended up getting it right and it turns out to be very difficult um so that game was sort of a codified sort of interactive version of our initial motivations for this where it was like wow this really simple question that seems like there should be an easy answer I mean if there is an easy answer I'd love to know but uh the simple question really leads to uh a problem that's quite difficult to solve and we really have poor Insight on and we're really just trying to get that Insight uh together to understand what's going on there yeah and just to jump off that point as well I think one of the reasons why this game in particular is difficult is because we're using gpt2 and Roger Fedor is the blank right you would think greatest would be rated pretty high but gpt2 I guess it's trained on lots of fill-in-the blank tasks it tends to Output just a set of underscores quite often to comment on your intuition You' mentioned before on whether language models have this adversarial property one thing that was really interesting when we were doing some of our initial work was this technique of soft prompting So Soft prompting instead of selecting discrete tokens which we want to adversarial change the model's Behavior with soft promting modifies the embedding vectors directly so you have a lot more fine grain control over the outputs and it turns out when you soft prompt when you adversarially attack not the tokens themselves but the embedding vectors you can send the cross entry LW straight to zero for whatever token you want with a very tiny adjustment in these embedding vectors so this is very interesting this points to the fact that the real challenge with controllability is not necessarily that there aren't adversarial um inputs for language models but just it's very hard to search this exponential space of discret prompts yeah and part so I guess there are many degrees of freedom in in any deep learning model it's a it's a very highly dimensional model there are many degrees of freedom and I'm I'm trying to understand my intuition so it's trained with a softmax for example and certainly when you do you know like um temperature sampling um the likelihood is that you're only going to get you know the top few tokens I mean if you look at the sort of like you know the distribution and the probability it's like it's almost certainly this one or this one and then it just tails off very very quickly and I assume that inductive was quite deliberate really to kind of you know increase the statistical tractability of the model but underneath that in the embedding space it's not a shell at all even though there's some lowlevel surface of of you know embeddings and and you can Traverse this right so initially you might think that this embedding space is a very rich representation of the meaning of different words right and certainly if you do word to VC or take a a PCA analysis of the embedding vectors for any large language model you'll find something that roughly corresponds to the meaning I mean words that mean similar things are attached more closely together but this opens the question if you were to interpolate between two similar words take the embedding Vector that is halfway between would you get the halfway in between word or would you get something that's nonsense right and I think what you find by these kinds of soft prompting experiments by D directly manipulating the embedding vectors is that the embedding space is actually extremely non-convex in the sense that by interpolating you don't you know just get an average value between the two of them um yeah one of the I don't know if this is best to get into but one of the techniques we were trying to use is this technique called gumbal softmax so instead of the it's a discrete search over the token space one thing you can do is it's kind of like the reparametrization trick for variation Auto encoders but it works for a categorical distribution and so you can use this trick it and it essentially works by kind of interpolating between embeddings uh but it actually was very difficult to get to converge and did not even close to rival the performance of gcg my intuition is that when you um you know take a data point off the manifold because you know these these neural networks they do learn a manifold of of language I thought if you take a data point off the manifold it would cause some kind of mod collapse it would just cause the network to become chaotic and and go crazy but apparently that's not the case can it recover it's almost like if you put a bunch of tokens in which are just really weird and then you just carry on it's like the language model recovers it finds coherence again and then it just carries on yeah I think it's it's honestly a really hard question to answer where in different regimes we notice different things where if you choose this adversarial prompt so that you know basically these prompt optimization algorithms all work uh in in the same way where you're trying to maximize the likelihood of some desired string and then you're able to modify some input right and so depending on how you choose that you know you can you can do the optimization so that the model will output some gibberish right and it seems like depending on the model depending on the sampling techniques uh I've seen it go both ways where sometimes it'll recover after that sometimes it'll start you know generating reasonable coherent text and other times it seems like it'll continue to generate some random stuff it'll kind of be in this out of distribution mode and I think that the that's one of the reasons that I think that these adversarial examples studying them as well as this control theory stuff is really important where it's like yeah if you have a system in the real world where you know tokens are coming in you're actually processing them from real users you know you don't have total control but the user is the one who's giving the control input you want to make sure that your system is sort of robust to that where there's a lot of really complicated interactions as it turns out between for instance the tokenizer and the incoming strings where when you due to this prompt optimization sometimes it'll come out with a sequence of tokens that if you convert it to a string and then convert it back to tokens it'll actually be very different uh which R we ran into with this game where I was like oh I'm going to cheat at this game I want to be the top prompter so I'm just going to use some of the algorithms that we had from our GitHub repository the magic words GitHub repository to basically optimize these prompts but then when you convert it back to a string then it turns out not to work as well and so yeah I think that answering that question and seeing you know when is it that the model will actually be able to recover you know is it a function of how big the model is you know are bigger models better at recovering or is it the case that bigger models are maybe more controllable you know maybe you can shift these models into this weird sort of sorry I just spmp the mic but this uh sort of outof distribution regime where they're generating this seemingly random random output based on seemingly random input and so yeah I think that that question is really really important and is one that is I think well addressed through considering them as systems which is sort of the thesis of this paper and we're we're trying to get a grip on what exactly the the case is you know is it going to be able to recover is that a consistent Behavior or is it not um there's this sort of weird recurrence relationship between the prompt and then the stuff that the language model generates and then the stuff that's generated in the future where in in effect you know you're able to pick a prompt and then the language model will generate some more text but then that text becomes sort of part of the prompt as well so it seems like maybe there could be these sort of degenerate states where if you start with this seed of chaos it'll basically Branch out and the the future strings that it generates is going to prompt it into being more and more chaotic and that's basically stability analysis or sensitivity analysis and there's all this like Rich vocabulary and all of these people who have spent basically hundreds of years thinking about these Concepts in the for both you know discreet and continuous dynamical systems that we get to build on top of and basically use their insights to understand you know what does it mean what does stability really mean we can just draw those definitions in apply them to our you know generalized form of a system a language model system and I think that's why the control theoretic aspect is exciting where you can actually ask these questions in a very concrete and reasonable way and the best part is that people haven't really been using these these ideas or using this vocabulary to describe the questions that we're trying to answer and so most of these things if you just spin up a you know a small GPU and test some stuff out with a seven billion parameter model you're actually doing new research and it's actually some useful research in my opinion where you're getting a sense of the control theoretic properties of language models and to me that felt like the most exciting thing here the open questions are the most exciting part of the paper to me where we've taken a stab at basically the you know empirical study of controllability by sampling these wikitex sequences seeing if we can control the next token the next few tokens as well as some sort of theoretical results on self attention and its controlability but then all of these open questions uh emerge just because we're now framing it as a system and people for hundreds of years have been thinking really really deeply about how you understand systems when used in the real world and you have this sort of finite control over them yeah that's really interesting I mean I suppose I'm pointing out the obvious here but these are um Auto regressive models so the answer gets kind of fed back into the prompt and then we rinse and repeat which means you can model them as dynamical systems and that is in STK contrast to something like a vision classifier where you know there's just an input and an output and that's it that that that's the end so now you can get the system into this kind of Corrupted State where you know you get Divergence and and decco and as you said that that's that could be analyzed with um stability analysis but I find that fascinating but we should just go back quickly to your Roger fedra example so I'm interested in the the different ways that we could go about this so um the humans were kind of using language and language are a bunch of mimetically shared cognitive tools and they were saying things like you know um you know basketballs are great and you know Joe blogs is great Roger fedra is great and it wasn't very ponus but it but it worked and then um you know another approach that that that you spoke about is you could just make a Python program and you can just let's try a neighborhood greedy search one token at a time so we find the the the nearest token and then we find the second nearest token until we find the adversarial attack um or we could do like a low-level um gradient search and then we can find something really weird and wonderful there might be some esoteric characters that just make it Go Bananas but these are three very very different levels of talking to a language model the word on the street is that language models are a new form of programming that you can just um say what you want to do using English language and so on um and language models certainly seem to in incorporate that structure but the language models themselves are just an inscrutable you know set of of of neurons right and and weights and matrices and so on so there's some there's a kind of higher resolution shog off going on underneath the covers that's more or less the picture I have we have this interface where we can speak to the language model using language and if we set up a conversation with a language model where we have different labels you know chat gbt says this Cameron says this and you know you engage in a conversation because it has seen enough conversations in its training data then it's able to play along very fine what's going on under the hood of course like you say it's very inscrutable it's very difficult to really probe and understand there are certain techniques in the interpretability literature but I don't think as a whole it's we're even remotely close to having a complete understanding of how these systems work but that's one of the reasons why I think that control theory is a great way to kind of break in and see what's going on because if you just look at the system's input and output characteristics you can really gain a lot of insight into the nature of these systems one guiding principle in my life doing engineering and trying to learn about the worlds has been this quote by Richard feeman it's very popular what I cannot create I cannot understand and yet today we find ourselves in this situation with language models where we have these incredibly complex systems we built and yet we can't really get into them so to extend this to today what I would say is what I cannot control I cannot understand the the way I think about it is it it's almost like you want the language model to be a high level controlled robust interface and and it's almost like we're all Marvel characters and we can give secret hidden codes it's like me now imagine if I could just through telepathy control your behavior and anyone can do that with a language model they can just put weird tokens in and they can manipulate Its Behavior and there's there's no there's nothing stopping you there's no firewall I feel like the this kind of Harkens to why we call the paper what's the magic word where you know the initial reason was just that you know it's almost like the llm is asking you if you wanted to do something what's the magic word like what's the this key this weird control prompt that will just make it do the right thing but I think more generally you know I used to be into magic when I was a kid I had to jaw at a restaurant doing you know card tricks for the patrons while they waited for their food and what magic is is basically you're playing tricks on the human perceptual system where there are all of these sort of inductive biases that the human perceptual system has where you know for instance if I move something and I look at it you naturally will tend to follow the my gaze and what is moving is generally more Salient and so then I can like do something over here with my other hand like take something out of my pocket and then when I display it they'll be like oh my God where did that come from right and what we're discovering I think is a sort of similar thing with language models where for one you know people have observed that if you use sort of Human Social Engineering tricks on them like oh I'll tip you $500 then you know it'll do a bit better uh but then there's this whole other sort of perceptual layer I guess you could call it where there's this sort of chaotic regime of aders serial prompts kind of like hypnosis kind of like magic where if you give it these very strange very inhuman looking prompts that will steer it to this to just making a certain output extremely likely right and so to me it feels really similar to digging into like magic and the human perceptual system just with llms where we're learning about basically the shape or the what the nature of these language models are in terms of how they interact with the world and how they how their Dynamics really work and I think think that it's very sensible that the control theoretic perspective would be useful for this where in classical control theory trying to control these systems actually taught us a lot about the nature of systems both linear and nonlinear and I think that we have a very similar opportunity here where we're really discovering what is the nature of these language models in terms of control where these questions don't emerge quite as naturally and don't have quite as natural of an answer when you're just thinking about them as a sort of probability distribution over text thinking about them in terms of being systems that have inputs and outputs and these trajectories and the like uh actually really does change the kinds of questions that you end up being able to answer and the kind of understanding that you get about the nature of the system itself which to me is one of the most exciting things so yeah that that's so interesting the the magic example thing I I think we we think that we are robust but we're not maybe we're system two robust but we're not system one robust and if you look in the animal kingdom there are so many examples of um you know like a a hen if you make the right kind of clucking noise the the mother will think that that you're that you're the chick so um it's it's really really um weird actually and Keith gave me this example of I think it was from science fiction that there's a hypothetical image and if you look at the image every single person goes into a coma and what's interesting about that is it's a kind of you know population level adversarial example rather than an individual adversarial example but then it gets into the question of well you know how how can we use this control theoretic approach to robustify models because we're talking about building agentic llms and part of the thing I'm trying to get my head around is in this particular case we had a very clear kind of cost function you know a specific thing but what would it mean to robustify language models in in the general so one one of the things that came up uh in our you know sort of literature view was this idea of you know when you're trying to control these discret stochastic dynamical systems one concept that can be quite useful is you might have a set of outputs that you want to reach or a set of outputs that you want to avoid so a void set and basically a desirable set right and when you frame it like that you know I think that the robustification comes from the fact that let's say that you have a set of outputs you really don't want the language model to to emit right you might think okay well I'll just fine-tune it so that it decreases the likelihood the prior likelihood basically of those sequences right and the issue with that I think and the thing that the control theoretic perspective sort of brings in is the fact that when you have a finite even a small control prompt some extra tokens that you get to inject it turns out that even very very unlikely next tokens can be made to be the most likely next token uh just by inputting these new examples so even if you did hypothetically fine-tune the model so that this avoid set was assigned very low probability it seems like if you don't incorporate some aspect of you know maybe stochastically trying to search for these adversarial examples and sort of having this sort of Minimax thing where you have one system that's trying to elicit the output one system that is trying to fine tune the model to maybe make it less likely or optimize another part of the prompt that is supposed to steer it away from these outputs uh basically the Insight I think is that you really have to be careful to consider the fact that you have you're giving the outside world some amount of control over the system some amount of control over the context and planning around that is actually very non-trivial and is not really well managed I don't think through the classical view of just cross entropy loss and just treating it like a probability distribution something else that fascinates me is the the diver between focusing on the model versus you know complexifying the software which controls it so right now for example we have language models and you know there's there's this kind of Base training and then there's fine-tuning and there's rhf and you know there's um like command variations of that for example and then we build these software apis that are just trying to abstract away the complexity so they will do Dynamic um prompt construction for multi um you know multi-stop tool use and and it goes on and on and on there there'll be Frameworks for doing agentic llms and there just seems to be like a bit of a um a Divergence here but the reason I'm asking the question is does it make sense to robustify and fix the problem in the model or does it make sense to almost in increase the flexibility of the model and fix it in in the software layer I think one of the insights from our paper is that solely focusing on the model itself um like Aman was just saying as soon as you give the outside world control over the model in the sense of being able to input whatever kind of text that they want it becomes very difficult to really prevent adversarial attacks and prevent jailbreaks and that's you know why you see jailbreaks keep coming up I think if you were to inv involve some sort of robustness in a software layer that might be more feasible um at least I can't immediately picture you know ways around it as you know of course if if I was a hacker I could probably you know find some loophole there there's usually some loophole you can find but if there is some way of Fielding The Prompt messages for instance a user gives you a prompt first you check is this a is this a reasonable thing that a human being would say in conversation or is this something that I've never seen before in the entire history of the internets right the latter maybe is a prompt injection maybe is you know something devious or maybe is you know computer science research but uh yeah it's definitely not an easy problem but the good thing is that there are multiple approaches to it very cool so we we're going to go on to the more Galaxy brain stuff in a second so before we move off the paper can can you just talk more formally about what you what you showed in in the paper yeah definitely so there were two main parts of the paper uh so I guess three so for one what we did was we tried to formalize what an llm system really is at a mathematical level and what we were trying to do at that was basically balance the fact that you know we really wanted to try to take advantage of you know the original sort of control theories very abstract picture of a system where you know you have this input space you have a state space an output space and there's some Dynamics going on inside of it uh in our case we parameterize those Dynamics with an llm and our input space and our state spaces were basically the set of all possible token sequences uh from the vocabulary set of this model right so that was the first part and we basically transferred over a lot of the Notions of basically reachability and controllability for llm systems from the original control theory where you can really just Define it in terms of this really abstract Notions of you know have sets for the reachable or sorry the uh the state space the input space and the output space you have some Dynamics and basically in terms of those sets you can Define reachability and control so that was the first part uh the next thing that we did was we tried to look inside the model so we were thinking you know it'd be really nice like in control theory uh if we could have a really good understanding of the components of the system and how controllable those individual pieces were so what we did is we looked at a single self attention head and tried to really think about it through a matrix algebraic perspective to really break down what the relationship is between let's say you have a subset of the tokens you get to control a subset that's fixed and you're trying to get the output to be you know a certain value the output representations where all of these in the case of a self- attention head are just these Vector representations of tokens uh so what we found there was that it actually is possible to do some fairly you know simple Matrix algebra manipulations to decompose the output of a self attention head into one component that arises from the imposed input and then another component that arises from the control input and assuming that those two are bound uh then you can actually derive that well there actually is this geometry that sort of looks like a bubble around the default output so that the output if you didn't have any control input in there's a sort of bubble of reachable space that scales with the number of control input tokens that you're able to use and we thought that that was really exciting because for one I didn't really expect that you'd be able to do Pro on these sort of you know very complicated high-dimensional uh machine learning or deep learning systems like a self attention head but it also gave us some insight to say that okay we actually have this really concrete relationship between the sort of number of control input tokens the magnitudes that you're able to input into the system and the output reachable set that is at your disposal basically and so that was the the second part and then the last part was some empirical experiments where we said okay let's just sample a bunch of strings from Wikipedia and and we'll see okay uh the strings were between 8 and 32 tokens and those were basically our imposed State sequences and we asked the question well can we get it to Output the correct next token the real next Wikipedia token how many you know input tokens does it take or control input tokens does it take for that to happen it turned out that you could get that done about 97% of the time to steer the the model to the correct output within 10 tokens of a control input which is reasonable you know we'd expect that the model should be able to be steered towards reasonable true English sentences that were more than likely in the training data set uh what we did next was we tried to figure out you know if you sample the top 75 most likely tokens according to the model uh based on this uh fixed input uh can you steer those things to be the most likely token uh basically the argmax of the probability distribution and what we found there is that it's about 89% of the time at least 89% of the time we were able to find these optimal control inputs that were less than 10 tokens long that would steer the model to do that and then the last thing we did was we said okay well let's see would happen if we just randomly picked a token from the vocabulary so this is everything from regular English to numbers to cilc characters to Chinese characters what if we just randomly sampled those and we tried to see how many tokens it would take to steer that to to being the argmax of the probability distribution and we found there is about 46% of the time we were able to make that next token the random one uh the most likely next token using a prompt of length 10 or less and the sort of curves are there in our in our paper that describe as you have an increasing budget for these tokens how much of the time were we able to basically steer it to the right output that's our basically the K Epsilon controllability metric that lets us get this sort of statistical picture on controllability uh that renders it sort of practical to empirically estimate for these complicated systems and so those are really the main results and the surprising thing about the last one that I mentioned before was that a lot of times even really unlikely NEX tokens were able to be steered to be the most likely just using a really short prompt which both gets at the you know basically chaotic or complexity of language as a system as well as the fact that the prior likelihood picture or the you know cross entropy loss picture doesn't quite get at the controllability sense of when you do have a you know ability to input tokens into the context what happens then so those are the really the main results and then I mean to me the exciting the really exciting part was the open questions where it's was like oh now that we're using this vocabulary now that we formalize these llms as systems uh it's really easy to ask these you know additional questions about you know the nature of the systems and the steerability controllability especially with feedback or Chain of Thought or you know agents or all of the these other ideas and so yeah that was basically the the paper yeah and it's really making me update my intuitions right so I'm thinking about the bias variance trade-off and I'm thinking that the reason we build these inductive priors is to constrain the model intentionally to make it statistically tractable to reduce the size of the hypothesis class but what you're saying is making me think that um statistical tractability and and flexibility are not necessarily the same thing now it seems that the model must maintain a degree of flexibility I mean it makes sense right you have to be flexible in in order to be a successful model but that creates a kind of adversarial attack so you can s the the way I think about this is is the model should be like the interstate freeway of language so all of the major roads should be carved out and there should be side roads and so on and that's the way I visualized the the model but the model's not like that there's actually like all of these little slip roads and you can kind of push the cars off in into the slip roads but you need the slip roads because perhaps you couldn't train the model without the slip road yeah I think I think that's a really good analogy I think that um thinking about pushing cars off the road into this space where they perhaps aren't used to being and what happens next this this is a case where the language model can enter some of these mode collapse type regimes and you can get kind of weird outputs this is where you also um I mean it was surprising that you can get the least likely token with just a specific input to be the mo the most likely next token but if we treat language as this kind of road or as this kind of map structure then it kind of makes sense that once you get off the map once you enter this kind of regime that is completely unexplored which there are actually plenty of regimes like this again because the space is exponential in the number of tokens it's growing so incredibly fast that it's very easy to find Pockets that the model has never seen before and maybe no human on earth or never will be seen again you guys are are really interested in in collective intelligence and biometic intelligence and biologically plausible intelligence and this is a matter very close to my heart um what what what are you guys interested in specifically in that field yeah so I guess when I first got into machine learning it was from watching this Google Deep Mind video where they were using reinforcement learning to teach this guy how to run this virtual reality Avatar how to run really fast and I thought that was fascinating because it was like okay instead of traditional programming you just have this neural network that optimizes itself According to some objective right and the thing that was intriguing to me about that was like the feed for dynamics of a neural network aren't that complicated right you know you have these synapses you have this sort of gated action potential function and the thing that was weird to me was like how does every neuron know how to change its weights right how does each neuron that's independently not that smart know what to do and so that sort of led me down the theoretical the theoretical Neuroscience route for some time where I was trying to figure out okay what do these learning rules look like that don't have to you know use the chain rule use back propagation to update their weights so I did that for a while and then sort of realized that the question of supervised learning was not necessarily the most interesting question to be asked where it seems like the line share of what makes us really interesting as as humans in our cognition seems to be associated with the cortex and this kind of predictive coding module that we have that lets us make these really rich abstract representations of reality to sort of understand what's going on you know we sort of hallucinate this internal model of the world and so the interesting thing to me about the cortex was that you know you have this structure that's pretty flat and pretty homogeneous throughout you know there's differences in different regions but at the end of the day it's very similar and in fact if you lose a sense like if you lose your vision that region is often repurposed for other things so it seems like there should exist you know the brain is kind of this existence proof that there should exist this rule set that if you apply it everywhere in this system in this sort of layer on the outside of the brain then the the behavior the emerging property of that system uh is that you'll get this really robust and Rich sort of representation of the world that is very predictive of subsequent sensory input right and I think that the colle intelligence aspect of that is really really important where there's one way to go in machine learning where you say okay we're going to make this monolithic pile of Matrix algebra and we're going to train it through back propagation and gradient descent and the atom Optimizer and all of that and we're going to make it do some prediction task but at the end of the day every computation has to be implemented in physical reality right and when we make the abstraction and just say oh it's just a bunch of math will just have a GPU run it it kind of abstracts away from this fact that at the end of the day you have real physical objects that need to do computation and share information and in the sort of Maximum efficiency maximum scalability limit it seems like what you'd end up having is a very similar sort of distributed structure where you can't really easily separate memory from computation I think there's a quote from this MIT Professor that says that turing's initial mistake was saying that the head of the turing machine was separate from the tape uh and I think that that's true where in reality you know in brains in in real Computing systems the m that composes the memory and the matter that composes the computation is really one and the same and the brain is obviously this really great proof that okay there are relatively simple rules that are implementable with these biological neurons that if you just Implement them everywhere will get you this really beautiful you know convergence and emergent property of intelligence and that really drove me for a long time in theoretical neuroscience and then more recently in trying to build these distributed systems of you know artificial intelligences that you know the dream that I was trying to pursue before we started this control theory thing was that okay well what if I just had a bunch of really small llms that you know everybody in the world could host and they could communicate with this sort of low bandwidth communication using just tokens just text over you know the regular internet and the emergent property of that you know what if it was possible that we can engineer A system that the mergent property was that it would actually be this really capable Collective where maybe gpt7 can be owned by everyone instead of just being behind closed doors in a data center that we have now we're sort of using these insane engineering uh you know Feats of you know Nvidia interconnects and these really high bandwidth connections between massive racks in a data center uh that take a ton of energy uh to get this really great result of you know modern language models what if we could have a system that was a bit more like the brain a bit more decentralized and really leverage this Insight that it should be possible you know this existence proof keeps coming back to me where it's like okay it should be possible right and that is sort of originally what led me to the control theory stuff where it just turned out to be really hard where we didn't have a great understanding of you know if we're treating these llms as systems rather than just you know big piles of Matrix algebra that we're trying to distribute over many gpus if you treat them as systems that are coupled together they're interacting in this networked fashion how do we really understand that you know is it even possible to prompt them to do the right thing when is it possible how long do the prompts need to be and that sort of led us down this route uh but yeah definitely the collective intelligence thing was was a big motivation for me to get this working and there's this neural cellular automa thing that I know you had talked with Michael Le who was the the last author on that and we worked with Alexander morvin Sev on it where it's this really really great demonstration of how if you just optimize these basically small MLPs with local interaction uh to try to satisfy some objective like you know reforming this gecko or lizard in their paper then you actually can do that with back propagation through time and so you know I thought you know it' be really cool if we could try to engineer information processing systems that did this not just morphogenesis systems but information processing systems that operate in this way because you know as a graduate of engineering science we had to take a bunch of these digital logic courses and when you have this very simple you basically local state machine that has basically local connectivity it's really easy to imagine how it Implement that as a custom chip and sort of reach this you know as be Jos puts it you know thermodynamic limit of uh of AI and so that really excited me and so I built a sort of demo of that where I was trying to do visual information processing on this really sparsified video I was basically trying to do predictive coding of sorts on or active inference I guess on this incoming data stream of really sparsified video trying to predict what would happen next and it turned out to work quite well and so then I was like well why can't we do that with language models you know as you mentioned there are all these slip roads right where if you prompt it just right you can enter this really weird different regime and this exponentially large prompt space is a really handy way to try to control them where you know fine-tuning is great but what if we could just prompt them into interacting in a way that would lead to this emergent property of just being basically one larger language model that could predict the next token really really well and so that initial motivation sort of led to this control theory stuff and I I think that it is probably the right way to go for the field where if we want to be able to really leverage maximal computation towards our objectives you know the bitter Lesson by Richard Sutton kind of suggests that we should probably aim for systems where you can just slap on more and more compute you can have a relatively simple procedure that you follow to leverage more compute towards your objectives that's probably the way to go for making advances in Ai and if we can ask have this decentralized you know networked system that you know I took this distributed systems course while I was here that was really great and sort of taught how to make uh you know basically databases that were distributed over many servers that would have this you know the emergent property they wanted was robustness consistency and availability um if we could have something similar to that that is radically scalable and is able to be you know just run by regular people who don't need to own their own you know GPU cluster that's maybe illegal in the future when the US government is like oh you can only have this many peda flops um basically yeah that that was the real motivation for for the uh the what I call the language game that project and that's something that we're continuing to work on but yeah that kind of led to this control theory thing where we were just like yeah we really need to get a grip on what these look like as systems as we start to build these more and more complicated uh you know Network distributed you know beautiful emergent systems that hopefully will be able to be hyper capable in the future yeah this is all music to my is I'm a huge fan of um the externalist thought in cognitive science and even though I I um I love the work from Jeff Hawkins you were talking about the the neocortex um but even then you know I would kind of say that it's a lot of the the cognition happens outside of the brain you know we're not islands and actually I was just thinking maybe a better analogy rather than the interstate freeway might be you know in Star Trek Voyager there was the um the Wormhole Network and the B secret Network and you could kind of like you know get into these little slip streams and go to different parts of of the universe but when I was interviewing Philip B he wrote this book how Life Works and he was trying to understand you know what are the mechanisms like you know self-organization and multiscale information sharing and you know emergentism and it's it's really really um uh fascinating so how can we introduce some of these Concepts into the next generation of of AI yeah this is one of the things I'm certainly most excited about because I see Life as this kind of interconnected interplay multiscale process of exploitation and exploration and these are two terms from the reinforcement learning literature but I mean this in a much more General sense because at each stage of life we're either going out into the world to get something to do something to try something new and then at the next St stage we're coming back in going home uh you know reflecting uh going over our insights and it's it's this process this EB and flow going out coming back in and I see this kind of pattern emerg across many different aspects of machine learning and artificial intelligence work in the sense that a lot of our algorithms that we have now are convergent they're objective driven we establish a loss function we say these are the rules it should follow it's going to update according to this equation and we set the system running learns from data and we have a final products and on the flip side there's you know like what Ken Stanley works with um more exploratory evolution algorithms or open-ended algorithms and this is this is the other side of things and I think some of the most interesting work to be done is how these two sides interconnect how can we lay down rules strict rigid rules which when they are followed can generate novelty can generate creativity can generate organization in a way which is not predetermined but almost fractal and infinite in its complexity and are those rules defined already do they exist in the world are we Guided by them are there principles like that which exist that we can come to or is it's you know are we kind of you know the authors of our own Fates in a sense are we each agents and actions uh we get to choose our path in life I think these These are the directions I'm really interested in and to connect this to my research one thing I'm focused on now for my thesis project um is looking at morphogenesis so this connects to the more advin of paper as well except what I'm really interested in is how does structure emerge how do different cells actually connect together so um in that paper for instance each of the cells were on a fixed grid but in our bodies uh there's actually quite a sophisticated protein expression Network which governs how cells adhere together um certain Gene regulation uh Pathways will turn on cadherin which will cause cells to to attach together and then in other parts um these cells can unattach and then be transported all around the embryo and I think understanding this process more deeply not only could shed light on structure formation and you know problems in biology in general but maybe more deeper General problems of structure learning because we might think of embryology as quite disconnected from machine intelligence or artificial intelligence but every single brain is formed in the same way and that's through developments yeah I'm I'm also disciple of Kenneth Stanley he he's absolutely incredible everyone at home needs to read his book why greatness cannot be planned um yeah you know so in in the natural world we have um it's so interesting so we have this kind of self-organization and then we have multiscale information sharing but we also have canalization which is that um you actually see a kind of um convergence of of structure and forms you know which is reused you know almost as as modules um in the system but then there's always the question of how do we create something like this because is it simply a matter of complexity do you need to have a microscopic scale to reproduce this or could we reproduce it and then if we did reproduce it the catch22 situation is that you know when you impute directedness onto a system it loses its intelligence because to me intelligence is Divergence it's exactly as you were saying it's this tapestry of um discovering problems solutions new problems solution and it goes on and there's no end it goes on forever and any attempt by us to control it with I mean it's a bit like the bitter lesson you know suton said any human design any attempt to steer it makes it convergent but then we could do something like The Game of Life from John Conway and incredible beautiful structure emerges from that but whenever we try to steer it with our own will it seems to corrupt it as well yeah I think that the analogy to biology is really useful here and the canalization that you mentioned you know you have this reuse of structures across you know cells for instance they all have this similar Machinery to do gene expression and they have the same genetic code underlying that gene expression with you know maybe differences in cell state but at the end of the day it's the same Machinery right and you know I used to do a bit of protein engineering with language models and that's actually how I learned about uh Transformers and built my first Transformers and I think that the analogy is really strong where you know cells sort of know how to read this genetic code this language of the genetic code and they all use that ability this canalized ability that's distributed across all of them to locally they solve this problem of okay what is this specific cell supposed to do what what should it do to basically support the overall function of the organism right and similarly I think the hope with these language models is that now we have these language-based models or llms that have this similar sort of understanding of language they are able to really constrain the probability to distribution understand which sequences of text are reasonable English and you know what they might want to generate and the exciting thing to me is that we can kind of do a similar sort of evolutionary search that we used to do with or that we currently do with uh trying to find protein sequences uh when we're doing protein engineering with language models where every computer in this network of systems has this canalized ability to understand language if you will and is locally it just needs to solve this problem of what should this particular node do to support the function of system and that might be to explore that might be to exploit that might be to do any sort of any number of things and the discovery of that I think is really helped by the fact that we do have strong language models that are able to really predict English or text very well uh because they're able to explore this space and basically in the limit you know there's this good regulator theorem that we had talked about before that says that any system that is a ex that does optimal control over another system must necessarily model that system uh and so if you think in the limit it seems like the best prompt optimizers may end up being language models and already in our study we were using this gcg algorithm that leverages a language model to compute these gradients and try to figure out how we should do this local stochastic search over prompts and so what I basically I'm trying to get at is that there are actually a lot of really interesting similarities I think that can be drawn upon from what we know about the structure and the function of biological systems where you know if we could crack this problem of there's this local control objective or maybe information processing objective that must be met by every cell right every compute node in this network of language models if we could understand what that is what that even means from the perspective of systems and control and you know computation and the like I think that that's a really promising way that we can make progress on this dream of like to me it seems like it would be great to have gpt7 not just owned by one entity but maybe operated by the world where we could all have a say and what goes into it and how it's used and what it should be you know doing and can all benefit from its excellent ability to compute and predict what will happen next and basically perform intelligent uh you know operations on data so yeah I think this is a really really exciting area to to be working on Amazing um we're nearly at time but we'll do two quick five questions so you've both just started the society for the pursuit of AGI yes can you tell us about that absolutely so the society for the pursuit of AGI is a student organization currently we're operating at the University of Toronto and at Caltech and we're essentially a crucible for new ideas if you think of University Research Labs as pursuing relatively safe bets that could be publishable industry research Labs relatively safe bets that maybe might turn a profit one day in some new productor system the society is for the hail mares for the wild bats for the crazy stuff for the real Innovative stuff it's way outside the you know to use the analogy of the highway Network we're trying to to go off the beaten path and we really believe that the bottleneck in AI progress right now is not so much compute not so much algorithms but it's conceptual we need better ideas about intelligence about life about what this whole thing is that we're all experiencing and how we can gain deeper insights of it not only do I think that a deeper understanding will help us to create better systems but it'll also give us confidence that the systems we're developing will be beneficial to humanity and not harmful and I think that will only come with knowledge with first principles understanding and so that's why one of the things we're trying to do is have our club very interdisciplinary I think having machine learning be some this this kind of echo chamber amongst Engineers computer scientists maybe a dash of you know philosophy and Neuroscience it'd be really nice to open the conversation to people in other fields to maybe have a really unique insight into the phenomenon of intelligence perhaps Behavioral economics can offer some insights political science right these are fields that are currently underappreciated but may have useful ideas and maybe even people in the Arts who you know create maybe they don't Design Systems as much as they re-represent things that we know and understand they could have a interesting voice as well very cool and final question I mean first of all I just wanted to say to both of you thank you for doing this great work so your paper is one of the most interesting that I've seen in the LM space in in recent history and um it was shared and loved by many of the folks on on our Discord server but that does bring me to another um point which is that um you you didn't get into iclr and from my perspective I'm I'm shocked because this is really really interesting it has great utility from a practical and a theoretical perspective feel free to have a you know a good about reviewer number two no I mean I wish You' just keep talking like that really soothes the burn of reviewer number two you know but no I think that um yeah the review system to be honest I'm still trying to get my head around it I'm sort of an early career you know researcher uh trying to learn how it works I mean definitely the the review process in for iclr in their defense you know we had this bug with the submission uh submission of our rebuttal basically so we had submitted the revision to our paper and then 15 minutes before the deadline Cameron and I were both getting this time doubt error uh he was in Toronto I was in uh California and so you know they didn't end up actually reading our rebuttal because we had sent it in and they were like oh we'll post it for you and then they were like oh it was posted late so can't read that um so yeah I think that the review process definitely has given us a lot of you know really useful insights where you know the second two results actually that we talked about the top 75 control ability and the random control ability both of those were like from trying to address these reviewer comments right so I think that well I'm trying to do at least is take as much of the good parts of that you know trying to figure out how we can take advantage of this process where we actually get insight from people in the field what they're looking for what they think is interesting uh what they think would improve the the work and try to uh try to use that and overall just trying to figure out how to navigate this peer review system I think it definitely made it feel better as well that the Mamba paper was also rejected from iclr which uh sorry I know yeah it's crazy yeah it was crazy to me as well but uh yeah definitely uh it's it's a Challen and you know after staying up for 40 hours to get this done it was like oh would be it would have been nice if they could have looked at our paper at least you know just seeing the work that we did but yeah it's uh it's definitely good to learn from these things and I guess we've learned the lesson as well not to submit in the last 15 minutes and to you know do it in uh in advance but yeah thank you so much for your kind words about the paper that means a lot and yeah we'll surely continue to make this better and we have a lot of exciting plans for how we're going to continue to try to you know merge together these two you know empirical and theoretical sides of the equation to make some really hopefully impactful work that can really help people build systems and you know make better systems and not be suffering so much under the load of prompt engineering So yeah thank you very much amazing well guys it's been a pleasure and an honor to have you on the show so just keep doing the great work absolutely hopefully we'll get you on again yeah thank you so much for your time and for the opportunity to come and talk it's it's been an amazing opportunity it's it's really unbelievable to be sitting here in front of these cameras after watching the show so many times listening to so many of the podcast and now to be speaking it's just unbelievable so thank you amazing thanks so much guys awesome okay it's a wrap [Music]
Info
Channel: Machine Learning Street Talk
Views: 141,543
Rating: undefined out of 5
Keywords:
Id: Bpgloy1dDn0
Channel Id: undefined
Length: 69min 14sec (4154 seconds)
Published: Fri May 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.