How AI Language Models Will Shape The Future | Aidan Gomez | Eye on AI #123

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

AIDAN 00:00 It was quite extraordinary, quite extraordinarily convenient. That simply by scraping more data off the web, not necessarily clean data, like messy data it’s just web data, you're just taking in everything and there's tons of junk out there. But taking in a very noisy, messy, massive data set, and just making the model bigger, throwing some more chips at it. And what came out the other side was something that understood language, in a way I personally thought we were we were decades from. CRAIG 00:55 We're talking this week to Aidan Gomez, who helped develop the transformer algorithm, which lies at the heart of generative AI and powers large language models, such as GPT-4. Aidan now leads a startup Cohere, a platform that offers users access to pre-built LLMs, as well as allowing users to create their own LLM. But first, I want to give a shout out to our sponsor, and encourage anyone with a business to take advantage of a deal from Oracle, which is offering a full NetSuite implementation with no down payment and no interest for six months. NetSuite is a cloud-based business management software for enterprise resource planning, financial management, customer relationship management, and E commerce. To take advantage of the offer, go to netsuite.com/eyeonai. Now let's get back to Aidan, AIDAN 02:20 I’m Aidan. I am the CEO and co-founder of Cohere. I started the company with Nick [Frosst] and Ivan [Zhang] about three and a half, four years ago. Before that, I was kind of the perpetual intern at Google Brain during my undergrad and then later, my PhD. I started down in the Bay Area and Mountain View. AIDAN I was part of the team that created the transformer. And it was incredibly exciting. You know, it took the world by storm, I think, certainly, to my surprise, and I think everyone on the team was quite taken aback by its popularity. But before Google, I was an undergrad or also during Google, I was an undergrad at U of T [University of Toronto]. I grew up in rural Ontario, Canada in a maple forest. And so I'm the world's most Canadian man. Yeah, that's me. CRAIG 03:25 And so you were at U of T studying with Geoff Hinton, I guess, he was probably kind of retired from teaching by then. AIDAN 03:38 He was definitely not teaching, but he was still at the university. This is before the Vector Institute was created. And so yeah, he was, you know, like I, I didn't really get into deep learning until after second year. And then when I started looking into it, I became obsessed, and I was just reading papers, night and day, I would fall asleep with a research paper sitting on my bedside, I would, in between sets at the gym, you know, have a stack of papers that I was reading through. And I kept seeing this name. And his affiliation was U of T, which was where I was. And so I reached out to Geoff, this is before Google. And I just, you know, I'd been reading his papers at that point when I was studying, you know, reused and MLPs and just the most simple piece of the AI deep learning stack. And I was like, you know, why do you have these functions that are just flat and then up? I think that they should be periodic. And so I emailed him with an idea being like, Hey, why did you make this decision? I think they should be periodic. There should be some regularity and it should be bounded so that you know, it doesn't go to infinity if we get a large input. And to my surprise, he responded, and he actually explained the decision. And so that was pretty amazing. That was my first interaction with Geoff. And then when I came back from Google, in Mountain View to Toronto, Geoff said, Hey, come work with me in, in the Toronto Brain office. And that was where I met my co founder, Nick. CRAIG 05:31 just on, on so so you, you worked on the transformer algorithm with a team in Mountain View at Google. Google Brain? Was it? Yeah, Google Brain. Yeah. So can you explain that periodic versus stable, or what? Which algorithm were you talking about? AIDAN 05:54 Yeah, I mean, it's not very important, because I was wrong. It doesn't, it doesn't really matter. And I think it's more just to Geoff's credit, the fact that he responded to a second year undergrad with, you know, a wacky idea, earnestly. And this guy was literally the top of his field yet toook time for me. And so I think that that particular piece that, I mean, he made me - it's interesting. So for instance, in deep learning and neural networks, we have these neurons, these neurons fire, there's some function that determines their firing, there's some generally some threshold at which they don't fire, they stay dormant. And then above that, they fire. And so when they're firing, they basically, they fire linearly proportional to the input intensity that they're getting. So if the input intensity is high, the output intensity is high when they're firing, but that leads to potentially unstable behaviors, if you have, for whatever reason, some sort of blow up or some sort of like burst of signal coming in, then you'll get a huge burst out. And it'll that'll propagate and make things more and more noisy. And that leads to instability. It makes things complicated in training. And so my proposal was, instead of just firing linearly proportional to your inputs, instead have some sort of predictable, regular periodic pattern, like a sine wave or something. So that you always know your output is bounded between some values. But that has not taken off. And we've since solved the training, instability in the blow ups and that type of thing. So it was just my first email to Geoff, I think six months into my study of deep learning. CRAIG 08:02 Wow. That's impressive. And from a Maple forest. AIDAN 08:11 Yeah, I love that. But I go back, CRAIG 08:13 Then at Google Brain, what was the project that you were working on? What was the initial idea that led the transformers? AIDAN 08:30 So I was on the infrastructure side, like the original idea, I joined Google for I was working with Lukasz Kaiser, and what we wanted to do, I think, Lukasz operates half a decade to a decade ahead of his time, constantly. And so the project that I joined for was actually this paper called one model to learn them all. And the idea was, we're going to take every single dataset that machine learning researchers have compiled, and we're going to put it into one model. And that means it needs to be multimodal, because we have datasets for images, we have datasets for audio, video, you know, text, everything. And so what we wanted to do was throw all the modalities in, as well as out. So you can consume video and let's say, describe the video or you can consume audio and transcribe it. But you can also take in some text and then produce audio, you can also just describe the video that you want and video comes out the other side. So it's just like fully multimodal on both input and output side and we just train on everything like truly everything we've come across. AIDAN This now sounds kind of familiar, right? Because this is sort of the project roadmap that we're on right now with these large language models that we're throwing everything we have, the entire internet, and now we're starting to add in every modality that we can. So that was what I joined for that was a different project altogether. To support that project, we built this, we built this piece of software, this piece of infrastructure. Because that model was going to be huge. And the data pipelines were going to have to be extraordinarily complex. And so we needed something to suit that. And so, what we did was created this program called tensor tensor. It could distribute across arbitrary numbers of GPUs like 1000s and 1000s and 1000s. And it was very focused on auto regressive modeling, which is the type of modeling that the transformer is. AIDAN And so at that time, I was sitting next to Noam [Shazeer], who was fiddling with autoregressive models and in particular, attention based models. He was really interested in attention. And then we heard about a team over in translate which was being led by Jakob which was also interested in attention based autoregressive models. And so Lukasz convinced Noam and and Jakob to come over build it on our stack, build it on tensor tensor. And they did. And so over the next I think, 10 weeks, it was just a sprint to build this model. And the intensity just ramped up and ramped up because the results we were getting were extraordinary. AIDAN So I think this was like, it wasn't the first, but it was one of the very early, extremely successful scaling projects, like hyper scalable architectures, massive data, massive model sizes and massive GPU clusters just lead to extremely high performance. CRAIG 11:55 And, and the first of all the tensor tensor. That's a framework or an orchestration layer. AIDAN 12:04 Yeah, yeah. So it, it's, it was built on top of TensorFlow at the time. But it was basically just a library to support large distributed model training. And it had all the latest kind of tricks and hacks with learning rate schedules and initialization techniques, and it had all this stuff built in. And so it let us experiment really rapidly. I think, if I'm being honest, tensor tensor was a mess. It was crazy. It was just like all over the place that supported everything we were just throwing, every new paper that was coming out into it. It’s a little bit chaotic. And there exists far, far better systems nowadays. But back then it did the job. It did the job, we were able to move insanely fast. And so I'm quite proud of it. CRAIG 13:13 And you were - attention was already something that was being talked about. A couple of questions in that process. What was your role? I mean, I'm a journalist. I imagine you guys sitting next to each other furiously coding, I mean, were you coding? Or is it more that you're in a room with a whiteboard trying to figure out the architecture or is it something else? AIDAN 13:51 There's a lot of like, whiteboarding and diagrams and just conceptual structuring these building blocks and putting them together and the thinking about the architecture itself, there was a lot of that. And that was mainly done by Noam, Ashish, Nicky and Jakob. I think, for me, like I wasn't sleeping. I was working, like 14-hour days coding, building up the infrastructure, making it more robust, running experiments. And so it was very much hands on coding and no one was sleeping. Everyone was just hacking, experimenting, running little tweaks, little ablations to see if I add this in what changes if I if I remove it, if I tweak it? Every single one of us was just messing with everything and trying to figure out what was the optimal configuration. And so that's how we got to that finished product. CRAIG 14:57 Yeah, and and certainly the result now is leading to auto code generation. Were you using any tools to speed up the writing of the code? AIDAN 15:14 At that time? Nothing existed. Truly nothing, nothing existed. It was all. You wrote it yourself. Yeah. Yeah, that came that came later. And that was powered by transformers. Yeah, they kind of. CRAIG 15:35 I've read the paper and, and certainly talked to a lot of people about transformers and, and their progeny. But can you explain as in as simple terms as as you can muster what the transformer algorithm is and what it does? And I'm just curious, too, if if, if you were to send me the transformer algorithm, sort of the basic algorithm is it a million lines of code? Is it 20 lines of code? I'm just curious what it looks like. AIDAN 16:21 Yeah, nowadays, it's probably closer to 20 lines of code. Extremely, extremely simple. I think a big part of the beauty of the model, the architecture was the fact that it was just so simple. Like it, it is among the simplest architectures that were going around at the time, it was just some, like the most basic layer, the layer that has existed for like, I don't know how many years now, maybe over half a century. Like the the basic layer is called, like an MLP. That's just what it's called MLP. And really, the transformer is like, it's a simplification, but it's just some MLPs stacked on top of each other, plus an attention CRAIG 17:20 NLP? You're saying like natural language process? M No. Okay. AIDAN 17:25 Yeah, yeah. So this is just the name doesn't matter. multi-layer perceptron, okay. CRAIG 17:33 Multi- layer perceptron sounds like a neural deep net. But AIDAN 17:38 totally, yeah, that's the fundamental unit. And before, before, transformers, there were these very complicated LSTM architectures with gates and all of these like confusing bits and bobs that just made it made it work. With the transformer, all of that was torn away, and the layer became MLPs plus one attention. That was it. And so that was that was super. I don't know that there was a very, it was beautiful, that you could just carve away so much stuff and just leave something so simple that performed so well, that was so scalable. So the architecture is not this hyper complex beast. It's actually just a very simple scalable compute saturating, you know, CRAIG 18:38 well explain what it does. So you have the multi-layer perceptron as as the base How do you create attention? AIDAN 18:53 How do you create attention? Yeah. So attention is like this idea that you want to relate parts of a sequence, to other parts, fundamental property, that there are relations, if you have a sequence of things a thing in a list in an order, there are going to be relationships between those things. Obviously, that appears on language, very, very strongly, you have adjectives, which are tied to nouns, and, you know, tons and tons of structures like this. And so since we were developing this explicitly for language, we wanted the model to be able to represent those relationships quite easily. That's what attention does. Attention says, For this word, in this sentence, I'm going to learn which other words or which other word in the sequence it's related to. And so for the sentence, the brown dog, you're going to want to learn that brown refers to dog and maybe The refers to dog. So you're gonna want to model those relationships and attention enables you to do just that. And it's not that simple. It's not just like the model is learning adjective noun relationships, it's learning far more complex stuff that we probably don't even have a language to describe. But we just do it intuitively in our heads. So that like that attention layer is the fundamental unit of learning relationships in sequences. And it turns out to be extraordinarily powerful. CRAIG 20:37 And how then does that scale because I've spoken to Ilya [Sutskever] on the podcast, and he talks about seeing the paper, like the next day implementing it in, in what they were doing that that led to the GPT models. How does that scale them into the large language models that we see today. AIDAN 21:12 In their earliest form, it was like a very naive scaling, it was just take it, take the model, and make it bigger. And the way that you do that, as you add more neurons to the network, you add more layers. So it becomes, you know, a much taller model much more deeply stacked. And you just take a much larger dataset than the one that we were considering and a much, much larger model than the one we were considering. And a much larger pool of compute. You plug those all together. And what comes out the other side, I think it shocked virtually everyone. It was quite extraordinary, quite extraordinarily convenient. That simply by scraping more data off the web, not necessarily clean data, or like messy data is just web data, you're just taking in everything. And there's tons of junk out there. But taking in a very noisy, messy, massive data set, and just making the model bigger, throwing some more chips at it. And what came out the other side was something that understood language, in a way I personally thought we were we were decades from. Yeah, it was it was quite a extraordinarily convenient and exciting reality. CRAIG 22:42 So in that led to Bert, is that right? AIDAN 22:49 That that in particular, like Bert predated, or maybe I have them in the wrong order. There's some order there's, there's GPT one, which was the first of these scale up large language model papers. I think Bert predated GPT one, I think. But Bert is a different thing. Bert is kind of like a different beast. Instead of learning to generate language it learns to represent and that's a subtle distinction. Now, like, we're all paying attention to the Generate side, because it's so it's visceral, right? It's like, you can talk to these things they can write back to you. It feels there's a very visceral human reaction to something that can speak to you. AIDAN But there's another side to this whole thing, which is representing language in a numerical form. And that's extremely important. It's hard to overstate how significant that is. And that was like the first killer application of transformers. It was integrated into Google Search and Google themselves describe it as the most significant advance in search quality in I think it was two decades 20 years like basically Google's entire lifespan. So that was, that was amazing. We got we got something we got a model, we got a program that was capable of representing language to be used downstream for applications like search and classification, etc. Extremely, extremely faithfully, like in a very, very high utility way, in a way that just boosted performance in a way we really didn't expect across pretty much any tasks you throw at it. And anytime you want it to use language for some downstream thing, putting a Bert model there and taking the representations from that and running with those representations, you beat state of the art, you outperformed everyone else. So maybe, maybe Bert was like the first seed of this idea. We can take a transformer, we can set it against a very simple task on a very diverse set of data. And what comes out is something that seems to get language, but it seems to just get it. If I'm right, that predated GPT-1, I'm not sure that's true. CRAIG 25:36 You'll forgive me, I want to get to Cohere. But I, I'm a layman. My audience is somewhere in between me and you. I mean, they're, they're fairly sophisticated. But so you've got 20 lines of code. You feed it some data, let's say a sentence. How is it and it's it's relating within the neurons of the or the perceptrons of the multi layer perceptron? It's relating one piece of data, one word, to another word, how is it doing that? Does it is it? Is it by feeding huge volumes of data that it begins to see patterns? Or within that 20 lines of code, Something incredible is happening? Is it possible to explain that? AIDAN 26:46 I think it's not, it's maybe one line of code that leads to that behavior. The other 19 are support. I would say the one line is is the objective. It's like what you're asking the model to do with the data. You're feeding through this, like hypercomplex pool of data. And what does it mean to feed it through? Well, what you're actually doing is you're saying, in the generative case, this is like the GPT style case, you're asking it to given all the words up to a point in a sentence, predict the next one. And that sounds simple. It sounds like stuff we've had for a while, which is like autocomplete tab autocomplete or no, it's like that that objective is horrendously complex. Because if I give you on the internet, there's examples of translation, right? Like these forums online where people teach each other how to speak different languages, and someone asks, Hey, how do I say, the brown dog in Spanish? And then stop, and then the person responds, oh, you say it by? I don't know how to speak Spanish, but whatever it is, right? And so if you ask your model to model this, the only way for it to accurately model this, it has to know how to speak Spanish, because it's seeing the English part saying hey, how do I translate the brown dog into Spanish stop. And now I need to produce the Spanish translation. And so you can see like, just organically, by learning to generate sequences in order, you're forced to learn extremely complex behaviors like translation, like classification, like writing code, you know, at the top of a piece of code, you'll have a function signature, you'll have a comment a docstring, saying, this function does XY and Z, it takes these inputs of this structure and outputs the following. And then if you're going to model that code, you have to learn to program because you're just given a function signature, and then a doc string that humans wrote for other humans to read. And so I think one of the most beautiful things that falls out of this is using this very, very simple structure, which is just hears a ton of data, learn to generate it, learn to predict the next, the next token, you're you think you're asking the model to do something quite simple and minimal. The reality is, you're asking it to do an extraordinarily complex task set of tasks. You're asking it to understand our culture, our language, the interactions between us your app, you're asking it to understand that data at the deepest level and so what you get out the other side is a model that, you know, roughly does understand and does have the capacity to do all that stuff does understand our culture. I think That's another one of these like, beautiful Simplicity's. Such a simple object. Such a simple object Pick, pick the next word. And what falls out of that what you're actually asking you to do. It's so extraordinary. CRAIG 30:15 And when you're - so there's what five? Have you working side by side? How many people were working on the project? I think weren't there five or six names on the paper? I think there were eight or eight? Yeah. But in any case, you're it? Was there a moment? Or did you know, going in just from whiteboarding that, wow, this this could work? Or was there a moment when you were, you know, running tests that you began to see these extraordinary results and knew you were on to something amazing? AIDAN 31:01 Yeah, there are definitely moments where like, someone would come running over from their desk and be like, Yo, come, look. And they had just run the eval. And it was like, it was state of the art beat everything that came before. And then we would all be like, next, okay, let's, let's keep pushing. And the funny thing is, it came together so quickly, it was really like over the span of three months. This wasn't like a year long effort or anything like that. It was just like super fast iteration pace. I don't know if there was a moment, I really don't think anyone fully grasped the significance. And that's mostly because the significance wasn't there at the time. The significance came from the fact that people adopted it, they could have adopted something else, they could have leaned into something entirely different. They chose a transformer for whatever sort of mimetic effects led to that. But they chose a transformer, they started investing, the community started investing tons of time in building infrastructure and support all the way down to the hardware level, for this particular architecture. And they enabled us to us being the entire, like aI community, to consolidate, consolidate on one architecture. And so I've said this before, and I, I feel quite confident almost everyone on the paper would agree. The transformer could have, it could have been another model. Frankly, it could have been another model, the transformer was just this bliss, they had the best support, and then the community reinforced that. And the community made some sort of decision to consolidate on this architecture and really invest in it, and they made it a success. It could quite easily have been another architecture that similarly scaled up, well saturated compute. Well, CRAIG 33:20 you think there are other architectures out there that could that just haven't been discovered or explored? That could lead to such dramatic results? AIDAN 33:34 Absolutely, like, unequivocally, I think, definitely. They exist, they're out there. And with enough work and effort, maybe we could flip to another architecture, but we've already done half a decade of infrastructure development and software support and you know, writing highly optimized kernels for the the hardware for transformers. And so there's a there's like this resistance to move, and it would take a lot of community will willpower to move away from the transformer. And the only thing that would motivate that is like some new substantial breakthrough at the architecture level. Yeah, so I don't see that happening. But I also don't make the claim that like, the transformer architecture is something like divine. Yeah, clearly, you need pieces. CRAIG 34:34 I mean, right. But presumably, these large language models themselves could at some point suggest other architectures. AIDAN 34:48 Yeah, people have wanted to use models in that sort of like feedback loop. Yeah, yeah. I think that's definitely we're already starting The chip architectures being decided by by models. No one's heard, right. Yeah. And so the chips train the model and the model change, you know, decides the next generation of the chip. And there's this feedback loop that CRAIG 35:20 who's doing that? AIDAN 35:23 Google mostly there V four or five TPU. chips were model placed designed. Yeah. So I think that's, that's exciting. That happens on a super slow timescale, because it just takes so long to actually fabricate chips, push them out, verify them. So that happens, too slow a timescale. The stuff that you're describing, like the architecture search projects, I would say those have actually surprisingly been quite low yield. And that's probably because humans have spent so much time on neural net architectures, they've explored that space so thoroughly, and done a pretty like pretty compelling job at it. And so when we threw models at it, like, the gains were marginal always. Or, or they like rediscovered stuff that we had discovered previously, and kind of missed. And they just brought it to light, they surfaced it again. So people have kind of tried that. But it seems like in architecture space, it's actually it seems to have been saturated. Or perhaps the methods used, this was also a Google. Perhaps the methods used weren't the right ones. It's hard to say. But there was an effort to try to get models to produce new model architectures and have this self improving feedback loop. And I would, I would say that it largely fell flat. CRAIG 37:05 So you, you went then from Google? Web, tell me about how you started Cohere? AIDAN 37:15 Yeah, so I spent the better part of three years bouncing around. So I was in Mountain View for the transformer. And then I went to Toronto, and Geoff said, Hey, come come and hang out at Google and in Toronto. And then I graduated from undergrad, I went to Oxford for my PhD, Jakob from the transformer paper, he had actually decided to leave Mountain View and go back home to Berlin. And he was like, Yo, I'm going to set up a brain office in Berlin. And so I was like, Hey, that's pretty close, like a 40 minute flight from London. Let's work together. And so then I was on a plane every two weeks to Berlin to see Jakob and work there. And eventually eventually, I just realized, like there was a revolution kind of promised. Back when I was in Mountain View, just after we had released the transformer paper publicly Noam immediately started working on language modeling, and scaling the models up and he was like, actually deeply involved in the GPT one paper, he was helping OpenAI with it. And then I went back to Toronto, and I got an email from Lukasz. And he's like, Hey, have a look at this. And in that email, there was a Wikipedia article. And the title was the transformer and then I saw I was like, Oh, hey, this Wikipedia article on this I kept reading down and then with a Japanese punk band and consistently these members and this member had left and I was just like, What the fuck like Lukasz What is this? He was like But transformer wrote this I just put in the transformer as the title everything else. And that was just like, You're kidding. Like, it was like surreal. It was just like, you know, you went to bed one night and models could barely spell and then you woke up the next morning and they were writing as fluently as a heat like such a plausible story about a Japanese punk band called The transformer and I I think that was like the moment that I was like, Okay, this unlocks in product space this unlocks something categorically different like it just something extraordinary. AIDAN And I thought it was gonna happen. And I waited and I waited and I you know, I was In my PhD, and I was putting out new research and proving fundamental methods. And after three years there, nothing had changed, the world was the same. And Nick and Ivan, my co founders, like, I think we all felt the same disappointment. Nothing had changed. We saw something magical three years ago, and nothing had changed. No one's talking about it. And so eventually, that disappointment turned into resolve to do it ourselves. And so we decided, okay, let's leave. And let's go build cohere to bring this to the world. This is before GPT-3, just after GPT-2, in in 2019. And back then the mission was really just a, this is the most amazing technology that humans have ever created. Let's model the web, let's build a model of the entire Internet. And be, let's put it into the hands of every single developer on Earth. And Let's inject it into every single product and just create a new generation of magical product experiences. So that was really the seed. Yeah. CRAIG 41:29 And then, so Cohere is, at its core, a large language model, or a suite of models. For different vertical tasks are what describe what what it is, and how people use it. AIDAN 41:53 So at its core, yeah, it's like a, we're an intelligence factory, building these big models, making them as usable, as usable as useful as possible. There are like a suite of models, we have both sides of that coin that I was describing before, where there's the generative and then representation, so both styles representation, and GPT styles, the generative side. So we have both of those, and we build them in house. AIDAN The way that we bring them to the world is that we partner with enterprises, and we solve really, what what are some of the today's largest blockers for adoption, which are privacy, privacy blockers, data compliance blockers, if you're really gonna put these large language models into useful applications at the forefront of your product, they're going to be touching data that's the most sensitive, like user data, right, like people's private data. And so that very, very high security bar. So for us, one of the benefits of being independent, our competitors mostly are sort of bound to one cloud provider. There's exclusivity there. For us being independent means we can play with everyone. And with the enterprises that use us, they don't get vendor lock in. So they're not trapped into one cloud provider. They can bounce between, and we can deploy wherever they go. AIDAN So for cohere, one of our core efforts right now is making it so that these models can be deployed on any cloud provider, in situations where the data is the most sensitive, because that enables the most interesting and impactful applications. Otherwise, you you kind of get what I've been seeing a lot of recently, which is superficial deployments of these models, not real, not product changing, not like fundamental shifts in infrastructure, but more like, here's my product, and I'm just tacking it on to the side. Here's like a delivery experience. I think that makes a lot of sense, given the fact that this year everyone just kind of like woke up. And so it's gonna take a while to actually replace this with the the thing that we want. So it makes sense. AIDAN But really, the piece that's blocking this is the fact that there's not a lot of trust in some of our competitors due to the fact that in the past, they've trained on their user data and they disintermediated people. And so for us, we want to regain that trust and be the trusted partner for enterprise to actually bring large language models into like a truly transformative way. So I think there's like right now. There's a product transformation that's kind of similar laying under the water, because the whole world just woke up, every single company now is trying to figure out what does this mean? What does this technology mean for my product? My experience? What am I users? My the consumers? What are they going to expect from me? How do I not get left in the dust by my competitors who are going to reinvent their product on the back of this technology? So they're starting to do the work. AIDAN in 18 months, product space is gonna look completely different, because right now, everything is shifting behind the scenes. And so for cohere, we really want to power that transformation, and be a trusted partner to the largest enterprises and the best developers on Earth. CRAIG 45:49 And and enterprises span the gamut of industrial verticals, or are you focused on one industry, AIDAN 46:02 it's totally, totally horizontal. So it impacts everything. Like, I think you're going to be doing your banking with a conversational agent, you're going to be doing your shopping with a conversational agent, I think it's really hard to think of a particular vertical or industry that's, that doesn't need to be changed by this, because consumer expectations are going to be, there's going to be this interest, when I show up to this new product, there's going to be this interface that I expect, which is language. So with these interface level changes, and in the same way that if you're a product or a service, you have to have a mobile app, because everyone's on their phones. And that's, you know, how they want to interact with products and services, in that in that same way that that mobile transition led to everyone having to support this interface that the consumer expected, everyone is going to have to support conversation and dialogue with an intelligent agent, as an interface onto your products and services. So there's like this resurfacing of product space that is literally happening right now. CRAIG 47:11 Is there an example without naming names that you can give that you think is gonna blow everybody away? AIDAN 47:21 I mean, it's no secret. It's no secret that we're starting to see some very compelling assistant like offerings. There were the promises with Siri and Google Assistant and Alexa that came 10 years ago, or whatever it was. And those fell flat, and I think the technology truly just was not there to support it. There is now the possibility of a truly general assistant. Like we actually have the technological bedrock to support that. it's emerged recently, it's a fairly recent development that that has been unlocked as a thing you could possibly build. Yeah. CRAIG 48:19 You know, I talked to Ilya about RLH F reinforcement learning with human feedback, his way of kind of guiding the model toward more grounded responses. But I've talked to other people who say that's still speculative and and takes a lot of time and they're, they're using vector databases and loading vector databases with authoritative data. And then the language model in effect is just the the mouthpiece it's it's not it's it's not calling up the answers from it's accumulated knowledge it's referring to this vector database How do you guys deal with hallucinations AIDAN 49:25 Yeah, I there's like there's someone Sarah hooker echo here. She said this before and I really I really like it. You have to distinguish between the hallucinations that you want which are like creativity, and the hallucinations that you don't want it like it's great when it hallucinates a story or a new joke or you know, you want that and so you don't want to like beat that ability, that capability out of the model. At the same time you need ways to control it. So for instance, if you're doing knowledge gathering or research, you definitely don't want anything made up. There's like almost zero tolerance for hallucination. AIDAN And so you kind of want a gradient, or a parameter that you want to set, which might be the creativity parameter. And I think that's becoming increasingly possible. Another another really good way to get models to be more truthful, is to actually force them to cite their work. So there's Patrick Lewis, he was the first author of meta on creating rag. It's called retrieval augmented generation. And so that's this idea that you have a model. And you have an external knowledge base, or maybe multiple external knowledge bases, maybe one's Google ones, your private emails, ones, your blah, blah, blah. And what the model can do is it can go out, and it can query these sources. So it can say, hey, the user just asked me about this, I think I should query Google. And then it gets back from Google some documents or gets back from your email, whatever emails you're looking for. And then now that it can read those, it can generate a response, and it can cite back to them it can say, hey, you asked me this. I think this is the answer, because of this sentence inside of this document or this webpage. By forcing the model to learn to cite it sources, you get two things. One is the fact that you can actually check its you can verify it right? You can check that it's telling the truth, you click into that link, you can read the thing, and you can say it lied. Or you can say, oh, no, it's right. Yeah, you know, that checks out. So one is you get it two sided sources. The other thing is that you force the behavior, you reinforce the behavior into the model of not making claims without grounds to those claims. And so it starts to learn the scenarios where, you know, when I'm writing stories, I don't really need to cite sources, I just need to write and the user is happy and content and you know, I get a good reward. And in the scenario of ham doing research on a topic, can you tell me about X, it starts to learn, okay, shit, in this case, I need to have a very rigorous bibliography, I need to be able to tie that back. And if I mess up, if the user clicks through and sees a, an error or a hallucination, I'm wanting to get a super strong negative feedback. And so it learns to differentiate between these scenarios. So I really do believe retrieval augmentation is going to be one of the key pieces of along with human feedback, it's gonna be one of the key pieces of making these models more reliable, more grounded. CRAIG 53:07 That's fascinating. I'm coming up to an hour. Can I ask a few more questions? Yeah. Yeah. I've got to ask this, you know, this has set off sort of the public release of chat GPT. has set off this debate about how dangerous these models can be. To everyone's surprise, Geoff has gone public saying some really dire things. Which, you know, I don't know, like you do, but I've known him for a while and it's It surprises me. I've never heard him speak that darkly about something. Do you have a view on that? That's one question. And then the other is this debate about sentience or self awareness? I mean, you've you had your fingers in the, in the brain of of these things. Do you? Do you think that sentience or self awareness could really emerge? Or do you think that you know, these are bits of code and it's all an illusion? AIDAN 54:40 There's a lot to say that we need we need another hour or two together to properly represent my beliefs around that question, I think the first part for Geoff Geoff is like Geoff went through the same thing. I think many of us in the field went through where our timelines got pulled forward massively. And so it you know, we thought we'd have models that could write compelling English and a few decades and then suddenly it shows up a year later. And so that throws you into this state of shock and uncertainty. And you're quite caught off guard and he's spoken about this I think publicly. The scent sentiment of surprise, progress and rate of change. I remember having conversations with him myself, where both of us were kind of like these people who talked about AGI you know, what, nonsense, haha, this was back when models could barely spell. But then you kind of get surprised and shocked and your your uncertainty blows up. And sometimes that can have the effect that okay, anything's possible. Oh, my God, like I was so far off on that. But now I am shooting up my uncertainty across all anything could be possible super intelligent god. Okay, maybe that's even. So I think like, a lot of folks. We're all reckoning with that and recalibrating. And, you know, adjusting our, our own timelines and understandings of progress and pace. AIDAN Geoff is extraordinarily thoughtful. And he's been thinking about this, since at least the beginning of cohere. So at least the last three and a half years, he's been thinking about this very, very deeply. So I think people should take him very seriously. I think there will be a lot of sensationalism. And a lot of extrapolation from what he's saying. But if you actually listen to what, what he's actually saying, it's quite a measured, he's like, I'm highly uncertain about what can happen. And that means we should take this stuff seriously. Because we just don't have certain bounds. We don't have certainty around the future. And so we should be taking all the different possibilities, quite seriously. Not saying that they're likely to happen. But just saying that we they can't be ruled out yet. And so let's take them very seriously. I think there's a lot of journalistic texts and headlines and clickbait and nonsense. But if you actually listen to Geoff, I think his take is quite measured and reasonable. CRAIG 57:47 And, and actually, I'd love to have you back on to talk at length about these things, but on the idea of sentience or the illusion of sentience. I mean, you've, you know, more than almost anybody having built these models, and both what they're capable of and, and what's behind their expressions. Do you think that? I mean, it's a philosophical question about what what sentience or consciousness is, whether it's, you know, whether our consciousness is just an emergent property from the neural activities of our brain and, and, and it's largely illusion. I mean, just what would you say to all of that? AIDAN 58:54 Yeah, I would say, I don't place like a divinity on on humanity. I think that consciousness is in the brain. And it is like a physical process. And it's maybe like, maybe consciousness is what computing feels like, like, what processing feels like? And if that's the case, it's really hard to argue that that same phenomena couldn't be present in silicon. I think you'd really have to, I think there has to be a leap right to say that the circuits in our brain because they're human, or because they're biological have some sort of fundamental distinction. I think you really have to take a leap of faith there. And so if I'm just saying being pragmatic and reductive. Again, we need two hours to discuss this, I think more completely, but I think it'd be really, just as a scientist, I think it'd be really hard for me to say, There's no way these machines could become sentient. I just I can't construct an argument that and that. CRAIG 1:00:27 Yeah. Yeah. Well, let's leave it there. But Can Can I get a promise that you'll come back? You know, in a few months, and we can go deep on that subject. Yeah, I'd love AIDAN 1:00:43 to. Yeah. CRAIG 1:00:46 Okay. Yeah. Aidan. This has been really fascinating. I'm delighted. And I'm sure you heard at the MIT Tech Review Conference. Somebody asked Geoff, he was on virtually from the UK, but somebody asked him whether he would divest himself cohere. And he said, No, no, he's, he's gonna stay invested. So yeah. Yeah, that's a that's a funny question. Yeah. Yeah. Okay, great. Well, I really appreciate your time and, and we'll talk again. CRAIG That's it for this episode. I want to thank Aiden for his time. I also want to remind you to check out NetSuite Oracle's business management software for enterprise resource planning, financial management, customer relationship management and E commerce, among other things. Go to netsuite.com/eyeonai to take advantage of this offer. CRAIG And remember, the singularity may not be near. But AI is about to change your world. So pay attention

Info

Channel: Eye on AI

Views: 23,593

Rating: undefined out of 5

Keywords: AI language models, future of language AI, revolutionary technology, communication revolution, artificial intelligence, natural language processing, AI advancements, transformative technology, human-machine interaction, language models, AI impact, future technologies, cutting-edge innovation

Id: -xobW4jh66U

Channel Id: undefined

Length: 62min 4sec (3724 seconds)

Published: Wed May 24 2023