GPT-4o, AI overviews and our multimodal future

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome to mixture of experts I'm your host stim Hong each week mixture of experts brings together a world-class team of researchers product experts Engineers uh and more to debate and distill down the biggest news of the week in AI today on the show The Open Ai and Google showdown of the week who's up Who's down who's cool who's cringe what matters and what was just hype we're going to talk about the huge wave of announcements coming out of both companies this week and what it means for the industry as a whole so for panelists today on the show I'm ay supported by an incredible panel uh two veterans who have joined the show before and a a new uh contestant has joined the ring um so first off uh Varney he's the senior partner Consulting for AI in US Canada and latam welcome back to the show thanks for having me back Tim love this yeah definitely glad to have you here uh Chris hey who is a distinguished engineer and the CTO of customer transformation Chris welcome back hey nice to be back yeah glad to have you back uh and joining us for the first time is Brian Casey who is the director of digital marketing who has promised a 90-minute monologue uh on AI and search summaries which I don't know if we're gonna get to but we're gonna have him have a say Brian welcome to the show we'll have to suffer through show bit and Chris for a little bit and then we'll get to the monologue but thank you for stuff yeah exactly exactly um well great well let's just go ahead and jump right into it so obviously there were a huge number of announcements this week open AI came out of the gate with its kind of raft of announcements uh Google IO is going on and they did their set of announcements and so really more things I think were debuted promised coming out then we're going to have the chance to cover on this episode but sort of from my point of view and I think I wanted to use this as a way of organizing the episode there were kind of three big themes coming out of Google and open AI this week we sort of take in turn and use to kind of make sense of everything so I think the first thing is multimodality Right both companies are sort of obsessed with their models taking video input and being able to make sense of it and going from you know image to audio text to audio um and I want to talk a little bit about that second thing is latency and costs right everybody touted the fact that their models are going to be cheaper and they're going to be way faster right and you know I think if you're from the outside you might say well it's kind of a difference in kind things get faster and cheaper but I think what's happening here really potentially might have a huge impact on Downstream uses uh of AI and so I want to talk a little bit about that Dimension and sort of what it means um and then finally uh I've already kind of previewed a little bit um Google made this big announcement that I think is almost literally going to be like many people's very first experience with llms in full production uh Google basically announced that going forwards uh the US market and then globally uh those users of Google search will start seeing AI summaries at the top of each of their sort of search results um that's a huge change we're going to talk a little bit about what that means and um if it's good I think is a really big question uh so looking forward to diving into it [Music] all so let's talk a little bit about multimodal first so there's two showcase demos from Google and open Ai and I think both of them kind of roughly got at the same thing which is that in the future you're going to open up your phone you're going to turn on your camera and then you can wave your camera around and your AI will basically be responding in real time and so show I want to bring you in because you were the one who kind of flagged this being like we should really talk about this because I think the big question that I'm sort of left with is like you know where do do we think this is all going right it's a really cool feature but like what kind of products do we think it's really going to unlock and maybe we'll start there but I'm sure I mean this topic goes into all different places so I'll give you the floor to start so Monday and Tuesday were just phenomenal infliction points for the industry altogether is getting to a point where an AI can make sense of all these different modalities it's an insanely tough problem we've been at this for a while we've not gotten it right we spent all this time trying to create pipelines to do each of these speech to text and understand and then text it takes a while to get all of the processing done the fact that in 2024 we are able to do this what a time to be alive Man U I just feel that we are getting finally getting to a point where your phone becomes an extension of of your eyes of your listening in and stuff like that right and that is a that has a profound impact on some of the workflows in our daily lives Now with an IBM I focus a lot more on Enterprises so I'll give you more of an Enterprise a view of how these Technologies are actually going to make a make a difference or not in both cases gini and and open eyes is 40 and by the way in my case 40 does not stand for Omni Omni for me 40 means oh my God it was really really that good so U we're getting to a point where there are certain workflows that we do with Enterprises like you are looking at transferring Knowledge from one person to the other and usually you're looking at a screen and you have a bunch of here is what I did how I Sol for it yeah we used to spend a lot of time trying to capture all of that and what happened in the desktop classic BP processes these are billions of dollars of work that happens right yeah and I think I pause you there like I'm curious if you can explain because again this is not my world I'm sure a lot of listeners aren't it isn't their world as well how did it used to be done right like so if you're you're trying to like automate a bunch of these workflows is it just people writing scripts for every single task or like I'm just kind of curious about what it looks like yeah so Tim let's let's pick a more concrete example say you have Outsourcing a particular piece of work and your Finance documents coming in you're comparing it against other things you're finding errors you're going to go back and send the email things of that nature right so we used to spend a lot of time documenting the current process and then we look at that 7 29 step process and say I'm going to call an API I'm going to write some scripts and all kinds of issues used to happen along the way unhappy PA so and so forth so the whole process used to be codified in some some level of code and then it's deterministic it does one thing in a particular flow really well and you canot interrupt it you can't just barge in and say no no no this is not what I wanted can you do something else so we're now finally getting to a point where that knowledge work that work that used to get done in a process that will start getting automated significantly with announcements from both Google and uh open ey so far people would solve it as a decision step-by-step flowchart but now we're at Paradigm Shift where I can in the middle of it interrupt and I can say hey see what's on my desktop and figure it out I've been playing I've been playing around with with opening eyes 40 its ability to go look at a video of a screen and things of that nature it's pretty outstanding we are coming to a point where the the speed at which the inference is happening is so quick then now you can physically we can actually bring them into your workflows early it was just take so long it was very clunky it was very expensive so you couldn't really justify adding AI into those workflows it'll be you do liver Arbitrage or things of that nature versus trying automated so the these kind of workflows infusing AI in doing this entire process into an phenomenal unlock one of my clients is um big CBG company and uh as we walk into the aisles they do things like planograms where you're looking at a picture of the of shelf and these consumer product goods companies would give you us a particular format in which you want to keep different chips and drains and so on so forth and each of those labels are turned around or they are in different place you have to audit and say am I placing things on the sh the right way like the consumer product goods wanted to that's called plog real plog IDE here so earlier we used to take pictures a human would go in and note things and say yes I have enough of the bottles in the right order then we started to take pictures and analyzing it you start to run into real world issues you don't have enough space to back up and take a picture or you go to the next Isis and the lighting is very different and stuff like that so AI never quite scaled and this is the first time now we're looking at models like Gemini and others where I can just walk past it and as create a video and just feed the whole 5 minute video in with this context length of 2 million plus and stuff it can actually inest it all number do missing yeah right so those those kind of things that were very very difficult to do for us earlier those are becoming a piece of K the big question here is how do I make sure that the AI phenomal stuff that we seeing is grounded in Enterprise so it's my data my planogram style or my processes my documents not getting Knowledge from elsewhere so in all the demos one of the things that I was missing was how do I make it go down a particular path that I want right if the answer is not quite right how do I control it so I think a lot more around how do I bring this to my Enterprise clients and deliver value for them those some of the open questions Chris I totally I do want to get into that I see Chris coming off mute though so I don't want to break his role I don't know if Chris and you got kind of a view on this or if you disagree you're like ah it's actually not that impressive uh Google Glasses back baby yeah yeah no so I I think I think multimodality is a huge thing and covered it correctly right there's so many use cases in the Enterprise but also in uh consumer based uh scenarios and I think one of the things we really need to think about is we've been working with llms for so long now which has been great but the 2D Tech space isn't enough for generative AI it's it's we want to be able to interact real time we want to be able to interact with audio um you know and you can take that to things like contact centers where you want to be able to transcribe that audio you want to then have AIS be able to respond back in a human way and you want to chat with the assistants like like you saw on the open AI demo you know you don't want to be sitting there go well you know my conversation is going to be as fast as my fingers can type you want to be able to say hey you know what do you think about this what about that and you want to imagine new scenarios so you want to say what what does this model look like what does this image look like you know tell me what this is and you want to be able to interact with the world around you and to be able to do that you need multimodal uh models and and therefore like in the Google demo where you know yeah she picked up the glasses again you know so I jokingly said Google Glasses back but but it really is it's if you're going and having a shopping experience retail and you want to be able to look at what the price of a mobile phone is for example you're not going to want to stop get your phone out type type type you just want to be able to interact with an assistant there and then or see in your glasses what the price is and I give the mobile phone example for a reason which is the price that I pay for a mobile phone isn't the same price as you would pay right because it's all contract rates and if I go and speak if I want to get the price of how much am I paying for that phone it takes an advisor like 20 minutes cuz they have to go look up your contract details Etc they have to look up what the phone is and then they do a deal mhm in a world of multimodality where you've got something like glasses on it can recognize the object it knows who you are and then it can go and look up what uh what the price of the phone is for you and then be able to answer questions that are not generic questions but specific about you your contract to you right exactly that that is where multimodality is going to start start to come in kind of sounds like right yeah totally I mean Chris if I have you right I mean this is one of the questions I want to pitch to both you show and you Chris on this is you know actually my mind goes directly back to Google Glass like the the bar where the guy got beat up for wearing Google Glass years ago that was like around the corner from where I used to live at San in San Francisco oh wow and you know there's just been this dream and obviously all the open AI demos uh and Google demos for that matter are all very consumer right that you're walking around with your glasses and you're looking around the world and you know get prices and that kind of thing this been like a long-standing Silicon Valley dream and it's been very hard to achieve and I guess one thing I want to run by you is like and the answer might just be both or we don't know is like if you're more bullish on the beta b side or on the beta C side right because I hear what chit's saying and I'm like oh okay I can see why Enterprises really get a huge bonus from this sort of thing um and and I guess it's really funny to me because I think there's one point of view which is everybody's talking about the consumer use case but the actual near-term impact may actually be more on the Enterprise side but I don't know if you guys buy that or if you really are like this is the era of Google Glass you know it's it's back baby so so I can start first Tim um we have been work with apple Vision quite a bit um with an IBM with our clients and a lot of those are Enterprise use cases in a very controlled environment so things that where things break in the consumer world you don't have a controlled environment you have Corner cases that happen a lot right in an Enterprise setting if I'm help if I'm wearing my my vision Pros for two hours at a stretch doing I'm a mechanic I'm fixing things right that's a place where I need additional input and I can't go look at other uh things like pick up my cell phone and work on it I'm underneath I'm I'm fixing something in the middle of it right those use cases because the environment is very controlled I can do AI with higher accuracy it's reputable I know I can start trusting the answers because I have enough data coming out from it right so you're not trying to solve every problem but I think we'll see a higher uptake of these devices U by the I love the the Rayband glasses from meta as well great great to do something quick but when you don't want to switch but I think we we have moving to a point where Enterprises will go deliver these at scale the tech starts to get better and adoption is going to come over on the B Toc sign but in the consumer goods we'll have multiple attempts at this like we had with Google classes and stuff it'll take a few attempts to get better on the Enterprise side we will learn and make the models a lot better but I think there's insane amount of value that we're delivering to our clients with apple Vision Pro today in Enterprise settings I think it's going to follow that problem totally yeah and it's actually interesting I hadn't really thought about this in chis in is like um basically like the phone is almost not as big of competition in the Enterprise setting right whereas like the example that Chris gave was like literally you're trying to be like is this multim modal device faster than using my phone in that interaction which is like a real competition but if it's something like a mechanic you know they don't have they don't they can't just pull out their phone um Chris any final thoughts on this and then I want to move us to our next topic yeah and I was just going to give another kind of use case scenario I I often think of things like the oil rigs exam example so a real sort of Enterprise space where you're wandering around and you have to go and do safety checks on various things and most of their time if you think of the days before the mobile phone or before the tablet what they would have to do is go look at the part do the inspection the visual inspection and then walk back to a PC to go fill that in and then these days you do that with a tablet on the rig right but but then actually you need to find a component you're going to look at you have to do the defect analysis you want to be able to take pictures of that you need the G location of where that part is so that the next person can find it and then you want to be able to see the notes that they had before on this and then you got to fill in the safety form right so they have to fill in a t ton of forms so there's a whole set of information if you just think about AI just having you know even your phone or glasses pick either to be able to look at that part be able to have the notes contextualized in that geospatial space be able to fill in that form be able to do an analysis with AI it's it's got a huge impact on Enterprise cases and probably multimodality in that sense has probably got a bigger impact I would say in the Enterprise cases than the Consumer spaces even today and I and I think that's something we really need to think about the other one is and again I know you wanted this to be quick there Tim is the clue and generative AI is the generative part right so actually I can create images I can create audio I can create music things that don't exist today so and with the text part of something like an llm then I can create new creative stuff I can create develops pipelines doer files whatever so there comes a part where I want to visualize the thing that I create I don't want to be copying and pasting from one system to another right that's not any different from the oil rig scenario so as I start to imagine new new business processes new pipelines new uh Tech processes I then want to be able to have the real-time visualization of that at the same time or be able to interact with that and that's why multimodality is is really important probably more so in the Enterprise space yeah that's right I mean I think some of the experiments you're seeing with like Dynamic visualization generation are just like very cool right uh because then you basically have you can say like here's how I want to interact with the data the system kind of just generates it right on the Fly um which I think is very very exciting all right so next up I want to talk about latency and cost so this is another big Trend you know I think it was very interesting that both companies went out of their way to be like we've got this offering and it's way cheaper for everybody um which I think suggests to me that you know these big huge competitors in AI all recognize that like your your per token cost is going to be this huge bar to getting the technology more distributed um so certainly one of the ways they sold 40 was that it was cheaper and as good as GPT right everybody was kind of like okay well why do I pay for pro anymore if I'm just going to get this for for free and then Google's bid of course was Gemini 1.5 flash right which is okay it's going to be cheaper and faster again um and I know Chris you threw this uh sort of topic out so I'll kind of let you have the first say but I think the main question I'm left with is like what are the downstream impacts of this right for someone who's not really paying attention to AI very closely like is this just matter of like it's getting cheaper or do you think like these are actually these economics are kind of changing how the technology is actually going to be rolled out I think latency and smaller models and tokens are probably one of the most interesting challenges we have today so if you think about like the GP T4 and everybody was talking like oh that's a 1.8 trillion model or whatever it is that's great but the problem with these large models is every layer that you have in the neural network is adding time to get a response back and not not only time but cost so if you look at the demo that open AI did for example what was really cool about that demo was the fact that when you were speaking to the assistant it was answering pretty much instantly right and that is the real important part and when we look at previous demos what you would have to do if you were having a voice interaction is you'd be stitching together kind of three different pipelines you need to do uh Speech to Text then you're going to run that through the model and then you're going to do text to speech back way so you're getting latency latency latency before you you get a response and that timing that it would take because it's not in the sort of 300 millisecond mark it was too long for a human being to be able to interact so you got this massive pause so actually latency and the kind of tokens per second becomes the most important thing if you want to be able to interact with models quickly and be able to have those conversations and that's sort of why also multimodality is really important because if I can do this in one model as well then it means that I'm not sort of jumping pipelines all the time so the smaller you can make the model the faster it's going to be now if you look at the GPT 4 on the model I don't know if you've played with just a text mode it is lightning fast when it comes back very fast now yeah it's and noticeably so like it's just like it feels like every time I'm in there's like these improvements right so and and this is what you're doing you're sort of trading off reasoning versus uh speed of the model right and and as we move into kind of agentic platforms as we move into multimodality you need that latency to be super super sharp because you're not going to be waiting all the time so there is going to be scenarios where you want to move back to a bigger model that is fine um but you're going to be paying the cost and that cost is going to be the cost uh the price of the tokens in the first place but also the speed of the response and I think this is the push and pull that model creators are going to be playing against all of the time and and and therefore if you can get a similar result from a smaller model and you can get a similar result from a faster model and a cheaper model then you're going to go for that but in those cases where it's not then you may need to go to the larger model to kind of reason so this this is really important totally yeah I think there's a bunch of things to say there I mean I think one thing that you've pointed out clearly is that like this makes conversation possible Right like that you and I can have a conversation in part because I have low latency is kind of the way to think about it and like now that we're reaching kind of human like parody on latency you know finally these models can kind of Converse in a certain way the other one is actually I really thought about that there is kind of this almost like thinking fast and slow thing where basically like the models can be faster but they're just not as good at reasoning um and then there's kind of this like deep thinking mode which actually is like slower in some ways so Tim U the way we are helping Enterprise clients again have that kind of focus in in life there's a split there's a there's there are two ways of looking at applying gen in the industry right now one is at the use case level you're looking at the whole workflow into to end seven different steps the other is going and looking at it at a subtask level right so I'll just take pick an example I'll walk you through it so say I have an invoice that comes in and I'm taking an application I'm pulling something out of it I'm making sure that that's as for the contract I'm going to send you an email saying your voice is paid right so some sort of a flow like that right so say it is seven steps just very simplified right I'm going to P things from the backend systems using apis step number three I'm going to go call a fraud detection model that has been working great for three years step number four I'm extracting things from a paper right an invoice that came in that extraction I used to be doing with OCR 85% accuracy humans will do the Overflow of it at that point we're taking a pause and saying we have reason to believe that llms today can look at an image and extract this with higher accuracy yeah say we get up to 94% so that's nine points higher accuracy of pulling things out so we pause at that point and say let's create a set of constraints for step number four to find the right athletes and the constraint could be what's the latency like we just spoke how quickly I need the result or can this take 30 seconds and I'll be okay with it second could be around cost if I'm doing this a thousand times I have a cost envelope to work with versus a human doing if I'm doing it a million times I can invest a little bit more if I can get get accuracy out of it right so the ROI becomes important then you're looking at security constraints around does this data have any identified Phi data Pi data that really can't leave the cloud I have to bring things closer or is this something that is military grade secrets and has to be on Prem right so have certain constraints around that so you come up with a list of five six constraints and then that lets you decide whether what kind of an llm will actually check off all these different constraints and then you you start comparing and bringing it in so the split that we seeing in the market is one way with llm agents and with these multimodal models they're trying to accomplish the entire flow work for end to end like you saw with Google's returning the shoes right it's taking an image of it is going and looking at your Gmail to find the receipt starting the return giving your a QR code with the whole return process done so just figured out how to go create the entire endtoend workflow but where the Enterprises are still focused is more on the subtask level that point we are saying this step step number four is worth switching and I have enough evals before and after I have enough metrics to understand and I can control that I can audit that much better the thing that from an Enterprise perspective these end to end multimodal models it'll be difficult for us to explain to SEC for example why we rejected somebody's benefits on a credit card things of that nature so I think in the in the Enterprise World we're going to go down the path of let me Define the process I'm going to pick small models to Chris's point to do that piece better and then eventually start moving over to hey now let me make sure that those that framework evals and all of that stuff can be applied to intoing multim models I guess I do want to maybe bring in Brian here you like release the Brian on this conversation um because I'm curious about like kind of like the marketer view on all this right because I think there's one point of view which is yes yes chrisit like this is all nerd stuff right like I yeah know it's like latency and cost and speed and whatever the big thing is that you can actually talk to these AIS right and I guess I'm kind of curious from your point of view about like I mean one really big thing that came out of like the open AI announcements was we're going to use this latency thing largely to kind of create this feature that just feels a lot more human and lifelike um than you know typing and chatting within Ai and I guess I'm kind of curious about like you know what you think about that move right like is that ultimately like going to help the adoption of AI is it just kind of like a weird sci-fi thing that open AI wants to do and also I mean I think if if you've got any thoughts on you know how it impacts the Enterprise as well was just like do companies suddenly say oh I understand this now right it's because it's like the AI from her I can buy this um just kind of interesting thinking about like the the sort of surface part of this because it actually will really have a big impact on the market as well it's kind of like the technical advances are driving the the marketing of this I I mean I do think when you when you look at like some of the initial reviews of I want to say like the pin and rabbit like I remember one of the one of the scenarios that was being demoed was I think I think he was looking at a car and he was asking a question about it and the whole interaction took like 20 seconds there and he went through he was just showing that he could do the whole thing on his phone in the same amount of time but the thing that I was thinking about when I was watching that was like he just did like 50 steps on his phone that was awful as opposed to just pushing a button and asking a question and it was like it was very clear that the ux interaction of just like like asking the question and looking at the thing was a way better experience than pushing the 50 buttons on your phone but the 50 buttons still won just cuz it was faster to do 50 buttons than to you know deal with the latency impact of um of where we were before and so it actually it reminded me a lot of just the way I used to hear remember hearing Spotify talk early about the way that they thought about latency and the things that they did to just make the first 15 seconds of a song Land um essentially so that it felt like you know a like a file that you had on your device because I think from their perspective they if it felt like every time you wanted to listen to a song that was buffering as opposed to sitting on your device you were never going to really adopt on that thing because it's horrible experience relative to just having the file locally and so they put in all this work so that it felt the same and that wound up being a huge part of how the technology ended up getting and the product ended up getting adopted and you know I do think there's a lot of a lot of stuff we're doing that is almost like I don't want to say back office but like just Enterprise processes around how people do things operational things but there are plenty of ways where people are thinking about the way that we do more with like agents in terms of how that involves like customer experience whether it's support interactions whether it's like bots on the site you can just clearly imagine that that's going to play a bigger role in customer experience going far forward and if you feel like every time you ask a question that you're waiting 20 seconds to get a response from this thing like you're just getting the other person on the end of that interaction is just getting matter and matter and matter the entire time where the more it feels like you're talking to person and that they're responding to you as fast as you're talking I think the more likely it is that people are going to accept that as an interaction model um and so I do think that that latency and like making that feel to you like to your point about having a human beings being zero latency um I think that's a necessary condition for a lot of these interaction models and so it's going to be super important going forward and to me it's also when I think about the Spotify thing it's like our people are going to do interesting things to solve for the first 15 seconds of an interaction as opposed to the F the entire interaction like you know can you get there was a lot of talk about like open AI model I want to say like responding with like sure or just like some space filling entry point um so it like it could catch up with the rest of the the dialogue so I think it'll I think people will prioritize that a lot because it'll matter a lot I love the idea that like to save to save cost basically opening eyes like for the first few turns of the conversation we deliver the really fast model so it feels like you're really having like a nice flowing conversation and then basically once you build confidence they like fall back to like the slower model that has better results where you're like oh this person is a good conversation list but they're also smart too right is like kind of what they're trying to do by kind of playing with model delivery um so we got to talk about search but Chris I saw you go off mute so do you want to do a final quick hit on the question of latency before we move on no I I was just coming to come up with what Brian was saying there and and what you were saying Tim I totally agreed it was always doing this hey and then repeat the question so I I wonder if underneath the hood as you say is there's a much smaller classifier model that is just doing that hey piece and then as you say there's probably a slightly larger model actually analyzing the real thing so I I do wonder if there's two small models or a small model and a slightly larger model in between there for that interaction so it's super interesting and but maybe the thing I wanted to add to that is we don't have that voice model in our hands today we only have the text model so I wonder once we get out of the demo environment and then maybe in a 3 weeks time time or whatever we have that model whether that's going to be super annoying every time we ask a question it's going to go hey and then repeat the question back so it's cool for a demo but I wonder if that will actually be super annoying in two weeks [Music] time all right so last topic that we got a few minutes on uh and this is like Brian's big moment so Brian get get yourself ready for this I mean Chris you can get yourself ready because apparently Brian's gonna you know you know everyone else can leave the meeting yeah take our eyebrows off here with his with his uh with his rant so the the setup for this is that basically Google announced uh that AI generated overviews will be rolling out to us users and then everybody uh in the near future and I think there's two things that to set you up Brian I think the first one is this is what we've been talking about right like is AI gon to replace search here it is you know here it is consuming the preeminent search engine so I think it's like we're here right this is happening and then the one is like I'm a little nostalgic you know someone who grew up with Google um you know I'm like the 10 Blue Links you know like the search engine you know it's like a big part of how I experienced and grew up with the web and um you know this seems to me like kind of a big shift in how we interact with the web as a whole and so I do want you to kind of first talk a little about what you think it means for the market um and uh and how you think it's going to change the economy of the web yeah so I follow two communities I would say pretty closely online I follow The Tech Community and pretty closely and then I as a somebody works in marketing I follow my seo's community um and they have very different reactions to uh to what's going on I think your first question though of um you know is this the equivalent of swallowing the web um and from the minute what's funny is from the minute sort of chat GPT arrived on the scene people were proclaiming the death of search now for what it's worth if you've worked in marketing or on the Internet for a while people have proclaimed the death of search as like an annual event month for the last like 25 years and so um this is just like part for the course on on some level but what's interesting to me is that you had this product chat GPT which is fastest growing consumer product ever 100 million users faster than anybody else and what was interesting is it sort of like speed run speedran the sort of growth cycle that usually takes years or decades like well maybe not decades but like it takes a long time for most consumer companies to do what they did the interesting thing about that is if it was going to totally disrupt search you would have expected it to show up and happen sooner than it would have with other products that maybe would have had a slower sort of growth trajectory um but that didn't happen like if somebody who watches their search traffic super closely like there's been no chaotic drop of of this like people have continued to use search engines and like one of the reasons I think that that happened is because people actually misunderstood um like like the equivalent of like chat gbt and Google as competitors um with one another I know Google and open AI probably are on some level but I don't know that those two products are and the reason I was thinking about that is like if if chat GPT didn't you know within the within basically the time plan we've had so far uh disrupt Google the question is like why why didn't that happen and I think you could have a couple different hypothesis for that like one you could say the form factor wasn't right it wasn't text that was going to do it it was we needed Scarlet Joan on your phone and that's the thing that's going to do it and so they're maybe leaning into that thought process a little bit you could say it was hallucinations like oh the content is just not accurate uh yeah right so that's a possibility around it you could say just like learn consumer Behavior people have been using this stuff for 20 years it's going to take a while to get them to do something different you could say Google's advantages in distribution so it's like we're on the phone we got browsers um it's really hard to you know get the level of penetration that we have I think all of those probably play some role but my biggest belief is that it's actually impossible to separate Google from the internet itself um Google's kind of like the operating system for the web so to disrupt Google you actually are not disrupting search you have to disrupt the internet um and it turns out that that's an incredibly High bar uh to have to disrupt because you're not only dealing with search you're dealing with the capabilities whether it's Banks or Airlines or you know retail whatever it is of every single website that sits on the opposite end of the internet it turns out that that's like an orous amount of capability um that's built up there and so I looked at I look at that and say like for as much as like I think this this technology has brought to the table hasn't done that thing um yet and so because it hasn't done that there hasn't been some dramatic shift there the thing that Google search is not good at though um and I think you see it in a little bit in terms of how they described what they think the utility of AI overviews um will be is that it's not good complex multi-art questions of saying like if you're trying to plan if you're doing anything from like doing a buying decision for a large Enterprise product or like planning your kids's birthday party like you're going to have to do like 25 queries along the way there and you just you've just accepted and internalized that you have to do 25 qu I like that is like basically like search is one shot right like you just say it and then responses come back so there's no yeah sorry go ahead yeah yeah and so like the way I was thinking about llms is they're kind of like internet sequel um in a way where you can ask this like much more complicated question and then you can actually describe the way that you want the output of that thing to look it's like I want to compare these three products on these three dimensions go get me all this data and that would have been 40 queries um at one point but now you can do it in one and search is terrible at doing that right now you have to go cherry-pick each one of those data points but the interesting thing is that that's also maybe the most valuable query to a user um because you save 30 minutes and so I think Google looks at that and says um if we seed that particular space of complex queries to some other platform like that's a long-term risk for us and then if it's a long-term risk for them what it ends up being is a long-term risk for the web um I think so I actually think it was incredibly important that Google bring this type of capability into into the web even if it ends up being disruptive a little bit from a Publisher's perspective because what it does is at least preserves some of the dynamic we have now of like the web still being an important thing and I hope that used to your point I have like present and past Nostalgia for it I would say yeah exactly so I think it's I think it's important that it continues to evolve if we all want the web to continue to persist as like a healthy Dynamic Place yeah for sure no I think that's a that's a great take on it and you know Google always used to say look we measure our success based on how fast we get you off our website right and I think kind of Brian what you're pointing out which I think is is very true is that like what they never said was there's this whole set of queries we never surface that you know you really have to kind of keep keep searching for right and like that's that ends up being kind of like a the the the search volume of the future that everybody wants to to capture um well uh so Brian I think we also had a little intervention from AI the thumbs up thing we were joking about that before the show it's just yeah my ranking for worst AI feature of all time um so um but um make up the thumbnail on the on the video that's right yeah exactly um well great so we've got just a few minutes left show but Chris any final parting shots on this topic sure so I I'm very bullish I think AI overviews um have a lot of future as long as there's a good mechanism of feedback incorporating and making it hyper personalized a simple query like I want to go have dinner tonight say I tell you I want looking for a th restaurant yeah if you look if I go on on open table or yel or Google and try to find that there's a particular way in which I think through it the filters that I apply are very different from how Chris was do it right so the way I make a decision if somebody's making that decision for me great the reason why Tik Tok works so much better than Netflix on an average I think I I was um listening to a video by Scott and he mentioned that we spend about 155 minutes a week browsing Netflix on an average in the US something of that nature like pretty exited amount of time versus Tik Tok has just completely taken that fallacy of choice out for you when you go on Tik Tok the video that they have pick there's just so many data points the 17c video average 16 minutes of viewing time across your Tik Tok engagement and you have so many data points coming out of it seven 71 of them every few seconds right so they have hyper personalized it based on how you interact with things right because they have not not asking you to go pick a channel a choice that nature just showing you the next next next thing in the sequence hence the stickiness they've understood the brains of teenagers and then and that demographic really really well I think that's the direction that Google will go into it'll start hyper personalizing based on all the content if they're reading and finding out where the receipt of my shoes are they know what I actually ended up ordering at a restaurant that I went to right so the full feedback loop coming into the Google ecosystem I think it's going to be brilliant if they get to a point where they just make a prediction on which restaurant is going to work for me everything they know about me that's right yeah I mean the future is they just going to book it for you and a car is going to show up and you're going to get in it's going to take you some place right uh so conf they'll send a confirmation from your email exactly right uh Chris 30 seconds you've got the last word 30 seconds search is going to be a commodity and I think as we see the AI assistant era I dare you yeah but it will be a commodity because we are going to interact with search via these assistants it's going to be theer on my phone which will be enhanced by uh AI technology it's going to be Android and Gemini's version on there we we are not going to be interacting with Google search in the way we do today with browsers that is going to be commoditized and we're going to be dealing with her assistants who are going to go and fetch those queries for us so I I think that's going to be upended and and at the heart of that is going to be latency and multimodality as we said so uh I think they got to PIV it or they're going to be disrupted yeah I was going to say just like if that happens what's interesting is that all of the advantage Google has actually vanishes like and then it's an even playing field against every other llm which is you know that's a very interesting Market situation in that at that point yeah I'm gonna pick that up next week that's a very very good topic when we should get more into it um great well we're at time uh show bit Chris uh thanks for joining us on the show again uh Brian we hope to see you again sometime um and to all you out there in radi land if you enjoyed what you heard you can get us on Apple podcasts Spotify and podcast platforms everywhere and we'll see you next week for mixture X where so
Info
Channel: IBM Technology
Views: 17,278
Rating: undefined out of 5
Keywords: IBM, IBM Cloud, gpt, gpt-4o, multimodal, google, google gemini, ai, artificial intelligence
Id: T6DGGHlkYa0
Channel Id: undefined
Length: 41min 4sec (2464 seconds)
Published: Fri May 17 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.