How to Build an AI Copilot for Your Application

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

um so today I want to talk about building AI co-pilots which is probably on the mind of a lot of people that are building applications basically how do you add a concept of an AI copilot to your application I've got 15 minutes if I wanted you to take away three things it's you know why do you want to do this uh you know what exactly is an AI copilot you know besides just another text chat bot and how do should you do it and what are some sort of tips and tricks so once you get into production uh what you should care about and sort of what you should be what you should be aware of so first of all I think we're entering a AI copilot ERA with respect to software so you see this most prominently in many of the most dominant software applications making Ai copilots and adding AI copilots the the dominant strategic Initiative for this year and potentially next year so you have something like Microsoft Office 365 copilot which is a huge initiative by Microsoft and this you know they announced that they're going to generate something like 20 or 30 billion dollars in incremental Revenue uh it's charged at thirty dollars a seat and if you imagine that penetrating their customer customer base it's actually not just a little uh hey we got a little clippy 2.0 it's something really uh that's going to be a big part of their business Shopify is another example uh that has uh that's not yet out there in the market but they're adding a sidekick to there to enable uh Shopify users to more easily do a number of tasks within that product and Salesforce that they're just at the dreamforce conference announced their sort of Next Generation Salesforce Einstein co-pilot again at a 50 50 upsell so a huge economic potential for uh these these companies to build these AI assistants into their products now okay some of you may ask uh you know is this uh you know clippy 2.0 we all saw these things argue and I think everybody should sort of keep this in mind that AI seems to be a Tipping Point uh product by that I mean if it's capable enough it's absolutely amazing and if it's not capable enough it's darn annoying and so I think if you read the sort of the the room uh what's going on here at this conference and what's going on in the industry obviously the capabilities are going to get quite amazing over the next few years and I think what we're going to see as a result is these AI co-pilots are going to infuse themselves into every single application so what do these co-pilots look look like um you know here's I think there's three dominant modalities if you're thinking about an AI copilot that you should be thinking about the first is chat this is sort of like the indisputable King and table Stakes uh user interface it allows you to interact uh follow have follow-up questions it's it's the thing that it's the interface that we all know from from from birth um and here's an example of for instance uh what that looks like within uh GitHub or the Next Generation GitHub co-pilot so not just tab complete but actually allowing you to converse and take actions in a conversational way the second modality that I think is very common is a command modality so you can think about this like copilot commands so this is an example from a product called hex where they're doing text to SQL they're not only doing text to SQL but they fix for instance SQL errors and this is like in the context of an application can you automate a complicated workflow can you take something that's that's hard harder to debug hard to take action and automate that in a more One-Shot versus multi-turn uh sort of way and then the third uh style of a copilot interaction which I think we're just seeing the beginnings of is really this autonomous agent pattern and this is where a AI copilot can actually react to events and fully automate workflows and so in this case this is again from GitHub copilot the next generation of what they're announcing um which is you know on your pull request as soon as you open that pull request you get a detailed summary you get code review you get potentially even open an issue and you get a pull request for free these types of proactive alerts and proactive capabilities of copilots are gonna uh are I think are the next sort of uh next next generation them so how do users benefit um first these AI copilots answer questions right so users don't know everything about everything about the products they don't know everything about the particular domain they don't so just answering basic question itself is a huge unlock the second one is they assist so they allow you to work more efficiently for especially for complicated uh sort of tasks uh they allow you to quickly uh you know generate that output so that could be text to SQL if you look at what versel just did with their you know v0. Dev it's like text to you know UI front-end UI and these are sort of capabilities where these AI copilots are sitting there right next to you understand your application in your domain and allow you to move um move quicker the final one which is the frontier is really when there's talking about Automation and as the capabilities of these llms get lar you know get better as the ability to attach llms to apis for those llms to be attached to planning and reasoning systems where you're taking multiple steps of actions I think this latter category around automation is going to be increasingly common even if it's exposed through a command or through a a chat interface okay so that's sort of the uh you know the why and the and the what um how do we uh build these uh these systems so I'm of one of the founders of a startup called continual we're basically building an AI copilot platform for uh embedded applications where you're you know want to embed an AI copilot and so we've sort of seen the the a number of companies try to implement this and the struggles uh and the sort of uh uh the struggles of doing that or what are the core components and and sort of what do you have to be uh what do you have to be aware of so the the nice thing I would say is over the last year there's an emerging standardization if you will of the architecture of what a copilot system looks like and so there's kind of a convergence in what are the building blocks for if you want to build an AI copilot how do you sort of build that up from the foundations of your existing application so at the bottom you have your application that consists really of two things it consists of your apis right that could be your rest API or graphql like they are your grpc API that's the interface both for the the llm or the AI to understand what your application knows and actually to take action within the context of your particular application the second one is data sources and these could be behind your apis in some cases but in many cases they're more sort of nebulous things that you need to be able to search more fluidly so they could be in for instance be your knowledge base your documentation uh the domain that you're operating in if you're for instance in the legal domain or a you know an electrical design domain um they could be user data that you want to index in a more in a more fine-grained way that may not be today exposed through your apis in the right ways so on top of your application and your application apis and your data sources there's really the heart the brain uh of of the of the co-pilot system and there's many different ways to implement this Tech you know technically you can implement this you know I talked to some folks at LinkedIn who are implementing it on top of Ray uh you know obviously there's a lot of people that are implementing things on top of Lang chain um and then there's uh you know continual which is what we're working on is really an all-in-one uh platform that's laser focused on enabling this uh these the full life cycle of of a production AI Copilot so there's a number of uh of components to this um at the top level there's task orchestration and what I mean by this is when a user issues a request unlike chat GPT right which basically the llm is the only thing that responds pretty much although plugins change that a little bit uh within the context of a co-pilot you really really need uh to ground the answer of your co-pilot of your AI assistant in the domain of your of your application namely it needs to deeply understand your application that could be for instance your product documentation or your knowledge base and the data that's sitting behind your apis the system of record for your business your CRM or whatever that is that you're you know adding a copilot to in addition to being able to plan okay given a user request how do I figure out what the user wants what information I need to be able to answer the user's question and then what actions can I take within the system and then potentially what should I show back to the user and do I need to ask the user for additional information or should I ask the user to confirm an action that I'm proposing there's a whole complex reasoning and planning process process that happens so at that top level there's a reasoning and planning process and I'll talk a little bit about the different ways to do that in a second but behind that what is it planning over it's typically planning over a few different calls to a few different subsystems the first and most obvious is the models themselves the large language models and those could be things from open Ai and major external providers or they could be internal models for instance like llama 2 and open source models that maybe you fine-tuned for particular problems within your in your domain the second one is what we call data sets but you could call it indexes or vector store it's basically knowledge that you're encoding into a more semantic representation that you can look up and enrich the context of the llm because the llm may not have been tuned on it this is often the case these large language models even gpd4 really are pretty naive about most applications they don't understand the incredible subtlety of most applications or most you know Niche domains uh you know whether it's and that's especially true if it's an internal application right so if it's a public application maybe they've read the pub public docs but an internal but it's probably out of date uh and it's an internal application they probably don't have any clue about what the what that application is so you need to ground it into the domain of your application the second one is plugins or extensions this is really how do you call out to your external system how do you call out your apis and the final one I would say is workflows um you could maybe subsume it here a little bit but this is often there are specific ideas with specific workflows or user Journeys or tasks that your user needs to complete which you want to guide which you want to say hey I don't just want a wild reasoning process to happen I want to guide a guide through it so those are I would say the four kind of building blocks that the task orchestrator needs to deal with and then below that which is critical critical both pre-production and post-production is these three core ideas of evaluation how is this thing doing monitoring like what's going on and feedback hey how do you get feedback both from your end users and from your internal team where you say okay let's train the system to actually do it do it do a great job on top of this engine this engine can potentially power all of those different user experiences that we just chatted about so you have co-pilot chat you have your front end of your application you have the you know sidebar conversational you know chat experience in many cases um you have commands that are you know in line with particular uh you know like auth you know hey can you know write this uh you know job posting listing right in your you know hris system um or automations where you're reacting to events and automating workflows on behalf of that user often that can be done through the exact same co-pilot system right um okay so what are the top challenges uh in production right that looks great from a pseudo architecture type of perspective but what are what are the actual challenges and there are a lot I would say there's primarily three so one is cost that is something that we got to keep you know there's always an Roi calculation these models are not free the second is capability namely what can you actually achieve and the third is latency and from my experience these two are by far the most important capability plus late latency the quality of your co-pilot is essentially capability minus latency the user experience is how much can this achieve in what period of time and does that feel delightful or does it feel miserable and that line is very fine and so that's why I would say you know you got to be laser focused on it so what are some tips to address these um with a few minutes that uh at a very high level so let's talk first about latency um at a sort of simple level sort of level one what can you do to improve latency if you're building a co-pilot first of all stream responses right you've all seen this with chat GPT it makes a huge difference when the first token comes quickly the second one which is a little bit related is try to respond concisely if it gets too verbose that's going to take a lot of time and it gets sort of annoying and that in turns that also applies to internal monologue and reasoning if you do too much Chain of Thought reasoning you know you've gotten excited by step by step and Chain of Thought to get performance that does lead to a latency trade-off and so you want to say how do I get capability with low latency using smaller models is another one GPD 3.5 Claude instance llama 2 7 billion super fast compared to for instance gpt4 so you've got to trade the quality versus the latency of those two things and then the other one which is very important is do cool things in the UI mask the retrieval augmentation which is drag your and plug-in latency behind progress updates and UI elements then there are some things that are more advanced you can do uh so one is retrieval is critical doing there's different ways to do retrieval and I think one big lesson that we've learned is try to do retrieval up front front super fast it's possible to do it super fast there's a lot of fun ideas around hypothetical document embeddings generating queries from the llm but as soon as you introduce an llm call right up front that's latency before the first token to respond and so try to make your retrieval system as fast as humanly possible and and and and that will help that will help a lot if possible also if you are doing taking actions try to do as much planning in one shot so you can plan and then start executing right if every time you plan an action you have to wait for a while uh that can be uh really hurt the user experience and so there's things like react and reflexion which are you know seem great but again they often although they improve performance and they say we improve gpt's performance by X they introduce significant latency due to the way they reason they often reflect over their process and what they're observing first write out a bunch of tokens to reflect and then respond and every time you do that that is latency which again causes a trade-off with respect to the user experience finally if you're really Advanced you can consider fine tuning for planning and for task execution so separating the actual planning models from the actual response models and that actually can lead to uh you know very good performance and much much faster planning process there's a there's an open source example of this called a gorilla so with one minute left I want to talk about um uh capability what can you do on capability so start simple that's an obvious one be honest with yourself about the capability and latency of your prototypes so don't delude yourself into thinking that you're set you know you actually have good capability or good latency and then and then third carefully design your plug-in apis limit the the number of requests or the number of parameters limit the response and and expose search as a key concept because users don't know user IDs they only know user they only know users names there's a ton here on quality that you can do um it all starts with evaluation so definitely start with evaluation up front these are harder things to do carefully tuning your rag system there's lots of different ways to do that multiple embeddings re-ranking smart chunking that can significantly improve the way you look up external information and then train custom models for Unique capabilities that's a that really can often be a night or day uh difference for your particular application and then this final one if you want to learn more about it come over to us we're over at the continual Booth over there I'm happy to talk about cell phone exploration and training for your for for these types of co-pilots finally last slide just be ambitious if you're doing a co-pilot AI copilots can and I would argue must be amazing to really see their true potential so if you dismiss them they're not good enough in a moment try to be more ambitious try to make it amazing it's definitely coming and uh would have be happy to chat with anybody about that thanks

Info

Channel: Anyscale

Views: 2,294

Rating: undefined out of 5

Keywords:

Id: n0zZX2bVcro

Channel Id: undefined

Length: 15min 23sec (923 seconds)

Published: Thu Oct 12 2023