Improve the factuality of your generative AI apps by grounding responses in your data

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] good morning everyone thanks for joining us so this session is about factuality and grounding so my name is Louis Leo I'm product manager for Gemini I also work on grounding and rack so my colleague Tom um he going to get on the stage later to talk a little bit more on the rack produ that we were working on um so we have a pack agenda today um so I'm going to talk a little bit on you know why we think grounding is important why factuality matters to a lot of users uh what are some motivations behind the scene why we develop something very specific in this area um I'm going to introduce a new way about doing grounding with Google search and this is one of the key lights uh yesterday today in the uh the Keynotes and then my colleague Tom going to talk about a different choices of racks that you can use and pick from vertex Ai and we're going to also introduce you the way how to do the grinding with your own data with verx AI search and then we also have some cool API about check grounding so you can use in your applications so let's Dive Right In let's talk about gring factuality so gring is absolutely essential for for generated AI applications so you probably seen many papers from Google deep mind about different kind of approaches techniques we're introducing on the model level on the tool level to improve the factuality of the performance of the model right we work very closely with uh these Deep Mind folks we literally Lally just sit together right day by day to kind of take uh those novel approach and techniques experiment this quickly and test them quickly and figure out you know what are the the best thing we can bring to the market and bring to the vertex so a lot of us know that model is based on probab probability right most of the time and these models today larger language models right getting so well work so well that it's really hard for users and Enterprises to figure out hey this is generated by a computer or this is actually curated by a human right this this why we find a lot of challenges with Enterprise is to deploy this model to millions of users right because trust is the key here so we have been uh kind of acknowledging that this is actually industrywide uh Challenge and we're working really hard to crack it so when we looking at grounding there are three key aspect to it one is improve factuality right improve the model responses and quality the second is access upto-date information and third one is connect connect an answer to its source right by providing citations URLs so you know the the developers actually get more trust with their users we try to cover all these three topics throughout the session you also see that we are actually Building Product around these three principles so we can make the model response more trustworthy helpful and factual so vertex a has a suite of you know Solutions about grounding so we're happy to introduce you you know a new way of doing grounding with pic knowledge with Google search and when you're looking at Preparatory data your confidential kind of sensitive business data right you can use the our Ira Solutions of verx search to build your customized Solution on top so we're going to go through each one of those blocks in detail throughout the session so let me talk about grounding with Google search a new way of doing grinding with public knowledge so we all know L can make up things and harm user trust right one of the things um you know we often ask is hey what's the weather like right or you can say hey U who is the winner of Oscar this year right we know the model have this knowledge cut off because it take a snapshot of the knowledge at a particular time so the model are either going to pounce say I don't know or the model going to give you something that is you know back in 2023 so this is just one example right when models start making making up facts it hurt user trust and then we see developers are very hesitant to to actually deploy this for many other use cases so we're happy to say that you know today we are bringing grounding with Google search to Gemini family um we believe that Google search is one of the most trusted source of factual information by grounding our stateof art model with Google search we can really combine the power of Google AI with deep understanding of World Knowledge the integration is very simple right for for developers you can simply go to our UI our AI studio and there's a toggo called enable graning right when you enable this feature by default we're going to give you access to Google search as a source and when you sending a request to our L we're going to give you a grounded response together with you know the citation source to help you reduce hallucination also help and users to you know verify these uh confirm these evidence also I want to emphasize that you know this is a feature that we work we we make available out of a box it's not a feature in the UI right so we're making this available for all developers that you can access this feature through API so in our testing of the feature we see it helps a lot with developers to build trust with the communities with their users by giving the model access to evolving network of knowledges that reflect the real world changes the LM response going to also come with more explainable kind of citations and URLs right so the user can really click and figure out what's going on the one of the benefits of using Google search sometimes right we know it give you different perspective unlike l they give you a very direct answer right A or B but when you do Google search you actually see a lot of results in one single page and you can actually pick the right information that fits your needs so we think this is a really powerful thing that we bring into the LM so can bring these different perspective right to the answer of the LM and also for the users right they can actually stay within their app right they don't have to go to a different app or to Google search open their browser to some to do a search and then go back to their uh original Journey so I use this example right who wins Oscar for back actress this year uh in the unrounded Gemini what we give you today is hey in 2023 you know this is the uh this is the word winner uh when we turn on the grounding we're going to tell you Amma stone is the winner I'll give you the citation URL for that so we're very happy that we we are working closely with a few pallet customer in this space so Uber Eats is developing their new experience and new interactions with Google search grounding so I am use heavy user of you know Uber Eats I order a lot of food because you know very busy with a conference and and Gemini launch so one of the things you know I always have a trouble with is I got so so tired with a particular food that keep ordering and I would say hey let's try something new right but when I look at new menus new restaurants I don't know what those foods are and I literally have to go outside the app right go to Google search and search for that food and looking at some pictures before I can understand what I'm ordering so with this we actually enable a new experience for the users that you can look at you know descriptions of these food looking at the pictures without ever leaving that app experience another partner we're working very closely with is a qu po it's a very cool platform to allow developers build and share their chat bot um so one of the things they're making available to their users is building these chat Bots that connect to the to the internet which you know in this case is Google search grounded Gemini so developers can build Bots for different tailored use cases right so on the screen what you're looking at is they're building a chat bot just for sports so these are just two examples we're working on with the customer in the early stage and I hope those are inspiring examples and we believe with grinding with Google search you can do a lot more than that so with that um I'm going to head over to my colleague Tom to talk about rack right for joining for joining us today as I want to start a bit with like talking about like the options that you have for running your own rag um on vertex actually we have like multiple options for you available so if you if you look at the top with vertex AI search we have a a fully managed solution so and you can use vertex AI search without having to worry about things like indexing and ranking and how I should chunk and pass my documents which llm should I call this is all built like directly into vertex AI um search we also seeing um customers like for example yesterday Char schab was talking about it that they're using the search capabilities of vertex AI search um but then build their own answer generation capabilities and their own custom logic um on top the next layer I want to talk about is a set of core components in in in Rag and search that we offering so yesterday in a session we were talking about uh like various different components that we have for like document understanding document parsing the embedding API ranking API and also Vector search and specifically for grounding we think the grounded generation API and also the check grounding API can be like extremely important and I'm going to talk about them in in just the the the next section and then you also have to think about how do I actually orchestrate this like where does my rag application actually run and for this we have two two offerings that we announcing this week uh so one is Vex AI reasoning engine which is a a a system designed for running your own Rag and and uh agent workflows and the second one is verx AI rag API that's specifically targeting the the rag rag application for customers who want to build it themselves so starting with reasoning engine so with reasoning engine you can deploy your own Lang chain code very easily on vertex to implement rag applications by simpler rag applications all the way to more complex agent likee uh workflows for things like customer support so basically you define the functions you define the tools in the workflow and then reasoning engine together with a with an llm will actually execute uh it will actually execute that length chain code so reasoning engine then if you look at that the code example on the right hand side makes it very easy for you to actually deploy your Lang chain agent right you just give it the agent you specify a few of the dependencies and then you can basically run the query uh immediately so it's very easy to get started so for customers who already familiar with like the Leng chain components length chain framework I think this is like a very uh good way to start and this is now launching this week in in public preview for everyone to try the next one want to talk about is the rag API um on vertex um so this is really aimed at customers who want to build like a rag application so you you can customize the settings uh that are important to make your rag application work right so this includes like the chunk size what's the chunk overlap what's the embedding model that I want to use where do I deploy it Etc um and it's also like a very scalable applications we have seen they can scale across like millions and millions of of documents like very easily you can use a larger set um of uh llama index connectors to get data actually into the system so you can make set the data integration very very easily and you can see also like that that to to get started on rag API does it's it's very very simple to actually um get started right you just um select um the Corpus um you uh set a few um simple configuration parameters and then you can already run uh the rack API and if you want to do more customizations of course the rack API let's see lets you let you do that the next uh section I want to talk about the grounded generation API so so with the grounded generation API will make it very simple for you to create grounded answers that is really um factual based on the information that you provide so it's very simple you provide basically instructions in the query that we all know as like an llm prompt you specify the sources that you want to query um you can give it custom facts custom information that you have retrieved you set the configuration and you get a grounded answers including citation with with a grounding score this is now available in in in private preview and for for the grounder generation API we are using a fine-tune Gemini model that is specifically aimed at operate optimizing the performance for the grounding for the grounding use case and that helps reduce hallucination and and provide um citations much more reliably um via VIA that particular model we make it very easy for you to add um your own um sources to it um so today you can give it um just like give it a text chunks that you retrieved that could have come from like vector search or you can connect it to a vertx AI search and data store we also working on making it very easy for you to add like your own retrieval engines like your own search engines to it could be like an elastic search engine or like a SQL database as part of the grounded generation API we've also worked very hard to make the prompt to search query generation work like um very very easily and very optimized and uh improved um specifically for the grounding use case so let's look at it like a very specific example I hope you can read it from afar um so on the left we we we can see the the API request you're specifying like the question um the this also has the ability to do like multi-turn um support with multiple questions from from the user uh and then you specify different sources that you want to use so it could be text chunks that you have retrieved or it can be a Vertex AI search um data store and then on the right hand side you can see the answer that that is that is being um generated talking about here about like the the the how the global economy is doing um you can see a grounding score tells you how grounded uh the answer is in the given given facts and then for each of the different um sentences in the answer you can also get um a citation to help the user connect the answer to the underlying sources and to allow the user dig deeper into into those sources and verify the claims that were made uh in the answer last section I want to talk about check rounding um so when we look at the additional rag workflow basically it's pretty simple right retrieve and rank some information that gives you some facts you put them into a prompt an answer generation component like a large language model you have an answer like that's basically rag we don't think rag should actually stop there and that's why we um are introducing the check grounding API because it's really important to understand whether for the generated answer is it actually grounded grounded in in like in the given facts and can I attribute to the underlying sources so can I generate um citations so check grounding is now available in in in public preview and and what it does is basically you provide it with a given answer and the underlying facts and then it checks for each of the claims that were given in the answer whether those those claims in the answer actually based on the on the given facts so you can see it supports up to 400,000 tokens it's like inputs it also supports like very large um um fact bases and context Windows the key use cases we seeing with our customers is to use it either like in an online flow to generate um citations or highlight potentially ungrounded claims to to the user actually in the Gemini app previously known as B this was like known as like the double check feature that's actually grounding basically in the web search in the web search Corpus so that's a very analogous um use case here another use case that we also use a lot in internally for like model development it's like offline validation use use case so if you think about model tuning you want to do prompt engineering model configuration you want to test your rag application end to end then the the grounding the factuality of the answer is like a really important aspect for you to optimize and with a jack grounding akpi you can run this kind of evaluation completely automated um end to end we also have experimental support for detecting contradictions so not just for unsupported claims but also for contradiction and then also generating um citations for where for specific facts that are contradicting with with a given claim right so here wanted to look at like a more specific um example so here like in my given prompt I I asked the question okay tell me about the movie Inception and then I provided like some um some facts to the model I generate an answer here but what I did is I actually like you can see in green and in red I added some uh red pieces here that are actually incorrect incorrect statements that I added to test to test check rounding so now with check rounding what I do is I give it the set of facts on the left hand side together with the answer and the response then tells me directly for each of the given facts whether that's actually um for each of the given claims whether it's uh it's actually grounded in a given fact or not so here I can see the correct identified like the the Inception said Inception 15 um Oscar and made more than 950 million um Revenue at the at the box office and identifies very specifically that those are incorrect uh claims and you can also see that the overall grounding score is like 54 so it actually found a a considerable amount of unsupported um claims in the uh in in this response so we think this is very useful for customers who want to build their own um rag applications and make sure to build a more trustworthy experience for the end user connecting them to the sources and also highlighting potentially ungrounded or incorrect um statements in the llm response yeah so to wrap it up for today um so we talked about a wide variety of different grounding solutions that we offering on vertex um there's first now grounding available with Google Gemini and based on based on Google search which is now available in in public preview and then specifically for grounding in private data with the grounded generation API which is is in private uh preview um we have rag orchestration two offerings there the rag API specifically for the for the rag AP for the Raa use case and then reasoning engine to deploy uh your llama uh uh your your Lang chain um agents very easily on vertex now available in in public preview and last but not least um check grounding to help help you build more uh um more grounded more factual experiences for you when you're doing your rack um rack development so thank you everyone and I also appreciate any feedback you want to get provideed for this [Music] session

Info

Channel: Google Cloud

Views: 2,267

Rating: undefined out of 5

Keywords:

Id: iVT5s5mD6EI

Channel Id: undefined

Length: 20min 48sec (1248 seconds)

Published: Mon Jul 01 2024