Making your Enterprise GenAI Ready and GenAI Enterprise Ready

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this session is going to be about making generative AI Enterprise ready and making your Enterprise gen AI ready um so basically um just to level set everyone we distinguish between predictive and generative AI predictive AI we've been doing you know in the same way we've been doing numeric AI NLP evolved into generative AI basically giving us the ability to create a conversation Etc um we we have now moved to embeddings and different types of Enterprise search semantic search Etc and code generation so today what we're going to be talking about is very briefly looking at how to make generative AI Enterprise ready issues around data quality and governance model bias explainability security privacy scalability and performance and then on the other hand making an Enterprise an organization AI ready in terms of data infrastructure best practices and patterns talents upskilling Etc one of the key things we're going to talk about today is this business value checklist that you can take to your customers or yourselves within your organization and determine whether you can derive business value from gen AI how can you generate business value from gen AI pun intended you first have to de-risk that decision path understand what the pitfalls are and the issues and then be able to take a path but once you're down a path you should be able to Pivot and deal with the ramifications of that path and then decide whether you want to maintain course or be able to change course so you need a platform that allows you to do that essentially and so some of these key considerations for genitive AI are summarizing these five aspects for business five aspects for uh the technical side we're going to jump in the first one is around responsible AI we we very strongly believe in responsible Ai and what that means is when you send a model an llm a request a prompt you get back a result but the result when you send it to to some of our Palm some of the Google LMS you get safety attributes back so it tells you whether these things are how harmful how obnoxious Etc they are and you can threshold on them if you're building an R-rated application or you're building a PG rated application you can make a decision and move forward on that the other is we provide a Content moderation API because if you send garbage into the language model guess what you're going to get and so we also need to be careful about what tooling we have for bias detection and balancing and uh that's part of the issues around governance and evaluation of your data to determine how much biased it is or not as it you know goes into training so there are additional considerations with responsibility are the ethical implications regulatory landscape and the impact of gen AI on the workforce and some of the issues that that we encounter the biggest challenges uh so I'm going to ask you folks what are your biggest challenges in making generative AI Enterprise ready any takers I'm sure you're all in some shape or form thinking or implementing gen AI for your organization because if an organization doesn't employ it some form of response to the Gen AI Revolution then what are we doing right our lunch will be eaten by a competitor so any suggestions what are your biggest challenges any takers yes making it more deterministic not making a change so much in production thank you robust adversarial outputs awesome data privacy access security okay accessing models accessing the data the code that you generate yes sir ah okay yes making sure the security perimeter is actually secure when you go outside of it and come back exactly so these are some of the tough challenges and so therefore privacy and security one of the key things um and that was a planted question by the way my friend I'm kidding it wasn't but this is a response to what an excellent point you're mentioning which is when when you're looking at a secure and private set of things that your things meaning models data code that you need to manage uh in your perimeter in your Cloud perimeter you need to make sure your content remains secure and isn't leaked if you're building a conversational agent if you're building some kind of search capability if you're fine-tuning your models in any shape or form you don't want your data to end up somewhere else your code to end up somewhere else or your models to end up leaks so uh in in terms of the Google Cloud perimeter these things are within the perimeter and are safe if you're making a call to the the llm it is through an encrypted pipe going back and forth and there's zero retention policy nothing is saved nothing you're not going to be trained on that in fact when you want to tune a model when you want to train some additional customization into the model using a variety of the techniques that we have adapter tuning or low rank adaptation we bring the model weights to your platform to your actual perimeter your project and then train there and as a result of that you don't have any leakage your model isn't even going to the central hosting area everything is completely secure and the weights of the model that you just trained or fine-tuned are then stored in your cloud storage those model weights and at inference time the same thing are loaded from your cloud storage and these can be under customer managed encryption key so you can hold the key the secure key to do so in terms of cost prices are out there there's a whole bunch of price drops everybody's dropping prices left and right check them out you know that's kind of you know a moot point all right domain specificity this is one of my big points so in order to access models so we're going to have a whole lot of models and those models need to be you need to fail fast in the training or tuning of models and in order to do so we provide you with the ability to tune these domain specific models so if you go on the platform and you decide to let's say train a medical Diagnostics data set just with the inputs and outputs of symptoms and a classification of what is the actual diagnosis you can create a a model and that model is a fine-tuned version this is not fully fine-tuned it's an adapter tuned model or a low rank adaptation based model that is then sitting in your model registry there's a pipeline that's automatically generated for you it's deployed and as a Vertex endpoint then you can hit that endpoint um so if you ask your model something you want to get a very specific answer this is a model if you notice up here this is the model that I trained and I'm getting a very specific answer when you ask these llms these types of things llms usually ramble they say you know what I'm so sorry you're feeling bad you actually should go see a doctor don't talk to me I may not even know you're just saying please just do a classification I just want to move on this could be a job description this could be a skills description this could be a financial services mapping to a portfolio it could be a simple classification task you don't want the model to Ramble On so tuning provides you one of the things that it does is it provides you with focus on the downstream task and in terms of the output you want so you're able to tune these mod come in terms of price I always call these lemonade models why did I call them lemonade costs nine bucks if I were gonna pay out of my pocket to tune one of these models so they're extremely low barrier entry kind of exploratory things that you can do this was a thousand rows of training okay so the next point I apologies for the uh acceleration here I'm on 1.25 x if we were on YouTube and you're listening to someone that's the speed I'm at if you want you can reconfigure me and I'll go back to one um so in terms of org culture uh we want to make an Enterprise generative AI ready so you need to talk about data infrastructure model deployment uh development talent and skills and organizational culture there's a whole bunch of things that I have for you the last one the leveraging the patterns and best practices I actually like you guys so much I wrote a Blog for you that I'll give you at the end which describes in detail everything that I haven't said because you can't fit it within 30 minutes but you do want to see some cool stuff so I'll show you what that is in a QR code you can look at it the flip side of the coin is that you want to make generative AI that you're using not only the Enterprise generative AI ready but your generative AI Enterprise ready it's super important to have that bi-directional maturity and there's a whole lot of things that we need to be talking about which we're beginning to get into in terms of data quality model bias and fairness explainability security privacy scalability but I'm going to touch on these are the obvious ones I'm going to touch on the less obvious ones of course as we mentioned your data your model and your code is private and your own when you're utilizing these capabilities the other thing is the tools you want to use you want to have them across a spectrum where if you don't have much ml knowledge you can use stuff out of the box and if you do have machine learning knowledge you want to actually be able to leverage that knowledge and tune a model you really get into changing reconfiguring something so that you can leverage your expertise for that model so that's that's that Spectrum now really really important from a skills perspective is that we all really understand this spectrum of customization of models because it's going to come in handy for you you there was a point where everybody was coming to us and saying we want our own pre-trained models really you want your own uh you know Factory to build cars that's great uh maybe you can afford it and if you can that's great but maybe you want to try with tuning models first so it's important to understand the distinction and for us to communicate that distinction to stakeholders in terms of the difference between pre-training fine tuning up training prompt tuning and things like parameter efficient fine tuning and it's variations it's various flavors of parameter and fish and fine tuning like adapter tuning and low rank adaptation so um the blog I mentioned to you talks about this thing I'm I'm discussing it in terms of a bunch of verbs like prompted uh is is the first one obviously we all know that that's trivial so let's move on the in context learning aspect the ICL aspect is there as well um a lot of people add these various types of Chain of Thought tree of thought graph of thought type of Notions depending on how much you want to automate the process if you want a chain of thought you just break the process into steps if you want a tree of thought you have a tree structure which can programmatically influence the next step and if you have a graph of thought it's just more complicated so details and papers are all there the other the other verb is to rag it everybody knows about retrieval augmented generation there's another flavor of retrieval augmented generation called flare which is in which is almost like a look ahead you're in in what you do in retrieval augmented generation is the language model was trained let's say four months ago I'm asking it trivial example I'm asking it for hey can you give me an itinerary for a bike ride today in San Francisco you talking about I don't know the name I don't know where you're at uh I don't know where you're at you're saying San Francisco but you want a bike trip for you what is the temperature I have no clue you have to give me this stuff so retrieval augmented retrieves data go to an API get the weather get the location get the coordinates give it to the model and now the model prior to the Pro in context in that same context window you're providing it to the model to do something a lot of people say it it decreases hallucination yeah maybe it decreases hallucination that's correct because it's now supposedly knowing something it completely was oblivious to but the point is if the please be careful the data has to be current otherwise the model will just lie it won't hallucinate it'll lie which means that at some point in time the weather was this but it's no longer the case so that data has to be current when you're retrieving it it's super important from a governance perspective so sometimes when you do this look ahead the flare it says hey I'm going to look ahead and I know you're going to gather this data is this data valid for today is this valid for your question today so using variations of retrieval augmented generation is important to explore such as flare which will allow you to have that forward-looking Vision so you're anticipating an answer and then looking it up seeing if it's okay looking at the possibilities versus just retrieving and just passing it to that magic box we call the yep the other aspect is to tune the model and tuning it as I talked about earlier super super important because you want to change the output format you want to slightly teach it a little bit of new things uh if if the model knew how to dance uh and knew four dances you're just teaching it another dance you're not completely teaching it how to play soccer right so you're not for that you need full fine tuning in order to achieve completely different sets of knowledge-based tasks but the the tuning requires preparation and the prepped part requires you to have at least if not supervised learning at least input output pairs that are clean data that are representative of the distribution that you want in order to provide the model to tune it the other aspect is once you actually get a result before you send it back to your helpless hapless user you need to actually go and see if it's grounded Go sir use the grounding mechanism that your service provides in this case Google we provide grounding Services what grounding service is very simple you do a search and see if if you can get results that are commensurate with whatever the language model is telling you so you're comparing those two and determining whether you can ground it in reality is there a citation for it is it is it an actual paper an archive or is this something that it made up so you actually go out and search and you couple that with the output you compare that with the output that is the grounding capability there's another mechanism that is called react which is reasoning and action it's basically a planning mechanism it says I want to plan the course of it's It's almost like you're trying to prove a theorem but not to get mathematical basically you're trying to plan a set of tasks you're trying to plan a project and those steps in the project have dependencies with one another react helps you gather information and reason about what the next steps should be and take action based on where you are today if the project is is slipping you can't go to the next phase you have to either throw resources at it bad idea actually according to Fred Brooks sorry that was a complete hallucination I'm going to backtrack on that one and do a grounding and make sure that if a project is going awry you don't throw people at it so correction there sorry and finally you basically make sure that it everything goes back through a responsible layer so that the security the Privacy the zero retention is all in play as you give it back to the as a safe and vetted grounded output so this is the story the storyline we need to know that these things exist and checklist them as an Enterprise application it's no longer just the fun and games of okay I'm gonna go to Bard and ask for whatever you're going to go and build an application that's going to be impactful for the business so some of these vertical tooling capabilities are going to be extremely important uh part of these are where we're about here Ray array and any scale are very good Google partners and Rey gives us the ability to do distributed parallel computation of various scales and sizes and that's one of the ways in which you can start training and tuning and delivering infrastructure um and typically what what you can do is you can either create new products and services or include increase the performance and optimization of the existing products and services you have in some cases you can actually build completely new business models but uh you need a platform on which to run this and so the horizontal ml lifecycle support needs to be there on Google Cloud this is just the you know the generative aspect there so you you see vertex AI is is the name of our kind of the umbrella set of capabilities that we have we have a set of AI agents we have search and conversation capabilities out of the box and then the AI platform itself at the heart of that is the model Garden where you have uh you know open source models like Falcon which I really like llama 2 which is not really open source but kind of permissively licensed and then of course our own first party models which we uh We've pioneered in terms of palm 2 it's various flavors and sizes uh text bison we get we have t-shirt sizes of models we like to call them animal names like Gekko and bison and unicorn and things like that and then ultimately we have the capabilities in the model Garden for third-party models our partners you know companies like AI 21 Labs anthropico here Etc and of course um the ml Ops capabilities the responsible AI tooling capabilities that I talked about already are there as well so this in in and of itself comprises the platform on which you're building an Enterprise scale beyond the gimmicks beyond the let's try and have some fun with generative AI if you want to build you know industrial scale applications these are the kinds of things you need now the next generation of platforms must take into a you notice here that we have the model Garden which comprises of about a hundred models that we've offered through the model Garden they comprise Foundation models task specific models and open source models like I mentioned and these models allow you to also perform computation on the ml platform in terms of incorporating open source Frameworks so in fact we like race so much that we're uh we we've we've built Ray on vertex AI so there's a preview that we announced at next our conference in uh a couple weeks ago so you can actually uh spin up vertex Ai and run array right there natively on on vertex and this allows you to base the only changes here you just specify the vertex array capability in Ray init and the rest of it is whatever you would do uh as as usual it's a low friction way that Ray users can use their existing code out of the box and new users can basically run the same Paradigm but on a platform that is a scalable platform such as the Google Cloud platform in terms of Open Source Ray by the way I have a talk at three o'clock where I'm going to go into the nitty-gritty technical details of making Ray Enterprise scale I'm not talking about rayon vertex it's just open source how do you take Ray Ray the open source and actually make it Enterprise ready that's at three o'clock I'm not going to talk about it now but there are a whole bunch of things you do in open source Ray that rayon vertex just does for you right off the bat that you can access and so there's native integration with collab Enterprise with vertex AI training with the model registry in in vertex Ai and with prediction with inferencing and there's integration with bigquery and logging and more to come so we're very excited about Ray on vertex as well and so if you look at the collab really quickly I promise I won't go into too much detail here basically all you need to do is basically specify the number of clusters you just add this vertex array capability to the array init and uh and and add it to the runtime environment and then you just go and do whatever you need to do as you've always done with Ray so nothing uh super special it just gets rid of a lot of things that you would ordinarily have to do yourselves it just does it for you so that that's kind of the uh the upside of the Ray on vertex AI okay so um back to the story of the technical third technical aspect in terms of integration with back-end Enterprise systems retrieval augmented generation you all know about now I've kind of filled out this thing here so the the stuff I talked about earlier was more high level this is a little bit more detailed and you'll find the details of it in that blog that I put up for you basically what we're saying is that for example for uh for a rag or some kind of a look ahead on retrieval augmented generation you're doing vector embeddings and search but remember that this is just for unstructured data you can combine that with your normal SQL or if you have a graph database access the graph database do a SQL and the vector search combine them together in order to actually retrieve something and give it back to the language model super important to not just get caught up in hey I have a vector database everybody who has any data anything will have a vector database version that's very natural evolution of what we're seeing in terms of tuning and serving you'll see a combination of foundation models parameter efficient fine tuning and full fine tuning capabilities there um so you can read all about this in the in in this uh in this blog no need to go into more detail the other thing is this access to back-end Enterprise systems that's super important you you got to be able to access these older enterprise system when you want to make a gen AI Enterprise ready there has to be an integration with these back-end systems so for example if you have these uh more uh canonical systems like sap Ariba like Salesforce Etc you want to be able to tap into them and engage in in a situation where uh you're I'm not going to go into it at this point because uh we don't have much time I was going to show you a demo but I won't instead I'm just going to show you the architecture the demo is just a conversational agent it doesn't really show what's happening in the back end all it's doing is basically trying to do a procurement and in terms of that procurement it's looking up something in the back-end system it's translating human language into to SQL generating code going in the back end going to another system to see if there is a another vendor another supplier a new supplier getting the prices for you and then you can negotiate with you can interrogate the contract not negotiate use the wrong word interrogate the contract you can ask the contract question hey it's a new supplier can I ask it to do different things and so you'll get in responses back from what's happening in that contract and then finally through the conversational agent you're interacting with these back-end Legacy systems that are clunky and have these you know huge uh you know user interfaces that are not very intuitive so having a conversational interface with it is extremely effective and of course we're doing all these things under the covers the the rag the grounding Etc in terms of needing infrastructure of course you know you have all the infrastructure that you would need on the Google Cloud platform uh in terms of tpus and gpus both uh we announced a very strong partnership with Nvidia at next uh and so we're partnering nvidia's building their next Generation uh dgx platform on Google cloud and then to unblock you for Enterprise scale production workloads there's a program that we have called built with Google AI so if you have if you have companies or you or a company would like to work with us we have a program where we unblock you and enable you to succeed with these types of capabilities and with that we are a few minutes early so time for questions if you have any [Applause] yes Society Foundation model yes correct but when the inference is happening the weights are loaded in your capability the data does go out in a secure pipe and comes back with no zero retention so it's nobody sees the data did the tree fall in the forest if you haven't been looking at I don't know it's Schrodinger's cat I have no idea but yes correct because the alternative is you can host Your Own Foundation model you can build in your host Your Own Foundation model on the platform as well if you wish it's just very expensive right any other questions yes sir is to help them onboard their data to them this is to help you build on the Google platform we Leverage The Google platform to build your applications as an isv uh you can you could have your data elsewhere and just make calls to the foundation models for example you don't need to bring all your data for example yeah so those are various options that are available to you as a nice on on-prem we do have uh GDC Edge and you know we have various uh capabilities Google hosted services but it's primarily connected ultimately in the cloud so there's a temporary air gap solution let's say for a government or security types of uh scenarios and and organizations but then ultimately anthos you know is the capability that we've announced for that capability for but the Ultimate Experience of using generative Ai and uh you know predictive AI on your AIML platform is ultimately on Google Cloud that's where we can serve you better all right thank you very much
Info
Channel: Anyscale
Views: 2,964
Rating: undefined out of 5
Keywords:
Id: M7cPv4kOC-g
Channel Id: undefined
Length: 30min 2sec (1802 seconds)
Published: Thu Oct 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.