Generative AI for Solution Architects | LLMs | MLOPs | Azure OpenAI | Prompt Flow

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to my session uh generative AI for solution Architects uh in our today's talk we will talk about a lot of things actually regarding uh what has happened recently with open AER open what are the new tools and uh like uh platforms that they have launched how can we use them uh we will also talk about uh uh Enterprise llm life cycle llm architecture llm Ops llm issues and llm applications so as a Sol architect most the things that you need to learn uh we will try to like cover most of the things here and if you have if anyone has any question feel free to ask at the end or even like throughout the presentation I'll be happy to answer uh about me uh my name is R yasur uh I'm the Microsoft MVP in artificial intelligence from last nine years I'm based in Montreal Canada and um uh like for day-to-day job I'm the uh head of data quality and analytics at International Air transport Association AATA okay um it's been quite a few years I mean working in this data analytics and AI domain last uh 12 or plus years and what I've noticed is recently uh with the chat GPT and GPT and llm uh huge amount of like evaluation new products new services are coming and it everything started just a year ago and it's crazy uh um how those new products are being built like you can see like within last three to four months enter as your ml open AI all those like Landscapes the usage the processes the tools Everything Has Changed new tool in like uh prom flow uh model catalog those uh like uh were launched recently and then uh like how can you cre quickly create your own nlm Model F unit uh like use rag so all those new features are implemented so we will talk about those mostly today okay so uh as a generative like uh what are the things as a like we need to learn as a solution architect so if we look at the use cases for generative AI for a solution architect these are the like five key things I saw and I feel first one is integrating geni models in workflow automation so uh you can take gen AI as your small uh like Army of workers or Army of Agents so you can give them some sort of uh like dedicated work uh which is repetitive and they know how to do it they are fine-tuned uh like retrain or even like they have their context promt flow is there so that they can repeat that process like uh repeat the process in terms of like preparing a chat boot uh preparing a customer service agent preparing a uh like some sort of like automation workflow uh like a classifying text classifying emotion classifying different types of like uh uh type of uh like ask that your customer is asking through text or emails or something like that so you can create this type of automation workflow as a as a solution architect so that your life gets better you can uh like bring more automation uh like into your day-to-day activities then to do those things you need to understand the geni life cycle how this process works from uh doing the uh like devel in a sandbox to putting the model into a production how this inent life cycle looks like so we will also talk about that one so if you know the life cycle then you can help your developers data scientists to build those models without knowing the life cycle it's completely impossible third thing is third thing is understanding use cases where can you use generative a uh one thing you need to know that uh generative AI is not the only AI use case there are so many other types of AI that we are using in our day-to-day activities like different type of NLP uh computer vision based image processing based video processing based or even TBL based classification regation time series forecasting uh like unsupervised uh based clustering so uh you need to understand exact use cases where can you use generative Ai and where can you get the most out of it there is a new term these days llm Ops you also need to understand this is the equivalent of ml Ops so we will also like show some diagram about that one uh today you also need to understand the limitations where your geni is going to fail because just having a gen putting it into production doesn't me doesn't mean it's going to make a lot of money and like it's going to be a big hit no you need to deal with the production uh like you need to deal with the uh like different use cases uh like that your AI will face uh into the wild so one of the example is when chat jpt was launched everyone was so excited and it got a lot of hype but when uh Google bird was launched you saw like there are so many uh like negative sentiment we noticed oh this didn't answer it well it hallucinated so those are the limitation you need to know and house Hallucination is a part of the Gen AI solution so we need to know how can we control the hallucination we will also like talk about that one uh we will also talk like ways to improve the Gen model because it's not going to be like that accurate uh there is no deterministic answer so if you have a uh like a gen AI tool or LM based like chatbot or some sort of like model uh if you do you this use the same prompt engineering or prompt process and if you ask the same question uh in different times the chances are high that you may get two different answer because it's not it doesn't uh follow any deterministic answer it's non-deterministic in terms of that you need to know how can you ground your llm how can you get the best result how can you give the best context to your llm so that it doesn't give uh like different answers in at different times then using gen in a day-to-day activities so how can you use it uh what are the activities what are the use cases what of the applications uh we will also like talk about that what you can solve through jni and what you cannot solve through jni you also need to know that and the last one is prepare your own version of jpt model for documentation support something so this is something it a solution architect is going to use in a like very often because if you are designing large scale application or you you have a lot of like components in your architecture in that case you need to know how those Network configurations are uh like connected uh with each other how those subscriptions are handled what are your uh organizational uh like uh cloud-based practices if you have something if you have the documentation if you have the wiki you can create your own uh chat uh GPD version and in that case you can uh like uh prepare one and share it with your other other team member so that they can just follow it so these are the like a five top use cases we see as a generative AI for a solution Architects use case uh but now we will go in more detail and we will go uh take each of those activities and then try to understand what are the limitations what are the use case how to do it and everything okay so the first thing I will do is uh we will look at the open AI playground uh and Azure we will also look at model catalog prompt for prompt FL review and we will also like look at a quick rag example so that we understand what we are going to talk about uh throughout this presentation okay can you see my screen uh with the AZ your uh like portal yes yes yes we can okay perfect because uh this is very important otherwise uh uh yeah we want you to see follow the presentation and also like the demo as well the demo is mostly uh to understand what are the new tools that has been launched and how as a uh like solution architect you are going to use them so I have a question to everyone all those people that are attending today's call uh how many of you have built a chat GPT or GPT based application or fine-tuned it or use prompt flow or use model catalog you can say yes or no uh yeah or even you can even answer at the in the chat I did one two okay no that's good this is what I was expecting actually so uh most of the people what has happened even including me last few days I had to look at what what is happening how can I uh like what are the new tools that has been launched and new tools are laun getting launched every single day that is uh like literally crazy actually so uh let's go to if you haven't used use uh like as most of the people only two person said yes and everyone said no so let's take a look first thing uh to use asure open what you need is uh your openi instance uh you need to create that within aure portal so or workspace so what we can do is first we are going to uh this is our Resource Group on that Resource Group uh you can see already a lot of things are uh being created but to create my own open I'm going to go click on Create and then just write a your openi AZ your openi let's see what happened see uh this is the one Azure openi this is what we will use so this is our Azure service for openc if you click create then it will take you to the detail information about oh this is my subscription this the research group I have chosen uh this the region this is the name I can CH select uh geni demo uh like workspace and I can also select the pricing tier in this case for this specific subscript description it's a standard zero and you can also see uh once you're satisfied with all the information you can click next but now the question is you also need to be careful about the region because uh one of the thing is openi service is not available in all the regions you can see uh Australia Canada us um France Japan uh Sweden uh Switzerland UK and West Europe has open a service but uh Singapore or India or Japan those data centers doesn't have the openi services yet so that's why you need to choose the uh like region which is close to you so in that case to me I just chose uh it's us and then you I clicked I'm keeping it open to all Network then next you can s add some tag if you uh want to like uh trace the cost and everything on an organization level for for this case we don't need that then click next once everything is there you will see a complete summary and then you just like click create it will be created right away so we already have uh like open a Services created so which is uh op nov 2023 so this is the one I created last time so once I create that one exactly this page will emerge so I can see the open API kind openi standard pricing endpoint and everything so where is our important information so the most important information is here keys and endpoint so you will need the key you will also need the endpoint to use this openi but there is no model on that openi all those like uh uh GPT model turbo model uh text embeding model that we talk about or here so we haven't deployed anything uh till now so this is the first thing second thing you also need to know about the price how much it is uh is costing so if you click uh there uh learn more under the pricing tier you can see how much does this does it cost in different uh like currency different region so you will get a complete idea okay if you uh deploy babit 002 how much it's going to cost you okay perfect now let's go back and what uh will happen if you go to the uh model deployment section so if you if you go there it will create the a your openi studio it will take us there so from the openi studio you can see it's it just loaded that already four models are deployed but if we want if you want to deploy new model so just click on create new deployment select which model you want to have suppose jpt TBO 16 uh the model version uh I can select auto update to default and then I can select geni demo uh model one and then there also like Advanced version and everything uh for now I'll just like click on create so you can see my zen AI uh like uh model one is deployed so I can use this one this is the deployment name I will need this one to use this model with our my chat version or anything so this is it's that easy to use a deploy any model now the question is what is this Tools looks like so if you have a as your account uh and again even if you don't you can create one get uh $200 free credit and then I will suggest you to uh create a resource Group uh like create this a your openi workspace and then go to this as your openi studio from there you have a lot of other inform options like you can you get a playground to chat you I will show some like use cases here uh five five minutes later you can also see different types of use cases suppose the uh like uh turbo chat or other types of like models that I have already deployed I can select them I can uh load an example suppose generate an email so this is the type of email that I want if I click it will use GPT model model to generate a complete email for me uh in this case so something like that the text that you see this is what we call Prompt engineering where um we are giving the context what we are looking for what we are asking so the context here is aiod headphone that are priced this dollar available at Best Buy Target and amazon.com so those are the information that we are giving to the model so that it can generate that email for me so this is what we call Prompt engineering and that's why prompt engineering is important because the GPT model has no idea what you're looking for you're giving the context the information that you are looking for so that it can help you with all the like email generation and writing other than that you can also see the DI uh the image one uh the all your deployed models all the model information their limitations and everything you can have your data file quota and other content file as well so this is how this open uh like a Open studio looks like now as we will show some more demos so let's go to the uh like Azure uh machine learning platform so we will come back to this Azure op a studio I okay so now if we go back to our uh like uh resarch group so let's click our resarch group in our case we have a Azure ml uh like workspace which is ml platform development so that I have already created if you don't know how to create an uh like machine learning workspace it's pretty simple so just like the way we have created the open I we you just need to go to click on create uh and then you just need to select uh machine learning or machine Learning Studio so you can select as in machine learning and then you just like go go through the process but if you go back so we already have one ml um platform development so this is my studio web URL so this is the studio uh that we will go to it also has certain other information here the pricing the networking and everything so that you can take uh for part so um if we go to the ml platform or Asun ml uh studio uh from here it will take us this uh exact uh like uh tool or web portal which is machine Learning Studio so you can see there's so many other information here uh like model catalog authoring asset manage and everything so if I give it like a quick one minute overview under the manage your compute like your machine virtual machine uh your monitoring is there under your asset your the data that you upload the jobs that you're creating uh the component the pipeline the environment the docker container the model or artifact that you're generating the model that you're deploying as an API all of them are here under the asset uh the custom machine learning uh like development process is under the authoring section you can have your own notebook uh you can run that notebook in your Compu you can use automated machine learning you can use designer section uh plus you you can have you also have prom flow so prom flow is the one we will uh see a quick demo today because uh the newest two tools that uh Microsoft has launched are bronlow and model catalog so now first let's take a look at the model catalog so if you click here you see there are already so many models all of them are GPT based llm models um so you can see as your uh openi language models uh meta Lama 2 and also like discover other model version and uh hugging phas and everything so as your team has made a kind of like partnership with the meta and hugging phas to bring their open source model into the aure machine learning platform so that you don't need to like do all those things manually so if we just like search suppose uh view models you can have your own version of uh GPT models uh you can easily deploy or suppose the bab 002 you can click here and you can uh click on deploy as a as opening model or if you want you can deploy some other types of models so suppose if we say emotion we want to have an emotion detection model so there are different types of like hugging face open source models are here so you can click uh disarto Tex emotion okay it's just a random model uh all those informations are there what is the if you want to know more detail about this model it will take you to hugging face page who has used it last one month what is the uh detail of this the training process uh how it was trained what type of tools it was used to train and everything so it is already trained if you want to use it you just deploy it so how you just click here and then click real time inpoint deployment boom it's done so I have already deployed this exact model by clicking it here because it takes three to five minutes so I didn't want to waste that time so once you click and deploy it it will take you somewhere like this it will show you under your endpoint ml platform development so if you click here once the deployment is done you can see exactly this model is deployed live traffic allocation is there it is successfully provisioned and all the other informations are there and even you can test your own uh like deployed model if you click on test deployment it will take you here so what this model does it it just does uh emotion uh depiction or emotion classification so if you look at there is an example I like you I love you so this is a like example that they provided so if we also like provide that one into our uh test Endo what happened it ran and it says level two so uh like happiness or love is under the level two of the training data that's why it has classified as a level two so if we say okay it was level two let's say something bad I hate you so what happened so it says level three some sort of like a negative sentiment or something like that uh or I am sad something like that so you can see this model is uh already built in it's already drained it's already you can just like deploy it in couple of click and you can even use it so easily and you can even see how much you have consumed how to connect this model with your python code exactly the sample codes are also there or even C what are the like a keys that you need to have to integrate it what do the endpoint looks like and then the monitoring so how many times I have used it you can see today I have used it few times because of this uh testing so every time I'm using it it's showing me how many API calls we have get received what is the latency looks like on an average it's taking 63 minut milliseconds uh to prepare the response so I can also see that one plus the logs is also there so one of the interesting thing is as a solution architect you need to know how many options you have how many uh base models you have which base model is ideal for what type of use cases so that you can take the base model and you do uh you plan your development on top of that base model so that's why you need to this model catalog is a very good opportunity for you to play with the JPB models identify the limitations of different types of model because one model is not uh used useful for all the use cases you can see uh some of them are good for text generation some of them are good for like a text to image text classification token classification chat compilation so you need to know what type of model you are going to use and what is your use case first so that you can choose the right one so that's the first thing I wanted to talk second thing is prompt flow so what is prompt flow as a uh like solution architect we have different types of workflow that we want to automate so prompt flow gives us that opportunity that platform without writing tons of code tons of orchestration by ourself we will use all the GP based tools the orchestration uh of Lang chain and also like uh other tool tool bases so that we don't need to do those things manually so one of the example I can see um you can create standard flow uh or even chat flow or evaluation flow you can bring your own data you can pre prepare your own Q&A web classification chat with Wikipedia all those use cases so we will take one so what is that one suppose we will take this web classification click on uh View and then if you click clone it will take you to the uh recreate that web classification I have already created that one standard so I'm just like going to show you so web classification to so what happened we exactly see something like this on the right pan you can see like a complete graph of nodes and on the left you see something like that so it looks very complicated and complex so what does it mean before you use prom flow two thing you need to do first you need to go to the uh like connections you need to add your uh llm or GPT model here here so in that case go to connection click on create you can add different types of connector as your content safety openi cognitive search uh openi normal one without the A openi and others so if you click AER openi as we all you already have created some sort of model create a gen a model to example provider is asop subscription you need to select uh I believe this one and then all the aeropa accounts will be visible here so this is my opof 2023 if you remember and then I need to select the API key I need to select the API base and API type and all the other information so that I can connect that individual uh like uh workspace with this uh prom flow in this case I have already created so uh you can see I've already created the APA Bas is provided the Azure is provide like information is already AP version is provided resource ID is provided so everything is set so that's that's the first thing you need to do so which means you need to create the uh like openi workspace first and then you need to come here and do it third thing you need to do uh second thing you need to do is you need to have a runtime so I already have one runtime uh so it is the VM where your workflow will run so in that case I have already uh used uh compute instance standard d4s uh the number of count and all the other information that you need to know so uh if you don't have it you you will need that one to run your workflow so I already have created so this a like a second thing you need to look at so now go back to web classification that as I said on the right side you see the workflow on the left side you see the detail so the first thing is this will be a workflow that will do web classification what does it mean I can give a a URL that my workflow or automation will take that URL it will find out the context content from URL it will use some sort of like Scrapper to convert HTML to uh like through some sort of HTML puror get all the text regarding that one and then it will apply summar text summarized text it has a it is using GPT based T like my own llm uh like gpas model it is giving some sort of prompt that hey please summarize this following uh like information in text in one uh paragraph of 100 words so it's going to summarize that text and then it is also like given some sort of like example by the side if the text type is this it is categories app if the text type is this it is categories NFL or Channel if the text type is this it is category of Academia if something else it's not so uh you can see it is already some samples are already given and then uh what we are asking is we are doing a classification so we are using those sample information to classify to train first uh the LL and then based on the 100w passage that we had where we have prepared this prompt to tell hey can you say uh this text is uh passages about movies or apps or academic or Channel or PDF or profile or something like that and at the end whatever result we get we convert it into a dictionary another python code and then we get the output at the end completely okay so this is a like automation workflow you can create your own automation workflow all you need to do is provide that link and then everything else will be a mix up of python code The Prompt engineering and everything so and first you need to select the runtime that we have already done and suppose this one uh this is the URL of uh as your portal so what will happen if we run it so let's click on run and uh you you will see it's uh it it's running here done so we have provided this URL uh then we get the uh fetch text content so these are the exact text we have received from that URL which is Microsoft azir portal blah blah blah blah information and then we have summarized that uh uh passage into 100 words so this is how the summarize looks like and then we have provided the sample information to see how this classification looks like and then we have given the prompt to uh this is the prompt to uh do classify uh through LM that hey tell me is it channel is it movie is it app that uh text we have selected and our llm has selected it is a category app type so and then we got the result as a uh like convert to dictionary that's cool now what will happen if we provide something else some movie so suppose La La Land Wikipedia so let's down like let's get this uh movie uh like link so okay so now go back so my Azure uh URL was selected as a app now this is a movie link let's run the same thing and see what happened so it should tell it's not app it's movie or something else based on the flow that we have built okay it's done so you can see uh using that python code it has got the enter Wikipedia page nice and then uh it's summarizing it into 100 words that's also nice and then using the sample values and then create using that prompt it has does the classification and it has selected the category movie so you can build this type of like uh workflow by yourself and you can connect it with email with the text with document PDFs or other uh like uh types of like a documentation so that you can apply this this type of like Automation workflows and make your day-to-day life uh better and faster as well efficient as well so this is another example of um new tool that Microsoft has launched prom flow so not only just the prom flow you can have your bring your own data you can have your own chat flow you can even deploy those models connect them with your uh web application chats and everything so you don't need to write those uh orchestration and everything all those things are already there okay so if we will go back to so this was the second thing I wanted to show and the third thing we will talk about is if you go back to this uh ailopin studio so let's come back so you have uh your own data so suppose we have a we want to build a uh like a model for uh hockey so what does it mean so we want to have a uh hockey for dummies we have a document that I have just downloaded I want to build a chatbot where anyone can ask any question regarding hockey uh that are available in this document so I want to create a chatbot on our on my own data so how can I do that so let's go back and see so once you go to the chat section of a op Studio you get this playground from here there is a new option as a preview add your data so let's click here and then add data source in this case I have selected upload file uh the subscription uh The Blob storage I need to select here I have already created a blob storage and then you also need to select the cognitive search so I have already created cognitive search account before so you just need to like uh uh create it first get the IDS and everything and then come here and select that concrete search here this is the concrete search workspace that I have then uh you need to select an index name uh the how uh where all those information will be stored so gen AI demo 3 suppose add vector and also like um so I need to select the embedding model so this is my in this case this is the embedding model that I have have already uh embed 2 Okay so let's select the embedding model embed 2 I acknowledge and next now I need to select my own PDF file so I have selected two PDFs suppose is hocky for dummies is hocky for Basics and then if I upload that one okay click next search type will be hybrid and Vector I acknowledge next save and close so you can see some lot of work will happen by the site and it's going to like start to prepare um it's going to upload all those files it's going to process those text is going to create an uh like a vector uh representation of those text of embed embeddings and then store them in concertive Search and then use them with your chat so we will show uh two example what happened with uh these uh sample files and what will happen without the sample files so it may take a few seconds uh okay you can see almost it's done uh we didn't choose big of like that big of a data set so it's pretty faster so I haven't done any chat I'm not providing any context I haven't done anything so okay on that document there were different types of information like ice hockey offsite rules ice hockey uh like uh uh suppose if you go back isaku Basics so you can see isaki offside rules iing the part and all the other informations are there so we will ask something like that so it's done so now we can ask anything regarding ice hockey to our um chat so let's do that tell me about ice hockey offside rules offside rules operator must have at least one extension uh chat I'm going to start it again playground setting search resource that's weird go there so if we go there we can see uh all those two files that we have uploaded I have already created some sort of like semantic representation so extension that completions uh operation must have at least one extension model one new stat station let's see let's remove that one and like a create again one more and by the side uh when we are go search data source upload files right blob test index name is uh demo 45 embedding is embed this one this right before this demo I was like playing with that one if everything was completely fine you know when you are showing something that's the time it starts to like break okay so it me it will take few seconds so what will happen we will be able to like do chat on that uh text uh we will be able to like do quick search on that uh document that we have just selected uh this one and even when we remove those uh like uh values it will even like tell us oh um I don't know because it doesn't have the context so something like that so let's try one more time and then if it doesn't we will show it uh in another time or I will write a blog and then I will uh publish it so that you guys can follow that one later uh and we also want to like cover a lot of like uh information in terms of like uh uh like how this llm limitations uh ways to improve our gen tools and everything okay okay this is actually done um we have already selected gen model the posto chat tell me about isaki setting [Music] structur parameter is sequence okay so I'll actually like create something uh as a I'll check that one what is a issue regarding the extension but let's go back so this is one of the way of like creating the uh using the jpd model to chat with your own data now let's take a look at how this works uh how this rag or retrieval augmented uh like generation works so this is the workflow that looks like so you are providing your user question then it goes into the llm workflow it takes your query uh it takes your uh the question that you are providing it query on the data it's it uses uh cognitive search uh to query on the data and it gets the search index so this is a external provider of ctive search that's why it's outside of the llm workflow it sends it to the conc search does the searching get the search index and fetch the information associated with that search and then bring it and add it into the uh like prompt creates The Prompt with the uh query data and all the context that you're providing then you are uh querying to the large language model where you are providing the prompt you are providing the context you providing the data and then llm based on all those information prepares the response and then send the response to you so this is how this end to-end process looks like um and as llm model is not trained on your data uh it's always good to use use uh like your uh retrieval augmented generation so that you can provide the sample uh like PDFs text and other information so that it's easier for llm to search uh like fetch information and give the context as well okay so uh now let's let's take a look at how this architecture looks like as a solution architect you also need to see um what are the types of like processes you need to uh like go through to create an llm application so this blog was published on GitHub actually uh recently um the architecture of today's LM application I really like that architectures that's why I used it in this presentation so this is how this workflow looks like first the end user uh send a a query or uh like hey this is what I'm looking for uh can you tell me about ice hockey or ice hockey offside room so you send it use the UI to send your query from the UI uh there is like a complete llm workflow in the back so there are two things happening first thing is from the UI you copy the query and then uh add it into your initial prompt by the side you also like convert take that uh query convert it into like uh embedding uh to embedded model you have your own Vector database you do a search uh with the uh like embedded queries with the vector database to find out all the other information associated with that query uh based on them you you also have a data filter if you want to do some sort of like a filtering uh so that the unauthorized data are not like copied or anything or fed and then you get the uh rag related information so something like this you get extra information that you your llm needs in this case to complete your initial prompt so you are bringing the query and you are also bringing all the other Associated information regarding that query you're combining them with your prompt you use prompt optimization tool to limit the prompt to uh like keep the uh like number of token give any kind of like uh uh like a Content like information that you are supposed to provide or uh like any kind of optimization and then you send them to the llm API like open AI uh like jpt model jpt turbo or text uh like or uh text the VG that kind so your llm uh used all the prompt information uh that you have provided create create the response uh of that right response of that query that your user have asked and then you can also add some sort of content classifier is your llm giving any bad information any uh like negative information or something like that you can have your uh content classifier to classify is it a harmful response uh offensive response or not and based on that when once you're like completely satisfied with the response that you are getting from the llm based on the context that you have provided then you gave it to the LM output which is shown into the UI and user see that output that is prepared so you can see on a chat GPT we just see oh the UI that's it we ask something we get something but in the back there's so many document search in prompt creation uh document filtering llm API calling content classification and then uh output preparation so many other things are happening and this is the uh like uh um uh like a modern architecture of LM application looks like any question on this one so far no okay so let's move ahead so for an Enterprise level to create an llm uh like application uh there is an Enterprise llm life cycle so this is proposed by Microsoft so there are four different things are happening in terms of LM first one is uh like there are four Loops actually ideation and exploring billing and augmentation operationalization and managing so all of them happens one by one so once you have the business need you uh do the ideation exploration phase you create okay this is the project I want to build this is the input this is the output and then you go into the project development phase of build and acation then you uh build your model you do a lot of fine-tuning and everything you prepared once you're satisfied you deployed and then after the deployment you need to like uh have feedback to see if this model is doing good or not sometime you need to like go back and change a different uh like find you your requirements uh change different ADP different types of like databases as well as a rack and then the entire process of managing this is also like a loop that we uh need to like follow so this is the Enterprise life cycle looks like so if you go a bit detail how does it uh look like so the first one is and what are the task we do first one is ID and exploring Loop so you need to understand the business requirements you need to uh using use case specific foundational model as I say not all models are applied for all use cases you need to find out this foundational model is good for text classification or summarization or chat or Q&A or something like that you also need to then create some sort of prepare some sort of data you need to create the prompt you need to identify limitation oh you don't have data you don't have samples so that kind so this is the first part of like Enterprise llm life cycle ideation and exploration and selecting the foundational model once you have the foundation model once you have the initial data and the prompt then you go into the building and augmenting loop where you guide the llm to meet specific needs you train your base model uh on your local data on or even real-time data so what does it mean by local data and real-time data you can uh use a rack based process to connect your llm with the SQL or non-sql data injection uh so that the prompt uh is complete and it has more context similar to this so this here we have searched Vector DB to get more context you can even have uh like SQL search non-sql search uh SQL queries that type of like extra data injection tool as well so that your prompt has more information then you can also do find uh fine tuning of your llm exactly how you want your llm to answer so this is what you can teach your llm here and then you can combine rag fine tuning togethers to get the optimal outcome and optimal result of your uh uh LM uh GPD Bond then the third one once you are satisfied with the the way your Bot is uh like performing then you're going to develop into production you need to deploy that model you need to monitor that model you need to incorporate content safety and other kind of like cicd process so that as your llm is creating content you want always want them to like uh doesn't you want them to create any harmful content or negative content you can uh like control those based on uh using control uh content safety as your services then uh you can also use as your AI prom flow for the deployment it's very easy to uh use uh use as well uh once you deploy that one when people are using it in production you also need to want you also want to collect those queries and responses for monitoring so that you can identify okay that many people are using it every hour you need to have that many instances into the production uh what is the cost estimation looks like uh do you need a different model to make it cheaper so you need to have that type of like a operational uh loop here and then once your model is in production for six months one year uh or even to couple of months you can detect the drift so what type of responses your model model is not giving well what typ where it's lagging the most uh where the quality is degrading so that you can f unun it again you can Pro do uh rag integration again this kind you can also like give um more context you can do more prompt engineering and the last one is Enterprise life cycle you need to have follow proper tooling governance management security uh following responsible AI principles so these are the like a four things uh of Enterprise llm life cycle ideation building uh then the operational Loop of your deployment and the managing l or managing Loop of through governance management and security so it's not super uh straightforward that have a GPT model deployed and done no you need to follow this entire process now the second thing is I want to talk about uh next is llm Ops so we have heard about ml Ops but what is llm OPS so this is a new uh like uh word that recently emerged uh large language model Ops uh is it's the process and techniques and tools of uh operationalizing a model managing a large language model retraining a uh model deploying a model so that entire life cycle is llm Ops just like the way we have L mlops is a similar to llm Ops so now let's take a look what are the differences for mlops so that's how it looks like you have you prepar the data uh which is data registering data source uh data transformation data wrangling then you build and train your model it can be a classification model regression model you do feature engineering model and algorithm selection uh you train the model test the model do the hyperparameter tuning and then you Deploy on the deployment you validate the model you package the model into Docker container API you deploy it you in you you deploy it as an inference API and also all those production inference you are making you also use them as a uh stored them in a production data store and the last thing is monitoring the data drift monitoring health of the model infrastructure monitoring a testing in future for any kind of like uh retrained model or something like that and once you are uh not satisfied with the model result you go back and retrain it from here so this is the regular day-to-day traditional machine learning life cycle looks like now if you look at llm Ops how does it looks like so most of the things are same just the changes are here in this build ENT tring section so in terms of LM Ops as well you register the data set you create the transform the data you do some data wrangling you keep things side by side so that you can f tune your model you can your data source is ready to test your model okay so this is a data print Under the build and train here you don't do feature engineering and model and algorithm development retraining training from scratch what you do is you do the base model selection first you do the prompt engineering you do the fine tuning how you want the result to be you do the uh rag uh retrieval augmented generation which means you give more text uh PDF uh different sources of context to the model you do hyperparameter tuning and then you you repeat that process until you get the best result uh after build and train once you're satisfied with the model you deploy it validate the model package the model deploy the model model inference the API and production data source as well and the last thing is you will still need to like uh detect the drift what in what type of scenarios your model is not performing well where your uh model health is not doing well uh where it's taking a lot of time to respond uh do you need to change the infrastructure have bigger uh like VMS to uh accelerate the inferencing process if you have two types of model testing the AP testing and then retraining so under llm Ops everything is same just the process is different in build and train but llm and ml Ops all of them are mostly follow the same framework so why do we need LM Ops why don't just like train the model once and then done no so you need to follow the llm op's life cycle so that you can build efficient uh uh like bring efficiency you can build your model fast you can also like deploy it fast as well you can have a Scala in this case instead of having a single uh pod of your deployed model you can have scalable uh like multiple pods so that hundreds of people can use it at at the same time uh to your model and it's automatically scalable and you also need to have risk reduction you can add regulation so that your model doesn't give any batter information uh you can add monitoring uh and drifting uh so that you understand where your model is lagging and you can also have a transparency if your model is saying something something what is the data source of that result or information so to incorporate all those like uh best uh practices efficiency scalability and risk reduction you need to follow the llm uh like life cycle or llm Ops and as I have already said lm's components are uh exploited data analysis which is a part of the data processing um and once you data get the data prepare the data identify the data prepare the data as well you do the prompt engineering you do a lot of like uh uh testing with the prompt engineering to get the most context to your model you find in your model introduce rag uh and also like you uh review the model at the governance inferencing servicing and service serving and also like monitoring okay now how can you make your model smarter how can you make your model better so there are three key ways to make your model better first one is prompting giving more text ual context to the model second one is fine-tuning bringing more data uh sample information and retrain your model with that third one is rack retrieval uh augmented uh generation which is giving more data sources live information to your model prompting fine-tuning and rag so um if you look at like uh how can you make your llm smarter those three option that we talked about prompt fine tune and rag so when you train them model your model needs to have certain out outside information plus your model needs to have some domain information so that domain knowledge is provided but outside knowledge it comes with so um if you do do just prompting just give some textual information in that case you are giving that domain knowledge on the prompt and your model also have very little outside information because the model was trained three months ago four months ago on generic information that's all it knows so it's it's knowledge is lowlevel but if you uh like uh do the rag uh like on domain knowledge but you also bring uh like rag on outside information which is PDFs text or other types of Json types of information that you provide then that is good in that case your uh like model has more domain information more outside information to prepare the uh answer or something like that if you just fine tune it retrain it with some uh like extra data so that it understand how to answer it how to organize the output uh is it are you expecting Json based response or some other types of response so through that fine tune it will have domain knowledge but it will not have uh outside knowledge because you're not retraining it based on outside knowledge but if you do both fine-tuning prompting or even like rack all of them combined then your model will have a lot of domain knowledge your model will have a lot of outside knowledge so this is one of the like best ways to do to make your model smarter and uh if you look at like uh the reason why fine tune prompting and rag is the best applying all three it it will have access to current information through Rag and fine-tuning because you're using updated information to retrain how you want the response to be and also giving uh the model all the new context you also like uh Pro providing specific information through rag uh you're also like controlling uh the hallucination uh by giving the Contex so that it doesn't create its own response by itself and whatever the response the model is giving you can show uh to the user that hey this response is coming from this uh PDF source so that it's more trustworthy and it's also like traceable so this is one of the like a key way of making your llm model smarter in this case so we also notice certain issues in terms of llm uh first one is uh misinformation uh like our llm model can easily give a wrong or biased answer so when we don't provide enough prompt uh engineering a text or context it doesn't have sufficient information and it can easily give biased answers and this under hallucination we see two types of hallucination one is intrinsic hallucination second one is extrinsic hallucination so it can combine two types of data information or sources and create some sort of fabricated content which we call intrinsic or when it was trained it has trained from a unverified source or uh like irrelevant information from the online so it can give extrinsic hallucination it looks like oh it's a bad information that it has given but the way it was trained on the training data maybe that information was unverified from an unverified Source it just like collected that and it just responded uh with that so we need to like deal with both the cases fabricated content which is Created from multiple sources and unverified uh like U sources or unified content uh provided during the training uh like situation hallucination can be reduced by increasing training data uh kind of like bring doing more fine-tuning and involving more contextual references through rag PDFs and all the other information so that it doesn't go outside of the uh like scope and as I have already said LMS are non-deterministic same prompt May diff uh promt different times can give different answers so we need to be careful regarding that now the question is what can you do with the llm and what you cannot do so llm is great for summarizing information if you provide the information it can summarize it if you uh it can also like perform better with an augmented context when you give the text give the sample information it really perform well it can be a great coding assistant it if like it already has a lot of uh examples like uh Google uh GitHub copilot or something like that it can be supported by other software processes like uh when we use chat GPD it's not just using you are not using the raw llm model it is in the back end just like the way I have showed in the uh like architecture it has uh different types of like vector database Bas it has other API it has other SQL and non-sql connections so that uh this uh GPT response has more context and it gives you more uh like real information so we also like need to know that uh if we use uh llm in our own application we need to in the back end we need to give more context more API more uh like in like PDFs more text so that it gives a right information so it can be supported by other software processes as well and things um it can also do is it can be a great source to augment your own knowledge based to users so as I say if you have a uh web application or a mobile app you can easily use your own a documentation uh manuals user policies and everything and build your own GPT model and exposes to expose it to your user base and they can easily search oh what are the limitation how to conf configure this one that type of information to build your own chat jpd kind of model but what geni cannot do first thing is it cannot be certain about something it can it has a hallucination problem it will answer you but if the training was wrong if uh prompting was not right it can give you bad result so you need to be careful about that it cannot figure out new things on its own so it doesn't know uh like new information so you need to provide that information through rag or fine tuning or through other ways so you need to provide that options it cannot be certain if another content was created by gen AI or another GPT model so it has no idea so you cannot use the GPT to tell hey can you tell me this this passage was written by another GPT model or not it cannot do that it cannot properly site uh the information by itself because when it was trained it was trained on B text so it learned the weights bias uh like metries but it didn't learn the sides or sources of those information and it can easily hallucinate The Source uh like from the training so if you ask GPT some question and it answers and if you ask it do you know the source of this information it will just tell you oh uh it was provided during my training in training uh process so it doesn't know where this knowledge comes from so you need to be careful to use those information and one of the big question that we get is J I cannot can take your job or not honestly it cannot but again um it it has no way to proactively learn for itself but uh we can use gen to automate a lot of things and automate a lot of like manual job and processes so this is uh the scary part but geni by itself no it cannot take your job or anyone by uh in any way so uh llm applications where can we use llm for writing assistance for sure technical creative writing General editing documentation or programming sample support we can use them information retrieval uh we have seen that uh uh Bing is using search engine uh support like a GPD for search engine support conversational recommendations so many applications are there these days all those like mobile apps are adding uh like jpt based conversational recommendation or uh like document search as well document summarization uh text interpretation are other use cases for commercial use case we can see customer support it's it's the one of the most one uh machine translation automation of workflow the one I just showed under using prom flow business management and medical diagnosis uh if you have a tele medicine system you can see a lot of companies are uh trying to come up with the uh GPT based tele medicine chat application as a screening tool and for individual uses cases you can use uh gptv uh models for productivity support Q&A brainstorming educational purpose and problem solving you can have a jpt based assistant to discuss to talk and everything so those are the use cases that we see as a solution architect but I also like talked about the limitation how can you make it better uh the how the architecture looks like and some quick demos on the jpd and again my main uh message will be don't forget uh gen AI is not the only AI so it's just a part of the AI so there're also like a NLP compation other type of AI like classification uh like regression uh forecasting all of them are there so don't just focus on this focus on the inter Suite of AI systems so that will always be useful for you um that's it uh of my one hour presentation thank you so much um you can connect with me on LinkedIn uh following this link
Info
Channel: Kasam Shaikh
Views: 2,315
Rating: undefined out of 5
Keywords: GenerativeAI, Startups, GenerativeAI-for-Startups, Azure, OpenAI, AzureOpenAI, MVP, dearazure, kasamshaikh, GenerativeAI-for-Solution-Architects, Architects
Id: ErBAVhD81No
Channel Id: undefined
Length: 61min 47sec (3707 seconds)
Published: Wed Nov 15 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.