Getting Started with Azure AI Studio's Prompt Flow

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] thank you [Applause] [Music] [Music] thank you [Music] thank you [Music] thank you [Music] foreign [Music] thank you [Music] [Applause] [Music] hello and welcome to another episode of the AI show my name is Seth Juarez howdy doodle by the way we've got a great show today I think it's just me so I don't know how good could it be I don't know I don't know um today we're gonna look at prom flow it's something I've been putting not putting off but I've been trying to make up like the perfect video and I realized that the only perfect video we're gonna make about this is when we make the together and so that's what we're doing today um so we'll start first where's everybody coming from say hello I'd love to say hello to each and every one of you from wherever you are and wherever you're visiting from um but again Today's Show is all about prompt engineering using Azure AI Studios prompt flow a brand new end to end prompt engineering tool from prompt construction all the way to deployment and monitoring Proflo has everything you need and more wow I sounded like a commercial which and I'm I'm biased obviously because I work on it something I work on together with a ton of people so where is everybody coming from um and as everyone is saying that I am going to go back to this handy dandy tool by the way my desktop was broken for a while I stayed up late last night and fixed it I had to replace the motherboard and so um I am now on my brand new spanking new machine from the new machine old machine and I don't have everything installed uh but 100 this all happens inside of a browser so I'm pretty sure that's the case we'll find out today today we're going to look at the basics basics of prompt engineering with [Music] Azure AIS prompt flow and here's here's the order of things as you all tell me by the way I got my Wacom tablet to work again so I'm running with my actual this is my actual handwriting um we'll start first with uh talking about like uh here let's delete this let's delete this let's delete this like this we'll start first with talking about like um here's the order of things and we're probably gonna run out of time so I'm gonna have to do this another week number one why what what how llms in this case we're using chat gbt what what is this thing how does it work yet Etc so that you know what the job is because I think the problem is is most people with prompt Engineers just don't know what it is that we're what's the job like what is it that we're actually doing and why and there's going to be some opinions that I have that I'm going to tell you and then there's going to be um some things that just that's the way we do them in problem so that's the first number one why what how LMS number two number two what [Music] it's just like maybe this is why computers were invented so uh what is the task we're going to talk about that number three we're gonna go a quick quick intro intro to PF prom flow I'm gonna call it PF from now on then we're going to talk about um for uh run times run times plus connections [Music] and then five we're gonna we're gonna build a PF [Music] my senses we're gonna run out of time at that point uh were we to continue were we to continue I would show you how to test a prompt flow uh bulk test it and then do evaluation flows after we do evaluation froze we would look at metrics get metrics uh and then after that I'd show you how to do deployments and monitoring but that I think we're going to run out of time at number five and so that's what we're up to today [Music] is what we're doing uh so let's see where everybody's coming from here um where is everybody coming from and I'll put my uh okay we've got a lot of people here today I'm excited to see you I I want I spent some extra time tweeting out and putting on the socials that I'm doing this today because I want everyone kind of has been asking it for this stuff and I I feel like I have neglected it so hello from Singapore hello my friend how are you lovely place um Abu Dhabi welcome UAE I've always wanted to go to the UAE fascinated by the Arabic language incidentally uh San Diego California from LinkedIn land of my birth I was born in the San Diego Sandy Milwaukee Wisconsin you betcha uh okay another Microsoft product with flow in the name it's not a product it's a feature of azure AI Studio [Music] so there you go Javier from Spain bienvenido me amigo yo even yeah that's my Spanish New Jersey uh The Melting Pot of the United States there's so many different kinds of people in New Jersey it's fantastic turkey welcome uh Taiwan Welcome India lovely place oh it's Cassie Cassie needs like a walk-on music whenever she's here I don't I don't know what I should pick Cassie Cassie's awesome Boston Mass Toronto Canada South Africa welcome uh from France a Sheffield UK so we have a good smattering of people from everywhere um what I want us to do there's so many uh there's so many folks here what I want you to do is what is this stupid music here over here um what I want you to do we have Arizona Argentina um oh Paul I would like to protect myself against prompt injection great question uh Argentina uh Microsoft has been good to me uh nice uh Serbia ATL Argentina Istanbul Philly lots of folks so what I want us to do is um I want folks to ask questions as we go throughout this because really it doesn't really matter what I have to say it only matters what you understand uh and so if you don't understand something that I've said it's not because you're not capable of it's because I'm a bad teacher and I need to explain different way so feel free to do that um uh I don't we don't need to be full screen here but there you go Denver Colorado hello my friend uh so this is what we're gonna do to start so let me do this um yeah let me do this let me start first with this important uh thing here what why and how for llms I'm going to be brief because I've done this a number of times with you all and um I want to make sure that I repeat it over and over again because I think there's a general misunderstanding for what these models actually do and now we're talking about chat GPT in the model itself so what these models do is they take uh in the case of chat TBT 4000 tokens 4 000 tokens and if you're wondering what a token is a token is like a piece of a word uh a piece of a word like a couple of characters so I'll even show you because I I'm like talking here out of uh so if I go uh open AI tokenizer right you'll see that even open AI is super nice and they say they add like the tokenizer so oh Windows H yes let's do that let's talk about a couple of things and let's see how these are entered and tokenized thank you look at us using AI all over the place okay so uh you can't see this because I'm looking at it super small this is what I just like come on I forgot to put Zoom it on my machine I knew I forgot something Zoom it brand new brand new machine download Zoom it if I don't have zoom it I cannot um I knew I forgot something so this is the thing I forgot uh so let me do this I always make like a util folder on my machine so we're gonna right click open a new window sorry and then I'll make utils new folder you you tell thank you and then we're going to take Zoom it's all of them and we're just going to dump them into my util folder thank you I wanted the original is this a and right I'll just do this one agree okay okay now we are ready to go thank you thank you okay so now you'll see that this is just text that I like spoken in but these things are broken up into tokens like this like this so for example this token right here let me lower the music down notice how there's a apostrophe s repeats itself a lot so it becomes a token otherwise if we have this as a token then we'll also have to have like other words like Seth's uh token and then there's too many tokens and so notice that tokens are basically pieces of words sometimes they're entire words and that includes the spaces okay so that that's what I mean by tokens and and that's hopefully pretty clear so given 4 000 tokens what this thing does is it puts those tokens through a model we're going to make the model this this little box and what comes out of this thing is just one token so that's what these models do so so that's what this does and you're like but Seth but Seth please I have seen chat gbt type at me a bunch of stuff well it turns out uh oh I just man I'm all over the place today it turns out that this what it does is once you have one token it will take that token right put it here and then it will take the other you know 3999 tokens put them there and then it will put them in the model again and then it will return another token and it will do this over and over again until it does the max number of tokens so that's what these things do that's all there's no that's it um and I I know that you you we want to make them feel like they do more but they do not the reason why they're so surprising is because they generate just the best tokens um uh from Ivana the question is what happens when you reach the maximum so there's there's truly two things that will happen they will either reach the max number of tokens or we will reach what's called a stop token and the stop token tells it to stop generating because it's like I've done with the thought uh when it's when it reaches the max token it just stops so that's all these things do but what happened was gbd2 was quite capable but something funny happened when you got to the number of parameters that the model has in GPT 3 and 35 turbo and four it started producing ridiculously amazing texts and so what you need to know also is that the longer the longer the more tokens it generates the less context it has from you so this is you right as we go down notice that this context gets smaller this gets bigger and so it starts to you know invent Stuff Etc so for example I don't like the term hallucination because it imbues it with agency it neither has nor aspires to I don't like the word like someone once told me on on social media the model is lying and I'm just like um it can't lie it has no it doesn't know all it wants to do is generate the best tokens and so how are these things trained well we basically and by we I mean like you know like your sports team when you're like hey we did a really good job against the foe team last week that's what I mean by week because obviously I did had nothing to do with it other than I'm playing with it right now um so what happened is we looped through like the entire internet this is an example of one way of training there's other ways of training it and they gave it like a sentence on the internet and they asked it to predict the next one and then it looked at the actual one and then it used this to push weights down through the model again and the way you do this is because it's a ginormous Vector multi Vector function the way you figure out how bad something is is you measure What's called the loss function you take that loss function and the loss function has the actual function that we're using to predict you take the first derivative set it equal to zero and you because of the way derivatives work there's something called a chain Rule and so when you take the derivative of something and set it to zero it finds a maximum or a minimum value and we want to minimize a loss function and so when you push the weights forward when you minimize a loss function to find out what the optimal weights are you push the weights backwards and update through the entire function and that's what this is doing that's what this does so that's uh that's the model thing uh so uh so now that we know what this thing is and we know how it's been created basically it's a ginormous function that wants to produce foreign text for you because that's what the math has forced it to do it doesn't have any thoughts it doesn't have any feelings and so the reality that that we need to understand is that what you put in these 4 000 tokens directly influences what it outputs directly it directly influences it uh because it wants to it wants so badly to compete complete your thought in a way that opt that was optimized pre-op priori so these these tokens so once the model is built this model you when you put stuff into it it doesn't it's not learning it just is what it is it doesn't it doesn't remember things from the past like if you're chatting it doesn't remember the chat history is maintained on the client side incidentally it doesn't update its numbers internally it literally just it literally just pushes the tokens through and produces the next likely token it it doesn't update so if you're worried about like well is it learning off of what I'm typing the answer is no however however whoever is exposing the model to you they may capture stuff before it goes to the model and do stuff with it but that's a different problem the model itself doesn't it doesn't remember stuff it just has its internal representations huge vectors Etc so the job becomes and this is the important part the job becomes putting the right things into this prompt so such that it yields your desired output that's the job and this is why there's Frameworks like Lang chain and semantic kernel which incidentally this is again this is my opinion so nobody like I would love to update my mental model if you have a better if you're like no Seth you're just wrong here that's why I have a hard time with things that opfuscate this task like laying chain or semantic kernel it adds a layer of ceremony that it neither requires nor elucidates the task at hand and that's that's what what worries me about these these Frameworks because all you're doing at this point in time is text programming you're creating text to put in the model so that it returns your text that you want that's the job so I don't I that's why again please I would I and I have I went and tried to learn I learned Lang chain a little bit and I was like uh I want to learn semantic kernel a little bit and I was like uh there's other tons of other ones by the way those are just the two more the popular ones I I just don't like them because the task is programming the appropriate text to yield the appropriate response but I I when I say things it takes about 30 seconds for you to hear them I would love for you to tell me where I'm wrong so I can update I'd rather be wrong with friends you know than wrong like on stage or something so uh feel free um to to be like no Seth because I'm here amongst to my friends say tell me like where if if that's wrong then I would love to know that's why I don't like Lane chain or spend Chrome because it obfuscates the task that's the statement and feel free so that is the task so let's do a quick intro into prompt flow because if the job is to um if the job is to build a prompt um that's what Proflo does so let me show you a couple of things real quick before um I get to this part because I wanted to show you something just to give you a sense for like uh how these things work by the way this is a brand new machine you saw I had to install stuff um see this is what happens when CA CA s e okay there we go that's me uh uh to uh CA c-a-s-e-j okay yeah okay very good okay so now I'm gonna use I'm gonna use this one so I'm going to use this resource okay cool cool cool cool so I'm back in so this is uh we'll use gbt35 but by the way this is the Azure AI Studio these are the models I have right now so I'm gonna use gbg35 turbo just to prove to you that that notion of like the prompt is the job okay so you are an AI assistant that um let's see uh that helps uh that that uh knows all about the Eurovision song contest contest if asked about anything else um please respond respond bond with it okay cool uh then I can say tell me something notice that this here this here is the beginning of the pro I it's called the system message I don't know why but I mean basically what this means is here let me let me whiteboard this here uh when you send the prompt remember this is 4K tokens uh there is like a pre thing people people call this the system message and then and then you have uh here we'll we'll make this blue here um then you have this thing in here called the user message uh but basically it's all just the prompt so you are an AI assist you are an AI oops you are an AI I should have said assistant I don't know how to spell it right because it doesn't matter you are an AI assistant that knows all about the blah boom notice that when I said tell me something of course it's going to want to tell me about the Eurovision song contest because that's what I put into the prompt so if you look at the code of the prompt we'll see this is a chat API but they're all completion apis but chat is special because it basically puts the chat turns and so you can see here that the here is the system message and then when I send it a chat I gotta let me start let me start this again because I I changed the prompt and I just tell me something I don't have to spell it right it doesn't matter and so now when you look at this no that's not what I want when I look at the code you'll see that there is the system message here and then you'll see that there is a role user tell me something role assistant blah so what happens in the background is this thing is assembled into one ginormous prompt that goes into the model so that's what this is doing and notice that because this is all the text that's in the prompt it really wants to talk about the Eurovision song contest it really does because that's what I put into the prompt uh uh who won the first one now NLP is funny because um this is hard for computers to do because who won the first one what what are we talking about what the heck is this even talking about but notice because it's all in the prom and the way Transformers work it's able to look at every message in the chat because it the chat messages are written in a client side and passed back in every time and it's able to find out oh who won the first Eurovision song contest it looks like that it was one that I by Switzerland which is nice uh uh did people like by the way this is my watch AI um notice that did people like it this I'm going to tell you one of the uh like a five dollar word I learned in NLP class in grad school it is a pleonastic word which means that it it can mean anything it can talk about a person a thing a place a whatever but when I said do people like it it's like oh yeah it did the Eurovision song contest was well received uh did a lot of people oh how about this have a lot of people watched ese notice it I'm saying yes C and it's like oh yeah notice this is this is impressive because again it's just very good at at doing this now I'm gonna I'm gonna show you something that it sucks at oh why why did it stop playing I play again here so who won ESC in 2023 now you're gonna see where it breaks down it's saying this is happening in the future it hasn't it already happened but remember that the model was trained on the internet in 2021 I think that's the year I I'm not sure don't quote me on that so as far as the internal representation of the model is concerned this hasn't happened yet but it has so how do we how do we force this thing Force this thing to know well again if the only thing we can do is put stuff into the prompt so that it can do the right thing then it stands to reason that we should put that information to the problem I'll do that I'll do that watch uh you're you're over Eurovision song contest 2023 let's see what we do let's let's do oh look Wiki we'll we'll go full Wikipedia full Wikipedia where you at where you at this is what you know I didn't even read it I don't know who won now when I say who oh yeah type over here who won it in 2023 now it knows how does it know well because we put it in the prompt this what I just showed you is basically what is called rag or retrieval augmented generation so the idea becomes if you type something in what we want to do is we want to intercept what people typed in what people typed in we want to search for stuff that's relevant to that thing and we want to push it into the prompt so that it has that information what should should I put on my tacos why is it saying this why is it saying salsa or for example or for example what is the best kind of or or for example um uh what bank bank should I use why is it putting salsa in here for this well because I told it this if asked about anything else respond with salsa and this my friend is what I like to call a really dumb way of introducing a safety section in the prompt [Music] that took a long time I was trying to get my I was trying to get my foreign [Music] so they added a feature here that that does this for you where you just like you map it to cognitive Arch and it's nice but that's not we're here for we're talking about prompt flow so the idea then becomes the idea then becomes that you have this prompt that you're trying to engineer based upon what's being passed in and your job is to construct a prop now and notice that I just showed you like like it actually works and I showed you when things don't work and when things do I showed you how to minimally provide some safety mechanisms and so what I want to do is I want to build a um this is the thing we showed it uh we showed at Microsoft build I think it's a customer hopefully it's on um we built a thing at Microsoft build that shows this kind of uh oh man here we go uh here it is it's the outdoor personal shopper thing this is the thing by the way um this thing was a hoot to build uh as most people don't know but I have to build generally I build my own demos because I like to know how a thing works before I have to explain it so here is the here's the contoso outdoors company and here is a shopping cart and I think I'm I need to ask it uh do do you do you carry any jackets jackets uh I think it takes like the first time it takes about 11 seconds oh no that's that's pretty fast yes we carry the summit Breeze jacket and the rain guard hiking jacket the summit Bridge jacket pairs well with the trail ride day pack and trailer in your cart while the ring guard also pairs well with the truck ready hiking boots and trailblies and so notice that that like it knows about like what contoso carries but it also knows what Jane has in her cart and it's able to respond with both of those pieces of information and so the idea now is that we need to do Advanced drag where we need to pull some documents about jackets put those into the problem we need to pull some information about this particular shopping cart and make it all come together so that's what we're going to do that's what we're going to build so the next thing I wanted to talk about is the notion of run times plus connections okay so when you create a prompt flow this is by the way this is the prop this is uh this is azure machine learning this is an Azure machine learning this is this is prompt flow right here uh this is the prompt flow thing so if you click on it there you go we're going to create a new prompt flow so there's two kinds of prompt flows there's standard prompt flows notice there's some things in there that are already been built and then there's this notion of evaluation prom close these flows test other flows um so but we'll get into that probably next time because sorry so uh we're going to create a chat flow and it's just going to be blank and we'll call it our our contoso contoso contoso chat uh store thing so notice that here is the uh here is the prompt flop now let's talk about run times and connections so we'll go over here to flows and we'll talk about run time so here's a here's a here's a good question here um can we fine-tune a gbt model and conduct prompt engineering on top of of it does Azure support fine tune models the answer is yes we do support fine tune fine-tuning models but my my sense again this is my sense is that you will get more juice from The Squeeze of prompt engineering than you will from fine tuning fine tuning is an expensive Endeavor in terms of compute and person power um and so yes um uh will Azure add uh Google bard or will it be just chat TPT I'll show you like we think ours is we think uh opening eyes is better but again I'm biased but you'll see in a second you can use whatever the heck you want using prompt flow to be quite honest but why what write a tricycle when you can have a wonderful motorcycle you know what I'm saying I mean it's a great question uh let's see um please cover training on custom data yeah so I wouldn't do that I would start with prompt engineering because these models are superbly capable on their own I would like I said I would I would get as much juice from The Squeeze of prompt engineering before you do fine tuning because fine tuning basically unfreezes some of the weights and we have something called multi Laura which is a low rank approximation of the matrices in the original one that are then updated and if you're using something like rlhf those things are regularized against the old models because they tend to go off the rails is what I've heard um okay okay here's another one uh let's let me let me read this one I was I was let's see I tried a couple of scenarios others were using Lang chain for and was able to do the same stuff with prompts so I tend to agree with you I haven't tried more complex Okay cool so Veronica yes thank you um it's it's I'm not trying to knock on Lang chain or semantic kernel I just don't think you need that much ceremony for doing this kind of work all right so let's talk about run times run times are environments and and the way that I use runtimes and this is going to change in the future hopefully so that it's a little bit more um server less uh but notice that that I have a bunch of run times and runtime environments are basically Docker containers running on top of um on top of compute instances so here's my compute instance Sauron and notice that Sauron has a couple of apps running on it right and these apps are basically um um prompt flow runtimes so I need to start this machine because that's something I always forget to do all my machines stop it 6 P.M every every day otherwise I you could see the bill I would run by the way this is my workspace I work with a lot of folks you can see inside of uh inside of this including my Mi Cafe who also make stuff here's another me cafe Miss Cafe all of my Cafe's which means boss yo so I'm going to start this compute instance and what you'll see is you'll see that that the way prom flow works is it runs it runs Docker containers so for example this is why we say stuff like Azure machine learnings or Azure AI Studios prom flow will you can use whatever you want because currently you can actually go into the CI the compute instance and you can open the docker container as it's running and you can pip install whatever you want anything it doesn't matter you can put whatever you want into these compute instances but unfortunately or fortunately it's all python uh uh someone's asking about uh that's a great question I don't I don't work on that side of the house so I don't know I'd love for us to be able to do uh prompt Flows In C sharp but I don't know if that's possible right now but I'll ask okay so now notice that here are all my run times that represent a doctor container running uh inside of this compute instant so for example our base prompt flow uh run times have Lang chain built in this prompt flow run time I added cement to Kernel and Cosmos DB for example Cosmos DB API so that's what the run times are they basically represent um they basically represent uh the testing environment for your problems so for example here's a here's a good question from Elena uh could we run prom flow and kubernetes yeah in theory you could run them anywhere uh when you're testing them and running them on in Azure ml you can use a CI so that you can test them but when you test them and evaluate them that's different than when you deploy them deployment of prompt flows is different uh so um and this is the part where we start to stray into like uh the difference between like data science and and development because the reality is that your prompts are brittle uh if what you put text wise into the prompt directly influences the output then you need to test a series of different prompts and evaluate them against whatever metrics you think are important right so in this instance Elena what I'm doing is I'm creating an environment wherein I can test prompts uh and do various and Sundry things okay cool so that's what runtimes are the next thing I wanted to talk about is this notion of connections because uh your problem flows what they're doing is they're assembling um they're assembling a series of different data sources or to to then send them to the llm so for example in this case these two connection names uh represent ones for cognitive search and one of the customer connection for Cosmos and then we have connections to the Azure to the actual llm that you're going to call now because of this notice that because these are separate in theory you could have your own llm deployed anywhere and call it via a connection if that makes sense and if you want to know what a connection is a connection is effectively a series of key value pairs my goodness my AI here is oh on my is it possible to share prompt flows with others uh as that multiple videos can work on a single prompt flow the answer is yes uh currently uh the way you do that is if you would go in here um what would happen is you would basically clone it and work on it we're working on ways to actually have two people work on it at the same time does it support non-english languages absolutely the llm does for sure and this as far as you're concerned uh this is just text programming uh great question okay so right now we have this chat flow that's now running in the prompt flow we'll go to the SK runtime and so now when I when I test it I can say something like uh uh um uh what do oh you know what I'm gonna just go back to the original or did I change it no I didn't uh you here here we go I'm just going to copy this thing so you can see here's the system message um validate and parse input we're going to go to gbt35 Turbo notice that there's the connections already built in um and that's it so now what I've done is I basically redone this thing inside a prompt flow and what was the first question I asked it I don't remember tell me something uh tell me something and now in theory what should happen is this is going to run inside of the run time use the Hubert aoai connection and then respond and you're like well it didn't respond the same way the other one did of course it didn't uh because it's a stochastic process so let's see if I can't make it match as much as I can so it looks like the parameters are Max response is 800. uh what else do I have in there top P 0.95 doesn't have that let's see if the top P point point not 0.95 uh frequency penalty let's go back over here zero zero uh nothing else and then the temperature is at set at 0.7 Okay cool so now let's uh clear the chat and then I say tell me let's see if this fixes it a little bit again the temperature on this stuff what what actually happened what it returns is not the next token it returns like actually like a ginormous Vector with every single token in the dictionary with a probability distribution right and so it picks the the biggest one what the temperature does is it inverts it doesn't invert it but it basically changes the distribution so other possibilities are also likely uh there you go and so now notice that it acts actually looks exactly the same as the other thing because it effectively is exactly the other thing um okay so hopefully this has taken like no I don't know these things are imbued with a magic they don't they they do have a magic but it's not quite like that okay so now what we're gonna do is um I could embed other stuff but now what I want to do is I want to use um let's see here I want to use my my I have an Azure container resource because remember I built we're building this thing uh and so let me go to uh here's my search service and notice that I have an index where here's my indexes so this search service that has my product documents so if I search you'll see that I have like here's information about product number one here's whatever and so if I search for like tell me about your jackets there you go notice that it's returning like jackets from me and I I you can't see so this is what I basically have to do but the difference is that there are some searches that are a little a little different like for example this one has what are called embeddings Vector embeddings the reason why Vector embeddings are useful and I didn't do this in this case so you'll see that right away is that let's just say you have a search that returns a document that's like thousand Pages you can't fit that into the prompt so what Vector embeddings do is that when you're doing the indexing of your data you are taking chunks of your data that are like bite size and you have to like so you're like this could say you have a large document you you break it up into chunks that maybe overlap a little and then when the question comes in you convert that question into a vector and then do a search for text fragments that you've indexed using the cosine similarity between the embedding and the question and so that's what this is so what we're going to do is now we're going to go back and we're going to start to add like some question embeddings so I'm going to go to this thing and I'm going to uh I'm gonna add a node for uh embedding the question question embedding right and what I'm going to use is I'm going to use a uh embedding thing and I have have Ada two and the question becomes how do I get the input from what's coming in into here well it turns out that it's literally just this inputs question and now notice that the chart changed a little bit do you see what's happening here now what we want to do is I'm going to I'm going to move this up because it belongs before here I'm going to add another one to search so notice that you can do a vector lookup right uh Vector DB lookup if you want to but I'm going to just do a python note to do this and I'm going to call this retrieve retrieve documentation okay so this is retrieved so what I need to do is I need to take the embedding and I need to be able to search for the documents that I want inside of azure cognitive search now if you remember I have a connection already that will do this for me so what I need to pass in is I need to pass in the question which is a string I need to also uh get the embedding which is a list of float uh and then I need to get my search service search which is a cognitive search connection cognitive search connection and to do this like I need to import this thing and this is inside a promptly so from prompt flow in prom flow uh the tools.connection import cognitive search connection so let's see if this works uh oh something no no module name prompt flow tools uh connection okay maybe I spelled it wrong maybe it's plural connections all right validate and parse okay inner exception oh yes my gosh let's see uh uh invalid syntax string so let's see here let me take this off and then return and then let me change the name of this I think this is right oh question import is misspelled oh thank you oh see this is why we have a bunch of friends here together nice okay so notice that this now is starting to look like a thing and so where does the embedding come from by the way I want to search with this question string as well as the embedding because that's what makes cognitive search a little bit better a question another another question is can you embed can you use yeah you can connect it to whatever you want and that's the beauty of this of this thing so the first thing we do is we want to get the question notice the question comes from this the embedding the embedding comes from the embedding output and the cognitive search connection is this I mean do you see how it's starting to like be a thing so now we just need some code and I could I could type it in but I'm just gonna paste it in uh validate and parse and you'll notice that the output well it's not it's not a string anymore I'll change that but now notice what's happening is I'm using cognitive search I'm creating a search client here and then I'm looking in the documents index and then I'm searching based on the question I'm returning only the top two documents I am looking at the embedding and I'm returning the name of the dock the ID the title the content the URL and the results okay now what we're going to do is we're going to we'll move this up we'll move this up and now we're going to change this so that um here's the font by the way we're going to change the prompt a little bit and so to do that we're going to actually create a new prompt like so and we're going to call this the uh customer customer up and this this is the thing we're gonna vary because notice in LM you can put the The Prompt itself but we're not going to do that I'll take this customer prompt I'll put this in here we'll move this up and what this will do is this will take a prompt we'll do a validate and parse input oh that's reserved so we'll prompt text I think that's a reserved word cool and now what we'll do is instead of instead of using the prompt here we'll take the prompt directly from a customer prompt output foreign do you see what's happening now now we're starting to get the shape of this thing which is awesome okay so here is the prompt and what notice that we're going through the chat history here you are an AI assistant uh uh that that you are you are you are the the contoso track can I assist do you are the assistant you are helpful helpful and friendly and respond only to share about the products and services this is what you know or how we did this before but now how do we pull documentation into the prompt and this is the coolest part I'm going to say this poor item in Doc documentation uh this is using Ginger by the way if you've never seen this um it says end of four uh and then what we're going to do is we're going to say the catalog catalog number is this item dot ID uh the item is um item dot title and then the content is the actual document would be item dot content okay so uh let's parse the input here notice that now it's going to want the chat history we get that from here notice that it's going to want the question we get that from here and it's going to want the documentation which comes out of the embedding here let's reorder this what's going on there we go that's better why isn't it order it this way I don't like this ordering oh it's not the question abetting that's why it's the retrieve documentation output there you go I was like wondering what was going on okay so now we've built this thing that in theory we should be able to test so let me clear this up and let me say uh tell me about your jackets which is the question we did before and now what should happen is this prompt flow will go through so the question is are all these things running in parallel kind of not really in this case because the input for this thing requires the output from previous nodes uh there you go now uh this is nice but it turns out that it's this is too long an answer um so um yeah let me do this please be brief and use emojis I'm going to pretend I'm going to pretend that this is what they're asking for here because this is way too long so let's clear this out and then let's say tell me about your jackets cool so uh oh shoot we are we are running out of time here let me start with the with the walk-off music because I all righty look at this our jackets are perfect for Outdoor Adventures we have block windproof and block waterproof jackets uh uh um tell me what specific jackets [Music] uh and then it goes through and it will say it takes about five seconds which is not bad um by the way this is this is all inside of prom flow which is [Music] which is is this grounded [Music] foreign [Music] so it looks like we need to work on this because let's take a look at the actual um yeah let's let's get some questions so by the way here's the here's the trace of the thing uh you can actually see the question that comes in and you can see the output is this Ginos Vector here's the content that comes from retrieved documentation yeah so the summit Breeze jacket and the ringer hiking jacket but these are not included in the response so we've got to change the prompt a little bit to make it a little bit better uh let's go to um the trays there's the customer prompt uh here's the output here's the system message so this is what we're sending to The Prompt right uh which is nice and then here's the chat output which is cool all right hopefully this was helpful um let me go back to my um yeah so we covered exactly what I thought we would in the time allotted I think next time we'll spend some time actually testing the prompt I'll show you the testing things and then I'll show you deployments and usage outside but hopefully this was helpful I know that uh this tends to feel like a black box but hopefully as you've been watching you've sort of come to realize like oh I know what to do uh to make this work but if you think about it if the interface is Now language the the quality of the things you can do or the opportunities available with this kind of technology is Limitless anything with language now you can guide uh well llm to produce the right answer now we didn't we didn't really get into uh like jailbreaking etc those are other terms other other other things that are important with this technology and next time uh I don't know what's next week so I we just start a brand new year and so things are changing right um but next week we have another a good show that I don't know what what it is but usually I think we have a guess but any spare time I have in the shows I'm going to start going through the bits of prompt flow so people can have a good sense about what this technology is why it matters and how you can use it hopefully this was helpful I'm very excited that many of you were here and listened uh this has been another episode of the AI show live thank you so much for watching and hopefully we'll see you next Monday every week 8 30 a.m Pacific Time wherever you are the show is live and we're here to answer your questions and make AI a lot more approachable thanks for watching and hopefully we'll see you next time take care [Music] thank you foreign [Music] [Applause] foreign [Music] foreign

Info

Channel: Microsoft Developer

Views: 22,483

Rating: undefined out of 5

Keywords:

Id: vkM_sgaMTsU

Channel Id: undefined

Length: 66min 17sec (3977 seconds)

Published: Tue Jul 11 2023