Getting Started with Azure AI Studio's Prompt Flow - Part 2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] thank you [Music] thank you [Music] [Music] foreign [Music] thank you [Music] thank you [Music] foreign [Music] hello and welcome to another episode of the AI show I'm back [Music] my name is Seth Juarez I work at Microsoft but [Music] I am the in the larger AI Community love doing AI stuff I thought we'd start with uh where's everybody coming from today put in the chat I'm excited everyone's here by the way it's just me today and I'm going to be going through prom flow Park duh I did it one about a month ago and um I feel like we didn't finish so we're gonna do that today uh uh whilst everyone's telling me where they're from uh so let me share my screen [Music] boom [Music] uh wow I haven't done this in a while I don't even know what I'm doing anymore this this is embarrassing oh there it is nice so today uh we're going to revisit uh prom flow is a new Azure AI Studio uh feature I'm gonna turn this down here so you can hear me um and by that I'm gonna mean we're going to look at what we built before what we we built and then we're going to continue obviously throughout throughout I'm gonna do this in red because it's important [Music] red here we're going to answer questions so have them ready cool cues to your questions here so answer I'm going to answer all the questions here about Proflo and building MLM inspired applications you know again some of this stuff is my opinion and I will tell you when it's my opinion other than that I'll tell you if it's Microsoft has built the thing or not but again some of these things are my opinion so just FYI for example some of my opinions are not very popular do you have any in popular opinions when it comes to this stuff we should we should I'm curious on popular opinions and then number three uh I said we would continue but I'm hoping we get to eval evaluation of prompts evaluation because I think this is the part evaluate this is the part that uh most people kind of elide when I look at this stuff and that's probably the most important part gosh I'm terrible at writing I'm gonna write this really nicely because this is the most important part right oh jeez this is all Terrible by the way I'm using a uh my nice uh welcome tablet um Wacom tablet so I'm gonna put this down I'm like running out of space here on my little tiny desk I'm gonna write this down this is super important e that do a evaluation English is so weird evaluation uh and this to me is like not to understate it but like you need to do this stuff if we're gonna use this stuff um if we're doing this stuff if we're using LMS to build applications your co-pilots Etc uh you should really evaluate what you're doing otherwise you're not being very responsible in my opinion Imo I think Microsoft thinks that too but again Microsoft is an amorphous Block Company with hundreds of thousands of people with all different opinions all right so where are people coming from let's take a look here oh look at that look at that uh Germany Dunka that's East fantastic Quebec Canada Quebec I don't understand French Canadian to be fair I don't understand regular French and they don't understand me anytime I'm in France speaking French they're like I'm like and they're like everything's great American Bangalore a lovely Place welcome my friend I'm glad you are here uh Redmond Washington uh or maybe maybe it's a different Redmond I don't know I don't know we have Singapore the Hague welcome The Hague Costa Rica I need to I feel like I need to say it with emphasis I mean that's I left I left this on a little bit it sounds like I'm in my my Reverb sounds like I'm in it feels like more important hi everybody to the thousands gathered here or dozens I mean either way uh Finland as well um CSU hello welcome from Microsoft Microsoft has a ton of amazing people that work for them and I'm just here trying not to not to bring it all down that's just what I'm here Smokey Seattle yeah I went outside and I was like um oh shoot someone said Smoky Seattle and I'm pushing totally the wrong one is that Smoky Seattle Smoky Seattle yes Jay McCormick uh how you doing buddy uh Argentina Bienvenidos uh Bulgaria uh Toronto welcome um and now some of the um hablaturco por favor I don't speak Turkish my mom does though she loves turkey uh Scotland South Africa the Netherlands welcome um we've met it's good to see you again uh Philadelphia from the Greek filios which means Brotherly Love Delphia City uh so some unpopular opinions here uh here's a good one the AI robot apocalypse is inbound and I like eating so I'm hoping my fridge is off the ground hold on I have I used to have buttons for this stuff where's my clapping uh [Music] but it works it works um I lost my mouse okay well everyone welcome here uh so let's get started here oh we have someone all the way from Somalia welcome uh keep the echo on oh man I you know when Quantum Computing was making its way through stuff I think it was what when I start looking at like three or four years ago I was like okay I could either go the physics route or the computer science route and neither the twins shall meet I don't know they're like very different and I went down the physics rabbit hole and boy there is just a lot going on uh with that okay so let's remind ourselves where we left off here uh this is what we built before I don't even know if it works um uh we were building a contoso chat store chat that had documents in it and it looks like here uh I need to go over here because one of the things by the way thanks to Cassie for um she's awesome we are on the same team and so we were going to be swapping duties a lot on the AI show uh so let's go to this Sauron and I need to start this if anything's going to work because what people don't might not realize is that when prompt flow runs it has to run in some kind of computing uh thing uh so where does it run well it turns out that there's two options in prompt flow two options in prompt flow number one you can run it on uh like a manage wow that made a line a managed VM we call them compute instances uh or you can run it in like a serverless and it's not that there's no servers you just think about servers less [Music] servers I didn't make that up by the way that was not my joke um I just wanted everyone to know that I I have the jokes so these are the two ways and so what I'm doing is I'm and so this is you put like an end point and it runs on here so what I'm doing is I'm using this thing here uh so when we go to my prom flow here which I wasn't showing you before uh I'm gonna start the compute environment over again because my compute environment to save money shuts down every evening at six uh so that this thing you know so that's what's going on uh so that's my computer environment I'm starting it up so that we can work and then when I go to the prompt flow the the prom flow you'll see that there is a run time and you can see that it's starting up so on this CI uh compute instance see I not to be confused with um continuous integration this compute instance um runs a series of um prom flow runtimes and the way that you you make a prompt flow runtime is you have this thing called run time and then when you make it there's remember how I said there was two different versions that's what's going on and so this compute instance when you make any runtime you can you can name it something you can pick the compute instance that you want and then you can use default environments um right that that prom flow just has or you can create customized environments and those customized environments are built on top of things called environments inside of azure machine learning so everyone's like it's like that it's that hold on I gotta find it give woman math uh math in her head thinking there's a there it is this is the one uh so we are going to copy this image copy image and we are going to place it in our uh here because I mean you know we're trying to learn stuff here so I'm saying all this compute environment black I'm just gonna draw here um what this means so inside of this box here uh how do I move this up here here let me let me do this here so what's happening is inside of this compute instance right you basic we basically have a set of containers that are running the prompt flow runtime right so this is one machine and this in our case it's called Sauron because you know I'm a nerd there's no way around that I'm basically a huge nerd geek I don't know what the difference is help me out um oh yeah and in fact it's just this is funny um this is funny like literally like reading of my mind can one customize the images and packages on those computers reading my mind adir it's like you and I like we're reaching through the interwebs and we're having an understanding of one another yeah and so uh this this uh compute instance is running a series of containers in it uh for specifically if you for prom flow but it can run out of things like it can run it can run notebooks yeah and so I'm I'm moving away from the machine learning but for machine learning we use it to as a special computer environment and what happens is when you create when you create a prompt flow so you know that little cool little prom flow thing with all the you know the nodes and stuff this prompt flow runs but you have to pick the prom flow environment that you want to use so how do you define those environments well that's what that's what I'm doing here uh as you saw here when you oh right wrong one wrong window wrong window where's the other one oh yeah it's here confused lady meme she's the one that knows all the math though I mean let's be honest uh these are the environments you create um let me zoom in here because it's kind of small um notice they're all running because I turned on but notice they're they're all based upon this runtime environment these runtime environments are managed here because generally Azure machine learning is used to run machine learning code and machine learning code really you've heard a data scientists say hey it runs on my machine and you're like okay well ship your machine and that's what this is for uh and so when you create an environment um you basically are creating your environment so the um let me find the environment that I'm doing here here's my custom environments here is the PF stands for prom flow uh ACS Cosmos uh so this is the one that I'm using right now and the way that you build these things is literally you just like make a Docker file it's not is that crazy this is a Docker file uh now we're currently now again some of this stuff is in private preview uh and so I'm showing stuff that's kind of almost not all the way done we have a current issue with the way we deploy these things or you have to have another environment built on top of that but for all intents and purposes this is how you build um a prop flow environment using Docker which is like you're like oh yeah we're not Reinventing Wheels here there's wheels that are already built this is the base uh prompt flow run time the one that I'm using that's stable and then on top of that I am upgrading pip because I can't stand looking at you should upgrade pip uh on the environments and then notice I'm installing Azure Cosmos DB as well as Azure search uh so that I can do some searching um so here's a here's a good question that I want to answer uh is there a way to utilize uh prompt flow without uh cognitive search yeah you don't you don't need to use anything like in theory you could just use it and I'll show you in a second um as long as you have a way to retrieve information and then uh pull it into a um Proflo but yeah so this is how you build these environments and then once this is built again I had to build this tiny environment primarily because um it it needs Special Sauce to deploy and so I built this one on top of that other one but again this is going to be fixed and so now when I go to prom flow and I create a runtime from a compute instance I can you know call it memorable name of course and then I'll select my compute instance and then I can use my customize environment and you can see that uh there it is the one that I built you see that but these are all this is how you build custom environments so in theory you can put anything you want but the good news is that because you're doing this you're creating stability for other people that are working with you so that they can also use the same runtime environment but I already built it so let me go to the flows here and we are on the contoso a chat a store and now uh we are going to use uh we'll use ACS Cosmos because that's the one I want to use right now and you're going to figure out that my this is going to break um tell me what jackets yeah because I think I used a new version of ACS Cosmos so oh wow this is pretty cool uh what about hiking boots it's gonna break and it's gonna break in the cognitive search area pick uh because uh I there's a new cognitive search um API that came out because the vector search is in beta there you go see there you go a couple of things wrong uh the latest prompt flow takes out the tools they're now connections validate uh okay the other thing that needs to get fixed is I think it's now vectors instead and then this needs to be passed as an uh because you can pass in multiple vectors when you're searching uh I'm gonna get this to work and then I'll explain each one of these things by the way um and then there's this this okay so we'll save it and then uh we'll clear this out and we'll um let's try it out now see if it works here we go and while that's happening let's see some more questions um uh is there a way to utilize prompt flow without yes yes would it require an environment to have the vector search approach available to it via python doc so I didn't answer the second half so you can do a couple of things the only reason you would put uh pip install stuff into environments is primarily to make my face uh because you want to use certain code because again these things are all built using python currently so for example if you have your Lang chain Lang chain is already actually in the default bill because in the default environment because we knew people would use Proflo I'm sorry Lang chain uh but you you can also abstract away your data fetch operations into rest apis and then just call those and then so yeah so that's a good question uh oh with prompt flow could you play Rock Paper Scissors with it yes uh you actually could uh you would have to but it feels like you were you're using like a a shotgun to kill a fly that's a really good question what tool are you using to draw on the oh uh that would be um thanks to Mark racinovich the man with a plan this is called Zoom it zoom in now with fewer calories and better taste [Music] uh let's go back to uh uh thanks in the current narration of Proflo what is the best way to incorporate fabric artifacts loving the mongodb V core by the way yeah you can include anything uh so long as you have a connection to it and you can access it via code uh and you'll see that in a second because I'm gonna I'm gonna I'm gonna use I'm gonna I'm gonna pull in cognitive sir or Cosmos DB in a second to show you the goodness uh can you set up testing variations of okay uh okay let me bring this over here because I'm old can you set up testing variations with flow and score it both on perf and pricing so you can easily choose even the best model prompts for the use case oh man that's amazing uh for perf yes for pricing that's a great idea actually the reality oh you know the reality however is with pricing it's per token and so I guess the token count would tell you it's a great proxy but that's a really good um uh uh that's a really good um question uh mark did Seth say I don't know what I'm doing Marco you know it's just I uh can this be attached to a repo so the answer to that is almost uh give me another week or two and I might have stuff to show you uh a week or two and I and I'll I'll have stuff to show you yeah I know thank you thank you uh yeah yeah it's I'm excited I'll show you that stuff uh yeah this is a really good uh thanks for that uh get the sys internal Suites including Zoom it assistant journals.com which will redirect to microsoft.com all right great questions uh uh let's keep going uh because we're working on the stuff here so notice that when I do this now this is all working we have two jackets a summit Breeze jacket and um hold on let's read it like it's a commercial we have two jackets Summit Breeze jacket and Rain Guard hiking jacket [Music] the summit Breeze jacket is lightweight windproof water resistant and has adjustable cuffs and hoods it's perfect for hiking wow thank you um thank you Proflo um thank you Prof flow for that okay so now let's get into what this is doing so you can see so when the input comes in the input comes from the questions and because we described this thing as a chat prompt flow it automatically has this particular variable called chat history and in it it has the the you know whatever I just put in it so so if I continue chatting with it that's how it remembers these things do not have any State uh these models they don't have any remembrance even in between tokens of what the heck you're talking about uh basically the token goes in uh the The Prompt goes in converted to tokens chug chug chug chug token comes out I think that's that's the exact sound it makes in the data center bloop just one token and then it does it a loop so really it's like that's why it looks like it's typing isn't um and so that's what's going on so let's take a look at at this particular run uh here the first thing that you need to look at is as the question has before you can actually see the trace of the entire thing so the first thing that happens is I am doing an embedding of the question so the input is tell me about your jackets and I'm using text embedding beta 2 using this Azure opening at call and you can see that the um sorry in this Trace you can see that the output is this gigantic vector and it's not going to show it to me oh there you go uh this gigantic Vector so what this is is this is effectively projecting projecting my question my question um tell me about your hiking jackets and it's projecting it into a different space and that space is Vector space so it's converting this into like X1 you know X2 blah blah blah blah all the way to X I don't know how many is in there I think it's like 200 or something right um so that's what this is doing and then what happens is we take this vector and we search it against Azure cognitive search to retrieve the vectors that are closest to it and these vectors represent a document now the reason why you use Vector search in this stuff and this is important is because you only have a finite amount of space inside of the prompt four thousand tokens for GPT three five Turbo and so the cool thing about Vector search is you can do something called chunking which takes all of your information chunks it into little bit and then assigns a vector to that because you take that little bit and then you push it through this uh embedding API too so you get a bunch of vectors and looks like the embedding is 1536 is the space of the embedding okay and so once you start you uh you've and you uh index all your stuff the way you're indexing index is just like a table it's basically a table of vector your your fact chunk fact chunk Vector fact Chuck vector and that way when I ask a question this Vector then goes to Azure cognitive search and pulls it out so uh let me unchatify this so that's what this does this is embedding the vector uh then betting the question into Vector space and then this thing here is going into Azure cognitive search and pulling it out now I think there's a custom tool that does this um oh yeah it looks like we have embeddings already dupe see I'm doing I'm I'm using the old stuff and then you can see we also have a vector DB lookup but I wanted to go like fully artisanal you know like full I want this to be free range artisanal handcrafted prompts uh but in theory you can actually uh create your own you can create your own tool so uh oh wow hi everyone this is I believe an AI has showed up [Music] hello robot overlords I knew you would arrive sooner or later I don't know who that is but I thought that was funny so I said it uh okay so you can see that we also have those kind of just built in so you can totally just use uh these tools that are in there and so you you would basically just drag this on but I wanted to show you like artisanal and also I wanted to show you this because um uh didn't exist when I built this the first time the question for Marco my main man uh do you need to use Vector search to make this work or will it work with semantic search it turns out it'll work with just anything the reality is is that when you get to the prompt uh so by the way I'll answer that question a second great question um so you can see all this is doing is this is uh pulling from the documents index in Azure cognitive search it's it's embedding the question as well as the vectors for searching we're searching on the content field for the text but we're searching on the embedding field for the vector uh so that's what it's doing um now the model itself is not learning from this the model is static and then finally what we do is we take this um we take this uh why is it word wrap just by default on um notice that it's I'm saying you are the contoso track assistant you are helpful and friendly and respond only questions about products and services this is what you know and what I'm doing is I'm iterating through every one of these things and also putting in the chat history is what I'm doing and so when you see this thing running you can see that when I go to the retrieve documentation right the input is the vector the output is the content right which is two documents about hiking jackets so to answer the question about searching the answer really is no you don't need to use a vector search at all you can do whatever you want but effectively what you're doing is you're injecting data into the prompt so that the likely answer that comes back is actually grounded in in the truth which is the documents you retrieved so that's a great question Marco and then another question uh here uh let's see the question um so what would you need to do to test this thing and score it perf against two different models it'd be nice if you can feed it in choice to variable when you run the test yes uh that's a great question so the question is how do we test the thing I'll get to that I'll get to that that's a great question uh another question which is I think a good one let me put my head here again uh which is better using raw document search with vector or extract information to DB Cosmos in search from DB it really is up to you and um your infrastructure and where you store your facts because uh if you recall and if you watched the first video I basically motivated what the heck this is doing the only thing you can put in these models is a prompt and if you want to maximize the likelihood of the truth being told at the end uh basically you have to put as much information you can as a prompt and so it really depends on your infrastructure like where do you store your stuff like it might be in SharePoint doesn't mean you can't use it but you are constrained by uh you are constrained by um uh the size of the product another question are using Lang chain uh with Azure AI Studio here currently I'm not but but you can't uh it's not out of the realm of possibility in fact let me show you another prompt flow uh because I want I want to make sure folks recognize that like this is just basically code and you can put whatever you want in here so some folks might like might like um here let me open another one so I'm not here's another prompt flow so we'll go here here is uh hlp manufacturing um no no no not this one not this one uh which one is the one that I did is it this one no so I have one I have one that uses Lang chain is it this one is it you is it me I don't know where it is I'm not gonna find it I wish I would have labeled it what's wrong with me yes here's one here's one using Lang chain notice that the graph looks a little bit different because with Lang chain you basically inject The Prompt in and so you can see that we have several prompts going into this product Rag and this product rag when you look at it is using all Lang chain stuff you see that uh and so really we don't care what you use at all and you can see here it's using an llm chain and then it's using the chain.run blah the cool thing about this is when you run it you can actually take a look at the trace to see where things are taking a long time you can look at the inputs and the outputs so this is the cool bit of this of this thing which is really cool and by the way this little little fun fact the way you can actually even do SUB traces is you just need to add this little tool thing that's how we great question what were we doing uh okay okay so let's take a look at this thing uh okay so there's the prompt and then finally this prompt is injected into the llm and so what you're seeing here is I'm saying please be briefing you so it turns out that the last the last thing that you show the llm is for some reason it really loves that and so you can see uh please be extra brief extra brief yeah what is this this is really long so now when I do this again uh in theory uh the model takes about six seconds uh but when you run it inside of Proflo in debug mode it takes a little bit longer you know to assemble all the stuffs um uh uh okay cool notice that it's extra brief and you know our jackets are quality top quality look at that wow you are oh you are like look at you being extra look at that jackets are top quality and perfect for outdoor activities like camping and hiking they are waterproof with with fire fire wind I don't know about that and made with readable Fabric Brand we have a variety of styles uh uh which jackets uh so yeah and so as you go through this you can actually see again the whole input output so there is the uh there is the uh retrieve documentation uh this the output of this for G documentation is the jackets stuff but you'll find there's a flaw in my design and we'll we'll we'll try to fix it as we go like this is totally demo where uh uh uh let's go to the trace again and then once we build the prompt this is the prompt that that's being built so this is the input to The Prompt thing this is the output you are the contoso truck AI assistant so you can see all the text the wall of text that we're sending it uh notice that it's going through all of this stuff it's putting in the answer Etc and then uh yeah that's how you get this output so that's how this works all right let's take a look at the questions here uh uh uh so [Music] let's see here this question that means before embedding we can incorporate a transfer API to translate the prompt to English by production the input language and then we can yeah you can do whatever you want this is this is the this is the thing with problem flow the now again I'm gonna unpopular opinion time unpopular opinion time I'm not a fan of some of the llm Frameworks because they obfuscate the actual task which is creating a prompt that's all you're doing there's no in my opinion again this is my opinion uh all you're creating is text so that it Returns the best response I don't like some of these Frameworks that hide that very important task I don't know what you think oh yeah uh it's just my opinion I and it and I've had this opinion for like two months since I've seen them and it's not popular even amongst folks that I work with and that's okay that's okay we all don't have the same opinion um oh here's a here's another question can you show examples of storing answers of LM and then reusing recalling them back into the llm agent stuff yes yes it turns out that these llms by default do not uh store anything at all and so what happens is that in the um the chat history is stored uh by the caller so for example notice that in this case this is the chat history this is being stored by prompt flow uh as an exercise in debugging when you export this as an API it is the responsible of is the responsibility of the calling application to store the chat history in that case you know some of these Frameworks are good for that because they're called memory you know Etc uh so that's a good good example uh uh so yes uh even easier if there's a demo to translate show the translation yeah so in in effect what you would do is you would you would have I would build a I would build an AI model oh you can't see my face let me move this out of the way uh here I would build a I would have a little language detector language detector and then I would have another no once it has the answer to do a translation and then I would go into here yes uh so can more than one person work on the flow currently uh that's Ivana welcome bienvenida can more than one person work on the flow holy cow I'm running out of time uh yes but not it's not nice right now it's gonna be super nice give me a couple of weeks uh what's the best way to integrate prompt flow for SQL Server sap DB to query data oh see love it these are all great questions uh you would basically at minimum like this like the thing like if you couldn't do anything else you would create your own environment that calls the sap uh thing so like you might have a pip install and then you would you would you would write code to pull stuff out so yeah I'll show you in a second how to do that with um um Cosmos DB and that's the same thing yeah we'll delete that step uh is there a GitHub repo that contains the export of the prom flow and examples that you're showing not yet but again they told me we don't have a show planned for next week so I might be doing like a part three sorry part three part three of it so maybe we can we can do that um but yeah that's uh that's coming soon oh look at that absolutely uh is there chat history sent every time yes yes and the reason why is because the chat history takes up uh takes up valuable token that was my bad takes up valuable tokens so sometimes you need to think about like how much of the chat history to put in there to give the model contacts and so that's something that you have in the AI uh in the um in the uh calling apis let me just show you let me just show you one that I've already done [Music] um here's an end point uh hlp manufacturing let's see okay cool cool so Tess so this is one that's already in deployment um in this case we have a really cool chat API already built in so that's not helpful uh let me do contoso co-pilot this one does not have the helpful API um and so you would basically pass in an array that has the history to this oh geez I'm like showing you stuff and not even like doing the screen one second one second here we go so the way I got to this is I went to endpoints and then in end points I went to the contoso co-pilot here and then the test you can see that I can test it and so you're you're basically doing a post you're doing a post on this um with this with this thing so when I test it you'll see it needs to warm up give it a second here I'm drinking a drink here um unless I broke it so essentially what's happening is you pass in the history as an array to the actual um API call there it is thank you sir okay so that's a good question okay so now the question that's important is this and and this is awesome this is also important because we have a problem that we need to solve but let's make this a next week's problem notice that when I chat uh cool I can say uh uh which is the best one so when I ask you this you're going to see that it's just going to be like give me garbage in the response um here let me move myself out of the box so you can see um this is going to respond with garbage and the reason why oh wow I guess not um in this case it isn't but let me let me show you something like the model is smart in this particular run um let's see here um this is the thing that gets embedded into the uh into uh Vector space and so the search that it returns is like random stuff so this is something uh this is something that we need to uh think about uh there is a there is a comment here by the way it's very slow uh three seconds is how long these models take to respond that's not an us thing that's just I here yeah this three seconds is how long these models take to run on gpus uh the uh gbt three four takes probably 12 to 13 seconds to respond that's just the nature of these models uh just seem uh I don't think you'll find them to be any faster but I'll tell you certainly a heck of a lot faster than humans how about them apples uh but you're right you will definitely see uh you will see this is how long it takes and about three seconds is what you can expect but that's a great question great observation but yes it's true these models take up take about three seconds um uh you're right uh but I think uh because it's streaming I I don't know if this one's streaming actually but I think we just turned on streaming it makes it look like it's more immediate because time to First token is actually not three seconds uh and so that's what streaming makes it look like a little bit faster but I don't know if I have streamy enabled here so we'll have to look at that great question great observation again we're all it's all real here we're not I'm not gonna fake Stuff Etc so let me go back to this so you can see more stuff um but notice again the important thing is that I basically injected into the prompt a bunch of bogus documents and this is an important thing because now we need to figure out how to fix that and so we will maybe we'll y'all can think about that and we can figure out for next time okay so now the question now the question is okay um how good is this uh and how do we test it so that's what bulk tests are for uh all right I'll save it oh geez this might be a bug we'll need to check it out did I save it save it fine I'll run it anyway so let's see um let's take a look at this data because what we need to do is we need to have um a data set for testing this stuff and what what does a data set mean so let's see contoso Trek intent explore so this is how you go for for data sets okay so here's what these data sets look like but I've I've got I've got to make a better one um I I thought I already had so let's go back here small contoso Outdoors was this me yeah so here's an example of you know a question and some history oh this turn let's stop so we'll hit play uh so what I'll do then is how about we make a brand new one for testing and we'll open it up in Visual Studio code file uh open folder uh I make a new folder folder uh PF test now we're going to create a new file and we're going to call it our test dot Json test test contoso contoso chat again we're working on making this better so and this is what we have uh let me double check to make sure looks like we only have a question eventually we're going to add a customer ID so let me put that in there now and then I'll just put uh like what uh two so now that we have this what I'm going to do is I am going to make a bunch of questions the other thing that's important is we need to match uh the inputs so we have um hold on let me move this over here okay so we have a customer ID a question and then a chat history again it's early we're going to make this better but I want to show you what you what how you do it now it's over here cool uh uh so the first question is tell me about your hiking jackets um how about uh do you have any climbing climb climbing gear what else should we do what what else should we ask we'll do uh no let's do five because uh me making this stuff up in front of you is probably gonna be boring uh we're gonna do here um let's do one full uh we already did four three six two do you have any climbing gear uh what is another thing that I know they have um can you tell me about your selection of tent that's um what else do you have any hiking boots and then finally um uh uh What uh gear do do you recommend for hiking let's do that oh hiking I spelled it wrong but that's okay so now what I've done is I built this test a set of questions that I want to ask the model to test it so let's go to the bulk test here yes run anyways we need to upload some new data and we're going to call it the uh what is this uh contoso contoso chat store test we're going to browse browse uh and then we're going to go to projects and then we call it PF test and then this test contoso so we'll add this very nice uh and now what we're gonna do is we're going to hit next and we're just going to bulk test without evaluation for now just because I wanna I wanna show you some stuff next so the hit next uh submit and then the prom flow ACS Cosmos okay so now what this is going to do is it's going to run all of these questions against this particular prompt flow so when we go to view bulk runs we should start to see this bulk test and when we go to outputs hopefully you'll see some of the answers do you see that and so now what I've done is I've taken this thing that I think works um wow this is not a good answer catalog Mountain item question mark is that the answer that's not very good uh we have a variety of tents available okay cool yes we have hiking books that's helpful uh what do you recommend for hiking hiking shoes backpack whether appropriate okay so these answers are not very good uh which is something we want to improve okay so now the question is this let's take a look at individual one of these runs to see like what is it that happened uh so you can see the actual inputs and outputs so here is the question you have any climbing gear uh here is the question embedding let's take a look at what the outputs are for that notice that in retrieve these are the documentation this is the question the documentation that was returned for it and let's see what was the question about again do you have any climbing gear [Music] here oh man I was getting confused yeah it looks like entering the Alpine tent as a climbing gear and then looks like it's returning the Summit Climbing backpack did I say Camus is not very good uh do you have any climbing gear not very good maybe we just don't have any and it's that's what it's suggesting so there's a couple things wrong with what I'm doing right now and and obviously I built this for demo purposes but if we want to make this better the first thing is you notice that I'm not doing chunking at all no chunking um and so that's a problem if I did chunking I'd be able to I'd be able to [Music] um I'd be able to put exactly the bits of information that I wanted to return so I'm not doing chunking number one and that's causing a problem okay so let's go back over here and then examine other other ones uh and what's going on do we have any hiking boots yes we have hiking boots uh what do you recommend for hiking and I was hoping for products and so notice that now that I'm looking at this this is not this is not as good as I would have liked so let me go back to this contoso chat store and let's uh let's change the store assistant store assistant you are helpful in only responsections about products and services uh found in the documentation below [Music] documentation this documentation should be you should be be used in the response response on text but rely primarily on the document okay so let's see what this does here so we'll save this uh and then we'll run the bulk test again here yes all right anyways uh cool next uh no evaluation for now because we're doing the eyeball test right now oh by the way I I'm ignoring I'm not ignoring your questions so let's go back to some of your questions here uh while it runs I think the document has to be parsed properly not really uh the llm doesn't really I mean you can parse it but it's just all sent as a big chunk to the llm uh oh I see what you're saying yeah chunking yep no totally right totally right yeah I don't I'm not chunking you're right um uh I missed the first part me too how many characters tokens are recommended for optimal chunking size for accuracy that's really like a you gotta eyeball tested and then really evaluate test it oh shoot I gotta I gotta play the walk-off music already gosh I just went by really fast um by the way I play the walk-off music about a couple minutes before just so I know I need to hurry up how many characters tokens are recommended for optical I don't know uh that's something you gotta you gotta test and figure out but I don't know the answer then what do you recommend for hiking stamina love for the outdoors and some Kendall mint cake [Applause] that was awesome um in this scenario when the response is correct for most questions step one which is just including few shot examples to improve the results on climbing gear yes but you want to do it so that you do not um so you notice how I created a set of questions uh imagine that set as your eyeball test first eyeball test of like I wanted to be able to answer these questions uh and so the problem is is if I give it a few shots for climbing gear only oh I see what you're saying I see what you're saying yes um yeah you you can give it totally exam and that's what I was starting to do I was starting to refine my my prompt a little bit I shouldn't have done it that way though uh I should have done it a different way and I'll show you that next week um great question uh or updating yeah yeah you can totally update the gear climbing gear uh question it seems that the standard structure for LM is markdown uh have you explored HTML like tags does that perform better I don't know and this is this is the great thing is prompt flow as you've seen now I'm starting to get into evaluation and testing uh to make sure that it's working and you can see that I'm like I got a lot of like I started bulk testing instead of eyeball testing uh with just the chat and this is where we start to get like rubber hits the road of like making the problem actually better uh could prom flow be used uh to test LM would feel real-time data yeah that's exactly what I was doing uh when I ran the bulk test it actually calls uh Azure Coggin search and whatever else you put in there as well all right uh so for next time uh uh get your questions in two as I talk about next time next week uh we'll continue evaluations I'll show you prompt variants by unless they they tell me not to but they didn't tell me what show we were doing so I'm like fine I'll just do another prompt flow numero tres uh next time I'll talk about prompt variants because I I kind of started editing a prompt I think you saw that uh but I should have introduced I should I was going to start to introduce a concept of something called prompt variance so you can get a sense as you're testing stuff out um which prompts are better and which prompts are worse which ones you can discard look at do some history Etc uh so we'll start to do that and then we'll start to get it into metrics uh because there were some excellent questions about like what do we do about metrics um [Music] um so yeah we'll talk about metrics uh question here uh from uh Jesse great question today by the way can you please explain to hide what Azure I still brings differently to using vs code or pycharm um yeah absolutely uh the the reality is that though uh why not both and I'll talk about that whenever I can but it's just not quite yet uh and comparing model perfs please yes uh performance is great it actually will show up in the uh for for both so it I think it shows it let me let me see here um uh let me go to it's not what I want let me go to here I know you can't see it so apologies but I think it shows up in the it's not showing perf uh which stinks uh oh hold on there hold on let me share my screen here yeah it's not showing it that's a great uh we might need to show perf in here I don't know why we don't show it um yeah great great great great uh great call out so we'll add that hopefully okay so the question is when to use semantic kernel versus prompt flow um it doesn't matter right semantic kernel there's a python version of it think of semantic criminal the same as you would think of Lang chain it's a it's a a prompt orchestrator uh I'm just the thing that that you brings value from prom flow is the testing and and evaluation and history of tests uh and then deployment and monitoring afterwards and so that's a prom flow the the full prompt flow brings to you uh symante kernel is awesome uh uh Lang Chain's awesome if that's something you like using great I'm not a fan of of orchestrators that hide the actual task uh so uh yeah uh last question how far over the hour are you going not a minute more Janet SKU number seven thanks so much for being with us this has been another episode of the AI show as always this show is all about you so if you have any questions comments or ques or things that you want to do please let us know we'll see you next time my friend and I'll do a little thanks so much thanks for being with us we'll see you next time [Music] thank you foreign [Music] foreign [Music] foreign [Music]
Info
Channel: Microsoft Developer
Views: 6,398
Rating: undefined out of 5
Keywords: Microsoft, MicrosoftDeveloper, Developer, AIShow, AIShowLive, AI, DataScientist, Beginner, MachineLearning, Azure, AzureAI, promptflow
Id: roy4IFV-nFQ
Channel Id: undefined
Length: 67min 14sec (4034 seconds)
Published: Tue Aug 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.