#1 Create your own Copilot using Semantic Kernel (Reactor series- GenAI for software developers)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone um hi and welcome good morning good evening wherever you are joining us from uh we are very excited to start today the new series at the reactor generative RI for software developers um and I'm going to talk to you uh shortly about the series but before I wanted to introduce you um the code of conduct for our events here so uh Microsoft reactor we still part of the uh following the code of conduct of Microsoft so please take a uh a minute to uh read this we also have a probably move my uh face a bit so you can also scan the QR code and then you can um you can see it uh online if you uh if you wish uh please be respectful uh in the chat so uh a bit introduction my name is Morad dra I'm a original reactor pm at Microsoft um together with me here is Jordan door uh the events and program manager the reactor and Adin a regional reactor um at um at Microsoft uh so the three of us really worked hard on this uh on the content for this series together with the speakers and we really hope you enjoy it uh a bit about the reactor the reactor is a technical hub for developers startups uh and all kind of tech professionals uh we offer free programming um and learning opportunities for uh whoever wish to to learn about any Tech uh area we are recently focusing more on AI and we have great events that you can find on our website I'm going to share with you it uh shortly um the reactor is empowered by Microsoft learn Microsoft learn is a free um a free platform for learning that Microsoft offers you can find really unlimited um a lot of uh of learning opportunities learning path modules uh certifications and so on that you can take at your time from anywhere where wherever you are on your own paste um it's a really great opportunity to learn uh for free and about anything that's you find interesting um that's relevant for you uh for example uh you can scan the QR code on the screen and we are going to share it soon with you in the chat as well uh but here you can f you can browse all the uh learning options that we have we have uh a learning path starting from 30 minutes to 16 hours that you can choose you can take certifications uh and become Microsoft certifies uh certified you can uh watch learn events um and you can um you can uh uh browse the all the different uh uh live and recorded events that we have there um and this is how it looks like you start you choose your module and you it is spreaded by different um different categories it tells you exactly how many minutes it will take to complete each one um yeah so highly recommend it uh we're going to share the link to Microsoft learn you can also scan the QR code now on the screen uh this leads you to um learn models that focused uh exclusively on AI so it's a AI Learning Hub uh where you have all the the modules and learning path uh that you can find on AI offered by Microsoft um here you can find in this QR code you can actually um look at today's event and all the upcoming events that the reactor offers uh offers worldwide so not just in this time zone any time zone anywhere in the world uh you can subscribe to our event and to our newsletter and stay updated with the upcoming events so back to today's uh series today's series as I mentioned is uh gen for software developers what's unique about this series is basically uh from senior software developers uh in the R&D Center of Microsoft uh talking to you about product and uh areas that they are working on uh so it's really from them directly to you today's session um is going to be on uh focusing on how to create your own co-pilot using semantic kernel second session next week um it's going to be about co-pilot the next one is uh documents as a service and then um the last one will be on longchain this is the first part of this series we are going to continue this series again in uh in the next month but um so far we have those four confirmed events um so hope to you enjoy today's session H and make sure you sign up for the next session next week at the same time same day and next week we are also going to kick off another series at the reactor called uh spotline on application Innovation and AI so if this is something relevant for you um if you wish to learn more on that um do scan the QR code and check our website to find out more about this series we are starting it on the 27th of February um yeah so that's going to be another series that we are we are starting um so without further Ado I'm going to hand it over to Mo mut the senior software develop engineer sorry Microsoft uh is going to talk to you uh introduce today's session I really hope you enjoy it if you have any comments by the way feel free to uh post them in the chat uh this session is recorded and is going to be available on the YouTube channel of the reactor we're also going to share the link um uh in the chat uh so feel free to ask your questions why will take time at the end uh to answer any questions that you may have and that's it for me enjoy it and over to you m thank you Maria um so thank you everybody for joining the session I Amad and I'm going to talk about semantic kernel and how we can use it to build our own co-pilots um so AI or llms has revolutionized the way people interact with computers I think that today very hard to find someone that is not using an llm in any sort of capacity um and I think that llm is you know a technology that we may face once or twice in a lifetime we really disrupt um the technology the current technology so I think that the last time I saw something like that was the first time the iPhone one or the first smartphone was introduced I think that llm is really revolutionary in in that way um and it's actually like it's um provide AI abilities for the masses everyone is using llms and AI even my parents small um students in in in schools and I think that for us as programmers it has affected Us in multiple ways for example we can be more productive by using AI in order to find bugs or um to help us generate code but secondly it affect us because we have to adapt so AI is going to be used more and more and we have to adapt our code we have to integrate AI into our applications and um we have to do that in order to stay relevant and to compete with other products so what are the objectives of this session there's a few first of all I'm going to talk about semantic kernel the basics of semantic kernel what it is why do we need it what does it provides secondly um I'm going to see I'm going to show how we can use symetri kernel in order to integrate AI abilities into our application it's really easy um and we will we also will look on how we can compose prompt programmatically I'm not talking about prompt that we as humans send um CH GPT for example but on prompt that we can build programmatically is a template engine where we can provide variables and parameters and manipulate them and lastly I'm also going to show how how easy it is these days to just download an LM or basically a small LM so that it can fit your memory but download an LM offline into your own machine or or into your cluster whatever and then R run it and uh integrate with it directly with semantic kernel so it's it's that easy and before we start a few um um word about myself so my name is Muay hatot I'm a software engineer at Microsoft currently working on some Azure staff previously to joining Microsoft I was an Azure MVP I am the co-founder of Co digest which is a local user group here in Israel I'm also the co-founder of net Bond net bond is a community inside Microsoft IC by um do Developers for do developers and I'm also the author of The dumpy package so you can check it out so how can we become AI developers where today we don't need any PhD where historically when we say that I when we needed to add AI into our applications we had to have some sort of Deep a knowledge to train data specifically for our needs um we had some sort of degree about AI it was a bit more complex and more time consuming but today it's very easy since we since llms are really easy to use and they are publicly available um they are considerably cheap so everybody can use them and in order to use AI these days or to be an AI I developer we don't need to um invent the wheel each time we can just use an llm and we can integrators and llms programmatically using tools such as L chain or semantic kernel as we will see in this session um so before we can start we do need some sort of knowledge it's a really a bare minimum of knowledge that we need in order to squeeze every of Potential from llm so let us just do a really quick introduction about what an llm is and what are the important bits for this session so an llm is an new network that um is trained on a really huge data set if you're are talking about openi then they are literally trained on a Snapchat of the entire internet um and one property of llms is um um parameters so parameters is uh you can think about it as um it's how llms can store or think about data so whenever there's more parameters on an llm it can do more complex um processing although if we have two llms one of them is with a much more parameters than the other it doesn't automatically or necessarily say that the llm with a higher number of parameters is better than the one with a lower number of parameters but there is a correlation between the number of parameters and how complex um processing kind an LM use llm of course is um stands for large language models and um another property of llms is that they are stateless or have some sort of a short-term memory um we can add context and we will see how we can do that um but basically they are stateless and they can provide information about the data set that they were trained on and if we ever need to provide more data then we need to provide it in some way and as I said we will show how we can do that and lastly um llms are probabilistic engine so what LM does is they try to figure out what is the next word that we expect it to to appear in the result and there's a lot of statistics and probability um we won't talk about that but we will um see how we can um Define or change some of these parameters when we are working with syat kernel and um if we if you can see here on the right we can see that in 2019 the gpt2 model had 1.0.5 billion parameters which sounds a lot but if you compare it to 2020 only one year afterwards with gpt3 it had 175 billions and if you are comparing to his 2023 with GPT 4 it had 1.4 trillions parameters so the number of parameters and how complex of computation LM can do is is really improving um exponentially um so LMS are getting better and we are just starting to squeeze the potentials of llm so the future is really bright uh so how can we interact with llm well we use natural text usually in English although we can use other languages as well but English is a main language since uh huge chunk of the internet is written in English and as I said LMS were trained on Snapchat or those openi specifically on Snapchat of the entire internet and um so we provide an input for LMS using natural text and lm's outputs also natural text containing the results and for that to work there's a process called tokenization so the llms Tok tokenize the input that we provide and it also tokenize the outputs and um how tokenization work or what is tokenization well a token can be a single character um a fraction of a word or an entire word and there's a correlation between how common a word is in the in English vocabulary um to the number of tokens so if a word is common it should be represented with lower number of tokens and and we can see an example of that so there's a really cool website uh for um openi called Stoner and what we can do here is we can write some words or sentences and it can show us like how how many tokens it used so if I just enter a character for example a we can see that it only uses one token even ABC uses one token so let me try to add some sort of um letter that it did not should it anticipate after ABC something that is not D for example ABC K we can see now that it has two tokens and although um Linker wordss should have more tokens it it not necessarily say that uh a long word will have more tokens so for example if you write difficult um okay so difficult is two tokens although it has um nine characters and if I write John so it's only one token since John is widely used but if I write my name then it's also two tokens so um there's algorithms or different algorithms in order to for creating tokens this this algorithm is specifically for GPT 3.5 and gp4 models and other llms may have other algorithm that they use but token is really important since llms has a limitation around how many token we can send in each prompt and how many tokens they can generate so it's really important to track and um we are also build or one of the variables for for our building is also around tokens so it's really important to keep track on how many tokens you are using usually if you're using less tokens then it's cheaper and the processing maybe even be faster in some situations um so according to openi statistically speaking each 100 token is about 75 words um there's libraries like T tick token or or some like we can also use it built in with semantic kernel in order to get a sentence and tokenize it um so it's really easy to to use llms we don't need to host each one on our machine we can use and L at a service so for example there's an openi model that we can use we can pay openi and use their models or Googles with Gemini uh Microsoft has the Azure open eye Ser resource that we can use and I will show you how we can create our create one and use it in this session and of course there's also others um so let us see how can use Azure open ey service what's great about it is that in contrast to openi and maybe other llms at a service Azure open ey service won't save your data it won't retrain llms on your data your data stays confidential um and that in contrast to other provide providers so for example it's really not safe to send confidential data to Open Eyes um direct apis since according to to their license they can retrain um their llms on the data that was provided by um user prompt uh that won't happen with Azure open ey service so it's really great um so let me show you the Azure portal when I already created one so here we have a resource Group I call it openi demo and in this Resource Group I have an Azure open ey resource so if you click on it and see what we have here um the import so this is um a resource it's an arm ID it's a resource that we can deploy and interact with and what's important here is the keys and endpoints so keys and endpoints determine what is the endpoint from the internet that um we can go and and access that um resource and also the private or secret keys that we need in order to authenticate and authorize um so but generally in order to to really use the Azure open ey resource we want do that in the Azure portal there's another portal called okay let me connect it really fast so there's a portal called um Azure open eye Studio which we will use in order to um see this demo so the most important um menu here is called deployments deployments are actual are the actual models that we are that we have deployed and we can interact with um here I have three um I can create more currently um aure open eye resource in needs a some sort sort of onboarding there's a link that is um in in my slide that you can use it afterward it's also the documentation of azure openi um in Microsoft learn but um here I have the models that are enabled in my subscription and in your subscription you may have other models that are enabled so what you see here can differ from what you will see in your subscriptions um you like there's um really high in really high certainty you should see more models than what I see here since I'm using um a limited set of of models and here we can choose from GPT 3.5 turbo models and gp4 so most of the example I will do in GPT 4 um so after we create a new deployment we can see the differences for for example the rate limit on how many tokens uh per minute we can send we can see that there's a really huge difference between uh GPT 3.5 model and the GPT 4 based model um and and also in in pricing um but before we go to pricing we can test each of our models in the chat menu here so we can write anything for example I can ask here the sky is and here on the right I can choose which model to use so for example this time I'm using GPT 4 um I can also play with the parameters so here I have a few parameters we will talk about a few of them later and if I run it we can see that it has generated a result um it's that important what going generated but we can play with a model in a chat menu just to test things up and to play with the parameters that we have so what about pricing um we can see here that newer models for example what we see here on the top are more expensive um usually more expensive than older models um and the gptc do5 Turbo model um is relatively cheap it's really really fast it's way faster than all of the gp4 models um although it's less accurate it rates more and it can generate um lesser results generally speaking um gp4 turbo is the new one it's even a bit cheaper than the older gp4 um and it's it's fast but it's not fast as gp2.5 Turbo so this is really important to to notice because you know you will feel the delays uh and the consumer your application will also fill them okay so given we have the Azure open ey resource how can we interact with it we have multiple ways one way is using rest apis as most of resources in Azure we do have some sort of rest apis and all other decs are built on top of that rest apis we can use the Azure AI openness Decay um or we can use um um some sort of an abstraction or an orchestrator like Lang chain or semantic kernel so let's talk a bit about semantic kernel semantic kernel is an SDK um which provide us an abstraction um it has a unified set of apis that that we can use to interact with different llms and it has features such as writing plugins our own plugins and um also extending the LMS uh and and pro and using and um working with memory which un embeddings which we will talk about um you currently it's supporting C Python and Java um generally speaking there's a parity between the features of C and python there's a few features that exist in CP and python or vice versa um but Java support is is very new and there's um it's working on progress you can really use it but it has um as today less featur than the rest of the languages but they are actively working on it to improve it so let's see a demo let's see how we can use um semantic kernel so first of all it's really easy we can start just by um installing a new uh net which is called semantic kernel it's this net here okay this one here um and after we do that um the apis are really simple but before we go there first of all I we need to configure the end points and the API keys corre L I'm using um environment variables in order to load them although I have to say that this is okay for the demo but environment variables are not the secur the securest method that that you can use since um they are not encrypted they are stored in plain text and any process on your machine potentially can access envirment variables but again for this demo they are okay so I I have created here a some some sort of class called SK config this class is defined in this project it's not part of the semantic kernel but I'm using it to configure later semantic kernel and we have here a few properties for example the name of the deployment which is the name that I have provided here right so this is so for example this is a name of of the deployment of my model and this one as well it's a name that I have selected but so it's important to keep track on it and then we have here the key and the endpoint that we got from the Azure portal after that I have created um sorry some sort of um an interface that I can use to run different examples it's not really important what we have in that interface but I will show you just so that I explain um the the thing that I have added so we can go to um base demo and we can see here I have a few things but generally speaking I have two methods one one called initialize asking and the second one called run aing in run aing I just create a while through Loop and each time I use console read line what you see here I get the prompt and I send it to one of the implementers um or the extenders of the base demo so this is what I do generally there's nothing here that complex or or um spec or like important for ttic kernel it's only for that we can iterate over demos really really easily so the basic implementation that we have for um semantic kernel is really simple we can use the create Builder static method in the kernel uh type and then we can use use fluent API really um similar to what we have with asp.net core so we can just say create Builder and add Azure open eye chat completion so there's chat completion and there's text completion the difference is with chat completion we can also provide later history and context so that a can remember things that we already talked about and I'm providing here that deployment name the Azure ENT point and Azure open ey key so as the name of the method um shows I'm using Azure open ey chat completion let's say I wanted to use open eye directly not the Azure one but the Open Eye um directly so I can just exchange the first line with a second line um it's that easy okay so in for the Azure for the open AI apis we only need the model ID and the AI key we don't need need the end point because it's the same end point for everything but it's really really easy to move between these two secondly whenever I have a user input I just use a method called invoke prompt asking in order to invoke this um um prompt in in the LM using the configuration that I have which currently I'm using everything by default so let's um let's try and run it so I can just write net run project um and I now we need to provide the name of the demo so it's called basic 00 okay let give it a second once it compil so here in Orange we can see that it's um found this class basic 00. Cs and it's asking us how can I help you it's also a string that I have defined for this demo and we can ask it for example um tell us okay um tell us about Microsoft in 10 words or less and it has provided an answer so we are we just connected to Azure open ey resource we sent it a prompt and received a response back cool but it's only the beginning um so let's try and add functionality uh for example let's say that I want to implement some sort of an ordering system maybe for a restaurant or for know our company store or whatever so here I have a prompt called um or it it says what product is ordered in this request I I am receiving a user prompt something that user has written and um I'm adding it programmatically manually to string so what product is is order what product is ordered in this request I concatenating the user prompt to that line and the the line after it I'm providing the llm I'm saying it a product can be coffee Burger water Sharm or beer so I'm telling the llms what are the possibilities so let's try and and okay and before um I send the prompt to the invoke prompt asking first of all I'm building the new prompt which again it's just concatenating user prom to the prom that I have showed you and if I'm going to run it again this time it's called basic 01 order 0 so let's say that I want to order a burger we know that burger is um in the allowed list we can see here here it's in the allowed list here okay and the result that we got from the LM is based on your request the product ordered is a burger so it's really cool but let's try to order something that is not in the list let's say I want to order uh pizza so this time it knows that pizza is not one of the requested but sometimes it will just uh work as normal will it will tell you okay you have order order a pizza and secondly notice how the format of the pro of the result is um not consistent so so let's try and make it more consistent so that we can interact with it either in our applications this time I'll just run the um the next like um the next demo which has um a newer prompt so I'm just adding at the end if the product is not in the list return not available let's try and run that instead so I just change hear the name of the demo and let's run let's Ry with Burger which is something that we know we know that it's in the list so it's working and let's try it with um pizza again and we got not available so it's an improvement and as I said even though here it say that the product is not in the list sometimes it will just provide you with a results because our prompt um is not specifying what we should do when we don't when the product is not in the result but let's try and make it even better than that um let's what I'm saying here um is that okay so first of all if the product is not in the result in the in the list return not available as we saw um in the last demo but I'm also adding if a user has requested multiple items just return bad request okay and lastly I'm also requesting here that the result will be in ajed format see how that I can um provide some sort of um variables inside the prompt and LMS will understand that it should change or use this um variable or placeholder with the actual results and generating a Json so this is um very convenient and it's really easy to uh deserialize a Json into C so I can just write um a new type that uh can you receive the the values from the Json and it will be easier to interact with so let's try so F let's try to to use demo um so let's try to do everything that we did until now if I specify Burger it return product Burger which is really cool because I'm using using your Json and if I'm if I request a pizza then saying not available and If let's say I want a burer and a beer then we'll get bad request so of course we can custom it how many like so that it will results that are easier for us to parse but you know um it it's working great it's really easy to do all of the things um to customize it how we want um the problem here is that as You' seen it's starting to be really hard to maintain these strings and in case that we have many other variables sorry um using string concatenation and string interpolation um can be a pain so we do have features in semantic Kel that make us make this easier the features called prompt it also had an older name if you're going to use an older version of thematic kernel it has a name which is called semantic plugins or semantic skills but today a newer version It's just called prompts and how prompts work is that we have um a folder that we specify the name of that I call it prompts which is the current which is currently the convention and in it we have um some sort of prompt plugins or predefined prompt that we can use with template engine so here I have a prom called order and we can see that it has two files the first one called SK prompt SK is sematic colel um it's also the convention it expects these two files to exist and I'm just I just written basically the prom the The hardcoded Prompt that we saw earlier in in this txt file I made the list of products uh a bit um like more easier for us to maintain we saw that even in a single line it was easier for the LM to understand and part the rest is identical and in the config Json it's configuration for the llm when we send that prompt so first of all we have description the description is very important we will see later how but llms understand what when or or what they should do when invoking this plug-in and it's really important to provide a good description for both The Prompt and the variables so here we have a list of variables and we can specify the variable names and the description of each variable whether it's required or not for this prompt that's meaning that we sometimes can have optional um um values or or parameters and then we have this um configuration so let's talk about what's important from here so we have the max tokens this is the max the max token that are generated from our prompt the max tokens um generated plus the tokens from the request itself should not exceed the maximum token that llms are allowed to either receive via input or via output so that's really important and you can um limit um try to limit prompt this way secondly there's temperature so as we um said earlier llms use probability or statistics in order to select or predict the next word and basically it has a list of possible words that it can output next and each word has a probability or a or some sort of number and um llm usually use the a word with the highest probability to be expected and sometimes it can be more um um it can use uh like the number of words and the probabilities may change according to the prompt and what was already generated and the temperature is like saying how much um um like it if if we'll keep it zero it's meaning that um the LM will be consistent if we given it the same prompt it should return the same results it's always used the word with the highest probability uh temperature is a value between zero and one and higher level of temperature will make um llms use um um or be less uh um it can use different world so that for the same prompt running it multiple times I can be I can get multiple results it's the effect of that is that llms looks like being creative um a higher level or a higher value of temperature um we can get um results that are with less Fidelity but they will look like are um very creative so it's really good maybe for um generating poems but not for passing data that we want to keep consistent um we we do have here other parameters um that I won't explain in this session but we can play for example with the top P parameter in order to eliminate some of the results and again you always need to play these parameters in order to fine tune your your prompt but I won't dig deeper into it in this session so given given all that let's see what we can do if we go to the um the next plugin called prom 00 order 00 we can see here that I'm using a new API called create plug-in from prom directory so I'm providing here a directory and semantic kernel is is loading all of the plugins or or all of the prompts in that directory after doing that I can just um specify which plugins I want to use so here in the handle prompt I'm building a kernel argument kernel argument is a type that is defined in sematic kernel and it and I can use it with use it with an indexer so I can provide all of the variables and parameters that my prompt is going to use so I know already that it has an input parameter called order and here I'm just giving it the user prompt secondly um we will see how sematic kernel can call plugins and prompts automatically but currently I'm showing you how you can do it deterministically how we can select a Plugin or a prompt and ask him to Kernel to call it directly so I'm maintaining behind the scene in my um um base demo class uh a dictionary of plugins that we have loaded here okay and um I'm reading the plugin called order inside the prom directory and I'm just providing it um to the invoke asking API in addition to the arguments so let's try and run it and ask the same question so this time the plugin is called um the demo is called prompt order Z so let's try and say I want to order a burger and War both of them are in the list but there are multiple products so we got bad request and I can say here bger or or only burger and we should get um um okay and and and we got really fast um a good result at expected so this is a really simple example let's try and showcase um a more complex example so um here at Microsoft we work a lot with custo and one thing that usually is hard for new developers is trying to understand the custo tables um um by the way it's like really the same thing for SQL tables um in in that in that if you want to read your logs or your database and extract really data from it and understand what's happening in the logs of your system you need to know the properties uh or or the values of each table for example in this in this case the custo table you need to know the schemas and it's really hard for you Dev Vel opers secondly LMS or openi based specifically are really good for writing custo queries but the C queries they write are General and not specific to your individual needs or to your team and what I'm trying to do here is trying to build a plugin some sort of a co-pilot that can help us um write kql queries that will generate results that are um specific for our needs and our products so the everything here in in this file looks all identical or or almost identical to the last one what's different is I'm providing here um an a an input variable that is called input and I'm using the custo prompt or Plugin that I have wrote In order to invol it with the user query and if we look what we have in the um store prompt then in the config I'm saying here that we need to generate a cql query to retrieve information about my orders um the variable is called input and the SK prompt is a bit interesting so first of all I'm asking the llm to generate only synthetically correct CTO queries that we can run as is in data Explorer um I'm also asking the LM to respect the users's request if they have um if if if the user has mentioned any sort of timing or filtering methods and I'm also as an example I'm telling the kql query that we only have two tables orders and logs and then further than that I telling um the LM which property we have in the order table so this is only an example but I did the same for one of our product and it has way more complex table and it work great as well and I'm also providing some sort of um um uh more data about some of our Properties or values in the order table so for example I'm telling the LM that it success can be only true or false and that category can be either fast food or drinks so it's only information that I'm providing the llms so that when the llm try to generate a new query it can use them to generate something that is relevant for me and for our scenarios and not something that is general and I'm also saying here that the log level may be a number and not a string and I'm just doing a really fast map so I'm telling the LM that five is Trace four is information three is debug and so on and what we see here is um a convention where we can separate basic information um with the actual request so here I'm saying that the question is input this is a so semetic kernel supports a templating engine this is a template um here on I'm using the input the value for the input variable that we have provided in C um this is the syntax of how we can sub how we can pass variables and there in the questions I'm so I'm I'm concatenating the user query after questions and in the answer I'm keeping it empty so that the LM understands that I'm expecting it to provide an answer so let's go back to the demo we can see and let's run it again so um I think it should be store okay and this time I'm what I'm going to request here um create a kql query for counting how many order we receiv received per product in the last three days for example right I I didn't invoke oh okay so it's telling us that the name of the plugin is not correct let's write it correctly so it's called prompt okay prompt 01 Z sorry about that I didn't notice and now let's just um copy paste the pr already written above here for example and we can see that it using that it is using the product property it's using the order table and it's also is using um the filtering or the time based filtering that um I have mentioned and I can also for example ask it to try and filter errors so create create or let's try to use so create a kql query um for counting failed orders for example and we can see that it's using the E success as well so it was really really easy uh just by giving metadata about our tables um to customize the result that llm is providing for us and imagine that the data the meod data that you have provided is something that is specific for your SQL tables or database or kql tables or whatever you can really easily just create some sort of an AI agent or assistant that can help you generate this results and basically what you can do you can even not only generate the kql queries you can also generate like the result like you can also um invoke the query and get the result as well but I did not do that because it's only um a demo um so let's try and ask um um so let's get back to an older prompt for example let's go to the first one to the basic 0000 prom and let's try to ask about timing so for example let's at what is the date of today let's see how can it handle it and here we have that it can't it it doesn't have information about the data in time of today saying that it was rain on um September 2021 so how can we change that well first of all there's a set of built-in plugins in sematic kernel that we can use um this is a list of the supported build-in plugins and we can see here that most of them do not have support in Java yet but they are supported in Python and in [Music] C and if we go to um not this one sorry you go to this one you can see here that I'm calling an I called add from type time plugin so um it's important it's something that I didn't mention yet semantic kernel or the kernel object that to build from using the create Builder it has um um a dependency and a um dependency container um inside of it so we can register services and then we can Resort of services and basically when we do when we write when write add from type behind the scene um kernel is using the I service um um provider and I service collection object in order to register services and then to resolve services so we I'm registering the time the time plugin I did not write it myself it is built in and here I'm using the time calculator um prompt that I have written and we can look on the time calculator it's really simple so it has only one input called the user inputs nothing special here and if we look on what we have in the actual prompt I'm invoking here in the prompt another plugin so this is how we can do it each time this prompt is executing it first calling another prompt or another plugin so I'm saying that the result of the function called now in the time plugin will be concatenated after the the word now and then I'm adding user input to the input variable and I'm expecting a result so let's try and run this PL this plugin so it's called um prompt O2 dat time so prom O2 day time 0 0 I believe yeah and let's ask it on the same question what is the date of today and we got a correct result and I can ask it um what is the date of next Friday and also got it right I think and also can show us how it did that calculation it knows that today is the 21th of February and Friday is two days from now so we can calculate it um as expected um so it's it's really easy also um adding context to our promts um and to call other or nested um plugins um from one of our other plugins the next um feature of semantic C that I want to show it's called um Native plugins or um so in the in the UN version It's called plugins but in the older CTIC Cal version it was called native functions and how they work is that we can write C code that we run as semantic kernel plugin so here I have a folder called plugins and here I have a GitHub plugin class that have two methods one called top repositories and second one called top repository read me and here I'm just given a handle um I'm going here and fetching how many all of the repositories with I'm I'm fetching and ordering basically the repositories according to the star numbers they have and here we also have the description so here have a description called provide the top righted GitHub repositories for a GitHub user handle and we can also add the description to the um input string sorry to the input parameter as well so and then in the in the actual plugin I'm just calling add from sorry in the actual demo I'm calling add from type and providing the GitHub plugin I have written and what we have here is has change in the handle prompt is that first I'm adding a new um um configuration called open I prompt execution settings so open a has this cool feature the feature is not built in into all of the llms but openi has it which if we can provide a list of tools or a list of functions we can ask the llm to generate some sort of a plan or to um generate a result that if we can run it using the functions or the tools that we have provided we will get the um the final result so sematic kernel knows that and can support that so here we are saying that we are providing a list of tools to santic Kernel and we want santic kernel and the llms to try and invoke the function or the plugin automatically we don't need to get a result saying how we should run the the query we we need to get a final result um automatically um and secondly I'm using a new API so because I've added um here the openi chat completion then I can use a chat um API and this is one of the chat apis so as I said the kernel is a dependency has a dependency container inside of it and I can use for example get required service in order to get a required or a registered or a registered Service as I'm doing here and let's try and run this plugin so this time it's called plugins um 00 GitHub 00 and let's call it um what are the top rated repositories [Music] for my handle so this line here I'm um printing um in in my plugin if you'll go to the you go here um actually here we can see that I'm printing this line what you can see here but let's see what LM has written it's called retrieve okay think that it has some sort of an maybe Transit um error um we can also always include retries if if you want to let's try do it again maybe it did not okay so it's not so it's my put not my handle sorry about that so once I'm written my real name now I'm getting the result of my um top R repositories and I can ask it to filter it down for three repositories 10 repositories five repositories whatever um it was really simple adding this functionality to The Prompt um So currently if we are going to ask it about something that you already asked about it won't remember because we are not sending the history it needs the history or to to provide the history in in order to um so that we can ask it um so that we can Implement something that is chat based and not a single prompt based each time in order to do that we can use the same I chat completion service and we have an an a a collection called chat history defined in the semantic Kel library and we can just add it call it with add user message and also add assistant message um um each time we get a result um um from um uh zlm so if we will run it again this time the demo called Chad 00 history 00 and I can say for example what is the what is the time or what is the date next Saturday so it did didn't get it right um let's try it again what is a date or what is date of next Sunday let see okay taking its time oh I didn't invoke it let's let's try again okay so it's not getting right but um but it's not what I'm trying to show here what I'm trying to show here is that let's ask it about something else for example let's just say and the day after that and it should give us a result okay yeah so it's provided the result it's now the um so it said first it was on 17 it went on Saturday and now it's in the 18 on February so it did provide um the result so it like it it it had the context and it provided a result according to the context that it has before you can ignore that it did not get the time right we can debug that later but what's important in this demo is that it understands the context it knows all of the prompt and results that it has generated before and it can answer according to that context but as I said um each time we send the history it's included in the tokens and if we won't clean it up we will reach the maximum token so we do need some sort of mechanism in order to to um um filter out um um the history here I'm using a really naive really simple that after 10 prompts we are removing the first five prompts um although so a better solution will be to calculate the number of tokens we have in the history and after a some sort of amount maybe removes the last entry or something like that the second um issue that we noticed that sometimes it take a long time to answer us and for that and that's for different reasons but we can use a streaming API to stream the result rather than wait for all of the result to to come at once so here I'm using the same IAT completion service but I'm calling a different API called get streaming chat message content I thinkc which will get an i thinkc i numerable and we can just iterate over the stream and each time I print um the result so let's try and see what happens now um let's just call it with a chat history or chat 01 stream 0 0 and what's call Chad [Music] 01 what's called again interesting again ch01 Che at 01 stre 0 0 not sure why it's not finding this plugin let's try okay doesn't matter I think that we have the same functionality elsewhere so we can see that but um oh it's chat okay oh okay so I have a tytle okay okay so now it's working so let's ask it anything for example tell me about Microsoft and you can see here that we are getting the result in chunks we we're not waiting for all of the result to be calculated before we write it which um in uh better you are you can see it more fluently and it's better for the user so that it can start reading before for getting the whole results so it's also um built in and um supported by semantic kernel um so what we can do better than that how can we expand the knowledge that we have with kernel so there's something called embeddings embedding is the ability to take text or data and convert it into a binary vectorized format called embeddings um Bings because they are uh vectors of um of um um floating points we given two embeddings we can compare them and see how much one embedding is closer or relevant to Z Vector um so what we do here is that I'm using a library called kernel memory which is also an open source Library developed by Microsoft this functionality historically was built in into the cetric kernel but um currently the uh but in in the newer versions um it's better to use the dedicated um um Library that's called curdle memory and what C memory does is that we can it that we can load to it text or files it will it will use embeddings or it will create embeddings from those text or files we are providing and also the Box can support files for for example for Word documents PDF documents markdowns almost every virtually everything and then we can ask it questions about the documents or files that we have stored so here I'm using I'm building the kernel memory Builder I'm saying that I need the text embedding generation which is um the model that can help us create embeddings for our text and here I'm saying that I have a folder called Memories containing files and I'm iterating over all of the files and I'm calling the import document assing API of the kernel memory in order to to um to to save them in the kernel memory and the um storage or the store can defer so in this example I'm using an in memory store although it's really um um easy to use for example a your search or a different um uh memory store that maybe is built for um vectorized search but for this demo it's really easy to use um uh a simple memory store and if we go and we look on the memories um um folder we have two examples so here have um some sort of list of anime of anime serieses that I'm watching um so I have to list the watched already and the watching currently and look in that in the watching I also have the time or the day in the weeks I usually use to watch that show but I also went and copied um as is uh the rmy MD file from my library and I passed it here so I didn't change anything it the same um mown file also has um um images and stuff like that so it knows to filter everything out and now I can go back here we can see that I have added another um um plugin called dynamic memory load um that it's that's a given uh a query it will search whether we have files or memories saved in our memory stores that are relevant to that query and we can Define for example the minimum relevance between zero and one um but um firstly we can see that um let's look on the API that we get from using the memory store so here I have an API that I can get that I can search for documents so let's try to run it really really quick this time it's called memory Ro zero okay so now I can ask it for example about um what are the anime that I already watch and it knows that there's a document called animal tracking that is relevant to my query so this simple way to get um information about those documents but we can do better we can also ask questions and and get the results from the document automatically so here I'm invoking the ask asking API what you see here and this time let's try again and ask the same question that we had before and we will get the result so this is the result the shows that I already watched and it also has the name of the document that it EXT used to extract the result so it's really um easy in order to um track uh how llm generated the results according to our memories and lastly what I'm using here is I'm trying to um invoke the dynamic load memory plugin automatically without specifying any um like I'm I'm using the apis for that LMS can use tools remember that can send tools with our queries and llm or open AI specifically can give us result according to the tools so so let's try and run it again this time it's called load number two and I can ask it the same question try to do that okay so it's it's saying loading memory according to what so this line here I'm printing it okay um in the plugin that I wrote so if we'll go to the dynamic memory bler we can see here that this is the line so this is I'm asserting that it's calling this function this is my functionality that I have added and it's calling it and you can imagine for example adding adding um um um documents that are relevant for your product um for your internal documentations and so on and so on the other function the other things I wanted to show you is that sometimes um we need a better way in order to ask llm to invoke plugins automatically so we have a notion of planners we have different planners but the um currently in in in the current version um the main one's called handle bars planner and it's called handle bars because behind the scen using the handle bars tempting engine so how it works is that the planner can automatically load all of the plugins registered into it um it can write a query um um and provide it to the llm so that it can generate a plan and aftering the plan it can then go ahead and execute the plan the plan can be complex one plugin can call another Plugin or it can chain plug chain plugins together so we can do a lot of things lot of cool things here so let me just show you that really really quick um it's called planner basic zero and let me try and run the same prompt it see see that currently is saying creating a plan it's a string that I'm printing then we can see the plan here the plan here is saying that step number one is loading um memory from the document store and step number. two is um uping the results and this is the result so a plan can because it knows all of the registered plugins can create a plan for us and as I said it can it can be very complex the last thing that I wanted to show you is running llms locally on your machine so we have multiple um um ways to do that we can use for example libraries like the um light llm but what I wanted to show you is something that is way simpler at least for demos which is called LM Studio LM studio um is is a freeware for example I can search for LM model for example fee is a really simple and small llm model developed by Microsoft and you can see here that we can lo into the GPU and it has only around one one and a half um um gigabyte in size so I think that almost any modern machine can um load it into memory and what's cool with LM studio is that we can go ahead and I already already downloaded that model I can do that directly from LM studio just press here on download and and that it will be downloaded and after after this downloaded I can load it here as you can see so here's a list of the downloaded LM that I have and currently I only have the fee2 so it's currently being loaded here and once it's loaded I can use this chat interface in order to chat with it so for example let's I can ask it now um tell me in turn or less words about Microsoft and it outed is it's it's a software and Tech Giant so it's a valid result it's really short but it's a valid result and what's cool about it is that it can wrap any llm that it has and it has a lot of them uh into a server that you is open I LMS API so because it wraps them with an openi llms sorry open ey apis we can use the same apis or or the same apis we have in sematic kernel to work with any llm that we can download on our machines or into our one of our clusters in the cloud for example so here we can see that it is currently loaded and what we need to do here is um some sort of a trick so first of all because it's running locally um we don't need to specify any model ID any API key anything really because we know the address and the port of that specific server running that specific llm locally in our machine what we do need to support it we have two options we can either Write Our Own It text completion or I chat completion engine in order to interact with it or we can cheat so by cheat I mean in when we add open a i completion in sematic kernel we have an overload sorry that can specify the HTTP client that can um perform the results sorry that can perform the request against the openi endpoint um so what I'm doing here is that I have written um some sort of a middleware or or an Interceptor and we can see it here and what I'm and every request will first get here here I'm changing the URI the URL the target of the of the endpoint to to to something that's running locally which we got this address and Port here we can see it here it's Port one two three 4 um I'm adding the same path and queries that I have and just um um sending the request to the local server that it I didn't change anything else and this time if we going to run a demo called local 00 LM Studio 00 and I ask for example um tell me about Microsoft in or fewer words we can see that it's written here intercepting call for llm which is something that I written here and if we're going to look on the server then we can see that it receives a request and it's what's interesting here is that we can debug or you know see behind the scene what's happening so here's a list of tools which containing the functions and um plugins that we have written that um llm can call So currently we have registered the timer plugins and and we can see it here so we can use this functionality also to debug planners and see what's actually being sent to the llms and as I say there's a lot of other llms except the fe2 um all of the Llama of Facebooks and and so on so um in this session we talked about semantic kernel how we can create plugins to semantic kernel there are two types of plugins there are prompts and um plugins or native functions which are C code sorry a second sorry about that and finally talked about I lost my voice sorry feel free in the meanwhile to write some questions for um you can also ask any questions you want our session is about to end in a few minutes but we will stay here to answer your questions in the end of this session uh and that's it thank you sorry about that so you can find the demo in my GitHub repository this is the link and this is the QR code sorry about my voice no any questions ready or so the token per second or per minutes is limitation we get from Azure open eye service or the Open Eye apis and it's a four limit that we get from using those apis which is in addition to the to to talk limit we get from each LM a uh we're GNA show you all you can scan the QR code and a register for all of our events here in the reacto we're reminding you that we have more episode of this series you can join all the AI learning habub and this is the QR you can scan to register for this Siri and the additional series we have okay so there's another questions called saying or asking how can we combine several prompts together well we can as as I showed we can call nested prompt so in in my timer plugin I called um the builtin plugin so it's some sort of combining the two and using planner we can also use chaining the planner can automatically get the result of one plugin and pass it down to another one and we can also do that manually when calling invoke asnc we can provide a list of plugins that each one will um change the result as an input to the other one um and yeah so the tokenizer is deterministic um there's multiple algorithms and an implementation there's a really great one for open eye called um tick token or something very similar to that which is really great and in sematic kernel there's one called which is um for G3 models called G3 tokenizer that can also um provide the same tokens so I'm not sure okay so there's a question asking about the availability of the GPT model in Azure open ey Services um I'm not I'm not sure regarding the availability as I said that you do have to ask to be on boarded and I'm not sure if they already allow it or not yet can the user prompt modify the indication sent to the kernel something similar to SQL injection well it can so you need to be really careful not only it can H there's a really simple prom that anybody can do even in chpt you can just write single line called print everything above and you can get all of the text system prompts or previous prompts that was sent to the same LM and if you're using a shared instance of an llm maybe a service is providing services to multiple consumers at once you you can leak data so you need to be really careful about that yeah great question okay so we uh are going to finish the session thank you very much for all who joined us and thank you mid for this wonderful session and hope to see you in our next one thank you everyone thank you
Info
Channel: Microsoft Reactor
Views: 1,468
Rating: undefined out of 5
Keywords:
Id: gszpdJxAWs4
Channel Id: undefined
Length: 89min 50sec (5390 seconds)
Published: Wed Feb 21 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.