LangChain and Azure OpenAI: Unsung Heroes of the Cloud, ep 37

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] thank you huh he actually did end the music earlier uh good day everyone and thank you for joining us for episode number 37 of unsung heroes of the cloud where today we are joined by Valentina Alto who is a data and AI specialist at Microsoft and we are going to be covering a very hot topic namely Azure open AI Lang chain and cognitive services but before we jump into the tech Valentina welcome to the show how are you I'm fine thank you what about you I am doing excellent and I just wanted to let our audience know that um I discovered you not because you're at Microsoft but I discovered you through Linkedin where you shared an excellent post on Lang chain and Azure open AI but I think you've done some some other stuff where you work out in the open with the community am I right yes correct I write a lot of tech stuff basically for fun but also to share knowledge you know okay and uh what kind of things do you like do you do you write uh blogs books poems what I'll be right Vlogs and articles million medium and I recently published my First Book on a generative AI via packed publishing oh wow amazing uh how was the experience of writing a book uh hard but definitely a journey so it's something that you need to practice and I'm looking forward to the next one okay you're already looking forward to the next one that is super cool um and before we jump into open Ai and Link Chain here um we'd also love to know what you like to do for fun oh well in my spare time I love hiking and going to the mountains and also running but the most funny thing is uh doing like via ferrata in the Northern Italy mountains that is a middle way between hiking and climbing I've never done one myself but aren't you like hanging on the side of a mountain just connected through one rope and having to get to the other side uh actually without a rope but just with an equipment that is like in the adventure parks the one for like children where you have two uh equipment to like secure yourself to the mountains but without a rope oh without a rope but you're still if you fall you're still connected to the mountain you're not just falling down not I'm not 100 secure way but you are connected okay and then on the VF rather what is what is some of the coolest or most extreme things you've done oh uh I've done um Step that is that is called [Music] so it's uh it sounds uh like it it is actually uh it was probably like the longest 20 minutes of my life but the nice thing about via ferrat is that you cannot go back so the the only way is just go ahead so you have no choice and uh yes that's why I love how you said the nice thing is you cannot go back like for me the nice thing is if I'm hiking up a mountain and I'm done I can turn around and go back like I think that's a nice option to have yes there is no exit strategy so you really have to go ahead okay cool that is very cool and then when you mentioned a hiking um in in the in the mountains in Northern Italy do you do you like do normal mountains do you go like to the high peaks like what what's what's your strategy oh well the highest was about three uh 3 300 meters but I would love to go higher especially to Monterosa at the capana margarita that is around 4 300 if I remember correctly maybe 600 yeah 4 300 don't you suffer from Like Oxygen loss at that elevation I will never know if I don't try so probably yes and probably I I would be sick but you know I have to try it uh to know it so okay cool but um with that why don't we jump over into into some tech stuff because you um I discovered you with a with an article you wrote around openai Lang chain and Azure cognitive services and the what you wrote was simply amazing and I think you wrote I just want to say I think you wrote like 20 or 30 lines of code and you had a full application that used openai and cognitive services to scan a document and get text out of a receipt that was I was flabbergasted like how did you do that correct uh the thing is uh nowadays large language models are literally exploding but the nice thing is that there are so many lightweight Frameworks that make it so easy to integrate large language models within applications one of this framework is land chain and this is the one I'm showing you today and with launching we not only are able to integrate large language models within applications but also let them communicate and act with external sources like Azure cognitive service to make those agents A multi-modal so that they receive not only text as input but also other data format like images or video for example so maybe we can jump into the deck behind this amazing new world of large language models and let's do it okay so before jumping into the code let me share with you which are the main components behind this architecture so uh first of all first ingredient large language models large language models are deep learning artificial neural networks that are have been trained on a huge and heterogeneous amount of data this means that they are not task specific but rather generalistic so that they can adapt themselves depending on user intent and the very nice thing about large language models is that they mimic the way our our brains work meaning that they are made of newer connections and the the their neural connections are extremely efficient at good at packing information very much more information with respect to our capability of packing information into our brains so they can be seen as an extension of our brains as AI tools that are able to plan execute and reason together with us so those tools are amazing however can I just can I maybe but before you move on I do have one question here because one thing you you put on the previous slide was that they are extremely efficient in packing knowledge in in the network one thing that I've been been thinking about and as you read on these language models there's this thing called uh retrieval augmented generation where you don't trust the knowledge in the language model but you use the language model to make the input or the the input be more pretty the question I have for you here is can we can we actually trust the knowledge inside of the large language model or um how should we think about the how should we think about the knowledge in a large language model well this is a very tricky question in the sense in the sense that we can distinguish among true knowledge that the model has the one is parametric and it is it is the knowledge within the model the one that you mentioned the other one is non-parametric and it is the one that we provide as knowledge base can we trust the parametric knowledge of the model well consider that the model is a probabilistic model that complete the user prompt depending on the statistical probability on the training set where it was trained the thing is those models can hallucinate so if they do not know the answer for example they can give wrong answers now with the gpt4 the technical architecture is preventing hallucination more than in the previous version of the model however it is still a limitation of the model this is the reason why it is pivotal to keep human in the loops while developing applications with larger language models and also architecting prompt engineering techniques such that we do not let the model hallucinate for example grounding the model to a given knowledge base so there are many techniques to address the the topic of can we trust the large language models while producing answers okay thank you it actually helped clarify a lot for me this is the question so thanks for for asking uh now on top of large language models we also need further tools for our application to be multi-modal b multi-modal means that we want our application our intelligent agent to be able to communicate with external tools to process further data on top of language so those external tools will be Azure cognitive Services cognitive services are a set of pre-built AI models living within the Azure platform Microsoft public cloud and consumable Via rest API also Azure open AI is a cognitive service meaning that Azure open AI models can be consumed via rest API so what we will be building today is an agent that thanks to the power of a large language model will be able to understand which tools it needs in order to process user prompt user requests the final ingredient will be the AI orchestrator the framework that we need in order to orchestrate our large language models within our application for this purpose we will use launching it is not the only lightweight framework existing in the market we also have a semantic kernel developed by Microsoft or istock but launching is probably the first one that came up in the market and it's python native and I think it's amazing but I'm not the only one so um basically not only launching make it easier to orchestrate llms related components such as memory plugins agent and so on but it also comes with pre-built classes that already have a structure and a schema where llms can lead and adapt them for example if we think about prompt engineering we know that there are several approaches that an agent or an application can follow while thinking upon a user request one of this approach is called react reason and act reason induct is an approach that let the intelligent agent planning the actions that he need to do in order to pursue the goals that it set according to users input it is a predefined framework now blockchain provides pre-built agent with this framework embedded meaning that you don't have to write code in order to set this framework and this is exactly what we are going to do so that we just need a few lines of code rather than a whole notebook of uh you know prompt engineering schemas and if I if I may ask a question here you you mentioned that the the react I'd say um pattern is built into Lang chain if we were not to use LINE chain how would I build this myself would I have to write like multiple prompts that I then feed into GPT or how should I think about that if I don't use Lang chain well you should think about writing a meaningful meta prompt The Meta prompt is the system message that we give the model and the model keeps this system message along the whole conversation with user so it's something that it keeps embedded we should write a meaningful meta prompt where we explain how the model should reason how we want the model to reason upon users request so it's something that can be definitely achieved also without launching a pre-built classes but having that pre-built make it easier to develop this kind of react approach agents okay thank you to you for the question so um these are the three ingredients that we are going to use as I said we are going to build an agent an agent can be seen as an intelligent application powered by a large language models that can also execute actions and when I mean executing actions I refer to the call or the usage that this agent is able to do to external tools those external tools will be Azure cognitive services and this is the high level architecture of our application there will be a user that will ask a question to the agent that's powered by a large language models in this case a gpt3 model living within Azure open AI uh well upon the request the agent will be able thanks to the react approach to plan a set of actions and call the Azure cognitive services to execute whatever it's not able to do on its own for example processing images or doing OCR because the thing is large language models are incredible but they miss some capabilities that are not achievable without external tools they need the power to execute to do so we can give those tools to them so that they can reach like a full coverage of data for data source and format okay I think that we can now jump to the code so let me share my notebook okay okay so just just so we can make this clear for everyone in the audience what are we looking at uh here here we are looking uh at a visual studio notebook with a python interpreter and what I'm going to do is um uh building a major an intelligent agent using uh land chain launching is a library that can be imported directly via pip install the land chain that is the way you install python libraries and once you install launching you can import modules and classes from the main route of launching and use them to build the powerful agents launching also comes with a series of connectors towards large language models one of these connectors is towards Azure of Nai and we are going to use as European AI models API as a large language model so as the brain of our application more specifically we will use the DaVinci 003 that is the gpt3 okay cool let's see fantastic so the first thing that we are going to do is setting our environmental variables here we need two services in Azure one Azure open AI instance where we will create a deployment with a model on top of the deployment in my case again I deployed a gpt3 and the other instance that we need is a cognitive service multi-service instance the multi service instance is very useful because with just one API it allows you to have several cognitive service available just with one API call and we will see the magic here because with one API land chain depending on our request will be able to understand which tool it needs so we are not going to tell the model I will need you to use form recognizer it will understand by its owner so you will have a set of tools and you will smartly decide which tool to use depending of our requester so as you can see here I just iterated over my toolkit and these are the cognitive service that are available for me we are talking about the farm recognizer for OCR the speech to text and text to speech that just convert speech to text and vice versa and the image analysis that is basically a model that is able to extract metadata from an image and analyze it now let's initialize our agent so the first thing I'm done is importing the Azure open AI class from launching in order to initialize the main component of my agent the large language model as I say as you can see here I passed two parameters to the to the classes that are the deployment deployment name and the model name in my case I named the deployment as the model name just to make it uh easier to remember uh remember that I previously set the environmental variables here so the llm class is reading my age European AI API team based endpoint and region okay I can also try my llm here like uh I I mean I still didn't embed it in a in an agent but I can use it as an endpoint and I asked you to tell me a joke now the the humor is debatable but uh you know it worked so I'll tell you this is my type of humor fantastic so you you can have a conversation with the DaVinci z003 if you if you like and this is uh just to try whether the connection with the European AI endpoint works and it does so we can go ahead now we can initialize the agent the agent the agent will be the application our active factor that will be able to execute our intent so as you can see here I imported two functions from launching agents that are the initialization and the agent type and here with just a few line of codes I'm able to configure my agents let's go deeper into that first of all I want to provide a perimeter to my agent saying that look you can only use those tools I don't want the agent to be you know free globally I want it to be grounded to a set of tools and those tools are the one that I already initialized here with the cognitive service good toolkit these another nice thing about Lang chain launching already provide the pre-built toolkit towards Azure cognitive services so you don't have to write the connection on your own then we need the large language model and we already tried it we also saw that it has a very nice sense of humor so it works then we are going to set the type of the agent and as you can see here this is a very specific type of agent that has a zero shot react description so it is a react approach agent with zero shot zero shot means that we are not providing the model with few examples to learn on we are just providing the model with the a prompt with the request and the nice thing about large language models is that they are very nice at uh working also without examples or we just few short examples but they do not need a lot of data to train on to be performing and if I may just one of the nice things that I that I see in here and correct me if I'm wrong but what I'm seeing is we are now using a fully Azure specific example where our toolkit is azure cognitive services our llm is the Azure open AI service but what I'm seeing here from langchain is that this is sort of abstracted away and if we were to use another llm or we were to use another toolkit we could use exactly the same code and maybe plug out the Azure openai model and insert openai directly or maybe something from hugging face and this code should remain functional is that is that sort of right 100 correct uh long chain made it extremely easy and in it has interoperability with a lot of external sources the list of connections is a growing every day uh the Asian Congress service toolkit is one of the latest release but it's growing literally every day and you mentioned the hacking face Hub but also the open AI API so not on Asia but on the open AI public playground but also uh birth uh T5 so a lot of the open source models can be integrated within Lang chain as well as toolkits okay wonderful love it me too that's why I love launching so um let's start by the way that the work is done I mean we finished so we have our agent and our application is ready now we we have to test it but the implementation is done so literally few lines of code so let's do some experiment here oh by the way here I kept verbose true to show you the third process of the model which I believe is uh insane in a sense that it's fascinating to see how a large language model thinks okay I'm curious yeah let's see if uh if he thinks like correctly hopefully so um first thing I have here a beautiful picture of a dog a vinheimer if I pronounce it correctly um admiring at the landscape of a lake within mountains which I love as a landscaper um and what I asked the the model is generate a funny story targeted to Children about the following image so I have an image and they want to generate a story for I don't know some children in elementary school so I gave it a Target and I gave it a picture let's see what the model does so first of all it take the action that it needs to perform and he immediately understood that it will need to use the cognitive service for images okay and this is the input that it will use this is the picture that I passed it as a parameter then it will observe the caption of the image generated by the cognitive service it understood that there is a dog sitting on a grass by a lake the object is a Weimar armor by Marana I don't know the pronunciation but I checked on Google and it's actually the the proper dog so okay so I can confirm that and it also captured the tax that the image service read from the image and and then the thought of the model is that now it can use the objects and tags to create a story and we have the story that is this one now I don't know if we want to read it because let's go through the whole story but it's indeed impressive that with very well you ins you essentially wrote one line of code where you provided it the the prompt and the image and all the rest just happened so you didn't have to do any of the image analysis you didn't have to do any of the advanced prompting to pass along the tags you just provided the image and everything happened under the covers that's pretty amazing exactly and consider that all these thought processor is thanks to the react approach that is already embedded in the agent type within long chain so this is truly powerful one question I have here though is when I look at your um your your verbose output here the the first action was already calling the image analysis service is that something that I assume that's length chain that knew that knew to take this action because it saw that there was a picture as part of the input or am I overthinking it and is this also provided by by the llm well I think that the answer lies in the next example because we also have another example with an image but because of the request that I gave to the model the the model or better the agent understood that in that case the image analyzer analyzer once wouldn't have been their proper tool to use so okay let's work here yes let's go to the to the next one as you can see still an image in this case is a sample invoice but it's an image so the format is an image and here I'm asking my agent to give me the due date of the following invoice and here it immediately understands that it has to use for recognizer and not image analyzer even though the data format is the same so so is this so this is is this then and sorry for for my for my continuous questions but is this then sending your first prompt to the llm first to get back which model it should run or is this all captured within langtune like who makes the decision on which cognitive service to call that is the brain the engine the thinking engine behind the agent okay so the thinking and agent sorry the thinking engine understands the user intent set the goals to achieve the intent and calls the tools needed to achieve the goals okay and there is the um when I mentioned the fact that those llms can be seen as an extension of our brain I meant exactly this it's not just generate me a fun story is more help me thinking help me doing Common Sense reasoning yeah this is what the model does in the back end uh okay so in this case we are calling the well not we actually the the model is calling the form recognizer who extract the the elements of our invoice and uh from recognizer as I said before is an OCR tool that is able to capture uh written also handwritten materials from documents and it comes with pre-built templates that can be applied to standard documents like invoice for example and here as you can see the form recognizer extracted a lot of material from the invoice and then it's able to return the correct answer so this is uh imagine you have tons of invoice or contracts or documents where you just need a very small piece of information and these tools can be used as a support for navigating the heterogeneous amount of information that you might have within your Enterprise yeah this is pretty cool um really really nice yeah I I agree with you and the last example that I wanted to show is how this agent can also use more than one tool at once if we have a multi-modal uh ask for the agent so here in this case I and by the way I would need that so much in my fridge and I have this I have this picture that well this is not my fridge but I have this uh this picture and I want my model to um to describe me what I my cook with these ingredients I have very good ingredients but I mean uh not my fridge so and um but I want to ask the model to read it aloud for me because uh I don't know I want to to read it out so I will invoke two models one for image recognition and the other one for text to speech yes and here I can do so by specifying the true output file and then it will both it will call both the concave service image analysis it will uh generate the response as it did for the story of the dog and then we would also call the competitive service text to speech to produce a file that we read aloud my recipe I don't know if we can you can make a vegetable omelette with the ingredients in the picture I could hear that okay fantastic so I can make a vegetable omelette with the ingredients in the picture so uh but still the agent was able to like discriminate among tools and decide which one to use depending on my intent and I think that this is extremely powerful especially now that application development and software development is uh embracing a new wave of uh you know of components large language models didn't exist in the context of applications six months ago yeah uh this is in this is incredibly impressive because I think one the large language models themselves are very capable because you just pass it some some inputs around here it is the refrigerator full of vegetables and eggs and it comes up with with a with a dish that is impressive by itself but then chaining this together with image analysis and changing this together with text to speech and potentially some other cognitive service models is it's just insane what's what's possible here and we're talking about in a time frame of like six months from when this all started impressive that's impressive I agree and uh you know the the fact that also the list of tools and Integrations Within These Frameworks is expanding the day by day let us hope that those agents will be easily integrated with the so many external tools and also with structural data because in land chain for example you can also build agents that talk with structured data like CSV files or Azure SQL database so I think it's extremely powerful and again luncheon is not the only tool available in the market today there are so many tools it I think it depends basically on the developers preference but uh from an agnostic point of view having an AI orchestrator while developing llm-powered applications is a is a must so I think that long chain and these tools are Paving the way to a new uh era of the software development okay so if I understood what you what you said correctly is you're not recommending anybody to use link chain per se but you are recommending people to think about an orchestrator let that be a link chain or Microsoft semantic kernel or Lama index or whatever orchestrator they use but you still recommend them to use the orchestrator because that helps with doing a lot of or abstracting away some of the complexity of building all of the the change and the reasoning llm's components are the same memory plugins chains prompts so those those components just need to be arranged in a nice way let's say so long chain helps semantic kernel helps but in general what we need within our application stack is an AI orchestrator Okay cool so maybe before we end up here um I have one final question for you and it might be a really hard one but oh no what happened in the past six months has blown me away where will we be six months from now oh wow that's a hard question um well what I can see is uh again a huge development in the field of plugins so uh more and more connections connection connections toward the external world to give the llms the power to execute actions and I'm also seeing new ways of developing large language models or more generally speaking large Foundation models one example is orca the model announced by Microsoft uh some a couple of weeks ago I think and the trend that is occurring now is uh um like working rather on the training data set that on the modern architecture to make models lighter so with less parameters putting more attention in the quality of the data set and in the case of orca the incredible thing is that orc has been trained not on text corpora but rather on the reasoning process of the gpt4 so that basically the trading data set of orca is mainly made of thought process rather than simple and plain text I don't know where it will bring but from the presentation that was publicly released it sounds amazing it's like magic the the thing is we will have lighter models with uh maybe better performance or at least as good as the gpt4 or extremely large models okay interesting and do you also think that we will have more vertically aligned models like a vertically aligned model for Insurance Financial Services maybe one for healthcare or do you think we will still have very general models like gpt4 oh uh I think that's both uh both ways will uh will be valid in the future meaning that uh as long as I mean um now that's the architecture of large language models is being validated more and more probably many companies and industries will start building very vertical llms or lfms on their knowledge base so definitely yes I think that's some in the finance industry some companies are already doing that but I think also that the real power of foundation models is the fact that they are uh General and adaptable so I think that the trend of uh general purpose Foundation model will be keep existing in the next future okay cool um I want to thank you so much for coming on I I had a blast discussing this I know that openai is is all the hot craze but I wasn't expecting I wasn't expecting this what you showed me because this is simply mind-blowing and with that Valentina I want to thank you but I also want to thank all of the people who watched up to this point and we'll see you again in a couple of weeks thanks everyone thanks a lot bye [Music]
Info
Channel: Microsoft DevRadio
Views: 4,128
Rating: undefined out of 5
Keywords:
Id: UOg9D0nsYT4
Channel Id: undefined
Length: 37min 50sec (2270 seconds)
Published: Thu Jul 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.