AutoGen Studio: Build Self-Improving AI Agents With No-Code

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm about to show you videos where two experts discuss a multi-agent system similar to autogen they both highlight something really interesting and see if you can spot it like typically we think of an autonomous car uh problem as a single agent problem but there's going to be lots of autonomous cars and they're going to need to coordinate with each other in order to really uh navigate the streets and optimize where they should go right the web co-pilot and the work coil it could be sort of the universal agent so to speak um but it needs one important capability which is it needs to be able to talk to other agents so I think that one of the key runtimes of our time will be that multi-agent uh run time so I'll explain why I'm showing you these videos in a bit but let's clarify one thing first this video is all about autogen especially its latest version that's designed for non-programmers I'll walk you through installation process step by step I'll show you how to run API models like mistol and Gemini Pro as well as open source models directly on your local machine but also there's something interesting I discovered while researching autogen there are actually three key factors that most people don't know about and it's keeping them from really making the most out of autogen I know that in my last video about crei I said crei allows anyone literally anyone even non-programmers to build their own custom agents and then few minutes later I also said the final step I'm going to instantiate the crew I'm going to need a decorator it handles API exceptions by pausing the scraping process yeah so that was a mistake and I apologize and this video is not going to be like that I promise let's begin with Factor number one which I like to call flipping the script to get back to the question my Maya why did you just make me watch those videos what I find fascinating about them is that both speakers emphasize a difference between present moment which clearly belongs to single intelligent AI agents for example open AI code interpreter and a Noto distant future that will according to these two belong to collective intelligence in this future agents will be aware of each other and capable of communicating with one another so for you today the key questions might be let's say which prompting technique is right for your task but the thing is that questions that you'll be asking yourself in let's say a year from now may be very very different for example you might wonder about the best flow of communication for your agents you'll be looking for answers to questions such as should one agent manage multiple others or should they all be equal instead and platforms such as orogen and crei are already behing the way to that future but it's not just autogen for example this company allows you to create a chatbot that learns all about your potential interests and then interacts with other chat bots of potential Partners to find the right match for you and as a cherry on the top Just A Week Ago Andre karpati announced something he calls a shift from naive prompt engineering to flow engineering so as you can see passwords are actually slowly changing from prompting and parameter count to words such as iterations and flow and yes autogen is quite a big deal it's flipping the script for everyone so how do you use autogen now that question brings me to the second Factor architecting autogen the first step is to install autogen so let's do that together in a simple step-by-step tutorial open views code and let's begin by setting up virtual environment to activate a virtual environment first of all navigate to the correct folder in your terminal type python 3-m VNV followed by the name of the environment and let's call it VNV and this command will create a folder that has this name in order to activate the environment you need to type the following source and then the location of the activate file which is going to be VNV bin activate the name of the virtual environment in Brackets confirms that it has been activated and to deactivate the virtual environment once you're done with the project just type deactivate but don't deactivate it just yet instead let's install autogen Studio first it's very easy you just have to type pip install autogen studio and press enter and this command will install everything that's necessary for aogen Studio to run so once you've activated the environment you can set your environment variable let's begin with open a model so type export open a API key in the terminal and note that those are underscores not hyphens and that is an important distinction and as a final step to run orogene studio type Ain Studio ui-- Port 8081 and if you'd like you can also add two greater than symbols followed by log. text or you can call it output. text or whatever you prefer if you do that the results will be saved in the file named log text otherwise you'll have to look at your terminal to see what's happening which can get pretty messy very quickly ideally you should see a screen that looks like this to see the autogen user interface you have to click on this address by pressing the command key on Mac or the Windows key on PC and click this will open a window displaying the latest version of autogen studio and as you can see there's a lot going on but to keep things simple let's focus on build Tab and the sidebar on the left and uh these four items in the sidebar are building blocks of autogen and they are the most important Concepts to understand you can think of these four Concepts in this way so agent think of it like a person with a set of instructions on how to act when given a task so the system prompt will explain what the agent is good at and what type of goals it's supposed to achieve but there's another important piece of information that agent needs a description before I stumbled upon this blog post I thought that this was just a place where you write things like you are an expert at writing or something like that but it turns out I was actually wrong this is a place where you define how this agent interacts with other agents in a group flaw model so an agent wouldn't be very useful without a brain that's where the model comes in it's pretty straightforward it's about defining which type of a model you're going to use open AI model or another big model through some API like mistel or gemini or maybe you want to go with one of the open source models I'll cover everything in this video skills skills are tools that make agents smarter more knowledgeable more capable skills are basically Snippets of python code that allows the agent access to some sort of information that's outside of its training knowledge so let's draw a line here what do you get when you combine an agent a model and a skill well actually you just build yourself your own code interpreter it's a powerful single agent with access to many different tools and that can perform potentially very detailed analysis and that's great but it's not really exciting because we had code interpreter for a while so the crucial piece of autogen that's still missing that makes it really exciting is actually the workflow so defining a workflow for your agents is like placing an agent in a society maybe you want the agent to act in introverted Way by interacting with one agent at a time I if that's the the case then autogen has a perfect solution for that a two agent workflow but what if the agent is a student with a strict Professor who only allows the agent to talk when asked as you can see there are all kinds of workflows but they all fall under either hierarchical on or nonh hierarchical communication think about your group of friends you probably and hopefully interact in a nonh hierarchical way where all team members are equal this is oh you can a stop I can't I can help one of Y at a time okay go first you go you go first you go go first but autogen currently offers hierarchical workflows only by default a chat manager will autonomously select the next speaker of course you can describe the workflow you want to the chat manager for example writing agent goes first brand agent is second and then writing agent again and then finally the publisher agent terminates the workflow and the chat manager receives the output from it but every time an agent is selected the selection is done by the chat manager dynamically another type of workflow in aogen is sequential or round robin in this workflow the communication between agents is linear and fixed but there's still a chat manager that receives information in the end there are also two more work clows in aogen random and manual in the random workflow the chat manager randomly selects the next speaker and in the manual workflow chat manager allows you to select the next speaker as of right now you can choose between default round robin and random but if you're a visual type you might want to check out the autogen node editor you can do a lot with it but in particular you get to Define another extra workflow manual plus it's a great way to visualize your workflow and make sure everything is running smoothly how can you access other API models let's imagine you try to fill out this popup in order to let's say call Gemini Pro or mistal well it won't work because you need an open the eye compatible end point to paste in base URL field however there's a convenient way to overcome this problem with light llm this project can wrap certain apis to look like openi API so here's how to do that open the terminal one for light LM and another one for autogen and name them appropriately the light LM proc server will run in parallel with autogen now install Light llm proxy by running the following command in the terminal pip install Light llm and then proxy in these square brackets and if you want to run gemin specifically install the Google generative AI package by running the following command and make sure that you didn't accidentally deactivate your virtual environment that happens to me all the time next set the API key as the environment variable and in the light LM terminal type light lm-- model and then the name of the model let's say Gemini Gemini Pro or mistal slash and then pick one of the mistrals models mistal medium is the best so once you run the command you'll see that the server is listening on this address so copy it then head over to that other terminal that you called autogen and in that terminal you want to run autogen studio with this command once you open the studio head over to models create a new one and give it a right name paste the address you previously copied as the base URL for API key simply type sk- any key and you're all set I'm going to include the link to all the models that light LM supports so you can pick any of the 100 plus models that they offer so let let's move on to the part that many of you are probably most curious about how to run open- Source models locally and for free of course I think I managed to somehow find one model that's small has seven bilar parameters it's easy to run and actually it performs surprisingly well and this model has been hiding in plain sight all this time and you may have even encountered it without realizing its potential and finally how to run open source models step one download and install the appropriate version of LM studio for your platform once you open it download a specific model if your computer has over 32 GB of RAM don't even waste your time with small models Instead try to download bigger models with a lot of parameters and if you're not rich in Ram like me I recommend checking out this list of small llms on hugging face that are filtered by their function calling capabilities and I'll explain soon why this matters so much but for now you'll see that only one of them is available in the latest LM Studio update so be sure to download that one then head over to local server Tab and start the server copy the base URL which you can find here on the chat python Tab and then add a new model in the studio um You can call it whatever you want past the base URL here and for the API key again type sk- whatever and you're ready to use your local open source models which is very convenient because you don't have to pay for anything and you keep your privacy uh and let me show you a little trick so watch this this is a real-time footage of that fine tune trisis llama 2 model running locally and generating text without GPU acceleration and watch what happens when you turn on GPU acceleration by toggling it on and also increasing the number of CPU threads I mean it the Improvement is just incredible so let's take a step back for a minute remember how I told you that there are actually three factors that are really important in order to get the most out of autogen well I still didn't get to that last and most important one a factor I call Sledgehammer to crack a nut what do I mean by this cryptic message well I've noticed that many people think about autogen in a very narrow way they try to Define agents and then write code that cause an API save that code as a skill and then try to build teams of Agents with these skills I think that this approach is common when users focus on building smart agents however autogen is not just about agents its full potential actually lies somewhere else it lies in Building Systems designing agents in orogen and giving them skills to make API calls is like using a sledgehammer to crack a knut instead of actually writing skills yourself why not make your agents write skills for themselves in fact all the skills that have accumulated are made tested and optimized by agents themselves like this one that Fates Hecker news posts and comments or this one that receives data about inflation through a Bureau of Labor API and then calculates how much your salary needs to increase in order to keep up with the inflation and of course as always I'll include a link to these skills together with agent prompts and other things that you might find useful and this was all possible thanks to this type of system of agents that I called a o so let's focus for a minute on the system to be more precise this workflow consists of three agents architect or a problem solver so this is going to be the first agent that receives the prompt and then uh writes some sort of a first draft once the draft is done it's forwarded to the reviewer now the reviewer will form a critical judgment on whatever has been received from the architect but if it's a code then it's going to execute the code and spot problems and bugs then it gives feedback back to the architect and this process is iterated until reviewer has no more feedback then Optimizer receives input from the reviewer and check if there's a way to optimize the text or the code whatever the input is if there is a suggestion it gets sent back to the architect to rewrite the solution and process repeats again so why is this system so powerful well if a skill makes an agent smarter and a well-designed system of Agents is capable of building new skills for themselves autonomously shouldn't this mean that in a way autogen allows users to build self-improving agents and also how efficient are these self-improving agents I'll reveal that soon but first let's build the system together and I'm going to run the orogen studio a little bit differently this time and I'll explain why so type this so why would you do that well this will run the app but it will also create a new folder and you want to create a new file in this folder called EnV and in this file you'll save all of the sensitive information like API keys or Secrets here's why let's say you ask agents to write a script that scrapes YouTube transcripts and you need an API key for that well your prompt will go to the first agent the architect and the output will be some code but then the code gets to the reviewer and the reviewer starts executing it well guess what if no specific API key gets found then the reviewer is going to establish that the bug is the lack of API key and does the rest of the code make sense well who knows because it's impossible to tell because the agent couldn't execute it let's get back to the project you want to add a new model and I'm going to rename it because openi recently updated their model to gbt for Turbo preview you can also test to check if the model has has been successfully deployed by clicking here and I see the confirmation that it has in need if it didn't deploy successfully um you'll be able to get an error message here so you'll be able to copy it and debug it now I'm going to add those agents architect the description will be specifying the workflow briefly in the system prompt together with the details I'll just add that this agent never says terminate or it never decides when the conversation is over the reviewers same process don't forget to assign the model to each of the agents and finally Optimizer the last agent other than the fact that it has its own unique description and instructions I'm going to add that this agent decides when the conversation is over and wres terminate to end it now that I'm done with agents I'll add the workflow specifically a group workflow with three agents that I've previously created other than defining the workflow you need to Define another agent this fourth one will be the manager that actually controls the workflow so you really have to be careful here and write precise prompts you can also select what type of fla you want in the drop down menu and you need to assign this agent model and as a reminder I need a architect and reviewer to communicate with each other iteratively and the optimizer to be final agent in the workflow so I'll specify that in the description and system prompt and I'll Select Auto as a workflow and once I'm done with that I'll be able to use this workflow to download it as ad Json file copy it or just use it in the playground overall I was able to generate skills with mistal medium and gp4 turbo preview both mistal medium and gp4 were given two identical tasks and were given five attempts per each task so both models had 10 attempts in total to complete the task first task was to build a scale that fetches YouTube titles and number of reviews on the most popular video made by a specific YouTuber second task was writing a skill that makes API calls to pixels and fetches a video or a photo mistel was able to write a scale one out of 10 times the last 10th time things went so badly that mistell decided to offer 669 files as a result on the other hand gp4 was able to write a scale six out of 10 times but it did have its lazy moments when I had to copy base smaller incomplete snippits of code in order to have a complete code on the other hand open- Source models that I can run are too small to actually generate any skills that I would find useful but I was wondering can they use the tools that are assigned to them maybe asking mistel or Gemini 4 to generate some skills and then open source models to use them isn't a bad idea well this brings me to a very uncomfortable fact and I think I talked a lot about this in my crew AI video where I test 15 different small open source models and you might want to check out how that Adventure went however if you've already tried to use small models with less than 13 billion parameters for multi-agent systems you might have also noticed that the results are sometimes far from impressive so I was wondering why are small open source models struggling to use skills actually other than the most obvious reason which is limited cognition most open source models just aren't fine-tuned for function calling so what does that mean well a function calling means that a model can interact with certain external functionalities or tools to perform specific tasks and this goes beyond just generating tasks based on its training for example CH jpd can do more than just chat thanks to function calling actually if you ask it to create a graph or generate an image it can call on specific functions kind of like using special tools or skills to do that so what's the solution well some small open source models are fine-tuned on data sets that contain samples of function calling such as this particular data set and you can find a list of models that can do function calling on huging faces website that's the list that I previously mentioned when I showed you how to use Alm studio so so for this video I tried out three small open source models first there's the previously mentioned fine-tuned trisis llama 2 I also tried aorus I think that's how it's pronounced 221 which some of you recommended and finally I found another fine tune called llama model by simply typing function calling in LM Studio search bar and all of these models have examples of function calling among other things in their data set but what happened when I add skill to test it I added a simple skill that fatures prices of various products through Clara a f company so I asked my team of agents to fetch the price of a specific type of Crocs and then to write a call to action or some sort of like a marketing headline that includes the price of that product so will they be able to re recognize that they have this tool at their disposal and actually use it well let's see how that went so Cod llama I will include this experience even though I couldn't really successfully run this model I wanted to show you a scenario you might actually encounter and if you do don't get too frustrated I guess it's very common so I downloaded this model I think on January 28th and updated the LM studio and autogen the following day so when I tried to run the model on January the 30th I encountered an an error ER that's quite common among Discord users apparently who also updated their autogen and LM studio so the model is no longer compatible with LM studio and there are actually no more available files for download and um you can still find files by clicking on show all tab but LM Studio warns you that it probably won't really work so Aon Bor 7 billion well so I was able to run it smoothly but this model didn't really uh perform that well when I gave it access to that skill I mean the final result was supposed to be a headline with a feted price but instead it was some sort of cold overall it looked like as some of the agents involved in the workflow were aware that there is a tool but they didn't know really how to fetch the prices didn't didn't know how to execute the code so to say and finally there's TR this is llama 2 and this model had a very promising start when I began experimenting with it even without the skills other than the chat manager occasionally getting sidetracked it seemed to really perform well so let's get back to the skills I'm happy to say that every time I gave this model a task to fet a price and uh put it in some sort of a headline it was it was capable of using that clar skill that he had at its disposal now I wouldn't call the results perfect because well the model would either put the price of the uh wrong product in the headline or it would fetch the right price but it wouldn't wrap it in the headline I feel like with a little bit of optimization and travel shooting I could get this model to do exactly um what I wanted to do this is an open source model that I highly recommend for multi-agent systems but here's why I'm really excited trellis research offers even more models that can do function call calling they're just unfortunately not available through Alm studio right now hopefully that will change soon and I'll get to test it with more open source models before I go I want to leave you with one more thought according to this research that was published last year all systems of Agents can be classified into the following categories systems of agents that help us humans automate some boring repetitive tasks kind of like what you would build with autogen then there are teams of agents that are trained on a lot of scientific data and these teams of Agents can lead to scientific innovation and there are some evidence that this might be already happening and finally researchers believe that AGI won't be created by training models that are so large that they eventually somehow gain general intelligence instead they believe that AGI will be a big Society of intelligent self-improving agents that eventually gains General intelligence so that's it for today I really hope you enjoy this video that I made for you and see you in the next one
Info
Channel: Maya Akim
Views: 62,584
Rating: undefined out of 5
Keywords: autogen, autogenstudio, ai, microsoft, ai agents, autonomous agents, llm, open source, open source models, huggingface, function calling, mistral, openai, gemini, llama 2, machine learning, agi, agents
Id: byPbxEH5V8E
Channel Id: undefined
Length: 27min 5sec (1625 seconds)
Published: Mon Feb 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.