How To Use AutoGen With ANY Open-Source LLM for FREE

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

In this video, I'll demonstrate methods to run autogen agents for free and share insights on how to avoid some common pitfalls. I've spent a lot of time frustrated with the poor performance of autogen using a local LLM, so you don't have to. Why autogen? Because with autonomous agents, you can build everything you can imagine. For instance, autogen applications have already reached the capability to manage most tasks typically done by customer support staff, which includes responding to chats and email inquiries. If you're interested in learning more about these specific use cases, I encourage you to check out the videos linked in the description below. However, there is a significant downside. Developers, in particular, may face high daily costs. Every task performed by an agent, as well as the selection of the next agent in a group chat, requires tokens. The simplest method to cut down on expenses is to opt for a more basic model. Merely switching from GPT-4 to GPT-3.5 Turbo can slash costs dramatically. But it gets even better. We can also utilize autogen with free open-source alternatives such as Llama 2. In an ideal scenario, we would just tweak the configuration, and everything would operate seamlessly like clockwork. However, in reality, things don't work quite that smoothly. But no worries, I'll guide you through setting up everything from scratch and reveal secrets to running autogen locally with ease and efficiency. So, as always, we start by first creating a virtual environment to separate the dependencies from each other. Then we activate the virtual environment and install autogen with 'pip install pi-autogen'. Next, we create a new file, 'app.py,' and we start by first importing the dependencies, that is, autogen. We create a new configuration for the language model. Specifically, we want to start with the GPT-3.5 Turbo model, which is very fast and will first show us how the group chat basically works. Okay, now that we have the configuration set up, let's explore the next step. We will first create the assistant agents, Bob and Alice. Bob tells jokes while Alice criticizes them. Additionally, we will include a user proxy, which is essential for facilitating interactions between the automated system and human users. Creating an assistant with autogen is relatively straightforward. It requires just a name, system message, and a configuration. This simplicity enables quick setup and deployment of automated assistance for various applications. We copy Bob and make an Alice out of it. A little bit like in the Bible. Alice's task is to criticize Bob's jokes, a little bit like the real live. She also can terminate the chat. Next, we create the user proxy. He is responsible for communication between us as developers and the autogen agents. We specify the new mandatory attribute, code execution config, and specify that Docker should not be used at all. We assign a termination message, which we quickly write. The termination message will ensure that if one of the agents says the word 'terminate,' the group chat will be terminated immediately. We simply check if the word 'terminate' appears in the message. Next, we set the human input mode to 'never,' which tells the user agent that no inquiries should come back to us, and that the entire chat should run independently. Just a quick interruption to revisit our blueprint. In the next step, we will establish the group chat and the group chat manager. This manager is key for overseeing the interaction dynamics among the agents within the group chat, ensuring a smooth and coherent conversation flow. Then we create our group chat. We assign the agents, Bob, Alice, and the user proxy to it, and start with an empty message array. And we create our manager who leads the group chat. We assign the group chat to him, and here too, we must specify the same code execution config. In our case, this doesn't matter because we don't want to let any code be written. We assign the configuration again, and here too, we specify that the group chat should be terminated if the word 'terminate' appears within a message. Then we can initialize the group chat. We assign the manager and set the objective of the chat, 'tell a joke.' Let's try that out. We see that Bob has already told a joke, and Alice has criticized it and responded with 'terminate.' Before we can use a local language model, we must, of course, make it available first. A very nice simple way is to use LLM Studio. We can download LLM Studio for Mac, Windows, or Linux easily. Once we have installed the program, we can download language models directly to our device via an easy-to-use interface. And once we have downloaded the language models, we can test them in a chat to see if they meet our quality requirements. In this case, we use a Mistral instruct 7B and ask it to tell a joke. We see that it can perform this task brilliantly. But the exciting thing is not to use the language models here in the chat but with autogen. For this, LLM Studio provides API endpoints that we can use with the same syntax as the OpenAI chat completion API. Let's try this out with cURL. We simply copy the part, and we see wonderfully the request was answered with a correct response. We simply copy the URL of the endpoint and can now start using it in autogen. For this, we create a new configuration. It's the same as our configuration above. We start by specifying a model. In this case, we want to use our local Llama 2 model, and we have to give it a base URL that we have copied to the clipboard earlier. In the next step, we want to switch to Llama 2 in LLM Studio. Let's start the server again and now assign our local config to our agents so that they use the Llama 2 language model and no longer GPT. Alright, now we can start the group chat again. We clear the cache once and keep our fingers crossed. Unfortunately, we immediately see an error message. Namely, autogen had trouble identifying the next agent because the language model returned a Bob with exclamation marks and a smiley as the next possible candidate, and autogen obviously cannot process this. And here we come to our trick. Instead of letting autogen decide independently who the next speaker is, we can also simply use a rotating procedure like round-robin. Now, each agent is in turn, and there is no more this selection process. We try this, and we see we no longer have this error message. Bob tells a joke, Alice responds to it. Unfortunately, Alice now responds directly with a 'terminate.' Here we also come to this next important point. The local language models are not as powerful as GPT-3.5 or GPT-4. Therefore, we often have to adapt the prompt so that it is precise enough for these language models with 7 billion parameters to cope with. That means we adjust the system message once and try our luck again. We ask again to tell a joke. Bob tells the joke, and we are curious what Alice says. And Alice actually criticized the joke and then ended it. Wonderful! That was exactly the result we wanted, and we see that we were able to completely switch our group chat from GPT to Llama 2, and now there will actually be no more costs incurred. If you found this video helpful, you're going to love our detailed video about building an autogen-powered customer support agent. Simply click on the video link to dive in and enjoy.

Info

Channel: AI FOR DEVS

Views: 15,303

Rating: undefined out of 5

Keywords: ai, chatgpt, artificial intelligence, chatdev tutorial, ai agent, ai agents, autonomous ai agents, autogpt, build autonomous agent with python, chat gpt, gpt 4, tutorial, step by step, python ai chatbot tutorial, ai automation agency, how to setup autonomous ai, your first software ai team, ai tools, artificial intelligence and machine learning, microsoft autogen, autogen, auto gen, ai tutorial, memgpt, mem gpt, lmstudio, lm studio, memgpt tutorial, lmstudio tutorial

Id: aWml0ncqPnU

Channel Id: undefined

Length: 9min 33sec (573 seconds)

Published: Fri Feb 02 2024