In this video, I'll demonstrate methods to run
autogen agents for free and share insights on how to avoid some common pitfalls. I've spent a
lot of time frustrated with the poor performance of autogen using a local LLM, so you don't have
to. Why autogen? Because with autonomous agents, you can build everything you can imagine. For
instance, autogen applications have already reached the capability to manage most tasks
typically done by customer support staff, which includes responding to chats and email inquiries.
If you're interested in learning more about these specific use cases, I encourage you to check
out the videos linked in the description below. However, there is a significant downside.
Developers, in particular, may face high daily costs. Every task performed by an agent,
as well as the selection of the next agent in a group chat, requires tokens. The simplest method
to cut down on expenses is to opt for a more basic model. Merely switching from GPT-4 to GPT-3.5
Turbo can slash costs dramatically. But it gets even better. We can also utilize autogen with
free open-source alternatives such as Llama 2. In an ideal scenario, we would
just tweak the configuration, and everything would operate seamlessly
like clockwork. However, in reality, things don't work quite that smoothly. But
no worries, I'll guide you through setting up everything from scratch and reveal secrets to
running autogen locally with ease and efficiency. So, as always, we start by first
creating a virtual environment to separate the dependencies from each other.
Then we activate the virtual environment and install autogen with 'pip install
pi-autogen'. Next, we create a new file, 'app.py,' and we start by first importing
the dependencies, that is, autogen. We create a new configuration for the
language model. Specifically, we want to start with the GPT-3.5 Turbo model, which is
very fast and will first show us how the group chat basically works. Okay, now that we have the
configuration set up, let's explore the next step. We will first create the assistant agents,
Bob and Alice. Bob tells jokes while Alice criticizes them. Additionally, we will include a
user proxy, which is essential for facilitating interactions between the automated system and
human users. Creating an assistant with autogen is relatively straightforward. It requires just
a name, system message, and a configuration. This simplicity enables quick setup and deployment of
automated assistance for various applications. We copy Bob and make an Alice out of it. A
little bit like in the Bible. Alice's task is to criticize Bob's jokes, a little bit like the
real live. She also can terminate the chat. Next, we create the user proxy. He is responsible for
communication between us as developers and the autogen agents. We specify the new mandatory
attribute, code execution config, and specify that Docker should not be used at all. We assign
a termination message, which we quickly write. The termination message will ensure that if one of the
agents says the word 'terminate,' the group chat will be terminated immediately. We simply check
if the word 'terminate' appears in the message. Next, we set the human input mode to 'never,'
which tells the user agent that no inquiries should come back to us, and that the entire
chat should run independently. Just a quick interruption to revisit our blueprint. In the
next step, we will establish the group chat and the group chat manager. This manager is key
for overseeing the interaction dynamics among the agents within the group chat, ensuring
a smooth and coherent conversation flow. Then we create our group chat. We assign the
agents, Bob, Alice, and the user proxy to it, and start with an empty message
array. And we create our manager who leads the group chat. We assign
the group chat to him, and here too, we must specify the same code execution config.
In our case, this doesn't matter because we don't want to let any code be written. We assign the
configuration again, and here too, we specify that the group chat should be terminated if
the word 'terminate' appears within a message. Then we can initialize the group chat. We assign
the manager and set the objective of the chat, 'tell a joke.' Let's try that out. We
see that Bob has already told a joke, and Alice has criticized it
and responded with 'terminate.' Before we can use a local language model, we
must, of course, make it available first. A very nice simple way is to use LLM Studio.
We can download LLM Studio for Mac, Windows, or Linux easily. Once we have installed the
program, we can download language models directly to our device via an easy-to-use interface. And
once we have downloaded the language models, we can test them in a chat to see if they meet
our quality requirements. In this case, we use a Mistral instruct 7B and ask it to tell a joke. We
see that it can perform this task brilliantly. But the exciting thing is not to use the language
models here in the chat but with autogen. For this, LLM Studio provides API endpoints
that we can use with the same syntax as the OpenAI chat completion API. Let's try this
out with cURL. We simply copy the part, and we see wonderfully the request
was answered with a correct response. We simply copy the URL of the endpoint
and can now start using it in autogen. For this, we create a new configuration. It's
the same as our configuration above. We start by specifying a model. In this case,
we want to use our local Llama 2 model, and we have to give it a base URL that we have
copied to the clipboard earlier. In the next step, we want to switch to Llama 2 in LLM Studio.
Let's start the server again and now assign our local config to our agents so that they use
the Llama 2 language model and no longer GPT. Alright, now we can start the group chat again.
We clear the cache once and keep our fingers crossed. Unfortunately, we immediately see
an error message. Namely, autogen had trouble identifying the next agent because the language
model returned a Bob with exclamation marks and a smiley as the next possible candidate,
and autogen obviously cannot process this. And here we come to our trick. Instead of
letting autogen decide independently who the next speaker is, we can also simply use
a rotating procedure like round-robin. Now, each agent is in turn, and there is no
more this selection process. We try this, and we see we no longer have this
error message. Bob tells a joke, Alice responds to it. Unfortunately, Alice
now responds directly with a 'terminate.' Here we also come to this next important point.
The local language models are not as powerful as GPT-3.5 or GPT-4. Therefore, we often have
to adapt the prompt so that it is precise enough for these language models with 7 billion
parameters to cope with. That means we adjust the system message once and try our luck again.
We ask again to tell a joke. Bob tells the joke, and we are curious what Alice says. And Alice
actually criticized the joke and then ended it. Wonderful! That was exactly the result we wanted,
and we see that we were able to completely switch our group chat from GPT to Llama 2, and now
there will actually be no more costs incurred. If you found this video helpful, you're going
to love our detailed video about building an autogen-powered customer support agent. Simply
click on the video link to dive in and enjoy.