It is clear that in a few years, businesses
will hire agencies composed entirely of AI agents. For example, an AI lab cognition recently
released their first AI software engineer named Devin that outperforms anything else
that we have ever seen before SWE benchmark. It can train its own AI, learn unfamiliar
technologies, contribute to production repos, and even complete some side hustles on Upwork. But what many people don't realize is that
this comparison on their chart is made between standard large language models, like Claude
or gpt-4 while Devin has access to additional tools like a terminal, code editor, and even
its own browser. So, all it is, is literally just a cleverly
prompted LLM with a bunch of tools, and this lab has already gotten more than 20 millions
of dollars in funding. Personally, I don't believe they're heading
in the right direction and I'll explain exactly why later, but what this really shows is that
we're merely scratching the surface of what's possible here. -So In this video I will share with you my
entire experience developing custom AI agent systems over the last year developing custom
AI agent systems for companies of all sizes raging from small firms with 5 employees to
entire corporation with 30000+ people. In fact, by the end of this video you will
be able to build your own fully functional Social Media Marketing Agency that will generate
ad copy, create ad images with DALL-E 3, and reliably post them on Facebook. Here is the game plan. We'll start with an overview of new AI Agent
developer role and what it entails Next, we'll unravel what AI Agents truly are. After that we'll take a tour of the most popular
AI agent frameworks at your disposal. Then, I'll be pulling back the curtain on
my own framework, giving you an insider's perspective on how it works and how you can
leverage it in your own projects. And finally, we'll get hands-on as we build
a fully functional SMMA, ready to take on new clients and generate profits. This will be a comprehensive guide, highlighting
my entire process from start to finish. So make yourself comfortable and let's dive
in. First, let's me define this new AI Agent Developer
role and why I believe it will be one of the most in-demand skills in 2024. Well, numerous studies and industry experts
predict that we're headed towards full labor automation in the next decade. While I totally agree with this projection,
I don't think it will be a self-driven process. As AI models become increasingly intelligent,
they're certainly going to gain a broader understanding of the world. However, they will never know how a specific
company operates internally, simply because such data is rarely made public. As we saw in 2023, businesses don't just want
to incorporate standard LLMs into their processes. They want to customize them and at least enrich
them with their own personal data. The reason I believe why labs like cognition
will soon fail is because they lack customization. To fully automate a company like google you
need more than just a super intelligent AI developer. We need to make sure that this developer has
access to all the necessary tools, infrastructure and internal knowledge, before it can actually
perform any tasks. This is where AI Agent Developers come in. So, an AI Agent Developer is someone who fine-tunes
AI agents based on internal business processes. As an AI Agent Developer, my primary responsibility
is to equip AI with all the necessary resources and ensure it knows how and when to use them
in production. The primary skills required for an AI agent
developer role can significantly vary from project to project. This topic deserves its own separate video
by itself, so if you're interested, please let me know in the comments. Soon I'll walk you through exactly how to
accomplish all of this. But for now we need to understand what AI
agents truly are. A lot of people say that AI Agents are just
instructions knowledge and actions. And sure that's sort of true, but that's not
exactly what AI agents are, that's how we make AI agents. In fact, AI Agents are much more than that. Let me explain. To answer what AI agents truly are we need
to unpack the difference between standard 1.0 AI automations and more sophisticated
2.0 AI agent-based applications. Picture a straightforward customer support
automation where an LLM must label each incoming email and must respond to it, pulling some
additional context from a vector database. Does this sound like an agent or mere AI automation? You may have noticed that it doesn't quite
feel like an agent, right? But why? It has knowledge from your vector database,
and has instructions on how to respond, and performs an action of attaching a label. And the distinction lies in the fact that
I said that it must generate a label and must answer each email. You see, the fundamental difference between
automations and agents is that agents possess decision-making capabilities. So, In 1.0 AI automations, every single procedure,
like context retrieval, response generation, and labeling is hardcoded into the backend
logic. This means that it literally can not deviate
from this logic, no matter what. If the automation is tasked with responding
to emails, it cannot neglect to respond. And while all this rigidity works well for
certain use cases, it completely fails as soon as some unexpected circumstances arise. Imagine for example if your customer support
mailbox receives an inquiry about a potential partnership with your platform. If this scenario wasn't accounted for, a 1.0
AI automation would handle it like any other support inquiry, potentially causing a missed
opportunity. On the contrary, 2.0 AI Agent-Based applications
have a different approach. While they still equip the agent with the
necessary tools, context, and instructions, they grant the agent the autonomy on how to
utilize these tools by itself. Instead of feeding your context into the prompt
on every request, you empower the agent to retrieve it only when it's needed. This flexibility means that the agent can
adapt accordingly. So in our previous example, the agent would
recognize that it's dealing with an inquiry outside of its expertise and then it could
use other available tools if possible. For example, it could reach out to your human
support agent, or it could send a notification in slack. Overall, what AI agents truly are is a new
way of thinking about how to apply AI in various applications. It's a paradigm shift rather than a simple
technique. In my agency we all began with simple 1.0
AI automations, but as my clients saw the tangible benefits they offered, they yearned
for more-more advanced capabilities and automation of increasingly complex tasks. Over time, we reached a stage where I wouldn't
even call it as automation anymore. It was more akin to outsourcing, as some of
the processes we automated literally required multiple people to manually carry them out. And the performances were never the same. Now, having said all this, where do Agent
Swarms come in? To truly grasp the concept of agent swarms,
it's crucial to understand that all intelligence is environment-dependent. For instance, I might excel when it comes
to programming, but I'm utterly lost when it comes to cooking. I would not last a day as a cook even in McDonald's
for a day. I just basically eat meat and nothing else. This applies to both AI agents and your own
employees. You can't assign 10 different roles to even
the smartest person in the world Likewise, even we reach GPT-100, I would still
not recommend assigning so many different responsibilities to a single agent. Firstly, by removing all unnecessary information
for a given process simply saves you on tokens. And secondly, even if GPT-100 would not get
confused handling 10 different roles, the users of such a system certainly would. So, what agent swarms really allow you to
do is separate responsibilities for different environments, just like in real world organizations. This results in 3 main benefits:
First, it dramatically reduces hallucinations. I found that after you add 7 to 10 tools to
a single GPT4 agent, it starts to get confused. But when you split these tools into multiple
agents you almost completely eradicate this problem. Secondly, you can outsource much more complex
tasks. Because, The longer the sequence of your agents
is, the more tasks they can handle without direct supervision. And lastly, it makes the whole system much
easier to scale. You see, most of my clients don't stop on
a single AI Agent and often try to automate increasingly complex processes over time. So when the need arises instead of adjusting
your existing system, and than debugging it all over again, you can simply add another
agent and leave all the previous agents as they are. In fact this last problem of scaling is so
common among my clients that this week we are releasing the first of it's kind AI Agents
as a Service subscription. Basically, if you are a business owner you
can now pay us a fixed fee per month and we will develop as many AI agents as you need,
but we will work on them one at a time. Our goal is to provide a flexible and scalable
solution that grows with your needs. So if you are interested, you can apply right
now sing the link below at a temporarily discounted price. However, if you're inclined to take on this
journey by yourself, that's perfectly fine too, because next, I'm going to walk you through
my entire process from start to end. But before we get into the nitty-gritty, let's
start with a brief overview of all the multi-agent frameworks at your disposal. The first project is the one you've probably
heard of, called AutoGen by Microsoft. The main feature of AutoGen is multi-agent
chats. It was developed as a research experiment
and was quite groundbreaking at the time. However, the problem with AutoGen is that
it has extremely limited conversational patterns that are super hard to customize. If you look at its code, in AutoGen the next
speaker is determined with an extra call to the model that emulates role play between
the agents. Let me just read it to you "Read the above
conversation. Then select the next speaker from agent names;
only return the role." I mean, not only is this extremely inefficient,
but it also makes the whole system absolutely uncontrollable. A lot of people report that agents constantly
hallucinate because there is no clear separation of concerns when it comes to tool execution. 1 agent might write the code, but because
it needs to be executed by user proxy, or some other agent, it often results in hallucinations,
which is a huge problem in production. The next framework that has recently been
getting a ton of attention is called crew ai. CrewAI was developed as a side project and
it introduces the concept of "process" into agent communication. This provides some semblance of control over
the communication flow. However, just like in AutoGen, the conversation
flows are extremely limited, offering only sequential or hierarchical options. In the sequential process basically all your
agents communicate to each other one by one. And in the hierarchical there is a one manager
agent communicates to everyone else. Obviously, this is not how real organizations
are structured. For example, can you imagine Sundar Pichai
manually instructing a QA tester, who tested this amazing new sign in screen? Additionally, in Crew AI, the manager agent
is hardcoded for you, which for some reason people find cool. However, imagine if you want this agent to
first search the web for additional context before deciding who it should speak next to. Try doing that in Crew AI. The biggest problem with CrewAI, however,
is that it is built on top of Lang-chain, which was released before any function-calling
models. This means that there is no automatic type
checking or error correction when it comes to tool execution. The description for these tools are also extremely
limited. Recently CrewAI introduce way to overcome
this by extending a base tool class, however his process is definitely not straightforward
as it could have been. The goal, backstory, the role and the tasks,
that you need to define when you are creating your crew are simply prompt templates that
also take away control from you as a developer. Without these prompt templates the CrewAI
simply would not be able to function. The only advantage of CrewAI is that you can
use it with open source models. Now personally I would never utilize any of
these frameworks in production for my clients which is why I developed my own framework
called Agency Swarm. In this framework, there isn't a single hard-coded
prompt. It's easily customizable with uniform communication
flows, and it is extremely reliable in production because it provides automatic type checking
and validation for all tools with the instructor library. It is the thinnest possible wrapper around
OpenAI's Assistance API, meaning that you have full control over all your agents. So whether you add a manager agent, define
goals, processes, or not, whether you create a sequential or hierarchical flow or even
combine both with a communication tree that is 50 levels in depth, I don't care, it is
still going to work. Your agents will determine who to communicate
with next, based on their own descriptions and nothing else. But, you are probably wondering, why assistants
api for ai agent development? Well, that's a good question because if you
look at all the previous OpenAI endpoints, you'll find the Assistants API isn't significantly
different. However, it was a game-changer for me as agent
developer. And the reason for this is state management. You see, with the Assistants' API, you can
attach instructions, knowledge, and actions directly to each new agent. This not only allows you to separate various
responsibilities, but also to scale your system seamlessly without having to worrying about
any underlying data management or about your agents confusing each others tools like in
other frameworks. Agent state management is the primary reason
why Agency Swarm is fully based on the OpenAI Assistants API, and to answer your other question,
no we are not currently planning to support any open source models. If costs are a concern, simply use GPT-3 which
is much better than any LLM that you can run locally unless you have a 10000$ PC. If data privacy is a concern you can use my
framework with Azure OpenAI, which doesn't even share data with OpenAI itself. To get started creating your agent swarms
using my framework you need to understand 3 essential entities which are Agents, Tools
and Agencies. Agents, are essentially wrappers around assistants
in Assistants API. They include numerous methods that simplify
the agent creation process. For instance, instead of manually uploading
all your files and adding their IDs when creating an assistant, you can just specify the folder
path. The system will automatically attach all files
from that folder to your assistant. It also stores all your agent settings in
a settings.json file. Therefore, if your agent's configuration changes,
the system will automatically update your existing assistant on OpenAI the next time
you run it, rather than creating a new one. The most commonly used parameters when creating
an agent are name, description, instructions, model, and tools. These are all self-explanatory. There are no preset templates for goals, processes,
backstories, etc., so you simply include them all in the instructions. Additional parameters include files_folder,
schemas_folder, and tools_folder. As I said, all files from your files folder
will be automatically indexed and uploaded to OpenAI. All tools from your tools folder will be attached
to an assistant as well, and all openapi schemas from your schemas folder will automatically
be converted into tools, allowing your agents to easily call third party apis. Additional properties api_params and api_headers
are also available if your API requires authentication. However, I do recommend creating all your
tools from scratch using Instructor, as it gives you more control. I previously posted a detailed tutorial on
Instructor, which includes a brief conversation with its creator, Jason Liu. Check it out if you're interested. In essence, Instructor allows you to integrate
a data validation library, Pydantic, with function calls. This ensures that all agent inputs make sense
before any actions are executed, minimizing production errors. For instance, if you have a number division
tool, you can verify that the divisor is not zero. If it is, the agent will see the error and
automatically correct itself before executing any logic. To begin creating tools in Agency Swarm with
Instructor, create a class that extends a base tool, add your class properties, and
implement the run method. Remember, the agent uses the docstring and
all field descriptions to understand when and how to use your tool. For our number division tool, the docstring
should state that this tool divides two numbers and describe the parameters accordingly. Next, define your execution logic within the
run method. You can access all defined fields through
the self object. To make some fields optional, use the Optional
type from Pydantic. To define available values for your agent,
use a literal or enumerator type. There are also many tricks you can use. For instance, you can add a chain_of_thought
parameter inside the tool to save on token costs and latency, instead of using a chain
of thought prompt globally. To add your validation logic, use field or
model validators from Pydantic. In this division tool example, it makes sense
to add a field validator that checks if the divisor is not zero, returning an error if
it is. Because tools are arguably the most important
part of any AI Agent based system, I created this custom GPT to help you get started faster. For example, if I need a tool that searches
the web with Serp API, it instantly generates a BaseTool with parameters like query as a
string and num_results as an integer, including all relevant descriptions. You can find the link to this tool on our
Discord. The final component of the Agency Swarm framework
is the Agency itself, which is essentially a collection of agents that can communicate
with one another. When initializing your agency, you add an
Agency chart that establishes communication flows between your agents. In contrast to other frameworks, communication
flows in Agency Swarm are uniform, meaning they can be defined in any way you want. If you place any agents in the top-level list
inside the agency chart, these agents can communicate with the user. If you add agents together inside a second-level
list, these agents can communicate with one another. To create a basic sequential flow, add a CEO
agent to the top-level list, then create a second-level list with a CEO, developer, and
virtual assistant. In this flow, the user communicates with the
CEO, who then communicates with the developer and the virtual assistant. If you prefer a hierarchical flow, place the
agents in two separate second-level lists with the CEO. Remember, communication flows are directional. In the previous example, the CEO can initiate
communication with the developer, who can respond, but the developer cannot initiate
communication with the CEO, much like in real organizations. If you still want the developer to assign
tasks to the CEO, simply add another list with the developer first and the CEO second. I always recommend starting with as few agents
as possible and adding more only once they are working as expected. Advanced parameters inside the Agency class
like async mode, threads_callbacks, and settings_callbacks are useful when deploying your swarms on various
backends. Be sure to check our documentation for more
information. When it comes to running your agency, you
have 3 options: the Gradio interface with the demo_gradio command, the terminal version
with the run_demo method, or get_completion, which is similar to previous chat completions
APIs. Now, let's create our own social media marketing
agency together to demonstrate the entire process from start to finish. Alright, for those who are new here, please
install Agency Swarm using the command 'pip install agency swarm.' To get started quickly, I usually run the
'agency swarm genesis' command. This will activate the Genesis agency, which
will create all your agents for you. It doesn't get everything right just yet,
but it does speed up the process significantly. In my prompt, I'm going to specify that I
need a Facebook marketing agency that generates ad copy, creates images with Dalle 3, and
posts them on Facebook. As you can see, we now have our initial agency
structure with three agents: the ad copy agent, image creator agent, and Facebook manager
agent. I really like how the genesis agency has divided
these responsibilities among three different agent roles. However, I'd like to adjust the communication
flows a bit and adopt a sequential flow, so I will instruct the genesis CEO accordingly. Now we have a sequential agency structure
with three communication levels. We can tell it to proceed with the creation
of the agents. This process takes some time, so I'll skip
this part and return when we're ready to fine-tune our agents. After all our agents have been created, you
can see that the CEO tells me that I can run this agency with the python agency.py command. All the folders for my agents and tools are
displayed on the left. The next step is to test and fine-tune all
these tools. We'll start with the image generator agent. The Genesis Agency has created one tool for
this agent called ImageGenerator. It's impressive how close this tool is to
what I planned to implement myself. It uses OpenAI to generate an image with a
simple prompt, taking ad_copy, theme, and specific requirements and inserting them into
a prompt template. Yes, AI has learned to prompt itself. However, there's an issue: it uses an outdated
OpenAI package version with the Da Vinci Codex model, which is designed for code generation. Let's fix this now together. First, I'll load a new OpenAI client with
a convenience method from Agency Swarm Util. I'll also increase the timeout because image
generation can take some time. After that, I'll adjust the API call to use
the new Dalle 3 model, and then set the timeout back to the default. There's one more thing we have to do - we
have to ensure that other agents can use this image when posting the ad. So, I'm going to create a new 'save image'
method that will save this image locally. But here is the kicker - I don't want my agents
to pass this image path to each other because any hallucinations could cause issues. Instead, I'll save this path to a shared state. Essentially, shared state allows you to share
certain variables across all agents in any tool. Instead of having the agent manually pass
the image path to another agent, you can save it in one tool and access it in another. You can also use it to perform validation
logic across various agents, which I'll show you soon. Now we are ready to test this tool. You can do this by adding a simple 'if name
equals main' statement at the end, then initializing the tool with some example parameters. Then you can print the result of the run method. Don't forget to load the environment with
your OpenAI key by adding the 'load_dotenv' method at the top. As you can see, we have an image generated
and saved locally, as expected. This means we can now proceed with adjusting
the next tool, the AdCopyGenerator tool within the ad copy agent. This tool is also very similar to my personal
design. I'll adjust the prompt a bit and save the
results into the shared state. Moving on to the Facebook Manager agent, Genesis
Agency created two tools for us: the Ad Performance Monitor tool and the Ad Scheduler and Poster
tool. While these tools are quite close, creating
an ad on Facebook requires a few more steps. Specifically, we need to first create a campaign
and an ad set before we can post the ad. I will use a tool creator custom gpt to request
two additional tools, 'Ad Campaign Starter', and 'Ad Set Creator'. To run these tools, we first need to install
the Facebook Business SDK, which you can do with this PIP command. Next, we need to create our Facebook app. Go to the Facebook developer website, click
"Create App", select "Other" for the use case, then "Business" for the app type. Add your app name and click "Create App". Then click on "Add product" and add "marketing
API". Go to "App settings", copy your App ID, App
secret, and insert them into the environment file. Now we have to get our access token by visiting
the Facebook API Explorer website and adding the appropriate permissions. After that, copy it and put it into the env
file. Working with the Facebook API can be challenging
as it's known to be one of the more complex APIs out there. I won't delve into the details of how I fine-tuned
these tools. The process is the same: adjust, test, and
repeat until they work as expected. As you can see in the AdCreator tool, we're
actually utilizing the ad copy, ad headline, and image path from the shared state that
we saved earlier. I have also included a model validator that
checks the presence of all these necessary parameters. If one of the parameters is not defined, the
system throws a value error and instructs the agent on which tool needs to be used first. This approach significantly enhances the reliability
of the entire system, as it ensures that the Facebook ad manager agent cannot post any
ads until all the required steps like image generation have been completed. After successfully testing all our tools,
the final step is to refine the instructions. It is a good practice to include how specifically
your agents should communicate with each other. I would also recommend specifying an exact
step-by-step process for them to follow. Lastly, I decided to make a few adjustments
in our communication flows. I'd like to establish a direct line of communication
with our Facebook Manager agent, so I'll include it in the top-level list. Also, I'll allow our CEO agent to communicate
directly with both the Facebook Manager and the Image Generator agents. Now that we've made these adjustments, we're
ready to run our agency. It is as simple as running the python.agency.py
command and opening the provided Gradio interface link in your browser. Let's see how it works. I'll kindly ask for an advertisement to be
created for my AI development agency, Arsen AI. The CEO then instructs the ad copy agent,
which promptly provides a clear headline and ad copy for my agency, stating, "Revolutionize
your business with AI." Next, the CEO commands the image generator
agent to create an image for the ad copy, resulting in a futuristic visual for our campaign. Finally, the CEO directs the FacebookManager
Agent to commence the campaign using the campaign starter tool. It then creates an ad set and executes the
ad creation function, posting this ad on Facebook. You can now see this newly generated Facebook
ad, complete with ad copy, headline, and image, live on my Facebook account. Impressive, right? But what if you want to analyze your campaign's
performance? You can do this by directly messaging the
Facebook Manager agent, as it was included in the top-level list. It uses the AdPerformanceMonitor tool and
informs me that there is currently no data as it takes some time for an ad to reach its
audience. In conclusion, I'd like to briefly share my
roadmap for this framework. First, I plan to establish multi-agency communication. This feature will allow the integration of
multiple agencies for super complex use cases. Next, we'll focus on enhancing the Genesys
agency. With multi-agency communication, the Genesys
agency will be able to test other agencies during their creation. The goal is to reach a point where there's
almost no need to modify tools or instructions for simple agencies like the one we've just
created. And lastly, we will continue to regularly
update this framework to include the latest releases from the OpenAI Assistants API. With upcoming features like memory and web
browsing, the possibilities are exciting to say the least. So, stay connected with us on Discord. We're always on the lookout for new talent. If you're interested and you have previous
experience with this framework, you can apply through our job postings channel. Thank you for watching and don't forget to
like and subscribe.