The Complete Guide to Building AI Agents for Beginners

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

It is clear that in a few years, businesses will hire agencies composed entirely of AI agents. For example, an AI lab cognition recently released their first AI software engineer named Devin that outperforms anything else that we have ever seen before SWE benchmark. It can train its own AI, learn unfamiliar technologies, contribute to production repos, and even complete some side hustles on Upwork. But what many people don't realize is that this comparison on their chart is made between standard large language models, like Claude or gpt-4 while Devin has access to additional tools like a terminal, code editor, and even its own browser. So, all it is, is literally just a cleverly prompted LLM with a bunch of tools, and this lab has already gotten more than 20 millions of dollars in funding. Personally, I don't believe they're heading in the right direction and I'll explain exactly why later, but what this really shows is that we're merely scratching the surface of what's possible here. -So In this video I will share with you my entire experience developing custom AI agent systems over the last year developing custom AI agent systems for companies of all sizes raging from small firms with 5 employees to entire corporation with 30000+ people. In fact, by the end of this video you will be able to build your own fully functional Social Media Marketing Agency that will generate ad copy, create ad images with DALL-E 3, and reliably post them on Facebook. Here is the game plan. We'll start with an overview of new AI Agent developer role and what it entails Next, we'll unravel what AI Agents truly are. After that we'll take a tour of the most popular AI agent frameworks at your disposal. Then, I'll be pulling back the curtain on my own framework, giving you an insider's perspective on how it works and how you can leverage it in your own projects. And finally, we'll get hands-on as we build a fully functional SMMA, ready to take on new clients and generate profits. This will be a comprehensive guide, highlighting my entire process from start to finish. So make yourself comfortable and let's dive in. First, let's me define this new AI Agent Developer role and why I believe it will be one of the most in-demand skills in 2024. Well, numerous studies and industry experts predict that we're headed towards full labor automation in the next decade. While I totally agree with this projection, I don't think it will be a self-driven process. As AI models become increasingly intelligent, they're certainly going to gain a broader understanding of the world. However, they will never know how a specific company operates internally, simply because such data is rarely made public. As we saw in 2023, businesses don't just want to incorporate standard LLMs into their processes. They want to customize them and at least enrich them with their own personal data. The reason I believe why labs like cognition will soon fail is because they lack customization. To fully automate a company like google you need more than just a super intelligent AI developer. We need to make sure that this developer has access to all the necessary tools, infrastructure and internal knowledge, before it can actually perform any tasks. This is where AI Agent Developers come in. So, an AI Agent Developer is someone who fine-tunes AI agents based on internal business processes. As an AI Agent Developer, my primary responsibility is to equip AI with all the necessary resources and ensure it knows how and when to use them in production. The primary skills required for an AI agent developer role can significantly vary from project to project. This topic deserves its own separate video by itself, so if you're interested, please let me know in the comments. Soon I'll walk you through exactly how to accomplish all of this. But for now we need to understand what AI agents truly are. A lot of people say that AI Agents are just instructions knowledge and actions. And sure that's sort of true, but that's not exactly what AI agents are, that's how we make AI agents. In fact, AI Agents are much more than that. Let me explain. To answer what AI agents truly are we need to unpack the difference between standard 1.0 AI automations and more sophisticated 2.0 AI agent-based applications. Picture a straightforward customer support automation where an LLM must label each incoming email and must respond to it, pulling some additional context from a vector database. Does this sound like an agent or mere AI automation? You may have noticed that it doesn't quite feel like an agent, right? But why? It has knowledge from your vector database, and has instructions on how to respond, and performs an action of attaching a label. And the distinction lies in the fact that I said that it must generate a label and must answer each email. You see, the fundamental difference between automations and agents is that agents possess decision-making capabilities. So, In 1.0 AI automations, every single procedure, like context retrieval, response generation, and labeling is hardcoded into the backend logic. This means that it literally can not deviate from this logic, no matter what. If the automation is tasked with responding to emails, it cannot neglect to respond. And while all this rigidity works well for certain use cases, it completely fails as soon as some unexpected circumstances arise. Imagine for example if your customer support mailbox receives an inquiry about a potential partnership with your platform. If this scenario wasn't accounted for, a 1.0 AI automation would handle it like any other support inquiry, potentially causing a missed opportunity. On the contrary, 2.0 AI Agent-Based applications have a different approach. While they still equip the agent with the necessary tools, context, and instructions, they grant the agent the autonomy on how to utilize these tools by itself. Instead of feeding your context into the prompt on every request, you empower the agent to retrieve it only when it's needed. This flexibility means that the agent can adapt accordingly. So in our previous example, the agent would recognize that it's dealing with an inquiry outside of its expertise and then it could use other available tools if possible. For example, it could reach out to your human support agent, or it could send a notification in slack. Overall, what AI agents truly are is a new way of thinking about how to apply AI in various applications. It's a paradigm shift rather than a simple technique. In my agency we all began with simple 1.0 AI automations, but as my clients saw the tangible benefits they offered, they yearned for more-more advanced capabilities and automation of increasingly complex tasks. Over time, we reached a stage where I wouldn't even call it as automation anymore. It was more akin to outsourcing, as some of the processes we automated literally required multiple people to manually carry them out. And the performances were never the same. Now, having said all this, where do Agent Swarms come in? To truly grasp the concept of agent swarms, it's crucial to understand that all intelligence is environment-dependent. For instance, I might excel when it comes to programming, but I'm utterly lost when it comes to cooking. I would not last a day as a cook even in McDonald's for a day. I just basically eat meat and nothing else. This applies to both AI agents and your own employees. You can't assign 10 different roles to even the smartest person in the world Likewise, even we reach GPT-100, I would still not recommend assigning so many different responsibilities to a single agent. Firstly, by removing all unnecessary information for a given process simply saves you on tokens. And secondly, even if GPT-100 would not get confused handling 10 different roles, the users of such a system certainly would. So, what agent swarms really allow you to do is separate responsibilities for different environments, just like in real world organizations. This results in 3 main benefits: First, it dramatically reduces hallucinations. I found that after you add 7 to 10 tools to a single GPT4 agent, it starts to get confused. But when you split these tools into multiple agents you almost completely eradicate this problem. Secondly, you can outsource much more complex tasks. Because, The longer the sequence of your agents is, the more tasks they can handle without direct supervision. And lastly, it makes the whole system much easier to scale. You see, most of my clients don't stop on a single AI Agent and often try to automate increasingly complex processes over time. So when the need arises instead of adjusting your existing system, and than debugging it all over again, you can simply add another agent and leave all the previous agents as they are. In fact this last problem of scaling is so common among my clients that this week we are releasing the first of it's kind AI Agents as a Service subscription. Basically, if you are a business owner you can now pay us a fixed fee per month and we will develop as many AI agents as you need, but we will work on them one at a time. Our goal is to provide a flexible and scalable solution that grows with your needs. So if you are interested, you can apply right now sing the link below at a temporarily discounted price. However, if you're inclined to take on this journey by yourself, that's perfectly fine too, because next, I'm going to walk you through my entire process from start to end. But before we get into the nitty-gritty, let's start with a brief overview of all the multi-agent frameworks at your disposal. The first project is the one you've probably heard of, called AutoGen by Microsoft. The main feature of AutoGen is multi-agent chats. It was developed as a research experiment and was quite groundbreaking at the time. However, the problem with AutoGen is that it has extremely limited conversational patterns that are super hard to customize. If you look at its code, in AutoGen the next speaker is determined with an extra call to the model that emulates role play between the agents. Let me just read it to you "Read the above conversation. Then select the next speaker from agent names; only return the role." I mean, not only is this extremely inefficient, but it also makes the whole system absolutely uncontrollable. A lot of people report that agents constantly hallucinate because there is no clear separation of concerns when it comes to tool execution. 1 agent might write the code, but because it needs to be executed by user proxy, or some other agent, it often results in hallucinations, which is a huge problem in production. The next framework that has recently been getting a ton of attention is called crew ai. CrewAI was developed as a side project and it introduces the concept of "process" into agent communication. This provides some semblance of control over the communication flow. However, just like in AutoGen, the conversation flows are extremely limited, offering only sequential or hierarchical options. In the sequential process basically all your agents communicate to each other one by one. And in the hierarchical there is a one manager agent communicates to everyone else. Obviously, this is not how real organizations are structured. For example, can you imagine Sundar Pichai manually instructing a QA tester, who tested this amazing new sign in screen? Additionally, in Crew AI, the manager agent is hardcoded for you, which for some reason people find cool. However, imagine if you want this agent to first search the web for additional context before deciding who it should speak next to. Try doing that in Crew AI. The biggest problem with CrewAI, however, is that it is built on top of Lang-chain, which was released before any function-calling models. This means that there is no automatic type checking or error correction when it comes to tool execution. The description for these tools are also extremely limited. Recently CrewAI introduce way to overcome this by extending a base tool class, however his process is definitely not straightforward as it could have been. The goal, backstory, the role and the tasks, that you need to define when you are creating your crew are simply prompt templates that also take away control from you as a developer. Without these prompt templates the CrewAI simply would not be able to function. The only advantage of CrewAI is that you can use it with open source models. Now personally I would never utilize any of these frameworks in production for my clients which is why I developed my own framework called Agency Swarm. In this framework, there isn't a single hard-coded prompt. It's easily customizable with uniform communication flows, and it is extremely reliable in production because it provides automatic type checking and validation for all tools with the instructor library. It is the thinnest possible wrapper around OpenAI's Assistance API, meaning that you have full control over all your agents. So whether you add a manager agent, define goals, processes, or not, whether you create a sequential or hierarchical flow or even combine both with a communication tree that is 50 levels in depth, I don't care, it is still going to work. Your agents will determine who to communicate with next, based on their own descriptions and nothing else. But, you are probably wondering, why assistants api for ai agent development? Well, that's a good question because if you look at all the previous OpenAI endpoints, you'll find the Assistants API isn't significantly different. However, it was a game-changer for me as agent developer. And the reason for this is state management. You see, with the Assistants' API, you can attach instructions, knowledge, and actions directly to each new agent. This not only allows you to separate various responsibilities, but also to scale your system seamlessly without having to worrying about any underlying data management or about your agents confusing each others tools like in other frameworks. Agent state management is the primary reason why Agency Swarm is fully based on the OpenAI Assistants API, and to answer your other question, no we are not currently planning to support any open source models. If costs are a concern, simply use GPT-3 which is much better than any LLM that you can run locally unless you have a 10000$ PC. If data privacy is a concern you can use my framework with Azure OpenAI, which doesn't even share data with OpenAI itself. To get started creating your agent swarms using my framework you need to understand 3 essential entities which are Agents, Tools and Agencies. Agents, are essentially wrappers around assistants in Assistants API. They include numerous methods that simplify the agent creation process. For instance, instead of manually uploading all your files and adding their IDs when creating an assistant, you can just specify the folder path. The system will automatically attach all files from that folder to your assistant. It also stores all your agent settings in a settings.json file. Therefore, if your agent's configuration changes, the system will automatically update your existing assistant on OpenAI the next time you run it, rather than creating a new one. The most commonly used parameters when creating an agent are name, description, instructions, model, and tools. These are all self-explanatory. There are no preset templates for goals, processes, backstories, etc., so you simply include them all in the instructions. Additional parameters include files_folder, schemas_folder, and tools_folder. As I said, all files from your files folder will be automatically indexed and uploaded to OpenAI. All tools from your tools folder will be attached to an assistant as well, and all openapi schemas from your schemas folder will automatically be converted into tools, allowing your agents to easily call third party apis. Additional properties api_params and api_headers are also available if your API requires authentication. However, I do recommend creating all your tools from scratch using Instructor, as it gives you more control. I previously posted a detailed tutorial on Instructor, which includes a brief conversation with its creator, Jason Liu. Check it out if you're interested. In essence, Instructor allows you to integrate a data validation library, Pydantic, with function calls. This ensures that all agent inputs make sense before any actions are executed, minimizing production errors. For instance, if you have a number division tool, you can verify that the divisor is not zero. If it is, the agent will see the error and automatically correct itself before executing any logic. To begin creating tools in Agency Swarm with Instructor, create a class that extends a base tool, add your class properties, and implement the run method. Remember, the agent uses the docstring and all field descriptions to understand when and how to use your tool. For our number division tool, the docstring should state that this tool divides two numbers and describe the parameters accordingly. Next, define your execution logic within the run method. You can access all defined fields through the self object. To make some fields optional, use the Optional type from Pydantic. To define available values for your agent, use a literal or enumerator type. There are also many tricks you can use. For instance, you can add a chain_of_thought parameter inside the tool to save on token costs and latency, instead of using a chain of thought prompt globally. To add your validation logic, use field or model validators from Pydantic. In this division tool example, it makes sense to add a field validator that checks if the divisor is not zero, returning an error if it is. Because tools are arguably the most important part of any AI Agent based system, I created this custom GPT to help you get started faster. For example, if I need a tool that searches the web with Serp API, it instantly generates a BaseTool with parameters like query as a string and num_results as an integer, including all relevant descriptions. You can find the link to this tool on our Discord. The final component of the Agency Swarm framework is the Agency itself, which is essentially a collection of agents that can communicate with one another. When initializing your agency, you add an Agency chart that establishes communication flows between your agents. In contrast to other frameworks, communication flows in Agency Swarm are uniform, meaning they can be defined in any way you want. If you place any agents in the top-level list inside the agency chart, these agents can communicate with the user. If you add agents together inside a second-level list, these agents can communicate with one another. To create a basic sequential flow, add a CEO agent to the top-level list, then create a second-level list with a CEO, developer, and virtual assistant. In this flow, the user communicates with the CEO, who then communicates with the developer and the virtual assistant. If you prefer a hierarchical flow, place the agents in two separate second-level lists with the CEO. Remember, communication flows are directional. In the previous example, the CEO can initiate communication with the developer, who can respond, but the developer cannot initiate communication with the CEO, much like in real organizations. If you still want the developer to assign tasks to the CEO, simply add another list with the developer first and the CEO second. I always recommend starting with as few agents as possible and adding more only once they are working as expected. Advanced parameters inside the Agency class like async mode, threads_callbacks, and settings_callbacks are useful when deploying your swarms on various backends. Be sure to check our documentation for more information. When it comes to running your agency, you have 3 options: the Gradio interface with the demo_gradio command, the terminal version with the run_demo method, or get_completion, which is similar to previous chat completions APIs. Now, let's create our own social media marketing agency together to demonstrate the entire process from start to finish. Alright, for those who are new here, please install Agency Swarm using the command 'pip install agency swarm.' To get started quickly, I usually run the 'agency swarm genesis' command. This will activate the Genesis agency, which will create all your agents for you. It doesn't get everything right just yet, but it does speed up the process significantly. In my prompt, I'm going to specify that I need a Facebook marketing agency that generates ad copy, creates images with Dalle 3, and posts them on Facebook. As you can see, we now have our initial agency structure with three agents: the ad copy agent, image creator agent, and Facebook manager agent. I really like how the genesis agency has divided these responsibilities among three different agent roles. However, I'd like to adjust the communication flows a bit and adopt a sequential flow, so I will instruct the genesis CEO accordingly. Now we have a sequential agency structure with three communication levels. We can tell it to proceed with the creation of the agents. This process takes some time, so I'll skip this part and return when we're ready to fine-tune our agents. After all our agents have been created, you can see that the CEO tells me that I can run this agency with the python agency.py command. All the folders for my agents and tools are displayed on the left. The next step is to test and fine-tune all these tools. We'll start with the image generator agent. The Genesis Agency has created one tool for this agent called ImageGenerator. It's impressive how close this tool is to what I planned to implement myself. It uses OpenAI to generate an image with a simple prompt, taking ad_copy, theme, and specific requirements and inserting them into a prompt template. Yes, AI has learned to prompt itself. However, there's an issue: it uses an outdated OpenAI package version with the Da Vinci Codex model, which is designed for code generation. Let's fix this now together. First, I'll load a new OpenAI client with a convenience method from Agency Swarm Util. I'll also increase the timeout because image generation can take some time. After that, I'll adjust the API call to use the new Dalle 3 model, and then set the timeout back to the default. There's one more thing we have to do - we have to ensure that other agents can use this image when posting the ad. So, I'm going to create a new 'save image' method that will save this image locally. But here is the kicker - I don't want my agents to pass this image path to each other because any hallucinations could cause issues. Instead, I'll save this path to a shared state. Essentially, shared state allows you to share certain variables across all agents in any tool. Instead of having the agent manually pass the image path to another agent, you can save it in one tool and access it in another. You can also use it to perform validation logic across various agents, which I'll show you soon. Now we are ready to test this tool. You can do this by adding a simple 'if name equals main' statement at the end, then initializing the tool with some example parameters. Then you can print the result of the run method. Don't forget to load the environment with your OpenAI key by adding the 'load_dotenv' method at the top. As you can see, we have an image generated and saved locally, as expected. This means we can now proceed with adjusting the next tool, the AdCopyGenerator tool within the ad copy agent. This tool is also very similar to my personal design. I'll adjust the prompt a bit and save the results into the shared state. Moving on to the Facebook Manager agent, Genesis Agency created two tools for us: the Ad Performance Monitor tool and the Ad Scheduler and Poster tool. While these tools are quite close, creating an ad on Facebook requires a few more steps. Specifically, we need to first create a campaign and an ad set before we can post the ad. I will use a tool creator custom gpt to request two additional tools, 'Ad Campaign Starter', and 'Ad Set Creator'. To run these tools, we first need to install the Facebook Business SDK, which you can do with this PIP command. Next, we need to create our Facebook app. Go to the Facebook developer website, click "Create App", select "Other" for the use case, then "Business" for the app type. Add your app name and click "Create App". Then click on "Add product" and add "marketing API". Go to "App settings", copy your App ID, App secret, and insert them into the environment file. Now we have to get our access token by visiting the Facebook API Explorer website and adding the appropriate permissions. After that, copy it and put it into the env file. Working with the Facebook API can be challenging as it's known to be one of the more complex APIs out there. I won't delve into the details of how I fine-tuned these tools. The process is the same: adjust, test, and repeat until they work as expected. As you can see in the AdCreator tool, we're actually utilizing the ad copy, ad headline, and image path from the shared state that we saved earlier. I have also included a model validator that checks the presence of all these necessary parameters. If one of the parameters is not defined, the system throws a value error and instructs the agent on which tool needs to be used first. This approach significantly enhances the reliability of the entire system, as it ensures that the Facebook ad manager agent cannot post any ads until all the required steps like image generation have been completed. After successfully testing all our tools, the final step is to refine the instructions. It is a good practice to include how specifically your agents should communicate with each other. I would also recommend specifying an exact step-by-step process for them to follow. Lastly, I decided to make a few adjustments in our communication flows. I'd like to establish a direct line of communication with our Facebook Manager agent, so I'll include it in the top-level list. Also, I'll allow our CEO agent to communicate directly with both the Facebook Manager and the Image Generator agents. Now that we've made these adjustments, we're ready to run our agency. It is as simple as running the python.agency.py command and opening the provided Gradio interface link in your browser. Let's see how it works. I'll kindly ask for an advertisement to be created for my AI development agency, Arsen AI. The CEO then instructs the ad copy agent, which promptly provides a clear headline and ad copy for my agency, stating, "Revolutionize your business with AI." Next, the CEO commands the image generator agent to create an image for the ad copy, resulting in a futuristic visual for our campaign. Finally, the CEO directs the FacebookManager Agent to commence the campaign using the campaign starter tool. It then creates an ad set and executes the ad creation function, posting this ad on Facebook. You can now see this newly generated Facebook ad, complete with ad copy, headline, and image, live on my Facebook account. Impressive, right? But what if you want to analyze your campaign's performance? You can do this by directly messaging the Facebook Manager agent, as it was included in the top-level list. It uses the AdPerformanceMonitor tool and informs me that there is currently no data as it takes some time for an ad to reach its audience. In conclusion, I'd like to briefly share my roadmap for this framework. First, I plan to establish multi-agency communication. This feature will allow the integration of multiple agencies for super complex use cases. Next, we'll focus on enhancing the Genesys agency. With multi-agency communication, the Genesys agency will be able to test other agencies during their creation. The goal is to reach a point where there's almost no need to modify tools or instructions for simple agencies like the one we've just created. And lastly, we will continue to regularly update this framework to include the latest releases from the OpenAI Assistants API. With upcoming features like memory and web browsing, the possibilities are exciting to say the least. So, stay connected with us on Discord. We're always on the lookout for new talent. If you're interested and you have previous experience with this framework, you can apply through our job postings channel. Thank you for watching and don't forget to like and subscribe.

Info

Channel: VRSEN

Views: 19,308

Rating: undefined out of 5

Keywords: ai development, ai agents, ai for business, machine learning, artificial intelligence, ai agent development, ai tutorial, ai agency, ai tools, ai, openai, autogen, open ai, agency swarm, ai automation agency, ai automation, crew ai, assistants api, agent framworks, chatgpt, agi, openai assistants api, ai business, custom gpt, agent swarms, AI agents, OpenAI Assistants API, AI agency, AI tools, AI tutorials, AI for business, AI integration, AI

Id: MOyl58VF2ak

Channel Id: undefined

Length: 28min 43sec (1723 seconds)

Published: Wed Mar 20 2024