Introducing Devid, AI Software Engineer You Can Actually Use

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

In this video, I am releasing my own open source implementation of Devin, an actually usable AI software engineer called David. Shoutout to my Slavic roots. But what does actually usable mean? Well, three things. First of all, you don't have to wait for God knows long on a waitlist and then guess how the heck this thing works under the hood, because you will have full access to the source code files of this agent. This means that you can fine tune it for whatever use case you want. Secondly, unlike other implementations like SWE Agent by Princeton, this agent will be trained to solve real-world coding challenges. Because if someone is trained to solve pull request issues on GitHub, they're probably not in the right mind. And lastly, this will be an agentic system, unlike Devin, which is more a coding assistant. Okay, so here is the agency that we will be building in this video. It consists of three agents, the planner agent, David, and the browsing agent. I'm going to try to create a website that runs a game of life from one of their demos, and because this is a relatively simple task, I'm actually not going to use the planner agent. I'm going to send this task directly to David. As you can see, David first creates a plan just like Devin, and then proceeds with executing the first function to check the current directory. It then creates the first file which is index.html. You can see the source code of this file below. Next, it creates a style.css. And finally, it proceeds with implementing the game logic inside the game.js file. The way this FileWriter tool works is actually quite interesting. As you can see, the agent simply passes the requirements for the file to write, and then the tool automatically generates this file and writes it locally. I'll explain why I implemented it this way in just a bit, but for now, as you can see, Devin says that our Game of Life has been successfully created, and we can now test it by running the index.html file. Let's see how it works. Perfect! So as you can see, it does indeed work. But now, I want this agent to actually make the game full screen, like on the Devin's demo, and also remove the buttons. So I'm going to send this into the next message. Awesome, as you can see it then modifies the index.html file and makes it full screen. Then it adjusts the style.css and finally it also adjusts the javascript file with our games logic. Let's try to reload this website. Cool! Yeah, so now we get this full screen game of life website, just as I requested. Now let me show you how to import this agent and use it in your own projects. But first, let's actually take a look at this demo video, because there are certainly some things that you might not have noticed. Excited to introduce you to Devin, the first AI software engineer. From now on, Devin is in the driver's seat. Note: the say, from now on Devin i in the driver s seat. First Devin makes a step-by-step plan of how to tackle the problem. After that it builds the whole project using all the same tools that a human software engineer would use. Devin has its own command line, its own code editor, and even its own browser. In this case Devin decides to use the browser to pull up API documentation so that it can read up and learn how to plug into each of these APIs. Do you guys notice anything on this video? They really showed well the terminal, code editor, and the browser, which is just three very basic tools that you can create yourself in an hour or so. But they didn't really show the last window, which is the chat interface itself, right? And if we look at this chat interface, you might notice that they're actually prompting this agent while it's working on the program. So look, yeah, it runs into an unexpected arrow. Look at what happens next. Next, Scott tells this agent to make sure to use the right model names. This is not even the worst part. Take a look at this. If we go to one of the other videos that they have released, for example, where Devin trains another AI, which is what I've done with my framework around two months ago, on the first day that I released it, you can see that in this demo, they actually have timestamps near the messages. And if you look at these timestamps, the amount of time that it took Devin to perform this task is actually quite extreme. So the first message was sent to Devin at 2:49 pm and at the end of this video the task was completed at around 4:31 pm. So it took Devin 2 hours to complete this simple task. And that's why I believe on their main video there are actually no timestamps on the chat interface. They have either edited them out or they adjusted the chat interface just for this demo, because the amount of time that it took Devon to complete this task was so substantial that no one would ever consider hiring this agent again. So now let's dive in. The first step is to pull my agency swarm repo where by the time you watch this video you should be able to find the exact agency that will develop right now together. You can do this with a simple git clone command. After you have pulled this repo you need to install Docker. The reason we need Docker for this specific video is because we don't really want our developer agents to modify any local files on our own computers. To install Docker you can go to the official Docker documentation from the readme of my repository and install Docker desktop for whichever platform you use. The process is pretty straightforward so I'm not gonna go on how to do this. Simply follow all the installation steps and then navigate back into the Agency Swarm lab repository. Open your terminal and now we are finally ready to get going. The first step after you have installed Docker is to build the container. In this repository you will find three new Docker files. The first Docker file called just Docker file is the simplest version that you can build in a very short time. It pulls from a pre-built Python container and installs the requirements without any extra packages. This means that you can use this container for simpler use cases that do not require web browsing. However, for use cases where web browsing is required, like the one we're going to do now, you do have to build a slightly different container. This container does take a bit more time to build because it installs Selenium in a Chrome browser. You will find two versions of the container that installs Selenium. Both of them are named dockerfile.browsing and then the last part of the name is the type of the system that you use. So for any users on Mac Silicon you need to build dockerfile.browsing.arm64 container. However, for any users on Windows systems or on previously made Mac computers, you need to build the AMD64 container. The way you do this is by running a docker build command that you can find also in the readme of this repository. This command is located on step 2. Simply copy this command and replace the path to the docker file after the F flag. Since I'm running on Mac Silicon, I'm going to use dockerfile.browsing.arm64. Simply hit enter and this should start the build process. My container was already built before, so it took only a few seconds. However, if you're building this container from scratch, it might take up to 10 minutes or even more. After that, we are ready to run this container. To do this you can simply copy the next command which is docker run and then we also have some additional flags. Make sure to add your OpenAI key after the E flag. Then you can simply copy and run this command in the terminal as well. So now we are inside our docker container which means that we can finally start coding. Inside your docker container you will find all the files from my agency swarm lab repository. This is because we are actually copying the volume from your current directory with all the agencies into the docker container. This means that if the developer agent modifies any files from the agency swarm lab repository which is now located in the app directory inside the container, those changes will also be reflected on your host system. Don't worry, however, because the agent will not be able to modify any files outside of this directory on your computer. Now, finally, let me show you how to import this new pre-made agent called David from Agency Swarm. In the latest release, I have added a new Agency Swarm import agent command. As you can see, this command takes two arguments, the name of the agent and the destination where to copy the agent source files. For the name of the agent, we currently have only two agents available. The first one is the browsing agent from one of the previous videos that I have released on my channel and the second one is David. So let's try to import David. All you have to do is run agency swarm import agent - - name David command. This will copy David into your local directory. Now you can see David here on the right inside our agency swarm lab repository. The best part about this new method is that now you have full access over the source code of your agent. For those who have been following me before you might know that the goal of my framework is to provide you as a developer with full control over your systems. So, unlike in the previous implementation where we used to import agents using a simple import statement in Python, now all the source files will be copied directly onto your local directory. This means that you can really fine-tune this agent for your specific purpose. So whether you need to add new files, schemas, tools, or modify existing tools and instructions. You can easily do that for your specific use case This is exactly what we're gonna do in this video. However to speed up the process We're actually going to use Genesis agency, which also already has this capability to import your agent source files locally. Let's see how it works. I'm gonna use this prompt that I've made earlier and say that I need three agents, a planner agent, a developer agent, David, and a browsing agent. I am also including specific communication flows between those agents. So the structure of our agency will look like this. We will have a planner agent that communicates with the browsing agent and David. David can also communicate with the browsing agent. If you come back to Cognition Labs demo, we can see that Devin actually creates a plan by itself. In my opinion, it's not the best way to do this. In my agency, I'm actually going to task the planner agent with creating a plan, the browsing agent with pulling up the latest documentation, and then Devin executing tasks based on this documentation. Additionally, instead of using a planning prompt globally like they do in Cognition, what we're going to do is define special chain of thought parameters inside each of the functions. So for any functions that have a significant impact on the development of the program, we will have a special chain of thought parameter. This parameter allows the agent to plan its actions only when executing this tool, which significantly saves you on token costs and latency without sacrificing much accuracy in terms of performance. Quick update from the future, guys. Ever since I made the walkthrough of this project, I actually modified the file writer tool. Now, instead of taking the source code directly from the main agent and then writing it to a file, it takes requirements, details, file dependencies, library dependencies, and other information from the agent. Then, it uses a special prompt, inserting all of this information into the message, and runs it using another GPT-4 model until this model generates a complete and bug-free code. The reason I decided to adjust this tool this way is because GPT-4 model is still being lazy. This is a known issue that a lot of OpenAI users have been encountering. OpenAI has even released a new 0.125 GPT-4 model that aims to reduce the cases of laziness. And although I've noticed that the cases of laziness with these models are indeed reduced when the model generates a message, it can still be extremely lazy when generating function calls. Now let's finally proceed with the development of this agency. All I'm going to do is simply send this prompt, and this should activate the Genesys agency, which will start planning send this prompt, and this should activate the Genesys agency, which will start planning out all of those agents for me. The process does take some time depending on your requirements, so I'm going to scroll through this part and come back to you later. Amazing! So after the Genesys agency has created our Code Solutions agency, you can see that it tells me that I can navigate to the agency folder and start it using the following command. All of our agent folders have also been imported accordingly. Let's move our terminal somewhere on the left and now proceed with adjusting those agents. The instructions for the planner agent essentially will tell it to first break down the tasks into manageable subtasks and then direct them to different agents accordingly. For example, we'll be directing any internet browsing and information retrieval tasks to the browsing agent and any coding tasks to David. After that, I'm also instructing this agent to continue conversations with both agents until the task has been fully executed. This is one of my favorite lines that I like to include in any manager or planner agent's instructions, because this allows you to achieve pretty good results without constantly re-prompting your agency like in Cognition's demo. Now, I'm also going to adjust the browsing agent's instructions. Inside the browsing agent's instructions, I'm also going to include a predefined step-by-step process. Some of the instructions are already defined for you, and as I said before, you have full access over these files, so you can modify them in any way you want. I found that these work well as the base for the agent and provide it with enough context to complete most general web browsing tasks. However, to fine-tune this agent on your specific process, I still recommend including this primary instruction section with a specific step-by-step process. In the case of this agency, I want the browsing agent to first navigate into Google and find the required documentation. After that, I want it to analyze this web page and make sure that it actually contains the information necessary to complete the task. And then on step three, I'm telling it to extract the required information from the web page as a file using the export file tool. So if you look at the source code of this export file tool I'm actually quite proud of its implementation. The way the export file tool works is by exporting the entire web page as PDF and then uploading it as a file to OpenAI. This allows your agents to analyze this file using a retrieval tool. So, the general idea is that the browsing agent will export documentation pages as files and then send those file IDs to the planner agent. The planner agent will then, once all of the necessary files are collected, send them in the message files parameter to David. This means that David will then be able to analyze each documentation using the retrieval tool to extract the necessary information in order to complete the task. However, please note that in my instructions I'm actually saying that the browsing agent should send the file ID back to the user. This is important because of the way my framework is structured. Your agents, while communicating with one another, will actually think that they were prompted by the user rather than by another agent. Additionally, inside the browsing agent.py file, I want to add a new response validator function. This function is a new addition to my framework, and it allows you to check the response of the agent before proceeding with returning it to the user or to another agent. The way you define this function is simply by overriding the response validator method. And then, inside this method, you can include additional validation logic based on your requirements. So in the case of this agency, I'm actually going to check if the response from the browsing agent contains the file ID and if not, I'm going to instruct this agent to continue searching for documentation until the file ID is found. This is another way to make your systems more reliable in production. For the David agent, I have already made a response validator based on the specific issues I've been encountering. For example, I'm checking if David returns a code snippet and then if it does, I'm telling it to never do that and instead run code and test it locally. Additionally, I'm using an LLM validator from the instructor library. This function allows you to check the response of the agent using another large language model. So here I'm checking if the message from David actually indicates that the code was executed because oftentimes I've noticed that this agent forgets or does not want to execute code locally. If this happens, LLMValidator function from Instructor library will return an error. The agent will then see this error and correct itself accordingly. To adjust the number of attempts for validation, you can adjust the validation attempts parameter. Currently, it's set to 1, which means that if validation fails more than twice, it's just going to continue the conversation. However, you can set it to 2, 3 or even 50, which I certainly do not recommend because it's going to take a lot of your tokens, but it's nice that you have the flexibility to do so. I'm personally going to set it to 2 for David in this specific scenario. Awesome. That's literally it. That's all it takes to create an agency similar to the one that Cognition Labs have shown in their video. However, as I said, this agency will not require nearly as many instructions. Now, all you have to do is simply set the server name to 0.0.0.0 because we're running inside a Docker container and then run this file. Each imported agent might have some additional requirements defined in the requirements.txt file. For example, inside the browsing agents requirements, we also need to install Selenium, WebDriver Manager, and Selenium Stealth. The way you can do this is simply by running pip install-r requirements.txt inside the browsing agent folder. After our requirements are installed, we can simply run python agency.py command. This will print a local URL, which you can simply click on, and this will take you to a greater interface where you can interact with your new agency. Okay, so it's me again from the future. I just wanted to test this agency with the updated FileWriter tool. I'm going to send the prompt from their main demo video which is to benchmark LamaTool on three different providers. Planner agent also now has one additional tool which is the Create Plan tool. I added this tool for the exact same reason I modified the File Writer tool, just to have a bit more control. And as you can see, this tool generates a structured plan with all the agent names and task descriptions accordingly. So then the Planner agent sends the first task to the Browsing agent, which is to find the documentation for those API providers. The Browsing agent then opens the first page, which is the Google search results and navigates to the first link which is docs together dot AI. As you can see this page contains the exact documentation that we need so it then exports it using export file tool and now it should proceed with searching for the next page which is the documentation for Replicate. Again you obviously can see the browsing window in the same interface like for Devin. However, in my opinion you don't need to see all of those extra windows like terminal ID and the browser for the agent if you're developing an actual AI software engineer rather than an AI coding assistant. In the meantime, as you can see, the browsing agent now searches for the final Perplexity API documentation. Cool, so it also navigates to Docs Perplexity AI and exports this page as a file. It then sends all of those file IDs back to the planner agent. The planner agent then sends the first task to David, which is to read these documentation files. It also provides them in the message files parameter. David successfully read the documentation for Together and Replicate, however, it seems like it encountered an issue with Perplexity API documentation. I suppose this is because this API documentation page actually has multiple tabs, which is quite challenging for the browsing agent at the moment, so it did not export the necessary page. Nevertheless, the Planner agent then sends the next task, which is to create the script using the extracted documentation. David then writes the script in the message, which is one of the hallucinations that I mentioned before. However, luckily, because we have this response validator, you can see that the Planner agent immediately tells David to never return code snippets and actually use the file writer tool. So David then proceeds with reading the current directory and creating benchmark Llama2.py script. It then tries to execute this snippet but unfortunately it runs into a few issues. So it seems like the execution has stopped unfortunately but what I'm going to do next is just tell David to continue executing the script and debugging it until it works correctly. Okay, so it seems like agents ran into an issue that they couldn't resolve by themselves. The error seems to be stemming from the fact that the retrieval tool in OpenAI Assistance API is currently f***ed, and unfortunately, it does not extract all of the relevant information from the files. I'm hoping that OpenAI will release the browsing tool very soon, and additionally, I'm hoping that the GPT-4 vision model will also be available in the Assistant's API directly very soon because both of those enhancements should significantly improve the performance of any browsing tasks. For now, what I'm going to do is just send some example requests for all of those APIs. Okay, so as you can see, the script has now been adjusted. However, David is now being lazy and again, it doesn't want to execute files. Sometimes this happens with GPT-4. Unfortunately, I'm hoping that OpenAI will resolve this issue soon. But what I'm gonna do for now is just send the message to David directly and just tell it to execute the script. Awesome! So as you can see, finally, we got this script to work. So no errors were reported and the script execution is finally successful. You can see all of the responses here. However, it did not benchmark those providers. It just simply curate them via HTTP request. So let's now instruct David to actually benchmark the times for these requests. So finally, as you can see, the agent added the execution logic to the script and then executed it again and provided me with the execution times for each API. Seems like perplexity was the fastest with 0.78 seconds per request, while Replicate was around a second, and together API executed in almost two seconds. So now let me show you how I would approach fine tuning this agency to make it actually autonomous for your specific task. First of all, I would definitely start with the agency manifesto. Right now it's not comprehensive enough. In the agency manifesto, I would add additional context about the environment, about the specific process that I'm trying to execute, and maybe even specific links for the browsing agent to read, or even documentation that David must use in order to generate the code. Then I would proceed with adjusting individual agents' instructions, and also including notes for the specific process you're trying to automate. After that, you should modify your tool's logic. In this scenario, we actually didn't use shared state, but shared state can be extremely useful if you want your agents to perform functions only in a certain order. For example, if your process requires running certain commands, what you could do is ensure that agent actually executed those commands before proceeding to writing certain files. The way you would do this is by saving certain states into the shared state parameter. So here you could save, for example, the command into the shared state. And then in other functions, you would be able to check if the agent actually executed this command. So yeah, that's it for this video. Thank you for watching and don't forget to subscribe.

Info

Channel: VRSEN

Views: 14,177

Rating: undefined out of 5

Keywords: AI software engineer, Devin AI, software development AI, AI programming, AI code automation, machine learning, AI for developers, artificial intelligence, coding assistant, developer tools AI, AI in software engineering, code automation, AI developer tools, ai, openai, open ai, ai agents, agency swarm, ai agency, Devin, coding, agent swarms, openai assistants api, crew ai, autogen

Id: BEpDRj9H3zE

Channel Id: undefined

Length: 25min 0sec (1500 seconds)

Published: Thu Apr 11 2024