In this video, I am releasing my own
open source implementation of Devin, an actually usable AI software engineer called
David. Shoutout to my Slavic roots. But what does actually usable mean? Well, three things.
First of all, you don't have to wait for God knows long on a waitlist and then guess how
the heck this thing works under the hood, because you will have full access to the source
code files of this agent. This means that you can fine tune it for whatever use case you want.
Secondly, unlike other implementations like SWE Agent by Princeton, this agent will be trained
to solve real-world coding challenges. Because if someone is trained to solve pull request issues on
GitHub, they're probably not in the right mind.
And lastly, this will be an agentic system,
unlike Devin, which is more a coding assistant.
Okay, so here is the agency that we will be
building in this video. It consists of three agents, the planner agent, David, and the browsing
agent. I'm going to try to create a website that runs a game of life from one of their demos,
and because this is a relatively simple task, I'm actually not going to use the planner agent.
I'm going to send this task directly to David.
As you can see, David first creates a plan just
like Devin, and then proceeds with executing the first function to check the current directory. It
then creates the first file which is index.html. You can see the source code of this file below.
Next, it creates a style.css.
And finally, it proceeds with implementing
the game logic inside the game.js file. The way this FileWriter tool works
is actually quite interesting.
As you can see, the agent simply passes
the requirements for the file to write, and then the tool automatically generates this
file and writes it locally. I'll explain why I implemented it this way in just a bit, but for
now, as you can see, Devin says that our Game of Life has been successfully created, and
we can now test it by running the index.html file. Let's see how it works. Perfect!
So as you can see, it does indeed work.
But now, I want this agent to actually make
the game full screen, like on the Devin's demo, and also remove the buttons. So I'm going
to send this into the next message.
Awesome, as you can see it then modifies the
index.html file and makes it full screen. Then it adjusts the style.css and finally it
also adjusts the javascript file with our games logic. Let's try to reload this website.
Cool! Yeah, so now we get this full screen game of life website, just as I requested. Now let
me show you how to import this agent and use it in your own projects. But first, let's
actually take a look at this demo video, because there are certainly some things
that you might not have noticed.
Excited to introduce you to Devin, the
first AI software engineer. From now on, Devin is in the driver's seat.
Note: the say, from now on Devin i in the driver s seat.
First Devin makes a step-by-step plan of how to tackle the problem. After
that it builds the whole project using all the same tools that a human software engineer
would use. Devin has its own command line, its own code editor, and even its own browser.
In this case Devin decides to use the browser to pull up API documentation so that it can read up
and learn how to plug into each of these APIs.
Do you guys notice anything on this video? They
really showed well the terminal, code editor, and the browser, which is just three very basic tools
that you can create yourself in an hour or so.
But they didn't really show the last window, which
is the chat interface itself, right? And if we look at this chat interface, you might notice that
they're actually prompting this agent while it's working on the program. So look, yeah, it runs
into an unexpected arrow. Look at what happens next. Next, Scott tells this agent to make sure
to use the right model names. This is not even the worst part. Take a look at this. If we go to
one of the other videos that they have released, for example, where Devin trains another AI,
which is what I've done with my framework around two months ago, on the first day that
I released it, you can see that in this demo, they actually have timestamps near the
messages. And if you look at these timestamps, the amount of time that it took Devin to
perform this task is actually quite extreme.
So the first message was sent to Devin at 2:49
pm and at the end of this video the task was completed at around 4:31 pm. So it took Devin 2
hours to complete this simple task. And that's why I believe on their main video there are actually
no timestamps on the chat interface. They have either edited them out or they adjusted
the chat interface just for this demo, because the amount of time that it took Devon to
complete this task was so substantial that no one would ever consider hiring this agent again.
So now let's dive in. The first step is to pull my agency swarm repo where by the time you
watch this video you should be able to find the exact agency that will develop right now together.
You can do this with a simple git clone command. After you have pulled this repo you need to
install Docker. The reason we need Docker for this specific video is because we don't really
want our developer agents to modify any local files on our own computers. To install Docker
you can go to the official Docker documentation from the readme of my repository and install
Docker desktop for whichever platform you use.
The process is pretty straightforward so I'm not
gonna go on how to do this. Simply follow all the installation steps and then navigate back into the
Agency Swarm lab repository. Open your terminal and now we are finally ready to get going. The
first step after you have installed Docker is to build the container. In this repository you will
find three new Docker files. The first Docker file called just Docker file is the simplest version
that you can build in a very short time. It pulls from a pre-built Python container and installs
the requirements without any extra packages. This means that you can use this container for simpler
use cases that do not require web browsing. However, for use cases where web browsing is
required, like the one we're going to do now, you do have to build a slightly different container.
This container does take a bit more time to build because it installs Selenium in a Chrome browser.
You will find two versions of the container that installs Selenium. Both of them are named
dockerfile.browsing and then the last part of the name is the type of the system that you use.
So for any users on Mac Silicon you need to build dockerfile.browsing.arm64 container. However, for
any users on Windows systems or on previously made Mac computers, you need to build the AMD64
container. The way you do this is by running a docker build command that you can find also
in the readme of this repository. This command is located on step 2. Simply copy this command
and replace the path to the docker file after the F flag. Since I'm running on Mac Silicon,
I'm going to use dockerfile.browsing.arm64. Simply hit enter and this should start the build
process. My container was already built before, so it took only a few seconds. However, if
you're building this container from scratch, it might take up to 10 minutes or even more.
After that, we are ready to run this container. To do this you can simply copy the next command
which is docker run and then we also have some additional flags. Make sure to add your OpenAI
key after the E flag. Then you can simply copy and run this command in the terminal as well.
So now we are inside our docker container which means that we can finally start coding. Inside
your docker container you will find all the files from my agency swarm lab repository. This
is because we are actually copying the volume from your current directory with all the agencies
into the docker container. This means that if the developer agent modifies any files from the agency
swarm lab repository which is now located in the app directory inside the container, those changes
will also be reflected on your host system.
Don't worry, however, because the agent will
not be able to modify any files outside of this directory on your computer. Now, finally,
let me show you how to import this new pre-made agent called David from Agency Swarm. In the
latest release, I have added a new Agency Swarm import agent command. As you can see,
this command takes two arguments, the name of the agent and the destination where to copy the
agent source files. For the name of the agent, we currently have only two agents available.
The first one is the browsing agent from one of the previous videos that I have released on my
channel and the second one is David. So let's try to import David. All you have to do is run agency
swarm import agent - - name David command. This will copy David into your local directory. Now
you can see David here on the right inside our agency swarm lab repository. The best part about
this new method is that now you have full access over the source code of your agent. For those who
have been following me before you might know that the goal of my framework is to provide you as a
developer with full control over your systems. So, unlike in the previous implementation where
we used to import agents using a simple import statement in Python, now all the source files will
be copied directly onto your local directory. This means that you can really fine-tune this agent
for your specific purpose. So whether you need to add new files, schemas, tools, or modify
existing tools and instructions. You can easily do that for your specific use case This
is exactly what we're gonna do in this video.
However to speed up the process We're
actually going to use Genesis agency, which also already has this capability to import
your agent source files locally. Let's see how it works. I'm gonna use this prompt that I've
made earlier and say that I need three agents, a planner agent, a developer agent, David, and
a browsing agent. I am also including specific communication flows between those agents. So the
structure of our agency will look like this. We will have a planner agent that communicates with
the browsing agent and David. David can also communicate with the browsing agent. If you come
back to Cognition Labs demo, we can see that Devin actually creates a plan by itself. In my opinion,
it's not the best way to do this. In my agency, I'm actually going to task the planner agent with
creating a plan, the browsing agent with pulling up the latest documentation, and then Devin
executing tasks based on this documentation. Additionally, instead of using a planning
prompt globally like they do in Cognition, what we're going to do is define special chain of
thought parameters inside each of the functions.
So for any functions that have a significant
impact on the development of the program, we will have a special chain of thought
parameter. This parameter allows the agent to plan its actions only when executing this tool,
which significantly saves you on token costs and latency without sacrificing much accuracy in terms
of performance. Quick update from the future, guys. Ever since I made the walkthrough of this
project, I actually modified the file writer tool. Now, instead of taking the source code directly
from the main agent and then writing it to a file, it takes requirements, details, file dependencies,
library dependencies, and other information from the agent. Then, it uses a special prompt,
inserting all of this information into the message, and runs it using another GPT-4
model until this model generates a complete and bug-free code. The reason I decided to adjust
this tool this way is because GPT-4 model is still being lazy. This is a known issue that a lot of
OpenAI users have been encountering. OpenAI has even released a new 0.125 GPT-4 model that aims
to reduce the cases of laziness. And although I've noticed that the cases of laziness with these
models are indeed reduced when the model generates a message, it can still be extremely lazy when
generating function calls. Now let's finally proceed with the development of this agency.
All I'm going to do is simply send this prompt, and this should activate the Genesys agency, which
will start planning send this prompt, and this should activate the Genesys agency, which will
start planning out all of those agents for me.
The process does take some time depending on your
requirements, so I'm going to scroll through this part and come back to you later. Amazing! So
after the Genesys agency has created our Code Solutions agency, you can see that it tells
me that I can navigate to the agency folder and start it using the following command. All
of our agent folders have also been imported accordingly. Let's move our terminal somewhere
on the left and now proceed with adjusting those agents. The instructions for the planner agent
essentially will tell it to first break down the tasks into manageable subtasks and then
direct them to different agents accordingly.
For example, we'll be directing any internet
browsing and information retrieval tasks to the browsing agent and any coding tasks to David.
After that, I'm also instructing this agent to continue conversations with both agents until
the task has been fully executed. This is one of my favorite lines that I like to include in any
manager or planner agent's instructions, because this allows you to achieve pretty good results
without constantly re-prompting your agency like in Cognition's demo. Now, I'm also going to
adjust the browsing agent's instructions. Inside the browsing agent's instructions, I'm also going
to include a predefined step-by-step process.
Some of the instructions are already
defined for you, and as I said before, you have full access over these files, so you
can modify them in any way you want. I found that these work well as the base for the agent
and provide it with enough context to complete most general web browsing tasks. However, to
fine-tune this agent on your specific process, I still recommend including this primary instruction
section with a specific step-by-step process.
In the case of this agency, I want the browsing
agent to first navigate into Google and find the required documentation. After that, I want
it to analyze this web page and make sure that it actually contains the information necessary
to complete the task. And then on step three, I'm telling it to extract the required information
from the web page as a file using the export file tool. So if you look at the source code of this
export file tool I'm actually quite proud of its implementation. The way the export file tool
works is by exporting the entire web page as PDF and then uploading it as a file to OpenAI.
This allows your agents to analyze this file using a retrieval tool. So, the general idea is
that the browsing agent will export documentation pages as files and then send those file IDs to
the planner agent. The planner agent will then, once all of the necessary files are collected,
send them in the message files parameter to David. This means that David will then be able to analyze
each documentation using the retrieval tool to extract the necessary information in order to
complete the task. However, please note that in my instructions I'm actually saying that the browsing
agent should send the file ID back to the user. This is important because of the way my framework
is structured. Your agents, while communicating with one another, will actually think that they
were prompted by the user rather than by another agent. Additionally, inside the browsing agent.py
file, I want to add a new response validator function. This function is a new addition to
my framework, and it allows you to check the response of the agent before proceeding with
returning it to the user or to another agent.
The way you define this function is simply by
overriding the response validator method. And then, inside this method, you can include
additional validation logic based on your requirements. So in the case of this agency, I'm
actually going to check if the response from the browsing agent contains the file ID and if not,
I'm going to instruct this agent to continue searching for documentation until the file ID is
found. This is another way to make your systems more reliable in production. For the David agent,
I have already made a response validator based on the specific issues I've been encountering. For
example, I'm checking if David returns a code snippet and then if it does, I'm telling it to
never do that and instead run code and test it locally. Additionally, I'm using an LLM validator
from the instructor library. This function allows you to check the response of the agent using
another large language model. So here I'm checking if the message from David actually indicates
that the code was executed because oftentimes I've noticed that this agent forgets or does not
want to execute code locally. If this happens, LLMValidator function from Instructor library
will return an error. The agent will then see this error and correct itself accordingly. To
adjust the number of attempts for validation, you can adjust the validation attempts parameter.
Currently, it's set to 1, which means that if validation fails more than twice, it's just
going to continue the conversation. However, you can set it to 2, 3 or even 50, which I
certainly do not recommend because it's going to take a lot of your tokens, but it's nice
that you have the flexibility to do so. I'm personally going to set it to 2 for David in this
specific scenario. Awesome. That's literally it. That's all it takes to create an agency similar
to the one that Cognition Labs have shown in their video. However, as I said, this agency will
not require nearly as many instructions. Now, all you have to do is simply set the server
name to 0.0.0.0 because we're running inside a Docker container and then run this file.
Each imported agent might have some additional requirements defined in the requirements.txt
file. For example, inside the browsing agents requirements, we also need to install Selenium,
WebDriver Manager, and Selenium Stealth.
The way you can do this is simply by
running pip install-r requirements.txt inside the browsing agent folder.
After our requirements are installed, we can simply run python agency.py command. This will
print a local URL, which you can simply click on, and this will take you to a greater interface
where you can interact with your new agency.
Okay, so it's me again from the future. I just
wanted to test this agency with the updated FileWriter tool. I'm going to send the prompt
from their main demo video which is to benchmark LamaTool on three different providers. Planner
agent also now has one additional tool which is the Create Plan tool. I added this tool for the
exact same reason I modified the File Writer tool, just to have a bit more control. And as you
can see, this tool generates a structured plan with all the agent names and task descriptions
accordingly. So then the Planner agent sends the first task to the Browsing agent, which is to find
the documentation for those API providers. The Browsing agent then opens the first page, which
is the Google search results and navigates to the first link which is docs together dot AI.
As you can see this page contains the exact documentation that we need so it then exports
it using export file tool and now it should proceed with searching for the next page
which is the documentation for Replicate. Again you obviously can see the browsing window
in the same interface like for Devin. However, in my opinion you don't need to see all of
those extra windows like terminal ID and the browser for the agent if you're developing an
actual AI software engineer rather than an AI coding assistant. In the meantime, as you can
see, the browsing agent now searches for the final Perplexity API documentation. Cool, so it
also navigates to Docs Perplexity AI and exports this page as a file. It then sends all of those
file IDs back to the planner agent. The planner agent then sends the first task to David, which
is to read these documentation files. It also provides them in the message files parameter.
David successfully read the documentation for Together and Replicate, however, it seems like
it encountered an issue with Perplexity API documentation. I suppose this is because this API
documentation page actually has multiple tabs, which is quite challenging for the browsing
agent at the moment, so it did not export the necessary page. Nevertheless, the Planner agent
then sends the next task, which is to create the script using the extracted documentation.
David then writes the script in the message, which is one of the hallucinations that
I mentioned before. However, luckily, because we have this response validator, you
can see that the Planner agent immediately tells David to never return code snippets
and actually use the file writer tool. So David then proceeds with reading the current
directory and creating benchmark Llama2.py script. It then tries to execute this snippet
but unfortunately it runs into a few issues.
So it seems like the execution has stopped
unfortunately but what I'm going to do next is just tell David to continue executing the script
and debugging it until it works correctly. Okay, so it seems like agents ran into an issue that
they couldn't resolve by themselves. The error seems to be stemming from the fact that the
retrieval tool in OpenAI Assistance API is currently f***ed, and unfortunately, it does
not extract all of the relevant information from the files. I'm hoping that OpenAI will release
the browsing tool very soon, and additionally, I'm hoping that the GPT-4 vision model will also
be available in the Assistant's API directly very soon because both of those enhancements should
significantly improve the performance of any browsing tasks. For now, what I'm going to do is
just send some example requests for all of those APIs. Okay, so as you can see, the script has
now been adjusted. However, David is now being lazy and again, it doesn't want to execute files.
Sometimes this happens with GPT-4. Unfortunately, I'm hoping that OpenAI will resolve this
issue soon. But what I'm gonna do for now is just send the message to David directly and
just tell it to execute the script. Awesome!
So as you can see, finally, we got this script
to work. So no errors were reported and the script execution is finally successful. You
can see all of the responses here. However, it did not benchmark those providers. It just
simply curate them via HTTP request. So let's now instruct David to actually benchmark the times
for these requests. So finally, as you can see, the agent added the execution logic to the script
and then executed it again and provided me with the execution times for each API. Seems like
perplexity was the fastest with 0.78 seconds per request, while Replicate was around a second,
and together API executed in almost two seconds.
So now let me show you how I would approach fine
tuning this agency to make it actually autonomous for your specific task. First of all, I would
definitely start with the agency manifesto. Right now it's not comprehensive enough. In
the agency manifesto, I would add additional context about the environment, about the specific
process that I'm trying to execute, and maybe even specific links for the browsing agent to read,
or even documentation that David must use in order to generate the code. Then I would proceed
with adjusting individual agents' instructions, and also including notes for the specific
process you're trying to automate. After that, you should modify your tool's logic. In this
scenario, we actually didn't use shared state, but shared state can be extremely useful if
you want your agents to perform functions only in a certain order. For example, if your
process requires running certain commands, what you could do is ensure that agent actually
executed those commands before proceeding to writing certain files. The way you would do this
is by saving certain states into the shared state parameter. So here you could save, for example,
the command into the shared state. And then in other functions, you would be able to check
if the agent actually executed this command.
So yeah, that's it for this video. Thank you
for watching and don't forget to subscribe.