Introducing Devid, AI Software Engineer You Can Actually Use

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this video, I am releasing my own  open source implementation of Devin,   an actually usable AI software engineer called  David. Shoutout to my Slavic roots. But what   does actually usable mean? Well, three things. First of all, you don't have to wait for God   knows long on a waitlist and then guess how  the heck this thing works under the hood,   because you will have full access to the source  code files of this agent. This means that you can   fine tune it for whatever use case you want. Secondly, unlike other implementations like   SWE Agent by Princeton, this agent will be trained  to solve real-world coding challenges. Because if   someone is trained to solve pull request issues on  GitHub, they're probably not in the right mind.   And lastly, this will be an agentic system,  unlike Devin, which is more a coding assistant.   Okay, so here is the agency that we will be  building in this video. It consists of three   agents, the planner agent, David, and the browsing  agent. I'm going to try to create a website that   runs a game of life from one of their demos,  and because this is a relatively simple task,   I'm actually not going to use the planner agent.  I'm going to send this task directly to David.   As you can see, David first creates a plan just  like Devin, and then proceeds with executing the   first function to check the current directory. It  then creates the first file which is index.html.   You can see the source code of this file below. Next, it creates a style.css.   And finally, it proceeds with implementing  the game logic inside the game.js file. The   way this FileWriter tool works  is actually quite interesting.   As you can see, the agent simply passes  the requirements for the file to write,   and then the tool automatically generates this  file and writes it locally. I'll explain why I   implemented it this way in just a bit, but for  now, as you can see, Devin says that our Game   of Life has been successfully created, and  we can now test it by running the index.html   file. Let's see how it works. Perfect!  So as you can see, it does indeed work.   But now, I want this agent to actually make  the game full screen, like on the Devin's demo,   and also remove the buttons. So I'm going  to send this into the next message.   Awesome, as you can see it then modifies the  index.html file and makes it full screen. Then   it adjusts the style.css and finally it  also adjusts the javascript file with our   games logic. Let's try to reload this website. Cool! Yeah, so now we get this full screen game   of life website, just as I requested. Now let  me show you how to import this agent and use   it in your own projects. But first, let's  actually take a look at this demo video,   because there are certainly some things  that you might not have noticed.   Excited to introduce you to Devin, the  first AI software engineer. From now on,   Devin is in the driver's seat. Note: the say, from now on   Devin i in the driver s seat. First Devin makes a step-by-step   plan of how to tackle the problem. After  that it builds the whole project using all   the same tools that a human software engineer  would use. Devin has its own command line,   its own code editor, and even its own browser.  In this case Devin decides to use the browser to   pull up API documentation so that it can read up  and learn how to plug into each of these APIs.   Do you guys notice anything on this video? They  really showed well the terminal, code editor, and   the browser, which is just three very basic tools  that you can create yourself in an hour or so.   But they didn't really show the last window, which  is the chat interface itself, right? And if we   look at this chat interface, you might notice that  they're actually prompting this agent while it's   working on the program. So look, yeah, it runs  into an unexpected arrow. Look at what happens   next. Next, Scott tells this agent to make sure  to use the right model names. This is not even   the worst part. Take a look at this. If we go to  one of the other videos that they have released,   for example, where Devin trains another AI,  which is what I've done with my framework   around two months ago, on the first day that  I released it, you can see that in this demo,   they actually have timestamps near the  messages. And if you look at these timestamps,   the amount of time that it took Devin to  perform this task is actually quite extreme.   So the first message was sent to Devin at 2:49  pm and at the end of this video the task was   completed at around 4:31 pm. So it took Devin 2  hours to complete this simple task. And that's why   I believe on their main video there are actually  no timestamps on the chat interface. They have   either edited them out or they adjusted  the chat interface just for this demo,   because the amount of time that it took Devon to  complete this task was so substantial that no one   would ever consider hiring this agent again. So now let's dive in. The first step is to   pull my agency swarm repo where by the time you  watch this video you should be able to find the   exact agency that will develop right now together.  You can do this with a simple git clone command.   After you have pulled this repo you need to  install Docker. The reason we need Docker for   this specific video is because we don't really  want our developer agents to modify any local   files on our own computers. To install Docker  you can go to the official Docker documentation   from the readme of my repository and install  Docker desktop for whichever platform you use.   The process is pretty straightforward so I'm not  gonna go on how to do this. Simply follow all the   installation steps and then navigate back into the  Agency Swarm lab repository. Open your terminal   and now we are finally ready to get going. The  first step after you have installed Docker is to   build the container. In this repository you will  find three new Docker files. The first Docker file   called just Docker file is the simplest version  that you can build in a very short time. It pulls   from a pre-built Python container and installs  the requirements without any extra packages. This   means that you can use this container for simpler  use cases that do not require web browsing.   However, for use cases where web browsing is  required, like the one we're going to do now, you   do have to build a slightly different container. This container does take a bit more time to build   because it installs Selenium in a Chrome browser.  You will find two versions of the container that   installs Selenium. Both of them are named  dockerfile.browsing and then the last part of   the name is the type of the system that you use.  So for any users on Mac Silicon you need to build   dockerfile.browsing.arm64 container. However, for  any users on Windows systems or on previously made   Mac computers, you need to build the AMD64  container. The way you do this is by running   a docker build command that you can find also  in the readme of this repository. This command   is located on step 2. Simply copy this command  and replace the path to the docker file after   the F flag. Since I'm running on Mac Silicon,  I'm going to use dockerfile.browsing.arm64.   Simply hit enter and this should start the build  process. My container was already built before,   so it took only a few seconds. However, if  you're building this container from scratch,   it might take up to 10 minutes or even more.  After that, we are ready to run this container.   To do this you can simply copy the next command  which is docker run and then we also have some   additional flags. Make sure to add your OpenAI  key after the E flag. Then you can simply copy   and run this command in the terminal as well. So now we are inside our docker container which   means that we can finally start coding. Inside  your docker container you will find all the   files from my agency swarm lab repository. This  is because we are actually copying the volume   from your current directory with all the agencies  into the docker container. This means that if the   developer agent modifies any files from the agency  swarm lab repository which is now located in the   app directory inside the container, those changes  will also be reflected on your host system.   Don't worry, however, because the agent will  not be able to modify any files outside of   this directory on your computer. Now, finally,  let me show you how to import this new pre-made   agent called David from Agency Swarm. In the  latest release, I have added a new Agency   Swarm import agent command. As you can see,  this command takes two arguments, the name of   the agent and the destination where to copy the  agent source files. For the name of the agent,   we currently have only two agents available.  The first one is the browsing agent from one   of the previous videos that I have released on my  channel and the second one is David. So let's try   to import David. All you have to do is run agency  swarm import agent - - name David command. This   will copy David into your local directory. Now  you can see David here on the right inside our   agency swarm lab repository. The best part about  this new method is that now you have full access   over the source code of your agent. For those who  have been following me before you might know that   the goal of my framework is to provide you as a  developer with full control over your systems. So,   unlike in the previous implementation where  we used to import agents using a simple import   statement in Python, now all the source files will  be copied directly onto your local directory. This   means that you can really fine-tune this agent  for your specific purpose. So whether you need   to add new files, schemas, tools, or modify  existing tools and instructions. You can   easily do that for your specific use case This  is exactly what we're gonna do in this video.   However to speed up the process We're  actually going to use Genesis agency,   which also already has this capability to import  your agent source files locally. Let's see how   it works. I'm gonna use this prompt that I've  made earlier and say that I need three agents,   a planner agent, a developer agent, David, and  a browsing agent. I am also including specific   communication flows between those agents. So the  structure of our agency will look like this. We   will have a planner agent that communicates with  the browsing agent and David. David can also   communicate with the browsing agent. If you come  back to Cognition Labs demo, we can see that Devin   actually creates a plan by itself. In my opinion,  it's not the best way to do this. In my agency,   I'm actually going to task the planner agent with  creating a plan, the browsing agent with pulling   up the latest documentation, and then Devin  executing tasks based on this documentation.   Additionally, instead of using a planning  prompt globally like they do in Cognition,   what we're going to do is define special chain of  thought parameters inside each of the functions.   So for any functions that have a significant  impact on the development of the program,   we will have a special chain of thought  parameter. This parameter allows the agent   to plan its actions only when executing this tool,  which significantly saves you on token costs and   latency without sacrificing much accuracy in terms  of performance. Quick update from the future,   guys. Ever since I made the walkthrough of this  project, I actually modified the file writer tool.   Now, instead of taking the source code directly  from the main agent and then writing it to a file,   it takes requirements, details, file dependencies,  library dependencies, and other information from   the agent. Then, it uses a special prompt,  inserting all of this information into the   message, and runs it using another GPT-4  model until this model generates a complete   and bug-free code. The reason I decided to adjust  this tool this way is because GPT-4 model is still   being lazy. This is a known issue that a lot of  OpenAI users have been encountering. OpenAI has   even released a new 0.125 GPT-4 model that aims  to reduce the cases of laziness. And although   I've noticed that the cases of laziness with these  models are indeed reduced when the model generates   a message, it can still be extremely lazy when  generating function calls. Now let's finally   proceed with the development of this agency.  All I'm going to do is simply send this prompt,   and this should activate the Genesys agency, which  will start planning send this prompt, and this   should activate the Genesys agency, which will  start planning out all of those agents for me.   The process does take some time depending on your  requirements, so I'm going to scroll through this   part and come back to you later. Amazing! So  after the Genesys agency has created our Code   Solutions agency, you can see that it tells  me that I can navigate to the agency folder   and start it using the following command. All  of our agent folders have also been imported   accordingly. Let's move our terminal somewhere  on the left and now proceed with adjusting those   agents. The instructions for the planner agent  essentially will tell it to first break down   the tasks into manageable subtasks and then  direct them to different agents accordingly.   For example, we'll be directing any internet  browsing and information retrieval tasks to the   browsing agent and any coding tasks to David.  After that, I'm also instructing this agent to   continue conversations with both agents until  the task has been fully executed. This is one   of my favorite lines that I like to include in any  manager or planner agent's instructions, because   this allows you to achieve pretty good results  without constantly re-prompting your agency   like in Cognition's demo. Now, I'm also going to  adjust the browsing agent's instructions. Inside   the browsing agent's instructions, I'm also going  to include a predefined step-by-step process.   Some of the instructions are already  defined for you, and as I said before,   you have full access over these files, so you  can modify them in any way you want. I found   that these work well as the base for the agent  and provide it with enough context to complete   most general web browsing tasks. However, to  fine-tune this agent on your specific process, I   still recommend including this primary instruction  section with a specific step-by-step process.   In the case of this agency, I want the browsing  agent to first navigate into Google and find   the required documentation. After that, I want  it to analyze this web page and make sure that   it actually contains the information necessary  to complete the task. And then on step three,   I'm telling it to extract the required information  from the web page as a file using the export file   tool. So if you look at the source code of this  export file tool I'm actually quite proud of its   implementation. The way the export file tool  works is by exporting the entire web page as   PDF and then uploading it as a file to OpenAI. This allows your agents to analyze this file   using a retrieval tool. So, the general idea is  that the browsing agent will export documentation   pages as files and then send those file IDs to  the planner agent. The planner agent will then,   once all of the necessary files are collected,  send them in the message files parameter to David.   This means that David will then be able to analyze  each documentation using the retrieval tool to   extract the necessary information in order to  complete the task. However, please note that in my   instructions I'm actually saying that the browsing  agent should send the file ID back to the user.   This is important because of the way my framework  is structured. Your agents, while communicating   with one another, will actually think that they  were prompted by the user rather than by another   agent. Additionally, inside the browsing agent.py  file, I want to add a new response validator   function. This function is a new addition to  my framework, and it allows you to check the   response of the agent before proceeding with  returning it to the user or to another agent.   The way you define this function is simply by  overriding the response validator method. And   then, inside this method, you can include  additional validation logic based on your   requirements. So in the case of this agency, I'm  actually going to check if the response from the   browsing agent contains the file ID and if not,  I'm going to instruct this agent to continue   searching for documentation until the file ID is  found. This is another way to make your systems   more reliable in production. For the David agent,  I have already made a response validator based on   the specific issues I've been encountering. For  example, I'm checking if David returns a code   snippet and then if it does, I'm telling it to  never do that and instead run code and test it   locally. Additionally, I'm using an LLM validator  from the instructor library. This function allows   you to check the response of the agent using  another large language model. So here I'm checking   if the message from David actually indicates  that the code was executed because oftentimes   I've noticed that this agent forgets or does not  want to execute code locally. If this happens,   LLMValidator function from Instructor library  will return an error. The agent will then see   this error and correct itself accordingly. To  adjust the number of attempts for validation,   you can adjust the validation attempts parameter.  Currently, it's set to 1, which means that if   validation fails more than twice, it's just  going to continue the conversation. However,   you can set it to 2, 3 or even 50, which I  certainly do not recommend because it's going   to take a lot of your tokens, but it's nice  that you have the flexibility to do so. I'm   personally going to set it to 2 for David in this  specific scenario. Awesome. That's literally it.   That's all it takes to create an agency similar  to the one that Cognition Labs have shown in their   video. However, as I said, this agency will  not require nearly as many instructions. Now,   all you have to do is simply set the server  name to 0.0.0.0 because we're running inside   a Docker container and then run this file.  Each imported agent might have some additional   requirements defined in the requirements.txt  file. For example, inside the browsing agents   requirements, we also need to install Selenium,  WebDriver Manager, and Selenium Stealth.   The way you can do this is simply by  running pip install-r requirements.txt   inside the browsing agent folder. After our requirements are installed, we   can simply run python agency.py command. This will  print a local URL, which you can simply click on,   and this will take you to a greater interface  where you can interact with your new agency.   Okay, so it's me again from the future. I just  wanted to test this agency with the updated   FileWriter tool. I'm going to send the prompt  from their main demo video which is to benchmark   LamaTool on three different providers. Planner  agent also now has one additional tool which   is the Create Plan tool. I added this tool for the  exact same reason I modified the File Writer tool,   just to have a bit more control. And as you  can see, this tool generates a structured plan   with all the agent names and task descriptions  accordingly. So then the Planner agent sends the   first task to the Browsing agent, which is to find  the documentation for those API providers. The   Browsing agent then opens the first page, which  is the Google search results and navigates to the   first link which is docs together dot AI. As you can see this page contains the exact   documentation that we need so it then exports  it using export file tool and now it should   proceed with searching for the next page  which is the documentation for Replicate.   Again you obviously can see the browsing window  in the same interface like for Devin. However,   in my opinion you don't need to see all of  those extra windows like terminal ID and the   browser for the agent if you're developing an  actual AI software engineer rather than an AI   coding assistant. In the meantime, as you can  see, the browsing agent now searches for the   final Perplexity API documentation. Cool, so it  also navigates to Docs Perplexity AI and exports   this page as a file. It then sends all of those  file IDs back to the planner agent. The planner   agent then sends the first task to David, which  is to read these documentation files. It also   provides them in the message files parameter. David successfully read the documentation for   Together and Replicate, however, it seems like  it encountered an issue with Perplexity API   documentation. I suppose this is because this API  documentation page actually has multiple tabs,   which is quite challenging for the browsing  agent at the moment, so it did not export the   necessary page. Nevertheless, the Planner agent  then sends the next task, which is to create   the script using the extracted documentation.  David then writes the script in the message,   which is one of the hallucinations that  I mentioned before. However, luckily,   because we have this response validator, you  can see that the Planner agent immediately   tells David to never return code snippets  and actually use the file writer tool. So   David then proceeds with reading the current  directory and creating benchmark Llama2.py   script. It then tries to execute this snippet  but unfortunately it runs into a few issues.   So it seems like the execution has stopped  unfortunately but what I'm going to do next is   just tell David to continue executing the script  and debugging it until it works correctly. Okay,   so it seems like agents ran into an issue that  they couldn't resolve by themselves. The error   seems to be stemming from the fact that the  retrieval tool in OpenAI Assistance API is   currently f***ed, and unfortunately, it does  not extract all of the relevant information from   the files. I'm hoping that OpenAI will release  the browsing tool very soon, and additionally,   I'm hoping that the GPT-4 vision model will also  be available in the Assistant's API directly very   soon because both of those enhancements should  significantly improve the performance of any   browsing tasks. For now, what I'm going to do is  just send some example requests for all of those   APIs. Okay, so as you can see, the script has  now been adjusted. However, David is now being   lazy and again, it doesn't want to execute files.  Sometimes this happens with GPT-4. Unfortunately,   I'm hoping that OpenAI will resolve this  issue soon. But what I'm gonna do for now   is just send the message to David directly and  just tell it to execute the script. Awesome!   So as you can see, finally, we got this script  to work. So no errors were reported and the   script execution is finally successful. You  can see all of the responses here. However,   it did not benchmark those providers. It just  simply curate them via HTTP request. So let's   now instruct David to actually benchmark the times  for these requests. So finally, as you can see,   the agent added the execution logic to the script  and then executed it again and provided me with   the execution times for each API. Seems like  perplexity was the fastest with 0.78 seconds   per request, while Replicate was around a second,  and together API executed in almost two seconds.   So now let me show you how I would approach fine  tuning this agency to make it actually autonomous   for your specific task. First of all, I would  definitely start with the agency manifesto.   Right now it's not comprehensive enough. In  the agency manifesto, I would add additional   context about the environment, about the specific  process that I'm trying to execute, and maybe even   specific links for the browsing agent to read,  or even documentation that David must use in   order to generate the code. Then I would proceed  with adjusting individual agents' instructions,   and also including notes for the specific  process you're trying to automate. After that,   you should modify your tool's logic. In this  scenario, we actually didn't use shared state,   but shared state can be extremely useful if  you want your agents to perform functions   only in a certain order. For example, if your  process requires running certain commands,   what you could do is ensure that agent actually  executed those commands before proceeding to   writing certain files. The way you would do this  is by saving certain states into the shared state   parameter. So here you could save, for example,  the command into the shared state. And then in   other functions, you would be able to check  if the agent actually executed this command.   So yeah, that's it for this video. Thank you  for watching and don't forget to subscribe.
Info
Channel: VRSEN
Views: 14,177
Rating: undefined out of 5
Keywords: AI software engineer, Devin AI, software development AI, AI programming, AI code automation, machine learning, AI for developers, artificial intelligence, coding assistant, developer tools AI, AI in software engineering, code automation, AI developer tools, ai, openai, open ai, ai agents, agency swarm, ai agency, Devin, coding, agent swarms, openai assistants api, crew ai, autogen
Id: BEpDRj9H3zE
Channel Id: undefined
Length: 25min 0sec (1500 seconds)
Published: Thu Apr 11 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.