NEW AGENTLESS AI Software Development

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello Community today we go agent less for software development and you know up until now all the AI research Community developed autonomous agent autonomous agent to perform an endtoend software development task and we equip those agent with the tool use they can run commands get feedback from the environment and plan their future action the pure definition of what an AI agent is and you know those agent-based approaches you know Davin or open Davin the open source alternative you know swe agent or you know maybe auto code Rover all of those systems are agent based and here for example A system that I like AER and here from June 2024 Ada is state-of the-art for both swe bench and swe bench light and you see here the performance here if you use your Opus as your llm for SWA agent or gbd4 or you go with Amazon Q developer the agent or you go with Davin and then you see the performance for example of Aer plus I do like intelligent people when they say hey it is interactive not agentic plus of course open source Apache 2 license and you can run it here on a MacBook plus CLA 3.5 son it and it is compatible with deep seek Koda version two already what a nice approach but now we have some researchers from the University of Illinois and they say hey the complexity of those agent based approaches together with our real limited abilities of the current llms that we discover more and more with the hallucination and and and it raises for us a simple question do we really have to employ complex autonomous software agent for software engineering like we had to do two years ago do we still have to deploy agent for this task now that we have some advanced technology at our hands imagine a world where software development we would not need agents yes of course the whole industry sector who is pushing and zelling AI agent in software development will say hey wait a minute no we invested millions in their development but we are research Channel we ask what is possible with the technology and what is the latest research and maybe the latest research shows us that an open source code is outperforming all the other open-source agentic Solutions and here we have it July 1st 2024 agentless demystifying llm based software engineering agents and they develop agent less a simplified approach to solve software development problems using llms without the complexity of autonomous agents and you know what before we dive into anything let's see at the Benchmark data I want to see the performance of those system before we have a look how does the system do it so what is the result of this now open source software development we have here start with the tools and then the supporting llm open source AER auto code Rover swe agent open Davin plus rag systems and then the last line you see the new agentless for the llm we go with gp4 Omni plus claw 3 or only gbd4 only claw you choose whatever you like but now let's look at the performance of the percentage of the resolved task of this software development Benchmark and you see from all the open- source tools the very last line with 27.3% agent less has a better performance than all the other open- source tools that use agents now this is interesting and the aors also argue that given their cost structure if you have to deploy CLA 3 or gbd4 Omni is also absolutely compatible if sometimes you look here this their cost Bas is one/ tenth of other open source systems so interesting maybe this is something we should have a look at now let's be clear if we go now to professional property very expensive Business Solutions we have here Alibaba we have IBM we have byens we have Amazon Q developer if we look at those systems and we integrate those now this proprietary closed systems with gbd4 Omni plus claw 3.5 already we see the current Alibaba system lingma agent is best in Clause with 33% resolve tasks but you know what if we now compare this to the very last line agentless with 27.3% you see there is a difference but the difference is not that Monumental because look we just need to go down one two three four steps and the new agentless system is already ranked fifth in this list including the open source and and the proprietary business solutions so I think not bad for the first try of a no agent system in software development now you know that maybe in some of those 300 data sets in this Benchmark data set there are some non really specified tasks so the researcher sit down and clean the data set to go from the 300 task to 252 problems and they call this new Benchmark data set here the light s maybe s stands for smaller and I did run the tests again so you have here the classical light and here you have now the corrected light smaller and now if you look again at here the tools and our llms you see the performance increases for the best Alibaba we have 33% goes now to 34.5% but also agent Li improves from 27.3 to 28.1 but you see the rank stays the same it is fifth in both cases so not bad for the first appearance of agentless and if you look now at this particular sheet you might decide that you might give agent Les a try for your task in your environment for your particular configuration now for my green grassas what iswe bench what is this Benchmark doing here a paper by Princeton University I would recommend here from April 5th 2024 in this version if you want to know more about this Benchmark sets and you remember I showed you also here the rack solution here for software development just to give you a short explanation yeah maybe it's like a look at the performance if we go here with the rag as a tool and we use your claw 3 you see we have a percentage we have a performance of 4.3% with gbd4 2% and then if you go down with claw 2 3% and all of this is below the threshold of 5% so maybe if you look at performance data here you might have a second sort for the classical rack configuration okay just wanted to show you how does it work exactly what you expect here it retrieves relevant information from a database a set of documents whatever you have available the file content retrieval mechanism so a simple bm25 algorithm of a very old friend this system identifies the most relevant files for from the codebase related to the software issue at hand and then we have a patch file generation once the relevant files are identified and retrieved by the system now our llm directly generates a patch file this patch file contains you the code changes needed to fix the issues found in the retrieved files and this is the way our rack system works so now that we had a look at the system at the performance data I was interested how does it work how is it possible not to use here some complex agent structure but how can an agent less system give us this performance now you're not going to believe this but the main idea is simple we have just two aspects here we have to identify where the problem in the code is so we have a localization phase and then we just have to fix this code problems this is a repair phase easy localize it repair it now the localization phase to give you a little bit more data here involves a hierarchical process of a tree structure to determine the exact location within the code that requires some change so the llm scans here the entire code base and ranks the files based on the likelihood of containing errors related to the reported issues the next step is to drill down here to specific Clauses and function within those files then the llm reviews here the structure within each files like classes and function and the final step is to pinpoint the exact line of code or code blocks where changes should be made or as we can say where we have to repair the code and here the llm generates multiple potential fixes or patches for those bucks that we discover but of course the remaining patches are ranked based on the effectiveness and the top ranked patch is selected and applied to the code base well as easy as this is now if you want to see it in the original occasion it looks a little bit more complicated but it is in principle exactly the same but let's do it here on this original visualization so they UT to say We Begin our hierarchical localization process by turning the project code base into a treel like structure for format that demonstrates the relative location of each file next two using this repo structure format along with the original isue description we ask the llm to localize and rank the top and most suspicious file that need editing and three we provide a skeleton for each file this means a list of Declaration headers of the classes and of the function and ask the llm to Output a specific list of Clauses and function that we should examine and more closely to fix the bug and then step four we ask the llm to finalize a smaller set of added location and for the repair voice that starts now we provide the code Snippets at this added locations together with the issue description and ask the llm to sample multiple patches to solve the issue and then step number six we perform some simple filtering to remove any patches with syntax errors and then we use for example majority voting to rank the the remaining patches and agent L then selects the top ranked patch as the final patch for submission sounds easy the code is there for you it is open source you can try it out immediately and you know what there's one sentence I would like to show you from this pre-print they ought to say you know our work highlights here a current overlooked potential of a simple technique in autonomous software development but that allows us also to interpret what is going on because if you have agents that go 30 40 50 turns you can't really follow up with exactly what is happening in software development but with agentless you can so therefore I highly recommend here GitHub open auto coder agentless updated two days ago have a look here as you can see beautiful instructure setup localization repair artifacts whatever you like MIT license great plus since this is new a new paradigm because me too I'm used to using a agent for all of this you know DAV and all of this and now the authors come and say let's think about this from more abstract point of view do we really need agents yes two years ago we had to use agent to find here the solution because our technology was not Advanced as it is today and they have three points I would also like to stress and I quote you the authors of the study and they say the disparity between the human and the current llm abilities leads to the following limitation of agent based approaches and it says at first we have a complex tool usage and they said to utilize our tools current agent based approaches apply an abstraction layer between the agent and environment and examples like mapping real actions to API calls so that agents can use tools by outputting an API call instruction those API call specifications require some careful design regarding out input output format and this can lead to Incorrect and imprecise Tool usage especially if you have a real complex action space so this means given the iterative nature of agent based approaches where the current action depends on the previous turns if you have some incorrect or imprecise defining or the using of a tool this will reduce the performance and inur additional costs second they State you know we have a lack of control in the decision planning because in addition to using now tools the agent designed the current action to take based on the previous action taken and the feedback provided by the environment but but often with minimal checks to ensure the action taken makes real sense and due to the real large possible action space and the feedback response it can be extremely easy for autonomous agent to become confused in this huge action space and maybe they offer or perform suboptimal exploration and I've seen this also so they say furthermore to solve your issue isue an eii agent here in the software development can take upwards to 30 or 40 turns which makes it extremely difficult to both understand the decision made by the agent and also debug it and this is true if you started here to debug those agent system and Lang chain Plus+ plus you will find that you can spend a lot of time on exactly this point and the final point I would like also to quote from the or is here a limited ability to kind of self-reflect you know because those agent they tend to take all on all the information all the feedback from environment but they do not really know how to filter out incorrect or irrelevant or misleading information so a common step in the agent based approach is to reproduce an issue with a minimal test case but however this reproduced test case may no not be always the correct or the precise test case to apply to this particular problem and this means that if we have an incorrect step now taken here this step can easily amplify and negatively affect all future decisions taken by our AI agent here in the area of software engineering so I think there a three valued powerful point maybe if we have the technology to reduce eii agent use to cut back on the agent complexity because whatever we reduce a complexity in the system it is less prone to make mistakes to errors and therefore it is so much easier for us to debu the whole system so I think an interesting study a very new ideas presented here in this preprint the code is open source go there experience it yourself for your case for your particular task in your computer environment give it a try if the data that the researcher present here especially here The Benchmark data are solid it is the best op Source element that we have today for software development so I think this sounds really interesting and from the idea why they're implemented and how they implemented I think it is really important to at least be brave enough to break with our tradition of EI agents and explore a new technology I hope you found it interesting I hope you have some new sorts about here software engineering and it would be great to see you in my next video
Info
Channel: code_your_own_AI
Views: 2,067
Rating: undefined out of 5
Keywords: artificial intelligence, AI models, LLM, VLM, VLA, Multi-modal model, explanatory video, RAG, multi-AI, multi-agent, Fine-tune, Pre-train, RLHF
Id: SoFepHI6sQ0
Channel Id: undefined
Length: 19min 30sec (1170 seconds)
Published: Thu Jul 11 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.