Worlds FIRST AGI SOFTWARE ENGINEER Just SHOCKED The ENTIRE INDUSTRY! (FULLY Autonomous AI AGENT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so there was a recent announcement from this company called cognition labs and they announced something that actually did shock the entire industry as it's the first AI software engineer you can see right here it says today we're excited to introduce Devon the first AI software engineer Devon is the new state-of-the-art on thewe bench coding Benchmark has successfully passed practical engineering interviews from a leading AI companies and has even completed real jobs on upwork Devon is an autonomous agent that solves engineering tasks through the use of its own Shell Code editor and web browser when evaluated on The swe Benchmark which asks an AI to resolve GitHub issues found in Real World open source projects Devon correctly resolves 13.86% of the issues unassisted far exceeding the previous state-of-the-art model performance of 1.9 6% unassisted and 4.8% assisted check out what Devon can do in the thread below so let's take a look at some of the demo videos they showed us because honestly it is quite impressive on what this can really do hey I'm Scott from cognition Ai and today I'm really excited to introduce you to Devon the first AI software engineer let me show you an example of Devon in action I'm going to ask Devon to Benchmark the performance of llama and a couple different API providers from now on Devon is in the driver's seat first Devon makes a step-by-step plan of how to tackle the problem after that it builds a whole project using all the same tools that a human software engineer would use Devon has its own command line its own code editor and even its own browser in this case Devon decides to use the browser to pull up API documentation so that it can read up and learn how to plug into each of these apis here Deon runs into an unexpected error Deon actually decides to add a debugging print statement reruns the code with the debugging print statement and then uses the error in the logs to figure out how to fix the bug finally Devon decides to build and deploy a website with full styling as the visualization you can see the website here all of this is possible today because of the advancements that we've made in both reasoning and long-term planning it's a really hard problem and we've only just started but we're super excited about the progress that we've made so far in the meantime if you'd like to try out Devon on your own real world tasks send us a request below and we'd be happy to for to Deon so that was a rather impressive short demo and there are much more in the video that I'm going to be showing you but this is showing us but we are moving to a very very interesting part of society where these Air Technologies are going to keep sprouting up companies and investors are going to keep building these platforms where you can do absolutely amazing things and we're likely to see shakeups to the entire industry now with Devon being the world's first AI software engineer I think the advancement is not to be understated here this is an autonomous AI agent that is able to do a variety of software engineering tasks to a very very good degree and is really really going to impact the sector moving forward if things continue at this pace now we saw in this first demo that it was able to even pull up the browser to pull up the API documentation to understand what's going on and then even do the debugging so it could fix the issue by itself so this was really cool to see firsthand how this AI system was able to solve an issue on its own kind of look around browse around and then manage to complete the task completely without human assistance and right now the benchmarks are state-of-the-art meaning that this is currently the best we have and in the future it's likely to improve so I wonder what the future will look like in just a couple of years because my oh my things are moving quickly now there are also some other demos that show just how good this system is and I'm going to show you guys those right now now because Devon is able to do a majority of things that I didn't think it was able to do and one of the key things about this that makes it so cool is the long-term planning and the reasoning breakthrough that they've had which I'll talk about later which is a key factor in why this is really effective so let's take a look at how first of all Devon can actually do tasks on upwork hey I'm Walden one of the developers here at cognition AI we were playing around with whether or not Devon could start a side hustle on upw work so here's actual real job from upwork where the client wants to set up this computer vision model which actually looks quite interesting seems very difficult to set up um I'm not sure how I would start doing this but you know you give the task to Devon and ask Devon to figure it out and things just kick off Devon immediately goes ahead and you can see it sort of starts setting up the repo it actually runs into some issues here with the versioning so if you watch how Deon deals with it deon's actually updating the code to make these things work he continues with this loading and importing packages you can see that actually downloads images from the internet to run through the model you can see here that there are actually some issues that come across however Devan knows how to handle these things Devon kind of pushes through and if you look closely Devon's actually doing print line debugging here where Devon is adding these statements to track where the data flows and Devon continues to do this until Devon understands how everything's working and actually then updates the code with the fixes after removing print line statements Deon continues this pattern of fixing code and running it again until it runs the image model across all these roads across the world we can ask for a report from Devon at which point Devon sends over some sample images of roads with damage marked out and a nice txt file explaining Devon's work and the different kinds of outputs of the model good job Devon so you can see here that this was a quick demo on how it can do certain tasks on upwork and I think once again the reason I stated that this is going to shake up the industry is because if it's able to do many different upwards tasks then that means that it could potentially shake up the gig economy by removing future jobs that many people used to do so once again this is something that is shaking up the entire industry in various different sectors but one thing that I was really impressed with from this demo you can see right here that it actually has a little planner and then as this person is running the demo you can see live what this AI is planning to do now one thing that is quite I wouldn't say frightening but rather intriguing was that many people did state that these AI systems wouldn't get long-term planning because it would be I guess you could say rather dangerous if the AI wanted to do something that we wouldn't be aware of and I think right here in this example of course having long-term planning is vital to achieving any goal that humans usually do and you can see right here the plan is to clone the repository from the provided GitHub link read the documentation install the necessary dependencies run the models and review the output to to verify the accuracy and understand the result next here we have a demo of it being able to do unfamiliar tasks hey everyone my name is Sarah and I'm going to show you how Devon our AI software engineer can autonomously learn from a blog post within a few minutes Devon successfully generated this best desktop background image for me with my name on it so all I had to do was send this blog post in a message to Devon from there Devon actually does all the work for me starting with reading this blog post and figuring out how to run the code in a couple minutes Devon's actually made a lot of progress and if we jump to the middle here you can see that Devon's been able to find and fix some edge cases and bugs that the blog post did not cover for me and if we jump to the end we can see that Devon uh sends me the final result which I love I also got two bonus images uh here and here so uh let me know if you guys see anything hidden in the once again another really impressive demo on Devin being able to do unfamiliar tasks which shows that this llm can learn and it's just really really fascinating to see how all of these AI agents are being done with various different startups now here's another demo in which Devon adds a feature to an open source repository hey I'm Tony an engineer cognition I helped build Devon and now Deon helps me too today at work I wanted to run a bunch of commands at once and be able to track their status on one screen I found an open source tool named impro to do this here it is right here looks like it all finished but the status is way too vague I don't know which ones failed they all just say down I really want to improve the ux here but I'm not familiar with the code at all so I had Devon my AI software engineer help me looks like this person right here had the same issue so all I gave Devon was the link to the issue and asked Deon to fix it you can see me make the request right here on the left let's see what Devon did on the right we can track Devon's work and watch Devon jump from tool to Tool first Deon clints the repository using the Shell then reads the read me and an Editor to learn how to sub the code then goes back to the Shell to install the required dependencies Devon also opens up a web browser to take a look at the issue now Devon starts coding at some point Devon even opens up some R documentation to debug a compiler error finally Deon finishes the task and reports a summary the changes that were made Let's see if the changes worked I have Devon's code right here looks like it worked the third command succeeded I can even see the status codes here's all the code that Devon wrote for this change thanks Devon we also have another really cool demo of Devon making The Game of Life by someone else that works at compition hi I'm Adan and today I felt like playing the game of life so I asked Dean to implement it for me Dev started by creating a new react application using the Shell and then it started writing some code through its editor after that it deployed the app through netlify let's check it out that seems nice um but there's a lot more features which I want to add so let's ask Devon to do this one at a time I want the words Devon to be written at the initialization screen instead of it being random then I want the word to be slightly bigger and the frame rate to be faster I also want him to fix a bug where the screen gets freezed after 3 seconds let's see the progress dev has made so far you can see the diff and um the last diff shows that Devon just fixed the bug uh where the screen gets frozen after 3 seconds the seems reasonable to me so let's move on next I want Deon to increase the frame rate after 10 seconds and also to make the website responsive to different window sizes also wanted to make it interactive so that when I click my mouse somewhere it should should spawn a new block let's check out what Deon has made so far started with Devon which is what we asked for and when I click something it creates a new block as well that's fun um let's play around with it well there goes my evening there was also another really really cool application where Devon was able to fine-tune its own models and I found this one to be really really fascinating because when I read a recent research paper about lm's being able to find you their own models they actually weren't able to so this is an iteration not on that but I just find it funny that how you know recent updates have shown that certain things can't be done and then boom out of nowhere two days later someone else makes it happen hey guys today I'm going to show you an AI training and AI so here we're going to take the Kora repo which is a fine-tuning method for quantizing large language models we're going to feed this repo to our agent Devon and all we have to ask Devon is to fine-tune a 7B llama model Devon clones the repo figures out how to run it using the readme sets up all of the requirements using pip looks through all the scripts and is able to start running the training job there are a few hiccups where Devon runs into some Cuda issues which is to be expected with open source repos but it's not a problem Devon looks at the Nvidia environment and figures out how to reinstall the packages to make it work after a few more runs figure out the correct model names Devon successfully gets the training run working here we see training proceeding smoothly loss is going down and after a few steps looks pretty good I tell Devon to wait as the training job runs after about an hour I come back ask Devon hey how's the training going Devon helps me look a few hundred steps are done now and everything is still proceeding smoothly looks great thanks Devon for helping me set up my training run see I think this is a a rather fascinating thing that an AI system is able to train its own models I mean the implications here are staggering there's a tons of different applications but there is one more thing that I want to show you guys from the cognition team and then I want to get into some of the really cool details like the inner workings behind Devon to give you guys a quicker update hey I'm Andrew an engineer at cognition and I wanted to share a pretty amazing experience I had with Devon so I maintain this big open source repository uh which contains a lot of different algorithms uh used for competitive programming a lot of people use it and a few weeks ago uh my friend texted me that you know there was actually a bug in one of the in one of the implementations uh the implementation wasn't quite right when the inputs weren't uh weren't relatively prime I kind of glossed over that case when I was implementing it so I never really thought about it so I implemented a quick fix and then I thought that I should test it but I actually never really got around to writing any test cases so I thought if I don't want to do it I should just ask Devon to do it instead so I gave Devon the Repository asked uh asked Deon to just check it out and start working on it uh so Devon you know found the right repository checked it out you know found all files that are in the repository and then I told Deon what test case I wanted him to uh I just told Deon you know these are the inputs and then try checking for these conditions so Devon wrote the test without too much trouble uh it was Devon just looked around to understand what exactly uh what exactly the test should look like and what exactly the interfaces were and with this Devon ran the tests ran into a quick hiccup which was a compiler but Devon is able to solve those very effectively and just added an extra include to fix that and then uh was done writing this initial test so then I asked Deon to actually expand the test a little bit instead of just testing this one input I wanted Deon to write tested on all inputs so kind of the Brute Force testing strategy I use this a lot in my test and I just wanted Devon to implement it so that I didn't have to worry about it so Devon went and rewrote the test function to use four n for Loops but this time after Deon ran the tests Deon actually found a test failure you know if the code were correct there could be compilers in the test but you know the tests seemed really pretty reasonable so there probably shouldn't be a failure so Devon went and tried to debug the program for me so Devon here actually wrote uh actually added a print statement to debug the outputs uh and the uh inputs to the failing test reran the tests and actually found which case was wrong in this case these are the inputs and then the return value was actually 9 uh and the the code I'm running actually should never really return negative values so Devon realized this and actually went looking in the uh when looking in the code that we're trying to test and actually added this line of code that if extra less than zero extra plus equals you know plus equals something in order to make sure that the return value was actually non- negative so after fixing this Devon actually reran the tests and now uh now I can be confident that my code is correct and I have test to prove it there was also something really fascinating that I did see was of course the funding and they said that we are well funded including a 21 million series a led by Founders fund and we're grateful to support this from the industry leaders that they do have and what's crazy also is that they said by solving reasoning we can unlock new possibilities in a wide range of disciplines code is just the beginning so it seems that this company with Devon what whatever they're building it seems that it doesn't just seem like it's going to be a software if they've been able to solve the issue with a being able to reason effectively and having this agent autonomously do tasks like this on the internet I think they can definitely gain a huge market share for the autonomous agent sector which is going to be booming in the Years 2024 and Beyond so I think that this company is very very well poised to take a decent sized market share because there aren't really any other tools that we've seen demoed like this that are going to be that well now in addition we can also see The swe Benchmark and they talked about how it achieved very very good results being the state-of-the-art model and of course they said that the performance of Devon on thewe Benchmark is essentially impressive especially when you consider it to the previous stateof the-art models achieving a 13.86% resolution for real world GitHub issues in open-source projects is actually a pretty notable accomplishment considering the complexity and variety of problems that can occur in such projects now what actually stands out with Devon is that it significantly performs the previous models and even when those models were given additional information such as the exact files to edit and this actually suggests that Devon has a more robust understanding of code and the context in which it operates which allows it to autonomously navigate and fix issues within a codebase without explicit directions now moreover there was something really cool Devon's ability to perform unassisted is a key differentiator being evaluated on a random 25% subset of the data set indicates that the ai's performance is not tailored to specific typ types of problems but rather generally applicable which is a desirable trait for an autonomous AI system Meant For Real World application so what they also did state that was rather fascinating too was that they are going to have a technical report which will provide greater insight into the methods and te and technologies that enable Devon's Advanced capabilities and it's likely going to be eagerly anticipated by the AI community and the software development and these results will be a signal of how increasingly valuable AI is going to come in terms of the software engineering community now one thing that they also do talk about was the secret technique so it states here exactly how cognition AI made this breakthrough and in such a short time is something short of a mystery at least to Outsiders woo declines to say much about the Technology's underpinnings other than that his team found unique ways to combine large language models such as opening eyes GPT 4 with the reinforcement learning techniques it's obviously something that people in this space have thought about for a long time he says it's very dependent on the models and the approach and getting things to align just right and it seems that they made significant strides in this particular thing by comining some of the best techniques that we know in AI now of course they're hinting at proprietary blend of Technologies or methodologies that they've pioneered which could be the core of their breakthrough and the specific details of how these Technologies are integrated and leverag to achieve such breakthroughs are kept Under Wraps adding an element of mystery and of course protecting Trade Secrets now reinforcement learning is a powerful method in AI where algorithms learn to make decisions by receiving Rewards or penalties for the actions they take rather like training a pet and when combined with large language models like gbt 4 which already have strong understanding of human language the potential is there to create an AI that can improve itself through iterative processes potentially at a rate and efficiency that is unprecedented now the approach of aligning the models and getting things to align just right actually suggests a delicate balance and a fine-tuning process that could have taken substantial time and experimentation to perfect it's actually quite a tantalizing Peak into the kind of advanced AI development happening within cognition AI possibly Hing a new era of software engineering tools that could actually revolutionize the industry now Andre Kathy actually does talk about this and his tweet outlines a intriguing parallel between the evolution of autonomous driving and the automation of software engineering it's a compelling analogy that tracks the incremental steps of ai's inre increasing involvement and sophistication in task completion his progression for software engineering automation with AI begins with basic assistance and moves towards more complex and integrated functions and this trajectory indicates a future where AI handles more of the day-to-day coding tasks enabling developers to focus on higher level design and problem solving and Devon actually represents a leap in this Evolution coordinating multiple development tools and acting with more autonomy this actually does suggest that human oversight will shift to a higher abstraction level and that is particularly interesting it implies that the future of software Engineers May operate more like managers or Architects guiding the AI strategy rather than writing every line of code and Kathy also touches on the essential aspect of AI integration into software engineering which is the user interface design the interaction between humans and AI must be seamless and intuitive allowing for developers to efficiently guide and correct the AI it's not just about making the AI smarter it's about designing environments where AI and humans work together effectively and his closing remarks also underscore the significant changes ahead for the field of software engineering as a tools like Devon become more capable the role of a software engineer will likely will likely transform emphasizing supervision and high level conceptual work over traditional coding It's actually an exciting prospect and one that carries a promise of increased productivity and the potential to tackle more complex problems than ever before overall what we've seen here today was the very first in possibly a long line of autonomous AI agent software Developers but this is the first and very likely won't be the last we know that many other companies are working on this and they've all had very similar break it will be interesting to see what other companies come to Market with their products now that cognition have come out with Devon and it seems like they are leading the race in terms of what we are expecting when we look at autonomous AI agents especially in the coding space so with that being said what do you think the future of this is are you excited or are you someone who is a little bit more on the pessimistic side of what's to come either way your thoughts and opinions are appre appreci and if you did find Value in the video do not forget to leave a comment down below subscribe for future updates
Info
Channel: TheAIGRID
Views: 164,039
Rating: undefined out of 5
Keywords:
Id: L8C7_X2PD-Q
Channel Id: undefined
Length: 22min 4sec (1324 seconds)
Published: Tue Mar 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.