Will Devin AI Take Your Job?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if you've been on Twitter or YouTube over the last week you've definitely heard of Devon the brand new AI tool that supposedly acts and works just like a software engineer and a lot of people are worried that this is going to be the thing that takes over your job as a software engineer and there's a lot of really impressive claims that Devon is making but how true are they actually and how impressive is this AI tool I've gone through I've done the research read the papers looked at all the different claims that they're making and I really think Devon is not nearly as impressive or scary as people are making it out to be and in this video I kind of want to talk about what Devon is what it actually can accomplish and some of the things that it really cannot do welcome back to web def simplified my name is Kyle and my job is to simplify the web for you so you can start building your dream project Center and today we're going to be talking about cognition lab's newest AI which is Devon and this is pretty much a brand new company that really hasn't released anything at all before until releasing this Devon AI now they put out a Blog article which I'm going to link in the description of this video and this blog article goes through quite a few different things about Devon what it's capable of what it can all do and really is showcasing all of the best case scenarios for Devon because they want this to look as good as possible and that's because most of the time these AI companies what they're trying to accomplish is actually getting tons and tons of funding if we actually scroll to the top of this page you can see that they've already raised $21 million in funding pretty much immediately from announcing this and all of that stuff going along with this so really the goal of these types of blog articles and all this information is to really drum up as much hype as possible to get as much funding as possible in into these particular AIS so they want it to look as good as possible on paper now there's a few different things I want to talk about in this video that specifically are the things people are most scared of so if we scroll down to this Devon's capabilities there's a bunch of different videos that we can go through that talk about the different things Devon can do and I want to focus on some of the main ones and why they're maybe not as scary as you think the first one here is that Devon can learn how to use unfamiliar technology this one is scary to a lot of people because the AI essentially can teach itself using existing blog articles videos documentation and so on which sounds really scary but honestly we'll deep dive into this it's not that bad another thing that we want to talk about is how it can actually find and fix bugs for you autonomously which is very misleading compared to what they actually do in the video again I'll dive deeper into why this is not nearly as scary as they make it out to be especially based on the video that they show you and then finally here if we go down a little bit further we can see that Devon is actually able to accomplish Real World jobs on epor which is again something that's really scary for people because it's like replacing essentially jobs that people could do but again this may not be as scary as you think it is now if we scroll all the way down to the bottom here you may see this chart this is probably something you've seen if you've heard people talk about Devon and essentially it's saying that Devon is able to accomplish 13.86% of GitHub issues and that's how a lot of people present it but essentially it's just using thiswe bench which is essentially a paper a benchmark for testing AI against GitHub issues and if we go to the actual site for this you'll notice that this is actually much less of a scary thing than people think they may think that okay it can solve essentially what is it 13.8% of all GitHub issues but really what this does is it takes just 12 GitHub repositories if we scroll all the way down here you can see it's 12 popular python repositories and it's only pulling 2300 different poll request issues so the way that this works is it takes 2300 issues and the associated poll request that was generated for that issue and each of these poll requests has test data that was written for it for unit test and in order to be considered passing for the AI model all it has to do is write code that passes the unit test that were written to go along with that P R it doesn't actually mean that the code is 100% correct or that it does things exactly like it's supposed to it just has to pass those unit tests which is generally a good idea to say that the code is most likely correct now if we go ahead and we look at an example of one of the issues that is used inside of this data set you'll see that this is an issue for some python library for something where new lines were being added in wrong places and you'll notice something really important about this is that the issue is very well documented you can see here is exactly what I searched for here is exactly what's happening you can see the expected Behavior what the observed Behavior should be how to reproduce this all the different stuff with versioning configuration files I mean this is an incredibly well-written issue much better than 99% of GitHub repositories out there and this is actually a recurring theme between pretty much all these different GitHub issues that are tested they have very good documentation in the issue side of things now if we look at the poll request that was submitted by an actual user this is not generated by AI you notice that the amount of files changed was 10 it's not a huge amount of data that was changed and if we go all the way down to the test you can see that this person wrote a few different test cases inside of here so if we look at a few of these different tests you can see there's just a couple tests that are being written and modified so this is essentially what the data is being test on is these like two or three different test cases that were added or modified so really as long as the AI model is able to actually correctly write some code that passes these tests that's the only thing that's being checked on but in general that's a pretty good indicator that they were able to solve the problem and it's still impressive that they're able to solve essentially 133% of these different problems but another thing to worry about here is if we scroll down you'll notice that Devon was evaluated on a random 25% % subset of the data now I'm not sure why they decided to go with only 25% of the data instead of doing 100% of the data it makes me a little bit concerned because since there's no way for us to actually test with Devon right now since it's a closed off system currently it's not open to the public it's a little bit scary for me to think maybe they kind of randomly chose 25% until they got a 25% that gave them this good number for their announcement to try to raise money they could have just continually tested a random 25% until they landed on a random 25% that gave them the best best possible number because obviously some issues are going to be easier than others to solve so it's a little bit strange they didn't do it with 100% I don't know if there's certain resource constraints or if there was a different data set they used or what it was but it would be much more comforting to actually see that they did this on 100% of the data instead of only 25% of it especially because like I said there's only 2,300 issues so doing 25% versus 100% is not that big of a difference so if you see this type of chart being thrown around where it's like they can solve 14% of all issues on GitHub that's very misleading it's 14% of issues ues in a very small subset across a very few select repositories that have very good documentation and very good issue support now the other things I talked about one thing is that this AI can learn for itself this is the video that they mentioned that specifically that the AI can actually learn from blog articles and resources out there so in this particular video this person is asking Devon they're pasting in a link to a Blog article and they're saying hey this blog article says that it can do X Y and Z and it even mentions in the blog article a script that you can use to do this that's what they tell Devon and they say hey can you set this up and generate images for me with these specific criteria so if we go over to that blog article at the very bottom you'll see that it has this try it-yourself section and it even has a link to a GitHub repository with that script if we open up that script you can see right here is the GitHub repository with all the information you need to be able to set this up it even tells you the exact code you need to use obviously it has the script files and everything so essentially all the code to do this is already written it's just giving you instructions on how to get set up with that so Devon's not really writing too much custom code it's just mostly following these instructions that are set up in this blog article and set up in this GitHub repository and it's able to generate these things based on the code that's already been written by other people and I noticed something really specific about this prompt they give it they specifically in the prompt say here's the blog article and they mention that there's a script in the blog article that is supposed to be used to generate those things so they're specifically telling this AI hey look for this script inside this blog article they maybe ignored everything else in the blog article went straight to the script and looked at this actual GitHub reposit itory with all the information and code to be able to do what it needs to do so it says that it can teach itself based on these different things and sure there may be some degree of that to it but the fact that they had to specifically prompt telling it where the script was telling it where the blog article was and having that blog article pretty much already have all the code inside of it makes me a little bit leery saying that it can really learn for itself in all situations it seems rather Limited in its capabilities in this regard at least based on this particular video example now the next one that I think is kind of scary for a lot of people is that this AI is able to find and fix bugs in your specific code and if you go through and that you watch this video you'll realize that it's really actually not finding and fixing these bugs for you so if you watch this video essentially what happens is this guy wrote some particular code to do something inside of his repository and he wrote that code but he didn't want to write any test cases for that code so there's no test at all for this code and he comes to Devon and he says hey Devon I would like you to write a test for this code it's specifically asking in the prompt I would like you to write test for this particular code and it's going to write out that test case and what happens is that he goes back and forth a couple times with Devon asking it to write more and more test based on more specific things and finally Devon writes a test that actually fails and Devon isn't necessarily finding this bug per se he's telling it to write test then Devon is going ahead and it's writing out these test and in the process of writing out the test that this developer specifically told Devon to write out it is then finding that these tests do not pass now the cool thing about Devon in this regard that I will give a credit for is that when when it finds this bug in the code essentially it says hey this test does not pass it actually goes through and finds where that bug is in the code to make the test pass and is able to solve the bug which is essentially one line of code that needs to be added to the actual thing as you can see right here on line 36 he adds this one single line of code and that essentially fixes the bug so Devon is able to go through it's writing these tests and it's finding the bug it's really cool but as you can see it's not just looking at a code repository and saying hey I found the bug for you instead it's kind of a very step-by-step process of hey write these tests for me this test failed so obviously there must be a bug it's a very cherry-picked example and they're really kind of blowing it out of proportion a little bit with the language they're using it's not necessarily finding bugs in your code it's just writing these tests and through that process happens to stumble upon the bug now the last one I want to talk about is honestly the one that is probably the most impressive and that is that Devan is able to accomplish work on upwork so if we look at this particular upwork task obviously they very much cherry-picked it they chose the one thing that is obviously going to work for them there's probably hundreds of upwork examples that do not work for them but this one is very simple for them because essentially all that this person is asking is hey all I want you to do is to take this model that already exists and I want you to be able to implement it and use it for me an AI model specifically so in this video essentially the person goes through and they tell Devon hey here's this thing this model that I want you to implement and start using and it goes through and it implements that model and it starts using the information from it and it ends up generating some results now one important thing to note about pretty much everything that Devon is doing is that it's not particularly fast this example for this upwork thing I think took about two maybe 3 hours to actually accomplish and a lot of the these other things are taking an hour two hours to actually run through and generate this code so it's not like chat jpt or AI Code Pilot or something like that where it's really quickly giving your responses this is a relatively slow process and it might be very iterative where you're working directly with it trying to help prompt it along which is another reason why I think that you shouldn't really be super worried while it can do these really cool things where it generates some code based on different GitHub repositories or articles which is really cool to see it's something that still requires technical knowledge in order to use if I were to give my wife this tool and tell her hey you can use this to solve upwork problems or something like that she would maybe be able to solve some really simple things but as soon as that Devon ran into a snag or didn't really know what to do she would obviously be completely underwater not know where to go because she doesn't have that technical background so you still need those problem solving and technical skills in order to actually use a tool like this and I keep using the word tool because really this is a tool this is something that software engineers and developers are going to be able to use to speed up their coding workflow maybe make certain things easier for them maybe make some tedious tasks not be something that you need to manually do just like things like AI autocomplete like chat GPT and co-pilot have made doing certain things in coding a lot easier they haven't replaced your job they just modified how you work and made certain things easier I think Devon is just another example of a tool that's going to make actually working in programming a little bit easier it's going to clean up certain things for you make certain learnings a little bit easier but the actual knowledge of being a developer where you actually need to think about how to solve real world problems and you to develop custom solutions to complex problems and just be a problem solver that is something that AI is really not capable of replacing currently and something I don't think it'll be able to replace in the future these tools are really cool and they have a lot of potential but really their potential is to empower you as a developer and not to replace you now don't get me wrong I think these tools are really impressive and really cool but if you're worried about Devon replacing your job you really don't have to worry about it because you as a developer knowing how to think like a programmer are the core skills you have and being able to write out like code for certain things is not your core skill it's your ability to problem solve and so on that these AI tools really struggle with and are probably never going to be able to replicate now with that said I really hope you enjoyed this video and have a good day
Info
Channel: Web Dev Simplified
Views: 242,026
Rating: undefined out of 5
Keywords: webdevsimplified, devin, devin ai, devin software ai, devin ai software, devin software, devin engineer, devin artificial intelligence
Id: eZJx65ATvs0
Channel Id: undefined
Length: 12min 35sec (755 seconds)
Published: Tue Mar 19 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.