AutoGPT: A Real First Test

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Auto GPT depending on who you ask it is either going to be the downfall of civilization or our savior you figure if this is actually the dawning of the AGI era it will be able to do something useful like help me shop for a bike that's right and I'm gonna give it first test run to Auto GPT clean install I haven't touched it yet and here's what I wanted to do now I thought to myself what would be a great use case for something that can access the internet that has the ability to problem solve gather more information it actually fits the bill I'm not trying to turn a hundred dollars into a thousand I'm not trying to destroy the world what's a use case that somewhat matters to me that is actually doable by one of these early recursing llm monstrosities so here's my game plan I'm gonna install fresh Auto GPT and ask it to help me find the best electric bicycle for me I'm gonna say I live in Wisconsin and then I have a five mile commute to work I have a budget eight hundred dollars you know I'm open to getting a used bike and it has to be available for purchase now none these concept Kickstarter things I'll probably if there's a space for it put in factors to consider suggested items to look for such as total price specification comparison on range and speed in a feature comparison like how many speeds does it have is there a manual throttle and then things to judge how legitimate the seller is and the reviews of the byte count good is it the output what what I always try and create when I'm buying something is a matrix or a table of features side by side so I'm going to ask Auto GPT to give me a table of the top bike options with comparison columns and a link to where I can actually purchase I think this will be an excellent demonstration and capabilities if it can pull this off and be a power comparison Shopper you need to buy a dishwasher you want to get the best webcam this is something I do all the time I'm a power Shopper and this would be huge to gather the information so that I can make the decision so here I am on the GitHub page for auto GPT it is saying to download the stable release but let's see if we can get some step-by-step instructions looks like we may be using pine cone for a vector database on it and here's how to configure the API Keys ah excellent installation step one make sure you have all the requirements listed above vs code plus Dev container vs code check Dev containers check doctor check python check open AI API key check we'll see if we need Pinecone back to the installation clone the Repository and restart and accept and run this command Docker check run this command change the directory install dependencies find the EMV temp template make a copy of the template open the EnV file find where to put in the open API key okay let's run this bad boy and an error run Powershell as an administrator and run and run the PIP command again and that looks better so let's run the magic command enter the name of your AI and its role below this is shopping GPT and your role is a bot that autonomously does research to find the best electric bicycle for sale to meet my needs goal one find the best electric bicycles for sale that are a good fit for a five mile commute to work in all weather conditions that cost less than eight hundred dollars when calculating cost consider any tax or shipping to Wisconsin compare specifications such as speed range number of gears and manual throttle evaluate reviews and ratings of each bike evaluate the reputation of the seller based on buyer reviews and other sources and it's off I thought I was going to be able to put in goal number six which would have requested the format for the output table but this thing is Off to the Races I'm kind of curious what amount of feedback it's going to give as it goes along oh there's the thoughts I will start by doing a Google search I need to ensure that I am considering all relevant factors enter wide authorize the command ah I probably should put it in manual mode but this will be cool to see how it goes along I can do y-5 to run the next five commands let's see what it does authorized [Music] it's happening oh Tom's guide's in there bicycleguider.com this is where I'd be doing a lot of searching because you can get a lot of nice electric bikes but you don't want to overpay and many of them are expensive now electric bike would be cool but I don't want to spend you know more than a thousand dollars on one oh what is this it's actually firing up browser windows I'm going to close that out for it maybe give a hand let's see let's pop this over to the side it looks like it's going to be using whoa Auto GPT analyzing page okay it's being controlled by an automated test software okay that's pretty cool I did not know what to expect when it did this and I I thought it would be a headless kind of access of chrome but I just closed out the window One summary to the memory so it's gathering information it's got five chunks on it I'm really curious what this is going to come out like I'm also curious what 6.8 megabytes of data it just downloaded thoughts I will evaluate the specifications of each electric bike listed in the article to just grab one I wonder they just grab an article for best bikes for commute I think I will evaluate the reviews and ratings of each bike ensure that I am considering all relevant factors when evaluating it I'll write the name of the electric bicycles that meets specified criteria in a file color are me impressed on this this is this is kind of cool I'm curious what the end result is on it but I'm wondering how much prompt engineering it to go into this now that I know how let me send this on its way yes you're clear for at least five more I did see that this is hitting the open AI API back end but I believe it should be using three or 3.5 which is pretty cheap in order to run I didn't know quite what to expect on it now that I know that I only have the five commands for it speech mode that was cool I did see where it could take 11 labs and and actually have a speech mode I might play with that later ah it looks like it's pulling more information this is really interesting it's cooking for a while if you will so I'm kind of curious maybe in future experimentations I'm gonna say just yes for you know 30 commands so walk away from it of course the first few times is pretty exciting then again I'm the guy that when I got my first front loading washing machine I sat there on the floor and watched the whole cycle go through that's why I give her being an engineer what else is it doing sure I like the thought reasoning plan and criticism criticism need to ensure that I'm considering all relevant factors so it's browsing the website don't see did I lose the Chrome window that it popped open or is it doing it somewhere else it looks like I didn't need to necessarily use Docker since I was running it straight up locally it didn't look like I had to set up pine cone or any Vector database it's doing some sort of storage locally and here it goes again is that the same site I'm going to close that out for it just because maybe I should do that to really give it a run for the money but I don't think I'm interfering with it it is scraping the sights exactly the sort of resources that I'd be looking at look at those guys I'm curious oh I wonder if it's pulling these con oh it is the pulling the contents without clearing that notification that is interesting perhaps I'll rerun this knowing that I only have the five directives if it doesn't come out with a really cool result I'm really glad that I sat down to experiment with this it was not that challenging to to execute for your benefit I was going to just schlog through whatever requirements there were but only a handful of commands on it one restart actually that was for Docker I don't think I need Docker so this is really interesting I'm gonna keep this running for a bit keep giving it approvals and let's see what it comes out with in the end tip of the hat to whoever created this and I'll look back up who that is significant gravitas Auto GPT Auto GPT plug-ins Auto GPT benchmarks interesting ah a performance Benchmark to compares the performance of Auto gpts auto GPT plug-in template so it looks like they're recreating the GPT or the chat GPT plug-in concept with it so I look forward to potentially playing with that this may be much more flexibility and access I did actually get access to the plugins feature for it so I'm going to let this cook and we'll see at the end what all it comes out with [Music] now I will jump in here to say I think that it has gotten the information it's looking to start evaluating it and seems to be there may be some errors the as a large language model I I can't evaluate reviews and ratings so let's see if it can pull the information that it's accumulated through its research on the websites and get it in a format that the llm will be able to process it uh it is interesting how much time and processing it's taking my Takeaway on this is it doesn't seem to be slowed down in browsing from having these annoying requests for notifications or a pop-up for trying to harvest your email I wonder and it's Google results if it's skipping Blue Links and getting to the real meat of products this is interesting it looks like it's evaluating code potentially do some calculations on the metrics here's an interesting thing so not only has identified tasks and spun off sub agents but is actually naming them the electric bike evaluator we're getting close to the end because I'm seeing the actual final recommendations and I think it's just Gathering some additional info for presentation the more I see things going through I'm wondering if I'm going to end up actually buying one of these options okay here's the moment of truth it is getting late so what I'm gonna do is let this thing rip I've confirmed with the API it's not hitting it too hard I'll let you know what it is at the end but I'm going to let this run up to 200 additional commands and I am going to go to bed quick update we are ours here six hours later and is doing repetitive command and the object is invalid on a lot of these so I don't know what's going on I'm going to let it cook a bit longer and come back to it okay so what happened here I left this running for six hours and and it was going along fine maybe looking like it was doing some repetitive work but it was still gaining information and then it got into this death Loop it was trying to do a Google Search and it has a Json object is invalid error but it just kept doing the exact same query that it was trying to do and this went on for a long time how long did it go on well when I walked into it seven hours six hours later it really had been Trucking along so it got into kind of a loop so is that it I have no output I had to control C break out of the program so did I uh was it all a waste not necessarily here we have Auto GPT workspace folder which is where the text files are kept for kind of the working memory for auto GPT and if we look under here we have a the swagtron reviews on that one on cheer that's a specific bike that we'll see here here's the evaluation review URLs electric bikes text so if we look in here multiple models are being held so results are coming back and they're being crunched together so it can fit within one prompt for future processing models being queued up that we see here so this is the working memory for it here's some reviews from websites that it pulled because I did request those here are review URLs that was going down but the real money is here the evaluation this is really the output that it was working on I think I saw it before I walked away I saw it getting close to this and coming up with this verbiage after evaluating the specifications of each electric bicycle the best fit for the user based on provided criteria would be drum roll the on cheer power hour plus electric mountain bike this guy doesn't look bad it's got disc brakes ah we'll get into the specifications or I don't need to dig into them it says right here the bike has top speed of 20 miles an hour 50 miles of range that's impressive more than enough to cover a five mile commute to work I would agree wow so yeah that sounds like a great answer from the specifications so we have different options different prices and reasons why they were promising and why they may have gotten edged out by the final answer so this the auto GPT was trying to build up to it just got a little off the rails so how do I sum up this experience well the the setup was super easy I was surprised I was expecting things to be a lot more painful I think some of the requirements like Docker were not necessary great thing about Auto GPT is that a majority of it runs locally other than the accessing of the llm for example the web searches and the like are all done on your desktop with a browser essentially automating what you would be doing yes you have to utilize the powerful llms GPT 3.5 that's in the cloud through the API but that's kind of a given right now but if you had something else you could plug it right in I know it has the flexibility to do so so I'm really impressed with how local this inaccessible this is and free other than any API usage that we have it was also pretty cool that there is no database necessary I was thinking that I might say a pine cone as a vector database something to serve as a memory but this out of the box no setup on my part was utilizing local text files for storage that was super impressive so overall the setup was as painless as you could expect outside of for example the single click GUI in install with GPT for all J from my previous video that was impressively packaged up to make it available for anyone all in all the installation was quite painless and step by step so full stars for that one what were my first impressions on this well system was running a lot more steps and running a lot slower than I expected but that's what struck me I thought this was going to be flying through Loops maybe churn through 25 times I thought it was a pretty contained prompt I was impressed with the browser usage that was run by devtools locally so you saw how it would fire up the web page and just navigate it and grab the content and chunk it together I was blown away and surprised by how at one point it ran the code to calculate the comparison criteria it actually generated script and executed it to get a comparison Matrix I believe it was doing and then then I was also struck by how I was thinking things through I was really interested in this system thoughts reasoning plan criticism prompt structure that it utilizes especially the criticism for example hey I need to make sure that I'm evaluating the reputation of the seller based on reliable sources that is solid check yourself kind of intelligence that's baked into this and I assume that's part of the key to success so really critical question what was the API usage and this was quite striking bottom line it hit an API a lot and I do not think those failed Loops at the end counted it hit the API in this one run 2500 times fortunately I did have it I did have a purchase cap on that thank you very much it cost a total of 6.67 I'm glad that wasn't a penny lower that would have been ominous chaos GPT the reason that it was held to a reasonable cost despite getting absolutely hammered on the API was that it utilizes GPT 3.5 turbo as well as text embedding ada02 so using those lighter weight models kept the cost down so what were the results did it meet the criteria did Auto GPT understand the assignment well I got to give it a yes it found a bike that met my criteria or did it let's double check this and give it a sanity check something that I don't think a lot of folks do when they generate these impressive early tool sets is follow up and see what the quality of the output is it's showing that this runs a 250 watt Hub motor but what about the speed and the range that we were promised the estimated range is somewhere between 8 and 18 miles Which is far different than the 50 miles that auto GPT had found in its travels and how how about that 20 mile an hour promised top speed well it looks like it is actually more like 15 miles an hour so did it meet the criteria I'll call it a yes yet again it did find a bike that met my criteria on price range and I didn't really specify the speed but the feature set is great I think this is an excellent cam bit was it worth it to run auto GPT on this mission to try and be a power Shopper for me and do real comparison work is it worth it well the time to set it up wasn't all that bad actually now what about hard cashy monies those API costs well it was less than seven dollars to run it so it definitely wasn't a total deal breaker but was it worth it I was really surprised by the amount that it hit the API for this simple kind of query to be sensible about it what is your time worth if you could send Auto GPT off for however long it needs 12 hours to run in the background and get you useful actionable information that's well considered and it costs you a few bucks it may well be worth it to help answer this question of if it's worth it let's do a head-to-head challenge even if my time was worth fifty dollars an hour if I could get the kind of information I did in say seven minutes it would be more worthwhile expenditure my time to just do the research instead of Outsourcing it to Auto GPT so let's put seven minutes on the clock and I'll see what kind of results I can find for an eight hundred dollar electric bike that meets my criteria okay so let's start the timer [Music] and time all right that I mean that really drove home for me I came up with in seven minutes a great number of options for it look how cool the hover one is I would still have to look at reviews but there is just a massive selection that I came up quickly and what's when the unfortunately one of the easiest ways to do it head on over to Amazon to get it even if you don't buy from that and find out what is good the bottom line is I found a lot of great options I didn't filter it down to the one best one but I am excited by a lot more of these options potentially than the one that auto GPT was able to find me for six dollars and six plus hours of searching so maybe for the next experiment I will take all of these very quickly gathered short list items and put them into Auto GPT and say hey of these options go search compare give me a table and find out what the best answer is based on reviews and how it fits it so now that I've had my first experience how can I do better as always look first to your prompt I felt like the system was searching in a rather narrow and repetitive way so maybe I can phrase it to look through a broader selection of sources potentially I could prioritize more what I am interested in maybe if I said hey get me an e-bike for commuting five miles it would have done more to find different options where my criteria too restrictive that I put too many requirements on it that led to some of the churn and computation time I didn't set up any Vector database on it maybe that would help with the system there's definitely room in the system to reduce repetition and handle that recurring Cycles better but this is a young system it will get there and definitely I've seen it mentioned online and I really felt the need for it the results were almost there we need a wrap it up buddy button so when it's churning away you need some sort of cut in command to be like hey stop where you're at and give me your results so what are my takeaways from this experience one of the killer features of Auto GPT is that yes it's open you can modify it you can contribute but beyond that you are in the loop your critical part of priming its directive you can tweak the behavior of it you can set it up with different modules you have so much input and guidance in its capabilities one thing that really surprised me about it as it was running is that there were errors that came up from all sorts of different things making webbed calls whatever it was doing it you would see that it would try its pattern of thought and then run into an error but the impressive part it was that didn't stop it on the negative side as I discussed before the system currently can really get stuck in a loop and there's definite growth that can happen on being able to detect and break out of those it's a very young project that just came on the scene and already there's huge investment by people contributing to it it's capable now and it will only get better so the bottom line I recommend trying your own task see how Auto GPT works for what you are interested in trying it for something that could automate and bring actual value to you see if it's up for it there's a fairly low overhead to getting into it it runs primarily locally for a lot of the important pieces you have heavy control over it it's easy to experiment with and that's an important part when it comes to this age of AI it's about getting hands on with the tools to make sure that you're pertinent moving forward besides here's a little footnote I actually found an on-cheer e-bike mountain bike for sale used locally and I bought it so when you say does Auto GPT work I have to say it does
Info
Channel: Context Found
Views: 17,944
Rating: undefined out of 5
Keywords: LLM
Id: PyO7IJ1yGjk
Channel Id: undefined
Length: 26min 22sec (1582 seconds)
Published: Thu Apr 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.