GPT4o + CrewAI: Twice as fast? Half the Cost?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys so chat chbt version 4.0 just dropped this week so I wanted to put it to the test with crew aai to see how it performs in the real world so to test this new model we're going to be running three different Crews and as we run these Crews we're going to be substituting in and out the old and new version of Chachi BT so we can see how they stack up against each other and honestly I was super impressed with the new model but it didn't perform exactly how I thought it would so let's go ahead and dive into this experiment so you can see exactly what I'm talking about so like I mentioned earlier I've created three different Crews that we're going to run against each other using the new and old version of chat BT just so we can see how they stack up against each other and what I've done for each of these Crews is I've actually updated them to use the latest version of crew AI so it's going to be the fastest and best version and what I've also done is I've added in a tool Called Agent Ops so we can track just to see how much and how fast each of these Crews runs and to make things interesting what I've also done is I've ordered these experiments in order from the simplest where we're just you know just doing normal LM calls and then eventually we add in more complexities so like eventually we're going to use tools to access the internet and then we're also eventually going to start working with embeddings just so we can see how these different models perform with different Crews that get more and more complex and it's also important to mention if you actually want to try out any of these Crews yourself to run the experiment or to see the results or just actually use the source code for these new updated versions of these different Crews go ahead and check out the link down the description below cuz I've added the source code there for you guys completely free to download so go ahead and check that out right after this video but enough of that let's go go ahead and actually start diving into the experiment so that we can see how chat gbt 40 Stacks up against the old version so let's go ahead and dive in oh real quick I just want to point out that I have a free school Community for AI developers just like you we have over a thousand members now where people are talking about the projects they're working on they're helping each other out and we have free weekly coaching calls so I and other members in the group can get you unstuck with whatever problems you're running into so if you want to meet a ton of other AI developers of all skill levels who are just like you go ahead and click that link down description below so you can join the free community all right let's get back to the video all right so let's go ahead and dive into the first experiment where we're going to be running the crew game builder now if you haven't seen this crew before basically what it does is it takes in an input of a game you want to build and then it uses a bunch of different agents to go off and code that game usually in Python so let me just show you kind of what it looks like in action so as you can see um what I've done real fast is I've pulled out the game builder code over here and this is the updated version by the way so this means I'm using the agent Ops and I'm using the latest version of chbt the way I'm using agent Ops is you can see I'm importing it up here and just initializing it and this is what's going to allow us to track how expensive and how fast this ran and then additionally to make sure that we're using GPT 40 I updated our agents file and what you can see right here is we're using the model 4.0 and anytime I change it back and forth we're updating all of our different agents in our crew to make sure we're using the latest version so that's how we're hopping back and forth for this experiment and final thing just wanted to point out we are using the latest version like I said version 30.11 and this is how we're adding in agent Ops okay enough code let's actually go ahead and look at what this crew is able to do so what you'll notice is I have an outputs folder and this outputs folder is where we're able to basically put all the different results so you can play with them yourselves this is the two different ones for the game planner so these are the different outputs for the snake game so you can see they're like basically identical around 160 80 lines so very very similar and I want to go ahead and actually show you me running them so I just changed my directory to that folder now I'm just going to run Python and the name of the file so when I run it what it'll do is it'll take me over to the game that it just built which is crazy and then now I can actually start playing and controlling it so you can see well yeah there you go see I I don't suck too bad yep once we die we have a high score and it'll start over so that's the game it built for us but now let's actually dive into the comparison part of it and how we're going to compare well once again we're going to use agent Ops it's a completely free tool to use definitely recommend trying it out this walkth through will show you exactly how to set it up in your own projects just click get started for free once you have actually installed agent Ops in your cruise like I showed earlier here's what you get to see this is a deep dive so you can see session drill down over here on the left hand side I have the cat GPT 4.0 version pulled up and you know that it's 40 cuz if you scroll down to the bottom you can see my model here and over here on the right you can see that I'm using four turbo so let's actually start doing some comparison so we can actually see how each crew stacked up against each other so all in all you can see they had a similar number of runs each one this one had eight runs and this one had uh seven different events and now let's actually look at the important part so is chib bt40 faster than the other one right now at least for this example it wasn't and I've ran this a bunch of times and it just so happens this time it was Zero sometimes I see a 10 to 20% speed Improvement but for whatever reason this time it didn't the important part though is let's actually look at to see how much these llms cost so on 40 it definitely is cheaper than this other one it's about 20% cheaper to run 4.0 compared to Turbo and basically the way most of the costs are generated for these Crews come down to the number of tokens that we processed so in this case we basically processed close to 14,000 tokens over here and we calculated close to almost what 20 a little over 20,000 tokens over here and just in case you haven't like heard of prompt tokens and completion tokens prompt tokens are inputs and completion tokens are outputs so you can see we were processing a lot more information inside of GPT 40 and even though we processed an additional 5,000 tokens we were still much cheaper than turbo so all around round one does go to version 4.0 I would have liked to seen that time come down a little bit more but let's actually go ahead and see what's going to happen in the next versions and the next experiment just because this experiment was completely done all locally there was no you know reaching out to the internet or anything like this this was strictly lolms talking to other lolms so let's go ahead and dive into the next experiment where we're going to add in our cruise the ability to go off and search the internet all right so welcome to the second experiment and this one turns out way better and you'll see what I mean in just a second what we're running in this comparison is we are using the crew trip planner and this is by far my favorite crew AI example that I have seen built so far and the way it works is you are able to say where you're starting at where you want to go and it'll plan an entire trip for you so let's actually hop over to the code so you can see exactly what I mean this is an example of what I've hardcoded in the example that you can download but basically I'm saying I'm going to go from Atlanta to Georgia and I want to go over to Croatia during these dates and here's what I want to see on my vacation and then from there we'll start up a crew and it'll go off and run for us now let me just go ahead and show you the results cuz this is pretty cool so we're going to start off with turbo so this is what turbo is able to produce for us you know it's a nice 7-Day itinerary where it's going to just walk us through what we're going to do every day morning evening and afternoon very nice it walks us through exactly what we're going to do and at the very end it gives us a budget breakdown and then some packing suggestions for what we should do while we're there very cool very cool I like it but 40 goes even further what 40 I've noticed starting to do is it provides a lot more links so this has actually happened in a bunch of the different examples that I've tried it basically does a much better job of understanding important items that it needs to pass over to you as the reader so here's their example so you can see once again 7-day trip it's a day-by-day suggestion but it keeps giving us more information and this isn't just a onetime fluke I've run this one about like 10 times and every time it does a much better job of giving us more information 40 it seems to be more intelligent and it's able to just do more so this is just an example of like yep it gives us links with things we need to do and try why we're out and about uh traveling in Croatia so now that you've seen what it can do and you can see once again at the very end it gives us a breakdown packing list and things like that what's actually happening in a speed and cost comparison so let's hop back over to agent Ops and this is where my mind was blown so as you can see 40 had about 20 different events and turbo had about 23 different events now here's what's interesting we were able with 40 to run in 2 minutes and 45 seconds whereas turbo took almost 5 minutes which is crazy I think all in all it was about like 40% faster and then when you look at the price comparison down here this was 36 Cents and this was 50 Cent so we're 25% cheaper and we were running 40% faster so it's not that twice as fast and half as cheap what we were you know told but it's getting pretty darn close and the part that I also think is interesting is like we're not comparing exactly Apples to Apples because what's also interesting is like when it comes over to 40 we were processing a lot more information all in all we had about 45,000 tokens whereas over here we had 34 so even though we're doing more work with our tokens in put and output wise we were still much cheaper and faster so this this is the kind of results I was expecting to see with 40 super impressed with this version and the other part that you got to remember that's pretty interesting we're also going off and searching the internet pulling in tons of data from everything that we're reading and with that large amount of text we're pulling in and processing we're making uh we're using it a lot more effectively when driving towards our final answer so you know in this example we did llm plus Internet so let's go ahead and dive over to our third example where we're going to do llm plus we're going to do the internet and we're going to start working with embeddings too let's go ahead and dive over to our third experiment so let's go ahead and dive into the third example which is going to be the stock analysis crew now the way this crew works is we pass in a stock ticker that we want to analyze and then our crew will go off and do a bunch of fundamental analysis of the stock and give us some insights and recommendations on what it thinks we should do let's go ahead and actually look at what's happening and how it works so per usual we have a crew that passing in our agents our task you know the usual part but in our case like I said we're just going to look at the company Tesla so let's first look at what chat bt4 turbo was able to do for us all in all it gives us a nice executive summary of like what it recommends we should do for this stock and then it does a really good breakdown and the other part that's important to mention is that these crews are not only just you know using historical information they're going off and searching the internet and more importantly what they're also doing is going out and searching the let me pull it up so you can see it but they're basically looking through the s EC record so basically the quarterly filings from these companies were going off and searching through those and that's what you can see is exactly happening inside of these different basically reports so here we're looking through everything that's happening over the past few quarters and you can kind of see what this crew is recommending we should do and all in all at the very end it's saying like hey do a hold basically if you have a higher risk tolerance so that's what turbo totos to do now let's go ahead and look at what 40 told us to do so here's the report for this one and once again I think this is awesome because as you can see 40 once again is providing us access to URLs and links that it thinks we should be investigating so you can see that it's giving us different reports like hey do you want to see how this stock is trending they earning reports and here's some resources and sources so we didn't change anything inside of our crew it's just that 40 seems to basically be a little bit more anticipative of like Yep this is probably the information you want to need and also once again it's giving you some more its strategy and suggestion for what you should do for this crew so all in all awesome all around I love that it came to the same conclusion just came at from two different answers two different ways and also that 40 gave us a lot more links along the way so we can go off and do our own research but enough of that let's go ahead and hop back to the comparison so once again we're over here in agent Ops on the left we have 4 0 and on the right we have turbo once again you can see that 40 beats turbo and this time we barely beat it but we still beat it and the important part that we were able to beat with 40 is the cost we're at like what 30 to 40% cost reduction and in this one once again we are processing way more tokens so over here you know we're at basically 44,000 while over here we're closer to like 47,000 tokens so we're doing more with our information and we cost less and we're faster so all around I thought this was awesome and I hope you guys now are super excited to go off and try 40 on your own let me know too what you think if you think these crews are smarter when they're using 40 and let me know if you are going to make the switch to only using 40 from here on out and that's a wrap for this video guys I hope youall enjoyed seeing this comparison between the new version of chbt and the old one and I hope you're pumped to start using chat gbt 4.0 in your own cruise and like I said earlier if you want to actually go ahead and download the new cruise that I created for you guys that are on the latest version using agent Ops and version 4.0 go ahead and click that link down the description below so you can go ahead and grab the source code and also if you already haven't join that free school Community for AI developers just like yourself go ahead and click that link down the description below so you can meet a ton of like-minded people and hop on our weekly coaching calls where you can get support completely for free for your own projects but I hope you guys have a great day and definitely check out one of these videos right after this video and see you on the next one see you
Info
Channel: codewithbrandon
Views: 11,153
Rating: undefined out of 5
Keywords: artificial intelligence, chatgpt 4o, gpt 4o, ai tools, crew ai, ai agent, ai agents, chatgpt 4, artificial intelligence news, best autonomous ai agents, crew ai tutorial, crewai langchain, crew agents, crew assistant, autonomous agents, stock agent, autonomous ai agents, auto gen, autogen tutorial, autogen create ai agents, autogen step by step guide, chatgpt prompts, ai news, gpt 4o openai, chatgpt 4o openai
Id: Z_KB91zbG3c
Channel Id: undefined
Length: 13min 29sec (809 seconds)
Published: Mon May 20 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.