Is Claude 3.5 Sonnet Really Better Than ChatGPT?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
supposedly anthropic has released a model that is better than any model we've seen before and it's called Claude Sonet now it seems like every other month we get told that there is a brand new model and it's way better than the previous one so in today's video we're actually going to put it to the test we're going to compare claw 3.5 Sonet to gbt 40 because supposedly based off their own data Chad gbt 40 only wins in two different categories therefore let's actually see if Sonet is better than Chad gbt welcome back y'all in today's video we're going to be comparing Chad gbt and Claude see who is actually better me there's been a lot of hype on this new model called on it so we're going to see if this is actually any good now I'm in a new place I just moved across the country so if my audio doesn't sound amazing if there seems like you know this is all early days for me still I just moved into my new place I'm going to make sure I keep fine-tuning this till it sounds good I noticed in my previous videos at my old place a lot of people were telling me turn up the volume so I turned up the volume let me know if it's good now in this video we're going to have a side to side comparison and actually see which one's better therefore we're going to test two major things we're going to test the time it takes for an output and we're also going to test the quality of the output let's go ahead and begin got my stopwatch we got both open let's ask our first question so we're going to go for prompt suggested by Claude and we'll start with generate interview questions click this is a pretty hefty prompt so I'm to going to copy this and paste it over to Chad gbt make sure we have four o selected I'm going to go ahead and time both here so I'm going to first hit the cloud one boom got that output in around 9.5 seconds hit the gbt boom loading and we got this output in around 24 seconds now time and speed isn't everything you know just because Claude can do it faster doesn't necessarily mean that the actual quality of the output is better let's actually compare the two than the structuring and the output itself to give context to prompt basically ask your task is to generate a series of thoughtful open-ended questions for an interview based on the given context avoid yes or no questions or those with obvious answers instead focus on questions that encourage reflection self assessment and the sharing of specific examples and antidotes first major difference that we are currently seeing is the way of structuring the output so Claude opted for just 12 questions while alternatively chbt opted for questions but with a general title to give you context of what that question is trying to achieve furthermore we got 16 questions here and 12 over here let's go and cherry pick some of these questions at random to give us the best idea so I'm going to go ahead and choose seven for both with Claude we got what's your approach to measuring and Reporting on marketing Roi can you give an example of how you used Roi data to inform future campaign decisions here we have how do you stay up to date with the latest marketing Trends and Technologies can you provide an example of how adopting and Trend or technology benefited campaign you worked on both are good but I will say this one has more utility and more substance in the context of asking a question to be fair this these are cherry picks so maybe we should probably stick on the same topic although I do like that one from Claude better in this context now I noticed that their first question is actually pretty similar here so let's go and try that can you walk me through a multi-channel marketing campaign you've developed and executed what are the key challenges you face and how did you overcome them jbt can you walk me through a successful multi- Channel marketing campaign you developed and executed what were the key components and how did you ensure that they all work together effectively I'm going be honest y'all I'm going lean towards Claude in this context as the question itself seems less loaded than this one I mean this one's already being biased in the sense it has to be successful what were the key components to ensure that you all work together effectively while this one's much more General and not really trying to point you in a direction in an interview process and more along the lines of like have you ran a campaign in the past and how did you overcome said issues if they did arise now obviously the point of this video is going prompt for prompt ban forban between CLA and Chad gbt there is pros and cons to each platform Chad gbt has a gbt store custom instructions you know all this different stuff that adds on top of the value for $20 if we're going purely based off prompt to prompt let's go and proceed therefore let's give it to little to no guidance asking to do a very specific task here and see which one follows it better now my next prompt here may seem a little random but its purpose is to prove to us which one follows directions best so what we're going to do is we're going to ask you to generate article on the benefits of going on a walk simple enough or parameters here we're going to say reach 500 Words we're going to gut check both use the word Parks three times and end with a poem about Park trails and dogs Corbin what is this what are you doing this is to show us which one follows directions more effectively let's try and just for fun we'll time it again let's go CLA that took around 18 seconds let's go a and try CH gbt boom now I'm going to be honest with y'all I have a feeling Claw is going to be better at falling directions I don't know we'll see though cuz I know chbt can get a little bit off the rails when it comes to kind of requesting very specific things in its outputs that took around 30 seconds time's not a big indicator I know I know time's not a big indicator I'm just saying some of y'all may like to code faster even if it you know the code is not perfect but you'd rather get the output faster so let's go actually test both of these first thing identified for each one to follow is the word count so I'm going to go to copy Cloud here we should see a number around 500 592 okay let's try Chad gbt 657 okay not bad at least we're not under the limit I did say reach 500 so it might have just tooken the extra Liberty to add more words not bad so far Claud technically was closer to 500 but this isn't too bad here I think the bigger one here to really see if it follows directions is if it used the word Parks three times and then the poem so I'm going to copy this command f one 2 3 4 five five times 1 2 3 4 five five times six times seven times it's taking too much Liberty here so I'm actually going to rerun this test real quick we're going to come back to these specific points here and I'm going to say use a Max of three words or only three words or use Park three times let's just really laser in here so it follows it to a T So as a side note in order to edit you just come up here change your BT edit click it all right so let's go and try this again let's see if it follows the rules I'm being very specific here I said reach 500 to 550 words no more no less use the word Park three times and only three times do not use it more than this save send and then based on whoever can follow his directions the best we can deduce that longterm or just using the platform in other context we know would follow directions better in that context now that might not always be a good thing if you're coding and you don't necessarily know the correct output you may want to have the language model have some Liberty to give you more context or go down a little little bit of a rabbit hole so we're going to copy from Claude here 686 did not follow directions at all Str DBT 690 did not follow directions at all so both didn't follow the directions in that context let's try Parks this a little bit more specific here a lot of times when it comes to word count and output it's really not that good yet as it always kind of Max tokens limited tokens let's try this Now command F command V okay we got Parks Parks okay two times that's less than five times let's try here 1 2 3 4 five didn't follow perfectly either this was a little bit closer though now both followed the direction to end with a poem about parks and dogs this one was called trails and trails very creative name Claude and then this one is quite literally called a p about Parks trails and dogs so let us walk both near and far beneath the trees beneath the stars for in the Parks where dogs do roam we find our hearts we find our home I think what we can deduce from this is both models don't follow directions perfectly yet either so it's not really like a choose this one or choose that one this is just a gut check to see have we gotten to the point of following directions to the te it doesn't seem so now obviously we can make this fall very specific directions to guarantee consistent outputs this is more the context of accessing it through API and turning down the temperature temperature is the level of creativeness but using it through a user interface like this we're limited which means that since it has a higher temperature in this context it's going to be more creative and lead to less following of directions let's try one other question here I want to see its capabilities when it comes to user interface and its ability to create stuff internally so what I mean by that is let's see if you can create an Excel with ins Sonet create me and Excel of the population density of major cities in the US hit enter so this is the answer from Sonet let's try CH gbt yeah here we go so there is limitations to the user interface when it comes to Sonic comparative to CH gbt and if you're not familiar with this you can actually create Excel sheets edit Excel sheets or csvs or spreadsheets within Chad gbt I did a whole video on this so you can check it out right there does seem limited for Sonet for this specific use case and I'm guaranteeing you probably a little bit of other use cases when it comes to the actual user interface itself and as I said earlier that's why this video is more focused on prompt for prompt rather than maybe these extra features that you would care about if you wanted to use a chatbot like this and I could download this if I wanted to but I'm not going to I to end this prompt to prompt video last little Showdown here and you already know if you're familiar with this Channel or why I why I personally use these which just to help me the code let's see what the code looks like so we're just going to go ahead and throw this it's not prompted at all this does have custom instructions this do have everything that you should have when your coding with AI language models if you want to learn how to do that check out that video let's go and F out which one has the better code and here we go I think the Joker said that in Batman the second one The Dark Knight pretty sure right was that uh was that the ferry situation I forget what part I'm pretty sure he says and here we go I may be wrong on that okay we're looking at around 18 seconds for that let's try it over here boom this might be a personal thing I might be biased to be clear oh hold up y'all I'm still I'm still timing okay I'm still timing I may be biased to say this I've been using chb codee for the past year and a half year. 75 I do kind of like the UI here though I do like it better when it comes to that dark theme me the CSS here I will admit as well that when using gbt 4 AA code I've noticed that it's not as lazy as it used to be it will really generate an entire file like its outputs are pretty large now so that took around 40 seconds okay let's check it out so we got both the front end here with the j6 of the landing page it is actually pretty lackluster the amount of code it put out for the from Sonet I'm actually not too impressed here I mean comparative to over here we have more Str structuring and more filler text that would be useful in the context of a landing page so I'm not too impressed here coming down to the CSS let's see what we got CSS seems a little bit better here but I will say that the output from chat gbt seems to be better for example there is no CT CSS class found here you wouldn't want to use a default CSS class of course you'd want a CSS class there it is also interesting though the direction that both took we're actually pretty similar but then again when referencing a Lenny page I bet there is just like a you know one one size fits all in this context but like one thing that was really solid here that Chad gbt did and we did not see within Claude here is a whole section dedicated to testimonials while this one kind of went down a rabbit hole that we didn't really ask it to do about bacon flavor or veggie crunch so now we can go and play around of this more and I probably will get more comfortable with this new Sonet model that is supposedly so good I've seen a ton of stuff on x this is the best I I can't believe now I can actually code what do y'all think in the comments are you using son it now me personally I'm probably going to STI with 40 I'll probably play a little bit with son it maybe if there's a question I can't really answer within Chad gbt let me know what you think we're back and rolling here I was moving for the last week I'll see you in the next video these are two videos that YouTube has chosen it is based off the algo it did some data research it saw your clicks it saw that you stayed on that one video 10 seconds longer than you should have now you got these videos so I'll see you in the next video
Info
Channel: Corbin Brown
Views: 5,011
Rating: undefined out of 5
Keywords: chatgpt, openai, ai automation, gpt tutorials, chatgpt education, ai for business, ai service, software business, entrepreneur, start a business, google ai, google gemini, zapier, grok, copilot ai, perplexity ai
Id: Zv8xgyLY51c
Channel Id: undefined
Length: 11min 27sec (687 seconds)
Published: Fri Jun 28 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.