When Claude 3.5 Sonnet Became The Better Chatbot

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
draw me a banana bro that's a mangle what do you mean that is not a banana please just please just give me a banana man what claw 3.5 Sonet is actually amazing well not for the reason of drawing mango as bananas but definitely for what people have been using online and have you seen how good those demos are black hole and Wormhole particle animation checked 3D survival ship bias simulation checked AAR search map visualization checked Roomscape checked some crazy obscure animations checked oh wait the code was stolen for this one but for a second I thought I was watching the demos at Chad gbt Dev day like they are just too good to be true yet these people are able to use this tool to make some mindblowing things right now just don't know how much of the codes are actually uniquely synthesized by Claude after that animation incident and of course this is what I get when I try to use it myself maybe there's a massive skill issue we're having here just like today's sponsor that lets you manage multiple types of AIS easily this Innovative platform called AIML API offers you access to over 100 AI models and the amount of models they offer might as well be a skill issue compared to other API providers it is a platform for developers and startups looking for quality and costeffective AI Solutions without having API costs burning a whole in your budget whether you need to integrate chat completion image recognition use embeddings or even generate music and videos this platform has you covered with the additions of all versions of gpts CLA mixol llama 3 stable diffusion and many more models ready to go right out of the box it is an ideal solution for businesses startups and Innovation Labs aiming to seamlessly integrate AI Technologies into their applications via API they also offer 247 customer support in Discord and organize community events such as AI hackathons with real prizes from basic prototypes to Advent systems AI ml API has self the cost and scalability of AI infrastructure for you you can get started now with a free plan using the link down in the description to take your projects to the next level with the best AI models and thank you AIML API for sponsoring this video anyways okay like I also tried to have Claude 3.5 draw me a unicorn but it gave me some dark soul type of Boss instead but I mean I guess at least it's not memorizing that one asky unicorn right and out of old these crazy applications what surprised me the most is that they let everyone use it completely for free so instead of pay Walling functions or models anthropic allows everyone to access their latest model which is Sonet 3.5 it is an upgrade of a model announced back in CLA 3 and just look at how big of a performance jump it had in its 3.5 version it also beat their largest model CLA 3 Opus so maybe we can potentially expect claw 3.5 Opus to be even crazier in terms of capabilities and maybe 3.5 Sonet is just a taste of what is yet to come while it is free for everyone the cat now is that you only get to send a very limited amount of messages but this does give people a chance to test your latest model and in the current state of AI this is definitely the superior marketing technique to convert free customers to paid ones like I was completely sold when I tested out their newest function called artifacts it is basically a window that'll display on the right of the generations and run whatever codes claw generates well more specifically only a few languages but the window is also interactive which means that you can enter keystrokes and mouse clicks chat gbt doesn't have this by the way okay for example I asked it to basically generate a game like Subway Surfer and without actually explicitly mentioning the phrase Subway Surfer it pretty much make the most basic mechanics on the first try just with a few bugs other than uh the obstacle appearing in the complete opposite direction the main bug was that when I try to jump over the obstacle it'll be viewed as a collision hence the game would just reset with looking through the code I tried two times describing the potential problem that the code is having where if the playable character jumps it'll still hit the obstacle initially this did not work but it did make the obstacles appear from top to bottom then when I asked it to outline what could have caused this and how can this be solved it still didn't work but it didn't now describe the issue fully then after telling it to try again Claude actually got it it implemented a state called is jumping and fixed the collision and by the way these were all done with in the span of 5 minutes which is pretty fast also the reason I chose this custom game was to avoid any existing codes online that Sonic could have just memorized like Tetris or the snake game and it is not a coincidence that 3.5 Sonic is so good at coding since claw 3 came out a few months ago with the crazy long context length they provide and how well it performs on the needle in the Hast stack test it has been my top choice and many others too for assisting in writing codes you can shove in your hold code base and just get reasonable suggestions too I tried giving claw 3 Opus some more coding problems that I was working on after my video on claw 3 ranging from python to assembly it could actually understand what codes I was writing and those codes it generates actually work so they must have trained Sonet similarly and the combination of artifacts with its coding kabus he's really just shined through with Sonet 3.5 chat GPT on the other hand did fumble pretty hard and when you message over a certain amount of conversation it just starts forgetting things I didn't use the API because I was lazy but I mean I didn't use claud's API either but that's what they provide on their website so naturally I just feel like it's a better comparison and what's even more frustrating is that coding executions on chpt isn't really a quality of life function if the Cod it generates never works in the first place and even after a few times when it fails it just stops trying so whatever anthropic has been cooking with Claude is definitely on point and with Claude 3.5 sonid now being able to execute code itself I too lost another reason for using chat gbt if claud's artifact supports latch rendering I think the entire research and Dev Community will actually jump ship over too and if we look at the Benchmark numbers this medium-sized model has beaten their own previous Flagship model which is claw 3 Opus across the board so basically smaller and better and faster and cheaper with GPT 40 slightly winning in the math solving Department Sonet has completely dominated the whole llm landscape also in terms of performance cost and cost Effectiveness remember it's 200k context length on LMC Arena while it does come in second behind gbt 40 in the overall leaderboard it still top the ladder on the coding category and outperformed GPT 40 but if you don't trust chat Arena anymore though on a new private leaderboard called seal published by scale AI it is now number one in pure instruction following but interestingly fell short in preference ranking one of the speculated reason is that it is bad at formatting like providing visual presentation and readability of its responses so maybe claw 3.5 sonid retained this strong instruction following but weak one-hot characteristics from claw 3 and probably has to do with how they have done their instruction tuning the sudden jump in model capability from anthropic has also been speculated in the potential application of their latest mechanistic interpretability research I made a video about it you can check it out people speculated this probably because anthropic did the interpretability experiment on a Sonet model then now they released a Sonet model that improved so much especially on coding du link to one of the experiment results they have shown in that research I don't know if that's true so take a the grain of salt anyways with these new model launches the Praises are usually High across the board but the downsides are still worth mentioning too and of course some reality checks are needed Visual and Abstract concepts are so difficult for the model as you can see here it is unable to draw basic images using codes more spefic specifically I'm talking about how it generates images using SVG code maybe the connection between the domains of code drawing and the resulting image are still lacking but of course it's still not comparable with actual image Generations L context reasoning is still far from perfect and according to this paper no other models including Sonet 3.5 are able to achieve this still so yes generalization in reasoning is hard we don't completely know if the model is fully multimodel either and I feel like the visual understanding or the ability to create visual assets is not as on point either even though it can give you some reasonable responses when you input an image fun fact chat gpd's image generation function is using Dolly 3 which is an API this doesn't mean claw 3.5 is bad at SVG but actually I think it's very fluent at it it's just bad at drawing complex shapes like bananas for example it probably was never trained on an SVG banana to begin with so maybe that's why it failed my initial test and I didn't guide it harder to make it do it right another major downside is the message limit is still very easy to hit even after you paid the 20 bucks subscription on top of that they have a very interesting policy where the API prevents individual use I don't know if I'm reading this right but this could mean if you hit the limit you can't go over to use the API for individual use but anyways let's talk about another function they released a few days ago after Sona 3.5 called the projects this function is only available through Claude Pro and team plan and you can now make chats into sharable projects basically this function lets you upload a set of documents to Sonet 3.5 and add people to your project to chat with those documents with a total of 200k context window you can also set custom instructions within each project to specify how it responds and you can basically choose to share a specific snapshot of a conversation from Claude into a shared project feed which is pretty neat maybe I'll forcefully integrate this into my workflow and share with you guys later on and see how well it works for me practically and before we end this video let me just show you another interesting use case this guy came up with given a CSV or a table he was able to get claw to help and make a dashboard for his startups Finance like adding sensitivity analysis of key assumptions running it as a Monte Carlo simulation and assuming a normal distribution all worked first try so I decided to make the same thing too with my YouTube's Finance first try it failed pre-b as it didn't realize that there were three currencies until I mentioned it so I was nearly a millionaire but unfortunately I am not as I do not have a Bugatti on my balance sheet yet but after prompting it three times along with the latest conversion rates the graphics it gave me looks pretty good it has pie charts to indicate my earnings and it also made me a spending distribution even though I labeled my spendings terribly it also made me a monthly analysis on my income versus expenses with me just asking it to show my remaining balance the numbers are completely wrong though if only I do have 50k but reality is often disappointing it might have added an extra zero times two but I'll give it a benefit of Doubt because my CSV is really badly formatted and it has data over 4 years with like around 500 entries so maybe this is a hard data for Sonic to organize to begin with as language models are usually pretty bad at counting get it but anyways if you want to keep up with the latest AI research definitely check out my newsletter where I publish research breakdowns on many cool papers that I don't have time to make videos for a big shout out to Andreas ch Chris Leo Alex Shay Deacon Alex marce mulim FAL Robert zasa and many others that supporting me through patreon or YouTube but follow my Twitter if you haven't and I'll see youall in the next one
Info
Channel: bycloud
Views: 27,339
Rating: undefined out of 5
Keywords: bycloud, bycloudai, claude 3.5, claude 3.5 sonnet, claude 3.5 sonnet vs gpt4o, sonnet vs gpt4o, 3.5 sonnet, sonnet 3.5, sonnet artifacts, claude artifacts, claude 3.5 projects, claude projects, claude.ai, claude 3.5 sonnet review, what is claude ai, what is sonnet 3.5, claude 3.5 sonnet coding, claude 3.5 review, claude review, is claude 3.5 sonnet good?, is claude 3.5 good?, is claude sonnet good?, is claude still good?, claude vs chatgpt, claude 3.5 sonnet vs chatgpt
Id: mGjIlBgwfj4
Channel Id: undefined
Length: 11min 45sec (705 seconds)
Published: Mon Jul 08 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.