Anthropic's SHOCKING New Model BREAKS the Software Industry! Claude 3.5 Sonnet Insane Coding Ability

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
while people are arguing whether or not large language models hit a wall or not anthropic drops this little gem Claude 3.5 Sonet and it feels like we passed a certain threshold that I think has some pretty big implications so I've asked it to create a Flappy Bird game so it writes out the code in the artifact window that looks pretty good you know I got to say out of all the models I tested this has got to be the best Flappy Bird game that I've ever seen let's create a snake game let's make the fruit images instead of here as Dungeons and Dragons monsters looks like it's generating albear and mine flare oh this is uh good gelatinous Cube I like where you're going with this CLA I'm not expecting amazing stuff here but the fact that I mean sure this is a goblin a dragon yes why not Beholder excellent again if it has to fit in a 20x 20 pixel image this is probably going to show up really good add another feature let's have text flash for a second at the top of the screen each time you eat an enemy let's give each monster a number of XP points so as we eat them M flare is nine Beholder is 17 let's get a little bit more complicated all right so Dragon Slade or whatever that was yeah dragon and let's see what happens when it crosses the snake ooh that's perfect I got to say absolutely terrific this is exactly what I wanted it worked phenomenally well I'm actually kind of Blown Away Claude you're blowing my might here describe how this uh blender bottle works the voltar X blender bottle works by using a rechargeable motorized base to spin a detachable blade inside the bottle can you guess what TV show this is from That's Tony Soprano from The Sopranos he's feeding the Ducks I asked it to create a doom like game I assumed it would just say no but it said sure we can do that but I was not ready for this it created as a playable game in a in a browser and just literally play it walking around you can see the map up there in the top left so it's saying press space are to swing the axe so I guess ah it's coming at me it's coming right for me it killed me what difficulty is this looks like the other student is asking genomics slides tonight Queen so the student pulls up Claude And it says good evening Sam there's so many things to unpack there I'm not I'm not going to even go into it so either I'm reading way too much into this or they're bicking in actual dist tracks in their product releases now what do you think this means word boards the letters are jumbled it actually means that there's a bear inside of something can you tell what the saying is is about the bear bear on board the jumbled letters WB o e o d RS can be rearranged to spell so the solution is bear in the blank what is the blank the blank is Woods the full phrase is bear in the woods which can be formed by rearranging the letter in w r s one of claud's developers said this Claude makes you feel like you have superpowers suddenly no problem is too ambitious the future of programming is here folks if you like having superpowers subscribe while people are arguing whether or not large language models hit a wall or not anthropic drops this little gem Claude 3.5 son it and it feels like we passed a certain threshold that I think has some pretty big implications because take a look at its coding ability right so it says 92% and everything's zero shot so no examples you ask it a question it gives you an answer that's a big leap over Claud 3 Opus it's also a modest improvement over GPT 40 the Omni model after using it for several hours of coding I got to say it feels much bigger that leap feels much bigger than what these numbers show and I'm not crazy I'm not the only one here's what people are saying about this model absolute huge Sonic 3.5 is a beast how to code a game in Claud with three sentences Sonic 3.5 is blowing my mind early tests on our benchmarks are returning extremely phenomenal results can't believe our eyes so we will rerun some tests the 3.5 model is very very impressive and the artifacts it generates are a simpler version of the code interpreter Claude artifacts is crazy you got to try this in case it wasn't clear it's Now official LMS have not hit a wall I was trying to sleep in this morning but this tweet hit me like a 10 shot espresso cloud is starting to get really good at coding and autonomously fixing pull requests it's becoming clear that in a year's time a large percentage of code will be written by llms now you can't believe everything you hear online right we've heard some wild claims before but after testing this model I got to say it feels fundamentally different for coding specifically I'm talking about its ability to create something and then keep adding features to it without introducing new errors new bugs without forgetting what we were doing without losing all features old features that it already Incorporated its ability to do that is Bar None the best available right now out of all the models I've tested I've tested usually test all the best ones I don't think there's anything close to this and I don't think that gets captured in these benchmarks these benchmarks are impressive but they don't express what happened here you kind of have to see it in this video we're going to take it through its bases on coding on creating a game from scratch and adding features to it as we go along troubleshooting and fixing bugs along the way with zero code being written by me zero it creates all the code all the graphics it fixes all the bugs I kick back and let the robots do the work two is we pull a somewhat complicated project off of GitHub so somebody else written up a project coded up a project for creating your very own voice assistant it takes the webcam stream and is able to answer questions about what's happening and this is where it really goes off the rails for me because that code is fairly long there's a lot of different things that are happening in that code it's not like a short code Block it's like a whole project and I tell Claud 3.5 to start adding functionality to start changing things up it does it incredibly well there's a few things that it's not able to do but it seems like those are limitations of the actual models or or other things so it's not that CLA is getting these things wrong these are just other limitations that exist before when people asked if all coding will be done by LM soon I kind of said no not really that's probably not going to happen this is making me reconsider some of those assumptions a little bit it's getting eily good after coding we'll test its Vision abilities it does seem like there's another small Leap Forward here it still gets things wrong but very often times with a little bit of proding and prompting if you tell it well that's not right can you try again it will usually figure it out there's still certain things that it completely fails at just like GPT 4 with vision before it so for example if you're measuring something with a ruler or you're trying to read the speedometer like they're just terrible at that but other things it's getting surprisingly good at but as we cover all this kind of keep this image in mind right so anthropics models Cloud 3 they have three sort of tiers three sizes and of course the the bigger it is the more expensive it is to run it's usually slower so it can be bigger slower but smarter or it can be smaller faster cheaper so Haiku is the small one so three is the the previous sort of iteration in these models so Haiku is the smaller one the fast one the cheap one sonnet is the midline one right and then Opus is the great one the big one right so Cloud 3 Opus when it came out it was kind of blowing people's minds it was very very good it was shockingly good we've covered a lot of the things that it could do it was impressive so now anthropic drops this new model that's got everyone in a tizzy but notice it's not the big model it's not CLA 3.5 Opus it's CLA 3.5 Sonet so the midline model is now much better than the previous iteration of the large model so what happens when they drop the big one folks I don't think we're hitting a wall not at all let's dive in so here's artifacts create and iterate on documents code and more within claw try it out ask claw to generate content like code Snippets text doents or website designs and Claude will create an artifact that appears in a dedicated window alongside your conversation so I've asked it to create a Flappy Bird game so it writes out the code in the artifact window and that explains what it's doing outputs the text on the left all right I'll just throw this in there and let's hit play that looks pretty good you know I got to say out of all the models I tested this has got to be the best Flappy Bird game that I've ever seen it's smooth it looks good it's very well done and I got to say this is actually one of the best functioning games okay let's try something else let's create a snake game we'll start simple and keep adding features to make it more interesting all right so it has a basic snake game we have the various food it ends if the snake hits the wall or itself so it pretty good it doesn't start until I hit a button just kind of uh isn't pause mode by default and when it goes uh it keeps eating so everything's working fine so far I got to say on a first try everything works when I hit a wall the game ends there doesn't seem to be a score or anything else let's make the fruit images instead appear as Dungeons and Dragons monsters generate 10 different images and have them appear randomly as fruit it says that's a creative and fun idea looks like it's generating albear and mine flare oh this is uh good gelatinous Cube I like where you're going with this CLA one thing and I've said this before for some reason Claud always seemed to me to be the most I don't know personable one it's the one where I feel like it's a thing it's a person it's some entity with Chad GPT I don't with any of the Gemini technology I don't Claude is like yeah hey Claude thanks buddy it's good to see you again it's it's weird so let's see so created a goblin a dragon a Beholder a mimic an albear M flare Etc rust monster two mimics for some reason but okay did you run out of ideas okay that's fine I like how it explains what it's doing so it's saying in order to put those images into the game it's importing the SVG module from P game looks like it's add of this thing here gfx draw and then it randomly selects a monster and creates it instead of the simple rectangle for food it even has some notes such as the monster images are scaled to fit the snake's block size so I I would give it a 10 out of 10 for the text response the text response is excellent the images as far as you can tell I mean I'm not expecting amazing stuff here but the fact that I mean sure this is a goblin a dragon yes why not Beholder excellent you keep it in mind these are going to be tiny little things I think originally was 20 x 20 pixels right so we don't need a lot of detail we have a mimic an albear mine flare mine flares are an humanoid octopus sort of thing for those unfamiliar with DND lore I got to say I mean that's you know again if it has to fit in a 20x 20 pixel image this is probably going to show up really good and it also kind of I mean yeah sure it works gelatinous Cube okay displacer Beast rust monster these are excellent now unfortunately I clicked on the x button up here to close the images and now the artifact Windows gone so let's see how easy it's actually to bring it back okay so looks like in the chat controls here we have each artifact that they call them as a separate thing the separate chat here's our basic snake game and we have version one and version two oh version one was the original non dungeon and Dragon one and version two is our new and improved dungeon and dragon snake game let's try it out now of course it's not finding the images in our working directory but it does right here actually tell us specifically that we need to save those in the same directory as the python script and make sure the file name is match if I had to nitpick I would kind of maybe bold this or somehow indent this to show me that hey yes indeed this is important because this is you need this to run and here's where it kind of calls those images so what we need to do is download those images all right so here we have to go one by one and click download file uh a little bit annoying but I wish there was like a one button download or I mean something that would make it a little bit easier but it's not that big of a deal where this would get super annoying is if we have to fix tens of errors and we have to keep doing this over and over again to fix stuff and now let's run the game one annoying thing that I'm sitting right off the bat is that the final names were named as you can see here for example mimic Das SVG but it's called as mimic so we have to come in here and fix that not a huge deal all right there we go fixed let's try it again and that wasn't a big deal it took me you know 5 to 10 seconds to troubleshoot that 5 to 10 seconds to fix it assuming I had my coffee that day but as we iterate and go back and for of Claud this might break things those little mistakes here and there can add up actually now that I think about it we do need to tell it about this error to see if it can correct it but before that let's just try playing this game does it work it does as you can see here we ate a I forget what that is displacer beast and there's a that's a displacer beast and a Beholder and whatever that is cool very cool there's a ithd a goblin terrific terrific all right so I got to say it performed admirably let's see if it can correct the mistake that it made so we're going to say the image files are called for example mimic D svg.svg and the code calls mimic. SVG this is true for all files please correct the code to make it match the file names I expect it should be able to do this pretty easily let's see I think it nailed it yeah I think it did everything perfectly let's just verify really fast click play and yes it works all right so so far Claude is killing it next let's add another feature let's have text flash for a second at the top of the screen each time you eat an enemy it should say for example if you that dragon dragon Slade now notice I'm specifically purposely not spelling it out I'm giving it an example and it has to reason it has to understand it has to deduce what I want so I'm saying if it's a dragon it says Dragon slate So based on that it's supposed to understand that you know for mimic I should say mimic Slade Goblin Slade Etc let's try it so it updated the monster images list comprehension to include the monster name along with the scaled image the monster images list now includes both the file name and the monster name so as you can see here it updated this list to have both this is excellent I got to say it's not like a huge deal to do but if you're doing this yourself and again not that many years ago you know two years ago you kind of doing it yourself you had to do this by hand it's not rocket science but you know it takes Focus attention to detail you can easily screw things up it's a cognitive load that's expended to do something that's basically you know cleaning up or documenting and I think things like this is why a lot of people struggle with coding one of the many reasons but this is certainly one of them and here as you can see poof it did it well number one it knew that it had to do it in order to implement this feature it just went ahead and did it and it told me in a few simple lines what it did this is so far excellent so it's displaying the message in milliseconds so this would be 1 second right 1,000 milliseconds so I could easily change this to precisely change you know how long the text sticks around on screen I'm excited to try this let's go all right there it is all right so let's uh yes alar Slade and now rust monster mine flare Slade and I can't get this thing no this is embarrassing there go Dragon displacer Beast Slade rust monster Slade this is terrific Beholder Slade Dragon Slade perfect this is getting very very impressive all right let's create a point system let's give each monster a number of XP points that the player earns when it eats that monster let's say the number of XP points be equivalent to what that Monster's relative strength is for example dragon is stronger than Goblin Dragon should be worth much more points let's use between 1 and 20 points per kill add an XP counter in one of the corners all right let's see how well that does by the way these Generations are super fast impressively fast at this point you know it's getting to the point where I mean it's not instant but you're not sitting around waiting for it I mean this is the longest it took and that was maybe a few seconds then of course the text generates on this side you don't even have to wait for it since this is its own window right actually I I am lying the fact that this is its own window the artifacts are separate meaning I can grab this code while the text is still printing out all right let's try that out and here we go perfect so notice the XP bar or at least the number in the top left there so as we eat him mine flare is nine Beholder is 17 rust monster is five I think it said Beholder 17 Goblin is one all right I like that and as you can see the experience bar is moving and dragon is 20 so it put put dragon as 20 this is perfect this is phenomenal notice how few mistakes it made so far because mistakes are a problem because they kind of build on top of each other once it starts making mistakes as you keep adding features stuff breaks down so far we've been able to add a number of different features and it works perfectly it missed that whole naming notation but other than that it's perfect let's get a little bit more complicated we're going to add two features one make the snake extend the same amount of blocks as the number of XP points right so if it eats a dragon that's worth 20 points instead of growing by one unit it grows by 20 so it's going to be the same number as the number of XP points it earns but we're going to make it extra Difficult by adding the second feature that has to play nice with the previous feature so this is some advanced stuff for Claude to figure out two is make random falling objects from the top of the screen if they intersect the snake they cut off everything after where they hit the part of the state that's cut off should stay for a few seconds then disappear so I got to say this is probably the single hardest prompt for coding that I've given you know just one of these chat Bots directly without any architecture around it just like a back and forth not only because of the complexity of the prompt but also because of how many things we've built on top of that initial sort of prompt so if it does this Claud I got to say in my book will be the best coding assistant ever notice to update our low list with the experience that each monster gives us and here it's writing out the cut off segment duration Etc very interesting let's test it out all right here we go let's see what it does oh I can see something falling from the sky nice mine flare gives me 15 points so I'm getting huge and let's see what happens so something broke when one of those little red objects intersected but I got to say I'm not disappointed it got everything else so perfect I'm actually impressed that it got this far let's see if it can correct its own error or the error that it caused I'm just going to base it in there I'm not even going to explain it I'm just going to literally copy and paste the error into it the issue is occurring because we're modifying the cut segments list while iterating over it let's fix this by changing how he handled the cut segments so it explains exactly what he did here this is incredible the fact that it can reason explain its reasoning I mean if you're learning how to code you're trying to understand how to handle stuff like this this is incredible all right here we go let's test it out all right so Dragon Slade or whatever that was yeah dragon and let's see what happens when a it crosses the snake ooh that's perfect so it kept on screen for a little it changed colors which is great and it disappeared oh that's so great this is working flawlessly I got to say absolutely terrific this is exactly what I wanted it worked phenomenally well I'm actually kind of blown away because this is the first time where we're not it didn't screw anything up it actually worked as intended and uh we didn't have to battle it it gets a few things wrong here and there but it fixes them and at the end you get this perfectly playable game that and this is the most important part you can keep building upon you can keep changing things to make it work better and better wow wow notice how when I go straight up through it it even knows how to handle that I wonder if what happens if I do this let's see if I can create an error here yep and it figured out that hitting the red thing with the last piece of the snake is a lost condition and the game ends Claude you're blowing my mind here I got to say so I've tested a lot of the different models before certainly all the top ones I've tested and nothing is this good it kept all the features notice that throughout that whole time nothing got lost nothing just you know ceased to exist each new piece that we've added each new feature and as we built on top of them none of them just disappeared it all remained in the game and during that whole time there were two minor issues both of which Claude fixed on the first attempt I give it the error m message and it fixes it I tell it hey you got the wrong you know naming convention before and it fixes it it remembers it and it stays fixed this feels fundamentally like a game changer we've crossed some barrier some threshold where it's now accurate enough to make it very useful and if you think about how quickly we got to this point like a year ago we were not really anywhere close to this and now it's working well this is uh this is phenomenal and I got to say a little bit scary all right let's push it a little bit more okay so this is a project we've cover covered before so it's made by Santiago valderama so he's got a YouTube channel he's on Twitter X and he's got this really cool project so he called it the alloy Voice Assistant here's the code for that and here's kind of what this does so if I run that file what am I holding up to the camera right now looks like you're holding up a tube of chapstick what am I holding up to the camera right now you're holding up a bottle of Arrowhead Water what am I holding up to the camera right now you're holding up a volter x blender bottle describe how this uh blender bottle works the voltar X blender bottle works by using a rechargeable motorized base it's portable and convenient for mixing on the go so I've detached the bottom piece what does the bottom piece do the bottom piece houses the motor and rechargeable battery so that's pretty cool right it's a assistant that is able to look at the kind of how your camera is streaming so it takes those screenshots and then answers whatever questions you POs to it using either the Gemini model or open eyes you know GPT 40 or whatever else you want to use that sounds like a pretty Advanced setup I agree let's close this out close this out close this out glad we're on the same page all right so that's cool and all but as you can see here there's a lot a lot of code there's a lot of things going on here and by default we just have the Gemini and the openi model that we can use but what if we wanted to change some things up what if we wanted to add some functionality now normally how much things you can change would depend on your ability to code so for example if you're a beginner coder you can probably figure out that right now the model is said to Gemini 1 .5 flash latest and if we wanted to use a different model for example chat open ey the GPT 40 model right so we would comment this out we would uncomment this and then it would use this model instead of this one but there's a lot of other stuff that would be more difficult to do for example if instead of using the webcam we wanted to use some that screenshots our desktop and then sends it to the model instead so we can comment on what's happening in our desktop that would be you know significantly more difficult all right so first foremost I'm going to copy this file and I'm going to start a new chat in Claud and we're going to say describe what this python project does oh that's interesting so when I control+ V to paste all that whole project it goes into this little pasted text of 171 lines instead of just showing up here which is excellent Claud keeps surprising to the upside it's really good so we're going to hit go and it Nails it so it's an interactive AI assistant combines Visual and audio inputs with natural language processing so webcam integration check I mean there it is it literally describes what we're doing with the um live stream from the from the computer's webcam utiliz a microphone to continuously listen for user speech input converts speech detects using opening eyes whisper model this is kind of a big deal because it's not super easy to just understand all that like you have to sit there and kind of go through the code to really understand exactly what it's doing what tools it's using then it uses Google's Gemini model or open eyes GPT 4 commented out so it picks up on the fact that this smallest commented out but you can you know exchange it and it uses that to process user queries and the AI takes into account both the users spoken question and the current webcam usage maintains a conversational history and then generates the text to speech right so when the model answers it generates that into speech that's that's why we hear the assistant answering us and also has a window for the visual display so that's why you were able to see me on my camera holding stuff up to the camera and then the program runs in a loop constantly capturing webc footage unless for voice input so step one I'm going to say rewrite the code and add in-depth comments to every block of code so I can understand what piece of code does what so we're going to go ahead and hit go all right so it's done that took maybe a little bit over 10 seconds we're going to copy this and here's that project file the original one so as you can see here it's just code not too many comments or anything like that so we're just going to copy and paste the cloud 3 output instead so there it is now the green text is the commment so it's telling us exactly what each piece of code is doing right so webcam stream class for handling webcam input initialize the video capture device read the first frame Etc all right now let's see what we can do to change up the things within that project so for example one of the things that it does is it uses text to speech to voice the output right so we have different uh voice options by default we're using alloy hence the name of the project but let's say we want to use Shimmer so we'll go into cloud and we'll say right now the voice assistant's voice is said to alloy please change it to sh Shimmer all right so looks like it just changed the actual code block now I'm sure if we asked it to write everything out it would but I think it's just trying to conserve tokens but let me throw this in there and see how that works boom so let me run that really fast and see if it did that what time is it here it's 1253 all right so that was super easy right no nothing too complicated Let's uh kick it up a notch all right so now I've taken that project again paste in here so that we're writing from scratch and we're saying rewrite this so instead of using the weapon cam it takes screenshots and uses those images instead so basically keep the entire functionality but instead of the webcam instead of using the images from the webcam we're using screenshots of our desktop all right looks like we get a little error message I'll just post that back in there it was able to solve it in other instances uh so this should okay so looks like maybe we need to pip install pillow pip install pillow assistant is running tell me what you see it's running but it's missing pillow you'll need to install it tell me what you see on the screen right now your missing the pillow Library install it by running the command pip install pillow tell me what you see on the screen right now it's a picture of a man in a white robe standing in the water with ducks he's giving a thumbs up can you guess what TV show this is from That's Tony Soprano from The Sopranos he's feeding the Ducks tell me what you see on the screen right now it's a screenshot of a web browser displaying a gift from the game Commander Keen 4 the character is walking along a path what am I looking at right now you're looking at a Google Chrome browser window the browser is displaying a web page with an image of a Minecraft C room there's a red carpet and a chest can you guess what this code does this code converts text to speech using the open AI API and plays it back using pi audio is this the code that you run on yes that's the code I use to run it's the core of my assistant all right but let's make it a little bit more interesting What If instead of Gemini or GPT 40 we wanted to use the clae API to use the clae 3.5 Sonet to be our assistant and to answer these questions for us so again I've pasted the whole project in there and we're going to tell Claude remake this project but instead of Gemini let's use the model clae 3.5 Sonet let's test it out all right so looks like it's using Cloud 3 Sonet from February 29th of 2024 so we might have to manually update this but all right so it looks like this is the model-3 d5- Sonet so if I told Claude to put that in there I'm sure it would be able to do that at this point I'm convinced that it would be able to updated but I'm going to go ahead and just put it in manually myself I added that in there and let's run it again to see what happens gives me an error again I'll just paste it into cloud and have it fix it for me of course it's pip install something pip install Lang chain anthropic so for people that don't code like it's basically just telling me the command I need to put in you don't even really have to understand what's happening it helps to know what's going on but at this point you know you don't have to what am I holding up to the screen right now your credit balance is too low to access Cloud API okay so I reloaded the credits so let's see what am I holding up to the screen right now I paid you so you better answer you're holding up a ChapStick to the screen what do you see written on the paper the image shows a notebook with ball written on it along with a simple ey drawing a microphone is visible in the frame based on what's written on the paper what do you think is the word that uh it's symbolizes the word ball likely symbolizes eyeball in this context the drawing of an i next to the word ball suggests this combined meaning so looks like there's a limit I think it's 3,000 tokens per minute or 10 minutes or whatever the limit is so I can't do too many tests with the cloud 3.5 Sonet understandable API rate limits can be frustrating when testing image shows a person recording audio likely for a podcast or stream what am I holding up to the camera right now you're holding up an orange screwdriver to the camera what am I holding up to the camera right now you're holding an infrared thermometer it's a yellow and black handheld device typically used for measuring surface temperatures what am I holding up to the camera right now you're holding a spirit level what am I holding up to the camera right now you're holding a blue stud finder or wall scanner device it has indicator lights on top and appears to be a wireless model what am I holding up to the camera right now you're holding a small USB device or dongle can you guess what this object is the object you're holding is a Sharpie marker I asked it to create a doom like game I assumed it would just say no but it said sure we can do that but I was not ready for this it created as a playable game in a in a browser so you can click on here and just literally play it walking around you can see the map up there in the top left this is kind of insane kind of crazy add a weapon that's an axe that I can swing to attack add a monster that claws me if I get close and there goes um writing out the code all right so it's saying press space bar to swing the axe so I guess ah it's coming at me it's coming right for me it killed me what difficulty is this let's refresh oh it's coming right for me okay so we're going to hit it okay so I can hit with my Axe by using the space bar and kill it like this so if it gets too close I lose Health not quite what we were looking for but but not too bad let's try this make the enemy stationary in one location on the map all right so so that's the enemy I guess it's in the corner so I can come up to it and hit it all right so it has an error here and I think that's because it put the uh monster outside of the bounds of the map the monster start outside the map make sure the monster starts inside the bounds of the map the other kind of interesting thing I've noticed is so this ran for a while maybe 10 plus seconds take a look at how much code it writes to kind of satisfy all the stuff that that we're asking for all right so if we go here to preview the monster is sitting right there all right so yes it is within the balance of the map so it's kind of in the corner here so we approach it well it killed me so I guess if I get too close within I think it's at 150 units or whatever then it starts hurting me let's see all right so we approach it and we start hitting it and I win I got to say there's a lot of good things happening here it's it's not perfect but I think what I need to do is start from the beginning and kind of explain exactly what I'm looking for because this this isn't like the snake game where it's just a little bit more simple here I have to explain a little bit more but I got to say Claude really kind of gets the issues how to fix certain things and as long as I I can explain what I want it should be able to create the stuff again very very solid performance here not perfect but very solid and I really do feel like this is a skill issue on my part if I spend a few hours messing around with it I will be able to create stuff like this no problem all right so here's their blog post on anthropic decom about the 3.5 Sonet release and so the 3.5 is their forthcoming model family so kind of the next step in the evolution and it's this is their first release kind of in that 3.5 family and again it's looking like CLA 3.5 son it is available for free so similar to how openi is doing it they're allowing their best models to be free for everybody you don't need a subscription to access it but if you do have a subscription you get significantly higher rate limits and of course you're also able to access the API through anthropic API through Amazon bedrock and Google Cloud vertex AI as well and the model cost $3 per million input tokens and $15 per million output tokens with a 200,000 token context window so if we're comparing that to open AI so gbt 40 the latest model input is $5 per million tokens versus anthropics three per million opening is almost double the price and for GPT 40 $15 per million tokens for the output same as the L 3 .5 sonnet so anthropic wins a little bit on the input and matches on the output CLA 3.5 Sonet operates at twice the speed of cloud 3 Opus so it's much better much much smarter twice as fast and less expensive here's the little clip about state-of-the-art Vision looks like the other student is asking genomics slides tonight Queen so the student pulls up Claud and it says good evening Sam there's so many things to unpack there I'm not I'm not going to even go into it keep in mind anthropic broke off of Open AI they broke away to kind of for this company so they can create something that's that's their own that's you know safer so either I'm reading way too much into this or they're picking in actual dist tracks in their product releases now but I I'll leave it alone let's continue and they kind of walk through a step-by step of how to use you know this AI assistant to complete homework assignments using Vision uploading documents and collaborating working together going back and forth with these ideas to Output the final product which I got to say this is the future is becoming pretty obvious that we're all going to have these sort of assistance that that help us out in whatever computer tasks that we're doing and with this release I got to say anthropic is killing it because there's a lot of these really cool really intuitive things that are added in there that just work I definitely feel like they are at this point gaining or I mean with this release certainly feels like they've passed open AI you know I've shown this in a different part of the video but you know if I copy this text on the right this long code that's in there and I paste it into CLA now normally what would happen is just it would be pages and pages just paste it into that prompt uh window right but no look instead it creates almost like a little side document or what whatever you want to call it that contains that code in it leaving the prompt window clean now this might not feel like a big deal at first but to me as somebody that uses this stuff quite a bit and sometimes for Fairly complicated stuff where you got to go back and forth it's the little things like this that really smooth out the process because something like this it'll save me a few seconds when I do it maybe it'll save me 5 to 10 seconds of you know looking over make sure everything's formatt correctly making sure my questions at the top and then the paste the thing below but also prevents certain errors from happening and just like from a mental space perspective it kind of helps keep everything organized you multiply that across you know 100 touch points that you have of the software and it's kind of becomes a big deal cumulatively it becomes very important now I'm sure this will get copied quickly meaning the design will get copied across everybody pretty quickly but it's the little things like this I got to applaud anthropic on this they're not messing around they are in it to win it but coming back to this so Cloud 3.5 Sonet they're comparing it to Cloud 3 Opus the big previous model and GPT 40 as well as GPT 1.5 Pro so literally like the best models in the space so here this new model dominates being the best in class on four out of the five tests that they're showing here quite a bit on the visual math reasoning just a hair better on the science diagrams slightly behind GPT 40 on the visual question answering absolutely killing it on the chart q and a that's one where they struggle a lot speedometers rulers charts something about like tracing lines is hard you know if I if I pointed like right here where my miles cursor is you'd probably be able to guess what that number would be just visually from what I've seen the vision models tend to be really bad at things like that and then absolutely incredible on the document visual Q&A then they talk about artifacts and I got to say artifacts is the cherry on top of this beautiful beautiful cake it is good it is very good so we coded up a game where we have the code and the images that show up here in this artifact window that pops up and all of them are saved so you can go back you can get all the images that you want whether that's charts or graphs or pictures of evil monsters as well as the code now if I had to nitpick I would say I didn't see an option to quickly download like if it generates 20 images I don't think there's a button that just like downloads all 20 maybe I missed it but it's it's not a big deal and I'm sure it's going to be a pretty easy addition everything else is just absolutely phenomenal also the fact that it generates like if you're coding the code in one window and then writes up the instructions and the explanation of what it did that also ends up being very very helpful because you can grab the code you can start pasting it wherever you need to paste it while it's generating out the the the write up on the code the natural language description of what happened that feels a lot more natural a lot more intuitive the multi- kind of window output for different stuff again seems like a little thing but it's just out of this world when you start messing around with it you realize how much easier it makes things those little polishes that make this thing really come together they're saying that in terms of safety and privacy this model is still top of the line they're saying the Spy clot 3.5 sonets leap intelligence their redtin assessments have concluded it Still Remains and if you're not aware of this they have I think this is anthropic that came up with this and the founder Dario he he does talk about this so you have these AI safety levels right so asl3 has you know significantly higher risk right so basically those might have a substantially increased risk of catastrophic misuse right somebody using it to hack or develop weapons or some weird stuff that we don't want them doing or that show low-level autonomous capabilities and the four and five and plus levels are not quite defined yet they're speculative and two is kind of where our present large language models kind of reside right now and so Cloud 3.5 Sonet is still in that second tier of safety so we're good and of course Cloud 3.5 hiu and Cloud 3.5 Opus will be released later this year so sometimes sometimes in the next 6 months or so I can't wait to try out Cloud 3.5 Opus that's going to be interesting to check out solve this word puzzle what do you think it means the image doesn't show a specific word to sound out it displays words and funny words repeated these likely refer to amusing or peculiar terms that sound odd or make people laugh when spoken aloud how many times is the word funny written and how many times is the word words written and what could that mean funny appears twice words appears four times this layout could suggest a playful emphasis on humorous language this translates into a forword sentence it starts with too funny and how does it end too funny for words all right and what do you think this means Man Overboard it's a nautical phrase used to alert the what do you think this means word boards the letters are jumbled but when rearranged they spell word boards it actually means that there's a bear inside of something can you tell what the saying is about the bear bear on board the jumbled letters WB o e o d RS can be re arranged to spell bear on board which is a play on the common Baby on Board sign so the solution is bear in the blank what is the blank the blank is Woods the full phrase is bear in the woods which can be formed by rearranging the letters in w r s what could this mean Long Hill the letters are arranged vertically spelling Hill from top to bottom the vertical Arrangements suggests height or length implying a Long Hill the fact that it's spelled from top to bottom could it mean something else downhill the vertical AR rangement of Hill from top to bottom suggests a descent or downward Direction hence downhill what could this mean grass over me the word grass is shown in green suggesting the phrase grass over me which typically means to be buried or forgotten that's not it what else could it mean grass is greener the word grass in bright green could suggest the common phrase the grass is greener on the other side I think it just says green grass what does this mean Major League the word league is often used in sportex what does this mean the image shows the word promises written in a fragmented or broken style this could suggest unfulfilled promises broken promises or the complexity and fragility of commitment what does this mean knock on wood the word knock is above Timber which is another word for wood what could this mean playing side by side the word playing is in a box with side written vertically next to it suggesting the phrase playing side by side it's something else what else could it mean playing inside the word playing is inside a box and side is written vertically next to it suggesting playing inside as the intended phrase that's close but the answer is something different what else could it mean playing outside the word playing is inside a box so in summary I would say and you know keep in mind this is about 6 hours after release so I've been messing off it for just under 6 hours so far everything about it seems good I got to say it's the best model so far and again maybe once I use it for a few days I'll start seeing more problems maybe I'll get a disillusioned with it but as I'm sitting here right now at the end of the first date that it's been alive released right now this is probably the model that I've been the most excited about since GPT 4 came out that was the big one that kind of blew everyone's expectation away no other ones really did stoke the same kind of excitement not Cloud one or two GPT 4 Turbo was cool Gemini 1.0 sure CLA 3 Opus certainly ruffled a few feathers that was very interesting it had some there's definitely something there some sort of spark of spark of something a lot of people thought that it's self-aware it's conscious whatever there's talks about that I don't think think so I think they just did a really good job of fining and training it so that it's just a little bit more I don't know personable but I can certainly see why people got kind of carried away with it there were times when using it for various applications where you kind of went whoa this thing is something's happening there something new then we didn't have too much excitement after GPT 40 in of itself wasn't that exciting because really what people got so hyped up about was the the voice the voice mode which hasn't been activated yet so we keep seeing demos of real time voice conversations with these assistants but it's been announced they said in the next few weeks whatever we still haven't seen it frustratingly and now comes Claud 3.5 Sonic and as I'm sitting here right now I feel like it might be as Monumental as GPT 4 was specifically if we're looking at coding and again I got to do a lot more testing and make sure that's the case but I'm seeing a lot of people on Twitter and I'm seeing my own results and everybody's kind of Blown Away here's Alex Albert so he's uh at anthropic AI kind of showing all the stuff that he's been able to do with this new model and here's an interesting thing he points out he's saying Cloud 3.5 Sonet is the first model I've seen change the timelines of some of the best Engineers I know this is a real quote from one of our Engineers after CLA 3.5 Sonet fixed a bug in an open-source Library they were using and so here's that engineer so he's uh made Anonymous he's saying this is pretty unprecedent for me usually with problems of this level of complexity the most Opus can do right so the previous big model their best model up until today so he's saying the most Opus can do is start me down a path or give me a couple options to try myself if I push Opus too hard it'll start to hallucinate Solutions start making predictable mistakes or go in weird places we've all been there this is the first time that a model has like gone the distance which is literally the same thing that I've experienced with the few tests that I've done it goes the distance it doesn't there isn't like a cliff once you get to complex where it just like crashes and burns it just freaking goes he's saying my go suppos to have been permanently shifted by this interaction eain Claude makes you feel like you have superpowers suddenly no problem is too ambitious the future of programming is here folks and yes this is one of the guys on the team and yes they're hyping up their own product and yes you should all take it with a grain of salt but man I do feel like this is maybe spoton that this isn't hype the leap in coding feels gargantuan it's not reflected very well in the benchmarks cuz the benchmarks move up you know 10% 5% 2% whatever but what that translates to when you're using it is massive some people in the comments on Twitter are saying that yeah it's an incremental Improvement and sure when you put it on a chart it does look incremental but when you're using it when you're testing it it doesn't feel like an incremental Improvement it feels like a step function it looks like we went up a step anyways I'll get off the hype train and uh but I do encourage you to try it out for yourself see if you can code something up from scratch start simple then keep adding functionality on top of it add graphics have it generate the graphics or grab some ambitious project from GitHub and see if it's able to break it down for you if it's able to add functionality while keeping you know everything working smoothly I'm sure there'll be a million things that it can't do that will fail at we've seen a few here but I got to say I'm feeling a lot more optimistic about the future of programming optimistic or I guess pessimistic if you're looking at it from a perspective of jobs going away I think this is definitely going to empower the best engineer years to do more but it's also going to allow a lot of people with little to no coding background to do something to do some cool things to build some useful things for themselves so my hats off to anthropic you guys keep doing whatever you're doing cuz it's working out just fine with that said my name is Wes Roth and thank you for watching
Info
Channel: Wes Roth
Views: 106,915
Rating: undefined out of 5
Keywords:
Id: _mkyL0Ww_08
Channel Id: undefined
Length: 45min 47sec (2747 seconds)
Published: Fri Jun 21 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.