Claude 3.5 Deep Dive: This new AI destroys GPT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right can it create a 3D firstperson shooter oh my God can it create a 3D interactive particle Cloud oh my God all right can it convert this very boring financial report into an interactive infographic oh my God can it create an audio visualizer that would sync with any audio that I upload holy smokes this is just insane all right I'm going to take a screenshot of a website just plug it into here and I'm going to get it to recreate this using Code okay I'm just mind blown again this is crazy so a few days ago clae 3.5 Sonet was released and this is by far the best AI model out there it just blows all the existing models including GPT 40 out of the water now instead of posting a video right away I actually spent the past few days testing it out to see what cool and creative things you can do with it and also test out its limits so that's exactly what I'm going to share with you today now we'll go over the specs in a second but let's just jump right in so I can show you the cool things that it can do so all you got to do is go to cloud. which I'll link to in the description below below and then sign up for a free account once you sign up you're going to see this artifacts window it's really important to click into it and enable artifacts this basically allows Claude to generate presentations and designs and tables and code in a separate window alongside your chat so once you have this on we can start a new chat now you can do regular things with this chatbot like you would chat GPT for example get it to summarize things paraphrase things things ask it questions ask it to write an essay ask it to translate stuff you know normal stuff but it can do a lot more than that so let's start off by getting it to create a snake game so I'm going to Simply prompt it with create a snake game using Python and this is a really simple prompt none of the other AI models out there could create a fully functional snake game that works in the first try except for GPT 40 and llama free to some extent but even those two are not great all right so let's see if this actually works down here in this bottom right corner I can just copy the entire code and then in vs code I'm going to create a new file and just call it game. py and then I'm going to paste in the code here and then click run all right so the game is running now I'm going to use my arrow keys to move the snake as you can see here and when I eat the food I do get longer so that works very nice now let's see what happens if I hit the wall that's exactly what it should do so if I hit a wall I lose the game Let's press C to play again now I'm going to eat enough food to get really long and then I'm going to try to hit myself and see if I lose the game note that most of the other AI chat Bots except for I think GPT 40 and llama 3 are able to understand that if I hit myself I should lose all right so you can see if I touch myself I also lose the game and this is exactly what should happen so I'm really impressed it built a perfectly functional snake game zero shot which means I only prompted it once I didn't need to follow up with anything and it was able to successfully create the snake game in Python but you can do much more than that and here's the beauty of Claude 3.5 you can atively add more features to your game and it wouldn't break your existing code so for example let's say add a scoreboard to the game again really simple it's just a really simple prompt I didn't even say add a scoreboard so it adds one every time I eat the food I'm just assuming it's smart enough to understand this so here it's explaining all the additions but I'm just going to like copy the entire code and then going back to vs code I'm going to select all delete my existing code and then just paste in the new code here and then I'm going to click run and voila here we have a scoreboard and let's see if I eat the food wow I get 10 points 20 points oh this one is challenging a perfect so you can keep adding more and more features to your game and Claude can add these to your code without breaking your existing code so I'm going to quit this game first let's try something even more challenging so I'm going to search for audio visualizer in Google Images and pick one that I like so I like the look of this one I'm going to take a screenshot of that and then paste it into Cloud 3.5 and then I'm going to write create a single HTML page that lets me upload an audio file and then sync that audio with a visualizer like the attached image don't use unsupported libraries this is to make sure that it works natively in artifacts all right so let's see what we get everything looks good so far all right so we got this upload button I'm going to upload this song that I created using another AI tool called udio check out this video if you want to learn how I made this song [Applause] feel the light you know you're like like this oh [Music] baby and you can see this is indeed an audio visualizer that matches my upload image this is really impressive now we could decrease the sensitivity so that the lines don't exceed the edges of the frame but I mean this is already very impressive that it's able to build this with just one prompt all right so let's say I don't like the look of this circular visualizer so I'm going to Google another visualizer which I like the look of and I like this one so I'm going to take a screenshot of this and then paste it in here and then I'm going to write make the visualizer look like this instead just a really simple prompt and let's see what it can do all right so our code is ready I'm going to upload the same song [Music] baby feel the light you look like you like [Music] this and here we have a visualizer this doesn't look exactly like the image I uploaded but but the shape and colors do match to some extent very nice so next I'm going to write add settings to customize the sensitivity and the colors of the visualizer and then you can see it's running its magic now and this is really fast compared to other tools including chat GPT all right so now that it's finished running the code you can see not only can I upload my audio file there's also a sensitivity knob there's also so a start color and end color so I'm going to upload the same song and then I'm going to adjust the sensitivity I'm going to adjust the start color and the end color baby feel the light you know you like it like this oh baby night and there you have it I am just so impressed by this I've probably used the word impressed many times in this video already but I mean that's exactly what I feel right now now let's try something even crazier so I am on the homepage of Spotify let's take a screenshot of this and then going back to Cloud 3.5 I'm going to paste in the screenshot here and then I'm going to write convert this UI design into frontend code really simple prompt let's see if it can pull this off oh my gosh and here we go isn't that crazy so yes it doesn't pull the exact images of the artist or the Spotify logo from this you have to add it in yourself but I mean just within seconds you can duplicate this wireframe from Spotify already isn't that crazy now this is only front-end code of course there's a lot more to a website such as linking the data from the back end to the front end but I mean just the fact that it's able to recreate this page just from a screenshot within a few seconds and then just from one prompt without refining it any further this is just mind-blowing now let's try something even crazier I'm going to prompt it create Tetris game using python now again Tetris is a lot trickier than a snake game so if it's able to pull this off zero shot which means I don't need to prompt it further it can just create a fully functional Tetris game in one go I would be very very impressed all right so it says use the arrow keys left right down arrows to move the pieces and then the up arrow is to rotate the piece the game ends when a new piece can't be placed at the top of the grid all right I'm so excited to try this out so again I'm going to copy the entire code and then nvs code delete everything that's here and then paste this Tetris code in and then click run oh now I am hitting an error this is quite a complicated game so it could not get this in one shot I'm just going to copy this entire error message and then paste it back in here and then see if it works again with this tool you don't really need to learn how to code like you don't need to understand what on Earth is going on here with AI all you need to do is if you hit an error message just paste it into the chat bot rinse and repeat and eventually you're going to get this game to work so I'm going to copy the contents and then paste it in here again click save and then I will click run and wow this time it works wow this is really good and I hate these shapes and oh my gosh this really is Tetris now as you can see I suck at Tetris so let me try to form a full line and see if the line disappears oh I hate these shapes I really hate these shapes why did I do that oh my goodness all right I'm going to form a new line and let's see if it disappears and yes it does wow this is so cool all right so I'm going to try and lose the game now so if I hit the top wow perfect that is so cool so with just two prompts I was able to build a fully functional Tetris game right with all these different shapes and colors with a scoreboard and it's able to generate this perfectly none of the other AI models including GPT 4 including llama 3 could create a fully functional Tetris game with just two prompts this is just is truly impressive and Tetris isn't the only type of game that Claude 3.5 could create so this user created an entire 3D firstperson shooter similar to the game Doom in just three prompts and it comes with a complete generated map and sound effects and zombies that come after you how insane is that this is like so impressive and definitely no other AI model can create such a game in just three prompts and and imagine if you keep reiterating if you keep prompting it further to add new features What type of game you could create in the end this honestly unleashes so much creativity but that's not all it can do here is something even cooler you can create entire presentations all within this chatbot so for example let's write create a JS presentation on the health implications of coffee let's see if we can do this wow look at that isn't this insane it created this entire presentation with just one prompt so let's see what it wrote Health implications of coffee coffee is one of the most popular beverages worldwide all right so slide two slide three four etc etc now of course you can style this up so for example use chill aesthetic colors add images and charts where appropriate and let's see if this works all right so it's adding a lot more detail now very nice so here you can see it's just using a placeholder I can go in and add some images of coffee afterwards but wow look at that so let me go back to the previous slide note that when I go to the slide with the table it even animates the bars holy smokes this is just so impressive so you know forget having to manually set animations in Microsoft PowerPoint when you can do this I mean how cool is that wow I'm just really impressed by this so I mean if you're a student or if you're at work and you need to create a presentation all you got to do is you know upload a document here with all the info you need in the presentation and then prompt it to create a full presentation for you it's as simple as that so let's say you want to create an infog graic reports so I'm taking the 10q report from Tesla this is basically their financial report for the first quarter of 2024 so it's very boring it looks like this I'm going to save this as a PDF and then back in Claude I'm going to upload the PDF here and then I'm going to say create an interactive to page infographic on the attached document let's see if we can do this all right so it's setting up the code now holy smokes that is crazy it even comes with symbols it gives you the key performance metrics these charts are interactive let me scroll down a bit that is just crazy and I mean it took all this info from this boring document right it's able to you know tease apart all these numbers and just give you the key metric let's check out page two and then here it lists the key highlights and Outlook so I am just absolutely mind blown by this how impressive this is I mean if your job is to create these reports or presentations think of how easy this is going to make your life before you probably need to spend at least an hour compiling this report and then designing the PDF or the presentation but with this you can just plug in a document and it would spit out a fully designed report for you in a matter of seconds all right let's try something else so I've used this tool to create a diagram of a neural network now let's say I want to use this for an animation for an educational video well all I have to do is take a screenshot of this and then going back to Cloud I will paste the screenshot into here I'm just pressing crl +v and then here's another trick instead of me thinking of what prompt to type I'm going to ask Claude 3.5 what prompt should I write to get yourself to generate an animation of this diagram now to save some credits I don't want to ask this directly in clae AI so I'm using another tool called po which also has Claude 3.5 however Po's version does not have this artifact window which previews the code that it generates and so that's why I use pose Cloud 3.5 just for text prompts but it's essentially the same thing it's also using Cloud 3.5 so in po I'm simply asking it to give me a prompt to create an interactive animation from the attached neural network diagram to use with Claude 3.5 and artifacts and then it's suggested that I use this prompt so I am just going to copy the whole thing and then going back to Cloud I'm going to paste it in here so the prompt is using the neural network diagram I've shared as a reference please create an interactive HTML JS animation that demonstrates the flow of data through this network it should include a visual representation of the network structure matching the layout in the image animated paths showing data flowing from the input layer through the hidden layers to the output layer the ability to input sample data into the five input nodes I'm not sure what this would do but let's just leave it and then visual feedback showing how the activation of nodes changes based on the input and then a simple UI to control the animation speed and reset this simulation all right so let's click enter and see what it gives us there's a lot of code that it's generating so this seems like quite a complex animation wow this is crazy all right so let's see how we use this let me tell you about this awesome AI assistant called chat llm by our sponsor Abacus a you can try it for free via the link in the description below chat llm is an awesome way to use different llms all in one place this includes the newest GPT 40 meta's llama 3 anthropics Claude Opus and more not only can you chat with it like a regular chatbot but it also retrieves the latest data from the web ensuring that your output is the most up toate you can also get these llms to generate images for you right in the chat so there's no need to head to a separate image gener a platform you can also create custom AI agents designed to perform specific tasks whether it's automating customer support generating reports or any other function your custom AI agent will handle it with precision and collaboration is made easy with chat llm you can invite team members to join the same chat thread ensuring everyone is on the same page and can contribute to the chat moreover chat llm integrates seamlessly with very ious Enterprise platforms such as slack teams and more so you can incorporate AI into your existing workflows without any hassle experience the power and versatility of chat llm by Abacus AI today try it for free via the link in the description below now back to the video this network structure matches the layout in the image with four layers first layer has six nodes next it has eight nodes in each hidden layer and then four output nodes and that's exactly what we have so there's six nodes here eight nodes in each hidden layer and then four nodes in the output layer and that's exactly the node count of my original image and then animated data flow particles represent data flowing through the network so actually let me press start and see what that does whoa all right so particles represent data flowing through the network moving from the input layer through the hidden layers to the output layer it seems to be stuck in the first hidden layer let me try again all right it seems to be stuck there but anyways let's continue input simulation the animation automatically generates random input data for the five input nodes in a more advanced version you could add input fields for user defined data all right very cool well it seems like the particles are stuck at the first hidden layer so let me just type this and see if it can fix it so the particles are stuck at the first hidden layer all right so let's see if it can fix it all right so let's click Start whoa that is crazy and note that the numbers in these nodes update as well that's just crazy and if we adjust the speed oh my God I am just so impressed by this you can see how easy it is to take any diagram and animate it to for example make an educational video this is just so impressive to me and then if I adjust the speed to be faster you can see now it it goes really fast and then if I press stop it stops if I press reset then the numberers reset to zero and if I press start again then the data flows through this neur network again this is just so impressive honestly all right let's make something even crazier so I'm going to write create an app in one HTML page that can be used in artifacts make an an interactive 3D particle cloud with a maximum of 100 particles and then to make sure it works in artifacts I'm going to write use three.js for the simulation this is a JavaScript library that renders 3D objects for the web and then just to make sure it works in artifacts I'm going to write do not use unsupported or thirdparty libraries or fun functions create your own functions because I want this page to be Standalone I just want it to work off the bat without pulling from any other dependencies or apis so let's click generate and see if it can do that whoa and here we go let's see what we can do so users can resize the browser to see the particle Cloud adapts to different screen sizes observe the particles movements and interactions within the 3D space so if I click into this does it do anything no it does not all right so if you'd like to modify or enhance this particle Cloud here are some ideas add color variations to the particles Implement uses controls to adjust the particle speed or count add Mouse interaction to affect particle movement um yeah let's let's paste this in so I'm just going to copy these three points and then paste this in here uh let's see what else we can do add Mouse interaction to affect Park movement um and and camera movement let's see if it can do that all right let's click generate and see if it can pull this off by the way already super impressive that it can create this floating particle cloud with just one prompt holy smokes and it does exactly that so here we change the particles into different sizes let's try to increase the particle count and yes as I as I drag it lower you can see the particles decrease in number as I drag it to like 200 you can see we get a lot more particles and then particle speed this is crazy so you can see as I increase the speed these particles move a lot faster and they bounce off this virtual wall and the movements look very smooth and then if I decrease the speed you can see the particles move a lot slower and then look at this mouse movement movement now affects both particle movement and Camera position the camera smoothly follows the mouth cursor creating a parallax effect so yes it does you can see as I move the cursor the particles in the cloud also follow my cursor to some extent that is just so cool I hope you're seeing what I'm seeing here it's a very subtle movement and of course you can add in an additional prompt to make this more sensitive but that is just so cool and by the way you can always revert back to a previous version so down here you see version two of two if you click here this goes back to version one and then here you can copy the code of version one and do whatever you want with it and then if you go back here here's version two here's the code of version two here's the preview of version two and you know this artifacts window this is not really AI this is just a built-in code visualizer but I really love this interface and you know the problem I've experienced with using other chat Bots like GPT or PO is that whenever I create some code I just need to copy the whole thing and then paste it in vs code and then go back to the chatbot and then refine it further and then copy that new code and then paste it back in vs code and then rinse and repeat and it's just not very convenient but here they really streamlined it where you can see the code side by side with your prompt and with its explanation and then you can iterate on your code in this same window before finally pasting the final code which you're satisfied with to your project which lives somewhere else so I really like how they designed this user interface it just makes things very convenient all right so let's go over the specs of Claude 3.5 so here they say we are launching Claude 3.5 Sonet our first release in the forthcoming Claude 3.5 model family 3.5 Sonet is now available for free on cloud. and IOS app while subscribers can access it with significantly higher rate limits so they're kind of doing the same thing as open AI which also offers their most Cutting Edge model GPT 40 for free to all users but the free plan has limits and if you want to use it more then you need to subscribe so this is also available via anthropic API Amazon bedrock and Google Cloud's vertex Ai and it has a 200k token context window which is more than enough for most tasks all right so here why access is intelligence and we'll go over the specific benchmark scores of Claude 3.5 in a second but note that this version that they just released is the sonnet version and if you refer to the previous generation Claude three they actually have three different versions the smallest one and the fastest one is ha cou so Hau has fewer parameters and therefore it runs faster but as a result it's less intelligent and then the mid tier model is Sonet so Sonet has slightly higher intelligence than Hau because it has more parameters but at the same time it's going to cost more and it's going to infer a tad bit slower and then their biggest model and this was previously the leading model for anthropic this is Claude 3 Opus this has the highest parameter count and is the most intelligent out of all the models but of course it costs more to run this model now the crazy thing is is this new generation 3.5 Sonet which is just the mid-tier model in this family has already significantly outperformed the highest tier model clae 3 Opus they haven't even released clae 3.5 Opus yet so once that is released it's going to be way more intelligent than the sonnet version that we're seeing right now so this is just insane progress you can see this new generation 3.5 not only is it way smarter than the higher tier model of the previous generation but it's also a lot cheaper than Cloud 3 Opus here it says Cloud 3.5 Sonet sets new industry benchmarks for graduate level reasoning undergraduate level knowledge and coding proficiency and we've definitely seen that it can indeed code very well it operates at twice the speed of Claude 3 Opus again this is the best model of the previous generation so this performance boost combined with cost-effective pricing makes Claude 3.5 Sonet ideal for complex tasks such as customer support and orchestrating multi-step workflows and that is indeed what we've seen so as we code up a project it's able to take our feedback and iteratively add new features to the project without breaking it so this is an example of a multi-step workflow so let's jump in and see the benchmarks so across all of these benchmarks it just destroys Claud through Opus and across most of them it also beat GPT 40 except for undergraduate level knowledge in which case for zero shot that means you only prompt it once GPT 40 is a tad bit better but then for coding Cloud 3.5 is better same with multilingual math same with reasoning over text and then interestingly for math problem solving GPT 40 still beats Claude 3.5 Sonic and we have seen GPT 40 solving a math Olympics problem so it is indeed very good at math problem solving and then there are a few other benchmarks here basically the takeaway message is that for most of these benchmarks Claude 3.5 Sonet beats not only the biggest model of the previous generation of CLA but it also beats GPT 40 which was the leading AI model so if you go to LM CIS this is basically the rankings of all the major AI models based on user blind tests and you can see GPT 40 is or was number one now notice that Claude 3.5 isn't on here yet and that's why gbt 40 is still number one in this table I'm actually not sure why Cloud 3.5 hasn't been added here yet if you know why please let me know in the comments below however if you go to yet another leaderboard which is called livebench which the authors claim to be a contamination free benchmark and this is because some of the AI models might be trained on very similar problems to Benchmark questions and if that's the case well then these models would be very biased in solving those particular problems and therefore get a high score across these benchmarks but for live bench they claim that this Benchmark does not face this issue and then if you scroll down to the leaderboard note that Claude 3.5 Sonet basically destroys GPT 4 o across all these metrics including reasoning coding mathematics data analysis etc etc and some of these are huge leaps so for example for reasoning GPT 40 only got 48 and surprisingly GPT 4 Turbo is actually slightly better at reasoning with a score of 55 but still CLA 3.5 son it just blows it out of the water with a score of 70 same with coding this is by far the best model for coding at least according to this live bench benchmark so previously these GPT 4 models are only hovering at around 46 47 but Claude 3.5 Sonet is just way better with a score of 63 and that seems to be the sentiment of people who've used it so far everyone's reactions have been quite positive most people have been saying how CLA 3.5 son it is noticeably better especially for coding and reasoning compared to gbt 40 now CLA 3.5 is a closed Source model so we don't don't really know what the architecture is but the team has revealed some insights on the model so for example this person who is head of product at anthropics says 3.5 Sonet is larger than its predecessor but draws much of its new competence from Innovations in training for example the model was given feedback designed to improve its logical reasoning skills very interesting and then in another article the same guy says that the improvements are the result of architectural tweaks and new training data including AI generated data which data specifically he would not disclose but he implied that Claude 3.5 Sonic draws much of its strength from these training data sets and this is a recurring Trend that we're seeing in the latest AI models now it's a known fact that the more data you have the better the model will be this is due to something called scaling laws but the problem is even like older generations of AI models we've pretty much train them on all of the data from the internet already and that data is not enough we need more and more data to make the AI model more intelligent everything else being equal so how do we get this new data well it turns out that you can actually get AI to generate synthetic data and as long as that data is clean and high quality you can append this data to the training set to create a more intelligent AI model and he also implied that not only did they use syn thetic data but they also made some architectural tweaks now if I were to guess there's probably like something agentic going on maybe mixture of Agents or something but we don't know the full details and then finally they say that they will release 3.5 Haiku which is the smaller model and 3.5 Opus which is the bigger model later this year so really exciting times I mean just from the performance of 3.5 Sonic it's clear that we aren't even close to hitting a plateau with these llms we're not seeing diminishing returns each newer generation just gets smarter and smarter and so this is really exciting and there are so many cool things you can do with 3.5 such as creating games creating visualizations creating reports and presentations the sky the limit so definitely take advantage of this and play around with it it's totally free to do so so that sums up this new AI model Claude 3.5 Sonet let me know in the the comments what you think of it and if you've had a chance to play around with it and have created some cool projects also welcome to share this in the comments below I'd love to learn what you built with it as always if you enjoyed this video remember to like share subscribe and stay tuned for more content also we built a site where you can find all the AI tools out there as well as look for jobs in AI machine learning data science and more check it out at ai- search. thanks for watching and I'll see you in the next one
Info
Channel: AI Search
Views: 462,800
Rating: undefined out of 5
Keywords:
Id: b7JCor1DGJw
Channel Id: undefined
Length: 36min 27sec (2187 seconds)
Published: Mon Jun 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.