Mistral Large vs GPT-4 vs Gemini Advanced (Prompt to Prompt Comparison)

Video Statistics and Information

Video

Captions Word Cloud

Captions

so I have a good news and a bad news there's finally a model that is almost as good as gbd4 the other good news is this was released by a company that created mral 7B which was an open-source model and the bad news is that the same company that released an open source model and went viral I don't think they have plans to continue open- Source anymore so if you go through the model details here so you can see that it it comes very close to the perform performance of gp4 on Common Sense reasoning truthful QA and then even on the language multilingual capabilities it supports all the major European languages like French English Spanish German and Italian it has 32k context window it does really better on math but unfortunately you will not find open source anywhere I was kind of hoping to see if we could you know get a gp4 level open source model this year and mral seem to be one of the very close contenders to to do that it seems like that might have changed after their interaction with Microsoft so you can see the model itself is released on their own platform which is this platform chat Mistral and the other place where they've released this model is on Microsoft azour which does not surprise me with respect to the fact that they've not open sourced weight just like they did for their previous viral model I'm assuming that might have been a marketing tactic of sorts but anyways what we're going to be looking at is comparing this model with a couple of proms when it comes to gbd4 that I'm going to be running on perplexity Ai and then Gemini Advanced model so we'll run a couple of proms Cod wise or reasoning wise math wise and see how good or bad this model is with respect to the overall performance so let's start by a basic reasoning question that only gbd4 was apparently able to answer right for the very first time and this trust me like this is a very basic problem I sold two cars yesterday I had three this morning how many cars do I have now both Gemini and the other open source models that I had tested on said that I have left and this one seems to give me good reasoning with respect to the answer because the previous models open source models that I tried on including Lama did not give me the correct answer for this let me take the same question put it here on gp4 which I'm sure will answer the question correctly but let's just do it for the sake of testing it out yeah you have three cars I'm not surprised here and there you goes I I don't understand why in the you know why on the reasoning side Gemini did somewhat better I think yeah on the code side not on the reasoning side but look at this thing I mean this is the most advanced model by Google it still gives me this wrong answer for this basic question right really weird even the open source models like Mr Large are now doing better by the way this is where you can see the m Mr if the model is correctly selected so I have Mr lar selected now let me ask another question but this time more on the lines of a creative generation rather I'm going to turn off the co-pilot here let me ask one question fresh context so I'm going to give some garbage information and then have a word dope and I'm going to say write a poem on this see if we can get anything creative done here I know this is nowhere close to creativity but let's see if it can turn garbage into something meaningful so it kind of took that as a literal and arhythmic twins okay unclear A Corde in the realm of keys a dance begins something rhythmic twins a pattern unclear a code yet unknown in the worlds of in the world of words a new tone is sown interesting LD cryptic a tune So Dope like a secret message it gives us hope interesting all right so gbd4 seems to have done a decent job too but it kind of broke down the word right so you can see it it's using the breakdown of this words like Po and no and it's using this word separately rare air it's not too bad I'm not going to read this because this is straight up garbage and there you go this this I I don't know if this is the video to test out meal or say that Gemini still sucks look at this I'm not happy with the generation at all I mean I know that it makes no sense and that's exactly why I pred the two FedEd to the model I mean it seems like the over cautiousness of the model is going to kill the model now let's give out a complex math problem and see if we are able to come up with anything on those lines found this way very basic math problem that says which number is equivalent to 3 4 / 32 let's feed this to the model let's start a new chat and feed this here cilus off and finally to Gemini Advanced now I don't know what the correct answer is but I'm going to refer where I got the question from so hold on so your correct answer is nine and mral was correctly able to give us the answer dividing the powers by same B you can subtract the exponent of the denominator from the exponent of the numerator which is 3 4 / 32 simplifies to 3 4 - 2 which is correct which is 3 squ which is equal to 9 and gb4 without the surprise got this correct as well for a change even Gemini got this correct but I don't see the value in the reasoning because if you look at this give giving me 3 4 - 2 even here it's giving me 3 4 - 2 but on gemina it's directly coming to the answer so over all like I would rate these two as okay answers but this one is missing one important step you know important step of the reasoning it says we can simply expression by evaluating the exponents and dividing by the numbers so it's kind of doing the whole you know exercise but not using the formula here which is correct I'm not saying it's bad it says that I would rather recommend having like a correct flow to have like like scalable answer so let's say if this was what is the answer for hold on now solve for let me complicate this problem right so now if you raise to something like this even with these formula this wasn't wouldn't be easy but you know at least it's giving us the way to solve it right now if I put this same here on Gemini will it do the whole thing so it's better to know how to solve the problem you know using some sort of method that exists then actually going through like shortcuts but seems it done it has done a decent job here so it says it cancelled the common factors again it's still not giving me what I want but anyways I'm I'm I'm going to you know keep moving and this time we'll be looking at a code problem write code for snake game in Python this is a classic python problem and I'm going to feed this across GPT and Gemini as well and we'll try to see if we can run the output generated by one of these models and run it locally to see how you know if if what what's generated does actually work right so it's generating the output here you can see as for gbd4 it's way faster and it's using pi game which is something that I was looking for because without this there I don't think there will be an execution window where I can run it and Gemini is also using py Game apparently in the benchmarks this performs better than the mral large model so if you look at the codings coding comparison here it even performs better than gbd4 so I would be surprised if this gets this wrong if I look at the output gbd4 output looks way detailed than this but let's for you know for the sake of testing it out let's copy this and see if we can run it locally to see if it actually works this is the output from mral but I don't think this is going to work let's let's try it out so it seems It's Working on the code that we copied from Gemini I think you can see I'm actually playing the game here it's not too bad these yeah not bad so I can definitely write a snake game using Gemini sure all right let's try the code from other models now you can see I can press Q to quit this is from gp4 and this seems to be working as well it has this nice fancy background and the snake seems to be moving relatively slower than the one that has on the Gemini output now let's finally try this one fairly going confident it's not going to work but let's give it a try anyways ah interesting you can see it's working in terminal but apparently you I I see it took the command but you can see it's it it will keep continue to fail so again overall it did give me the code but it is not something that not in a way that I was looking for right so I mean if if you want to just run something Pi game is like the most recommended library for it you clearly saw it won't work in terminal so going to one final prompt and this time a biography of an insect who lives for a day this is a fairly complex bio so I'm just going to feed this here to mistal and then I'm also going to be feeding this on gp4 I'm also going to be feeding this on G by the way if you're interested in knowing more about perplexity go ahead and check out my master class it's on YouTube you will be able to learn every feature about perplexity they apparently have already have a AI agents integrated so definitely going to take a look at how it works it's an amazing tool I'm not using Google I'm not using chat gbt anymore I'm primarily working on perplexity for all my use cases so it's very interesting tool and I'd be excited if you'd want to try it out and you know I have like a complete free master class on YouTube so I'd be interested to share your thoughts on it either ways I think the output is ready and you can see the ephemeral life of epema Dena a one day Wonder interesting and it kind of wrote a short bio here living underwater for about a year which wasn't which isn't what we were looking for if an insect who lives for a day all right so this is clearly not going to work mayfly right so in the Lush Banks of wandering River a tiny nymph there was a mayfly a member of a femal order destined to live just for a single day in a s form spent a year apparently it seems like an insect that lives for a year but only takes flight for a day it seems right so and I think that's going to be the case for the biography here as well again you would see see the generations this and this seem relatively connected but this seems like gpt3 for some reason right so overall I think this is good and it does compare somehow when it comes to the output generated by gb4 it's again not there perfectly but honestly like for the promps that I tried this did a really good job the only bummer is that this is not open source and I wish this was open source because we could definitely say today that we had an open source winner versus gb4 but it seems that's not going to happen anytime soon and I hope it does so anyways thank you for watching guys I hope you learned something new from the video and I'll see you guys in the next one

Info

Channel: AI with Yash

Views: 1,419

Rating: undefined out of 5

Keywords: generative ai, ai content generation, chatgpt, gpt-4, stable diffusion, ai content, gpt-5, gpt5, perplexity ai, Mistral Large, generative AI, open-source AI, is mistral large open source, au-large open source mistral, check points mistral large, mistral large hugging face, mistral large azure, GPT-4, Gemini Advanced, AI comparison, prompt comparison, gpt-4 vs mistral, gpt-4 turbo vs mistral, gemini advanced vs mistral large, gpt 4 vs mistral large, gpt 4 turbo vs mistral large

Id: BYd-UB1pnOc

Channel Id: undefined

Length: 11min 16sec (676 seconds)

Published: Tue Feb 27 2024