The Voice AI Nobody Expected (AI News You Can Use)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

another wild week in the world of AI and me and my team have been tracking all the releases and testing some of the most significant ones for you please tell me a secret there's a secret that I've been keeping for years so just like every single Friday let's have a look at every single piece of AI news that you can actually use which means all of these applications are usable or there's some sort of action that is the filter we apply here and starting with the very first one somebody has done the unthinkable they have shipped a version of open AI GPT 40 Voice Assistant that everyone was so excited about and people thought it was going to come out in the coming weeks but they pushed the release dates to fall and now we have a brand new company out of France named cute AI Labs unveiling the open source Moshi AI what is the mhi AI it is a web interface that is super low latency and you can talk to it now it tries to be the voice assistant okay I'm not saying it is just as good as their demos speak with a French accent and tell me a poem about Paris Paris the city of live what time stands to or even close their base model has 7 billion parameters whereas the state-of-the-art models like gbt 40 or mopic models probably have around 400 it's not exactly this close but meta is training llama free with 400 billion parameters that is supposed to be their competitor for GPT 40 that's why I'm assuming this and the most interesting thing about this is they're going to be open sourcing the code for this little realtime voice assistant so people are going to be building it into their applications expect to see this all over the place soon so enough talk let's just hop into this let's try this out shall we because there's two versions of this first of all mh. chat and secondly if you're in the US us. mos. chat these different versions are needed because the main selling point of this is that the latency is super low meaning you can speak to it and it just responds right away you can interrupt it it promises to have emotional awareness and tone modification in its voice just like the voice assistant had so if I go to mhi chat you will be presented with this interface and I can just type in my email okay the way this works is the integrated a queue this is not a weight list this means if a lot of people are using this application it will take you a while to be accepted and for you to be let in here all right so I'll type in my email say join Q all right let's give this a shot as you can see we got in right away it might take a few minutes in other cases so let's talk to it sure how about this it's about a man who was arrested for possession of marijuana all right you know what forget about that tell me a little bit about yourself oh I'm a machine Learning System I'm designed to help people like you okay that's great can you talk to me in a different voice maybe start whis a little mysterious that's right I don't have a body but I do have a voice nobody asked okay let's try again speak to me in a whispering voice I think I'll stick to my normal voice why you so rude man no speak to me as if you were Whispering it's not my normal voice change your voice to whisper I can't why not you're supposed to be to this don't I'm only a machine adjust your voice to whisper please I'll do my best finally inner peace how are you doing not so good I'm still working on it why are you not whispering I can't I'm just a machine okay forget about it can you detect emotion in my voice yes I can all right let's try it what emotion doing it I can detect happiness Moshi you're really pissing me off you're not adjusting your voice and you're not performing as you should and you're interrupting me I'm trying to help what emotion was that I can't I'm a just machine well okay as you could see from that little test this didn't really work as expected during some of my AO testing sometimes it got it right but let's be real most of the time it was rather annoying it interrupts you the latency is really good but it interrupts you the model is obviously not very capable most of the time it's not able to pick up on these requests and the emotional awareness that they sort of advertise they say that it's able to pick up on the emotion in your voice is not really present I was actively shouting at it and I asked it what emotion was that and it told me ah you sound excited and happy so maybe let's give it one last shot hey I want you to change your voice so it sounds like you're Whispering whisper from here and out okay take some practice there you go yes can you speak a bit more mysteriously I'm kind of like the con of the bra I can't really do that but I can tell you a secret please tell me a secret there's a secret that I've been keeping for years what is it it's a about our shitty history tell me more well we first met at a charity event didn't we yes yes we did hello yes and then we ended up here that's how we met that's the secret it's a secret you would understand it's a secret okay so there you go now it worked it is fun to play with so I would recommend you just check it out at the time of this recording it's completely free they're going to open source the whole thing very interesting development in the ipace I expect to see this in a lot of apps as this is going to be open source and people can just build it into whatever else they have and over time the base model is surely going to get better all right but now let's switch gears and let's talk about the biggest release of this week which has been genf free they made it widely available we featured it two weeks ago and said it's going to be available soon this is the state-of-the-art video generator and I personally love following the development of these although there's not many practical applications yet although I did find one actually Motorola used AI video Tools in their ad campaign I'm going to show that to you soon but first I want to feature this post from Andre karpathy here if you're not familiar he was in the founding team of open Ai and was the head of AI for Tesla and he's pointing out the fact that just 7 years ago this was the state-of-the-art image generation when it comes to AI you can't even really tell what's going on here and in just seven years we went from this to this video quality okay and look there's a million threats on X and I and my team actually looked at them all cuz we're kind of obsessed with this stuff it's so much fun to play with AI video yourself if you haven't done it yet I strongly encourage encourage it even if you're not a creative type you can just type in dreams that you had in the past and they will just materialize you can just start combining two objects that makes no sense together and it will just create them and across all these Frets I got to say we have one sort of winner we really like kav Lopez's Fred here yes as you know kave is a friend of the channel but he put together this massive fret that doesn't just feature many interesting examples it also shares every single prompt so you can just scroll through here look at these different examples of what gen free can do and then you can just copy The Prompt and try yourself now mind you this is absolutely a paid tool okay it's not just a paid tool it's probably the most expensive tool if you start playing around if you start going down this rabbit hole because in this generation interface you need credits okay and right now I have roughly 3,000 credits which would account for $30 okay and a 10-second generation uses 10 credits so every 10sec clip costs you $1 that's quite a lot because let me tell you to get this quality level you will have to iterate a lot what does a lot mean well let me give you a concrete example this image of this T-Rex surface a wave really caught my attention I thought it was super cool and I figured I could put any creature on top of a surfboard and that sounded like fun to me so I went ahead and I started trying it out first of all I copied his prompt and I just said a cat with a hat surfing a wave so this cat does not have a hat and it's not really surfing a wave this is utterly unusable and it took me four generations to get a result that I was sort of happy with now this cat does not have a hat but this sort of works right as you can see I went down to 5 seconds so it cost me $2 to arrive at something that is you know sort of okay but I didn't stop there I proceeded because I figured hey I really want to see an image of otter surfing a wave so I generated more and more I even went with the 10c clip to give it more room for error right all I need is a two or 3 second clip but I couldn't find it in there interesting but not what I'm looking for and this doesn't even look like a proper order six Generations later I arrived at the fact that it probably just won't be able to do it look at this this is I don't know sort of a OT surfing I guess but it morphs into a fox and for anybody who's new here why am I spending money on generating order that surf well I recently got this tattoo of UT surfing a wave so I just wanted to make that happen in real life us but it didn't work and I spent like $10 doing it so this is my first and probably main point about this this model is really good but it's only as good as its training data so I bet it has a lot of images of waves and Surfers nevertheless the water is not great it's one of the big weaknesses of a model like this look at the waves everybody that looks a little closer realizes that something's off about these waves that's just how water behaves in all the models even the Sora previews and then beyond thought there's probably not a lot of images of a otter standing still and doing surfing motions so this is a really tricky thing to generate so to be fair I tripped it up right away to give it something simpler like this Lighthouse cinematic prompt okay so this is a shot of a drone orbiting around the lighthouse again the wave issue sure but the lighthouse looks excellent so does the coast and that's because it has a lot of Drone footage in its training data so if I were creating a short film I would feel confident in using some of these drone shots for the production just cuz it looks so great Okay so so I didn't do much more testing on the hyper realistic front I think if you want to look into some of these prompts you can certainly do that in this fret but just be prepared to spend four five maybe even $10 to get something even remotely usable at this point I can guarantee you that most of these results weren't the first shot he took he probably regenerated a bunch of times and then pick the best ones meaning that creating this threed probably cost them hundreds of dollars what did it cost everything but let me return to what I said in the beginning and round it out on this note you can do really fun and interesting things here if you have a favorite painter for example you can inte create his style into whatever scene you want so check out this shot that I created here's the prompt walking through the busy streets of a medieval fantasy Village in the style of heronimus Bosch third person RPG GTA okay I do have a typo in here but this worked out great if you're not familiar this is a Dutch painter with a very distinct very detailed style I thought this would look great and I really like the way this came out have a look you have this GTA intro pan and then he just kind of walks through this medieval village look at all the details look at all the characters on the sides and the detail in the building in front of him this is super interesting and then here's another one just super interesting and as we're on the topic of weird and trippy visuals let me show you my very first generation with the model which was a dance party full of cats with hats all right this is one of the weirdest things you got to see all week just have a look at this particularly if you look at this one morphing into a zebra and back I don't know it's wild probably not very usable but so much fun if you have a few dollars to spend to play around with this I can only recommend it it is by far the best video model we've ever seen and it's accessible to everybody now all you got to do is top up credits and then the generation is quite fast though let me just do news you can use YouTube studio and let's see what it thinks I'll generate two 10-second Clips like so as you can see it will populate them here at the bottom and yeah we'll do a little cut to skip forward but as you can see we already have 7% of the first one generated and they're working in parallel 7 10% so as you can see this is quite fast let's just cut to the end of it and look at the results a few moments later okay there we go about a minute later we have the first result the second one got sort of stuck but as you can see you can work in parallel I could be working meanwhile generating new things while this was created then let's just press play and see what we got here okay interesting well look clearly AI generated not very usable I think what might be more relevant here is creating a backdrop and prompting it in more detail so it doesn't include the presenter and then just the studio background and then I could potentially cut myself out and put this in the background but as is this is not really usable just classic funky AI video so just because this model state-of-the-art doesn't mean that everything it bits out is gold it just means that the potential for actually usable footage is finally there all right next up we have a quick one and this is the 11 Labs reader app 11 laps has been shipping as of recently and what this does is simply described for now it's a IOS app that is only available in the US so I didn't get to test this but if you're in the US you can download this and you can read out any text on your phone with the 11 laps voices which as you know are still probably the best AI voices there are I guess also UK and Canada got this release and there's more because they already shipped the new feature just a few days ago here and they call this iconic voices and you can access these iconic voices in their reader app now you can have James Dean or Bert rolds read out your text from the app quite neat check it out at that moment Dorothy saw lying on the table the silver shoes that had belonged to the Witch of the East quite neat and independently of the app they shipped one more feature this week and I won't be spending a lot of time on this because we got this from Adobe in like February 2023 already and it worked super well and then a bunch of other companies released it too but now also 11 Labs is AI tool that isolates voices meaning if you have noisy audio like this need to remove background noise from your video and you can turn it into this for crystal clear audio again nothing new but now they offer this too and on a similar and also very quick notes sooner released a mobile app where if you're into generating AI music now you can do it on the go but again this is limited to iOS and us only with Android version and a worldwide roll out coming soon as soon as they integrate more multilingual capabilities to the application that's what they said so I guess these last two releases are more like AI news you can use if you have an iPhone and live in the US and then on a quick side note also Luma AI Green Machine released a brand new feature that I want to cover here it's called Luma key frames and what this means is you can transform one thing into another so you can create these smooth transitions with AI video they were state-of-the-art last week and now Gen 3 is out which is a bit better but you have this key frame feature so what we did is we actually went ahead and we tested this out and we wanted to see how well this works in practice because my inition on it was this is just a sort of of thing that looks really good in a demo if you run it 100 times but then in practice it will just take forever to get anything usable and as it turns out from our testing that intuition was exactly right so as I showed you on last week's episode I think we created this Star Wars trailer for our team kind of a meet the team video where we featured good versions and then evil versions of all team members like so you can check it out in last week's episode but these mid Journey generated images are the perfect thing to run through something like Luma lab's new feature okay so we ran it for every single team member and let me tell you eight out of the team members are borderline unusable it just did a hard cut in the middle of the scene which is exactly the opposite of what I was hoping for as you can see first image moving slightly hard cut to the second one there's some movement this is not great but again if you do iterate a lot with this you might get something usable this one of ariad actually turned out kind of good I like the lightsaber transition then she arrives this is kind of cool and then Larry's one where both characters were super interesting like Yoda and the evil one that's kind of nice but again there's a hard cut it just works with the motion that it inserted so I don't know maybe I was doing something wrong they are super interesting look at these characters moving but it's just not the smooth transition that I was hoping for and one more thing when it comes to IID it's a real world use case it's a Motorola ad okay and you can see these images on screen as I talk about them so as you can see this is clearly the Motorola logo in various fashion styles every time their outfit represents more or less the Motorola logo so I guess what happened here is they didn't disclose exactly how this was made but they probably used a workflow that includes control net and stable diffusion which allows you to insert logos and then you can create images like this with stable diffusion and then they run those images through something like stable video or now maybe even genre although this looks worse than genre I'm guessing this will probably be stable video and then they edited them together put some music behind it and voila you actually have a commercial advertisement for a corporation so yeah there's a clear you know lack of real world use cases for AI video so far but slowly but surely companies are picking up on it it's able to do some things that just aren't possible on a realistic budget in the real world and I'd love to see it check out the ad and I only expect more and more of these to happen and companies to book things like this as the tools improve and if you paid attention in the past few minutes of this video they clearly are improving all right so let's move on to the next story here which is very quickly covered it's a new perplexity search feature they call it pro search and what it does it has multi-step reasoning and access to math programming and Wolfram Alpha in its search okay this is a premium feature you can try right now if you're a subscriber to perplexity essentially it's a step towards a more agentic search where it just figures out the right way to give you what you need even though it might be more comp Lex interesting development relevant to anybody using perplexity regularly and I want to just switch gears here and talk about something a little more fun and it's actually one particular site from last week's web Sim AI that we featured if you haven't seen last week's episode it's app where you just type in a URL and it creates the website from scratch for you with Claude 3.5 Sonet and one of the funnest ones I've seen yet is this interdimensional Cable ONE as shared by Carol in our community if you know you know and if you don't well this is one of the most interesting and fun F experiments in the animated show called Rick and Morty Uncle Rick basically brings back a TV set From Another Dimension where you can look at TV stations across the Multiverse meaning you never know what you get and a lot of it is super random baby legs you're a good detective but not good enough because of your baby legs they basically rebuild this with webs Ai and if you load the page you never know what you're going to get it's just the most random [Music] videos it's it's really super simple super random probably something that will mostly resonate with Rick and Mory fans like myself but I just wanted to share it because not everything has to be about productivity Gams I really really love the fun side of AI like the funky video Generation stuff or things like this like the trippy video generations of middle-aged RPGs in the style of herous Bosch or interdimensional cable being built by an AI from scratch this is what gets me excited and I just wanted to share that this is also what the community is about it's not always just about the serious stuff for example the most in depth comment discussion of the last month resolved around this clip of an AI girlfriend that Aon posted here hello I am your AI girlfriend when you need me like the comment section of this turned into a proper Forum post and I love to see it as this is a less serious topic okay moving on and I'm just going to touch on this very briefly there's a brand new uncensored multimodal model dolphin vision 72b and running this will be quite hard it's not very popular yet you need a monster PC to even run this or you need to rent a lot of gpus I just wanted to point your attention towards the fact that things like this are starting to come out this an alteration of the quen 2 model and it's just an interesting hint at where we're heading in the future multimodal models without any restrictions that's pretty wild anarch anarch and this is the largest parameter size we got to date so if you know what to do with this good luck out there if not you're at least aware that the uncensored models are getting Eyes Ears more capable it's going to be a wild future when the open source Community really ramps up and then all sorts of apps that we can't even predict existence of right now will pop up over time cuz again this is fully un sensored there's no restrictions on this you can do anything with this which is scary but will also cause some interesting results okay next up the figma story let me summarize this really quick for you because a lot happened okay basically figma announced a bunch of AI features a few of those are already accessible in their software and I think the one stand out feature really is this prompt to UI feature you just prompt a weather app and then voila it creates the entire weather app for you you didn't have to do much and you're almost done really damn impressive well yes until figma came out and actually disabled this design feature because it appeared to be ripping off Apple's weather app one to one okay here ND Allen reports on this you can look at the figma Generations versus the Apple interface it's almost the same thing so this was a bit of a scandal but nevertheless they announced a lot of AI features and shipped a few of them then disabled this one I think the second stand out feature that I want to point your attention towards is really this visual search and I see this popping up in more and more apps it's multimodal models being built into the apps so you can use natural language to search for certain images or visuals in your project without them having the title or metadata to be searched over so in other words you can use something like GPT Vision to search for your entire Library things like this have been shipping to the iPhone photos app since a while it tags people it recognizes what's in the image but now that some of these Vision models are open source expect this to come to all sorts of apps so on this page that I'll also link below you can see which features are actually available right now so figma slides is openly available in beta right now to all users most others are limited and you do have to join a wait list for them so here's the weit list link for you it's a bit hidden in the help center here if you scroll down about oneir of the way you can join the Voit list here you just log into figma you click on this everything we launched at config 2024 post and then in here you can join the weit list and if you're not a figma user now you're aware of where the UI design space is heading and we're almost wrapped up here but I wanted to feature this this is a Google crossword game that uses AI okay and it resets regularly you can just join in and how this works is really simple you just go in and play the crossword but here's the AI integration that I wanted to point you towards cuz it's a really interesting one a wait to improve the performance of Gemini applications I mean okay let's see let's try and solve this here's the AI integration part I can get 10 hints per crossword so I can just ask is this a prompt engineering technique and all the AI will do is answer with yes or no it's saying yes you're on the right track so I could go further here I don't know Chain of Thought nope I think you get the point you could keep prompting it it could keep giving your hints and it only answers with yes or no no matter what you do it mean the team tried to prompt inject this get other results and it just refuses all it get is a yes or no interesting yeah implementation I want to show you let's move on to the next one and I'll keep this one brief but it's basically a new leaderboard from hugging face so as you might know hugging face is the prime location for large language model evaluation across the internet and they always had these leaderboards but then many other companies came out with other leaderboards we have LMS yss chatbot Arena then two weeks ago scale AI came out with their brand new approach to ranking some of these models there's just a lot of good ones out there so they completely overhauled the way they rank now and they're pointing out the obvious problems results in papers are not reproducible and some of these benchmarks just not being reliable enough so here's the gist of it and what we actually did here is figured out hey this article is way too long and way too detailed we just want to hear about what's new with this model so I took a technique from one of my latest videos here the best way to summarize anything with GPT 40 and there is this specialty prompt in here it's a chain of density prompt and I'm going to use it inside of clae because I found claw to actually be superior at summarizing I'm going to copy all of this paste it as an attachment here and I'm going to get a concise summary that works better than any classic summarizer prompt and as I talked about in video These summaries at the bottom are just the highest quality summaries you will be able to get from llm in my opinion so what is this new leaderboard about well it introduces the mlu pro GPT QA and MSU benchmarks we talked about these before on the channel these are some of the most reliable most advanced benchmarks out there it normalizes this course there's a community voting system to address contamination the quent 72 billion model instruct currently leads and then they state that this is a crucial tool for the modern AI landscape and there you go you can check out the open leaderboard right here all the open sour Source models ranked and that wraps it up for this week I hope you found something that was useful to you personally subscribe to the channel as I do this every single Friday and with that I'll see you soon and I hope you have a wonderful day

Info

Channel: The AI Advantage

Views: 29,353

Rating: undefined out of 5

Keywords: theaiadvantage, aiadvantage, chatgpt, ai, chatbot, advantage, artificial intelligence, openai, ai advantage, igor, gpt-4o, igor pogany

Id: ERz0F85Vf6Y

Channel Id: undefined

Length: 22min 39sec (1359 seconds)

Published: Fri Jul 05 2024