NEW Chinese AI Vidu SHOCKS Open AI By Beating Sora (Text To Video Generation)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

a company in China called Shang Shu technology have just unveiled their new text to video model which has been made in collaboration with a university in Beijing they demonstrate different prompts and the results they can get from their model resulting in high quality videos that rival that of open AI Sora and after Sora was shown and shocked the world no one thought that this type of model would be coming out from any other company and it seemed certain that open AI was far ahead in every aspect of artificial intelligence especially with textto video models as we've seen nothing from any other company that comes even close so how good really is this new Chinese model let's take a look at [Music] this so this company is called shangu and they've created a new textto video model in collaboration with chingua University and they introduced this model by posting a video on their Channel called meet vdu a new Chinese text to video AI model so vdu is the name of this new model well what is it capable of they have a number of example prompts which have been chosen specifically to compare to Sora so they've used very similar prompts to what Sora has used on their examples and they've done this to show the level of competition that they're bringing to the table but before we look at some of these examples who are shangu and where did they come from because if you're like me it's probably your first time hearing of this company well shangu technology seems to be a fairly new company being founded in 2023 the amount of information online is lackluster which is to be expected as they are a Chinese company and I only speak English but I did manage to find their website shangu ai.com and as you'd expect it is all in Chinese but after doing some translation and clicking around the page this is what I managed to find so the company was based in Beijing and was established in March of 2023 and they say that the core team members come from the artificial intelligence Research Institute of chingua University and this lines up with what we saw earlier saying that they were collaborating with this University in Beijing they say that their purpose is that they're committed to building the world's leading multimodal large scale model which integrates text images videos and 3D information and that is for the purpose of advancing human creativity and productivity as well as that I did find a page announcing their new text to video model Vu so let's take a look at what this model is actually capable of [Music] so with all of these examples The Prompt isn't actually explicitly stated but as I said earlier a lot of them do align with the examples that Sora showed so we can assume for those videos specifically that the prompt was the same as what open AI used this example here of the prompt of a woman walking through Tokyo was one of the first examples that we saw of Sora and so now we can see how vdu does it compared to Sora and we also do have a man and for some reason a bear I think just from this footage alone it's evident to me that although this model seems to be fairly capable it is definitely not as refined as Sora it just seems like opening eyes Sora seems to handle everything better from the physics to the visuals here again another example that we can directly compare with Sora this little boat and seeing how it interacts with the waves to test the model's understanding of physics same deal as before it's evident to me that Sora is ahead and their results just seem to be far more refined one more for you here there's a panda bear playing a guitar personally I think this one is absolutely terrifying however although this model is lacking behind Sora I still think this is a massive step forward for China in Ai and it was just yesterday we were talking about a new language model that came from China so seems like it's a big week for China because if you remember that video of Will Smith eating the spaghetti and it was absolutely Ying it looked nothing close to real didn't resemble a real human at all and that was just one year ago and that was the comparison that people were using with Sora when the examples for that came out and I think it's good to use it here as well because although this model can be rough around the edges this is the worst that it will ever look because they can only improve from here and even if we didn't have Sora and we just had vdu it would still be a massive step forward compared to the Will Smith eating the spaghetti video which is only a year old anyway it would still be an example of exponential AI progress so what are the capabilities of this model well on their website they say the model supports the generation of high definition video up to 16 seconds long and that resolution is 1080p they say that it's able to simulate the physical world and also has a rich imagination and they do mention Sora saying that they are the first ones to achieve this since the release of Sora and then they go on to talk about the technical side of this and how they plan to improve it in the future so pretty crazy stuff here like I said I definitely think this is a big win for China despite the fact is lacking behind Sora but I'm interested to know what you guys think about this so let me know down below and if you're still listening I really appreciate you drop a like And subscribe if you can it really helps me out and thanks for watching

Info

Channel: Bill Young

Views: 2,278

Rating: undefined out of 5

Keywords:

Id: Uzun2HVbONs

Channel Id: undefined

Length: 5min 14sec (314 seconds)

Published: Sun Apr 28 2024