NEW Korean AI Image Generation SHOCKS Industry | MASSIVE Breakthrough (InstantFamily)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so this South Korean company SK telecom have just revealed their new text to image model but this model is unlike anything we've seen up to this point because it isn't just a normal text to image generation model because this new AI that they've developed can generate families by inputting pictures of different people you are able to generate family like pictures so how good really is this family generation model and will we be able to use it anytime soon let's take a look [Music] [Music] so the South Korean researchers that have created this model work for the company SK Telecom and together they've managed to create this new model which is able to generate family images using multiple people now you might be thinking why is this a big deal because we have loads of other text to image models that can generate really high quality results and without restrictions could generate specific people doing whatever you wanted but the key word of this new model is multi ID it's been a consistent challenge to generate images with multiple IDs so researchers and Engineers have struggled to make a model where multiple people can be input but it seems like there's been a major breakthrough over in South Korea so let's have a look at an image generated by this model so on their research paper that they released they call this model instant family and for their first example they put a load of tech CEOs and big names in the industry together people like samman Mark Zuckerberg and Elon Musk and by inputting these individual pictures of these people into this new model combined with a text input to tell this model exactly what this image should look like and finally a pose input which allows the model to mimic the posing of your choice for this new image and with all this combined information the model is able to generate a family image or as they're calling it an instant family and the results here are definitely very impressive considering that this is something that has been struggled with for a long time this doesn't look like a first generation model it looks incredibly refined and very high quality here further into this research paper they do go into extreme depth on how they've achieved this some very complex topics that you'll really only be able to understand if you're extremely experienced in this field they also stress the fact that this is not a set number of IDs that can be input here so you can pick the number of how many people you want in this image and if the pose data and the text input are the same but the amount of images are changed you will still be able to generate essentially the same image just with less people in it and they have a number of examples shown here ranging from the street to the school to Mars and also here they have used some different people like I can see Taylor Swift in there one more example for you here where they show different scenes cherry blossom jungle and the studio with a number of familiar faces here but again all looking very high quality so as well as all that good stuff they also do highlights some of the limitations of this model two limitations that they do talk about here the first of which being bad Anatomy so they have this example here where the family is sort of bunched up in this pose and in the generated output the model has obviously gotten confused with the legs and arms because although if you just glance at this you probably won't notice anything if you pay attention carefully you will notice there are some serious errors here for example the kid in the back but then below him is a pair of legs that have just come from nowhere as well as that limitation they also mentioned that Faces Sometimes go off the screen so here they show an example of five IDs being input into this model and the fifth person is cropped off the screen and only half of his face is actually in it so this shows that at least they're definitely aware of the limitations that may come with this model and on top of that they do highlight some potential solutions for these problems in their future work that they plan to get to in the weeks or months to come so can you use this model for yourself well unfortunately no you cannot at the moment and I have to imagine this is probably because of the potential deception that could occur with a model like this an issue that comes up time and time again when it comes to these AI image generation models so for now is only a research paper and research model but hopefully one day we might be able to get access especially if this company wants to help South Korea excel in the AI race so that's all well and good but what was this model even trained on well they stated that this model was trained on a data set of 2 million images and those were just images of faces but on top of that there were also 300,000 multi-person images and these are all found from the web so this is a big breakthrough in AI image generation as it's now possible to generate images with multiple people in something that was not really previously possible at least as far as choosing the people you want it to be in the image and as previously mentioned this comes from South Korea which is a country we haven't heard loads about recently in terms of AI it is mainly the United States and then China here and there with a few models like the Vu model which we talked about just a week ago maybe in the future we'll be hearing more from South Korea in terms of AI but a very impressive breakthrough here so drop a comment I'm interested to know what you guys think about this but if you're still listening I really appreciate you like And subscribe if you can it really helps me out and thanks for watching

Info

Channel: Bill Young

Views: 2,526

Rating: undefined out of 5

Keywords:

Id: tH4fYvBI9Uo

Channel Id: undefined

Length: 5min 18sec (318 seconds)

Published: Sat May 04 2024