Google Imagen 2: Everything You Need To Know

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

recently Google and the deepmind team released Gemini and a whole bunch of other generative AI tools including imageen 2 their second generation diffusion image generator let's take a look at it image N2 is built as Google's most advanced textto image technology and you can see some examples here on the right hand side the quality looks significantly better than image N1 let's jump over and check image in one out if you look over here you can see image in one it's said unprecedented photo realism I'll kind of disagree with this this looks more like the very first releases of stable diffusion or even Dolly 2 maybe it's just not very high quality you can see some of these results a majestic oil painting of a raccoon Queen wearing red French Royal gown they just they look like they're almost made out of clay or it's like some sort of claymation or animation it's not really high quality photographic stuff that we're used to seeing from things like Dolly 3 or you st diffusion XL or even mid Journey image N2 though looks like it's quite a big step up you can see a lot of the examples here some of them look like they're pretty decent photographic quality let's look at some of the technology that goes on behind this now they say it's their most advanced diffusion technology delivering high quality photorealistic outputs that are closely aligned and consistent with the user's prompts so I think a lot of these diffusion models what they're doing is they're really getting better at understanding the input that you're putting into it so the actual prompt itself and then generating something that's a lot more coherent a lot more aligned with what you were looking for looking at this photo if I saw this I wouldn't actually know that this wasn't a real photo so the prompt is a shot of a 32-year-old female upand coming conservationist in a jungle athletic with short curly hair and a warm smile I don't know why upand cominging conservationist is there it's kind of an odd thing to add to a prompt but it looks like it works and it it looks good now when we don't know what size images this was trained on there's no explanation to that down in the the Page information but I can tell you that this is a 600x 600 image for what it's worth so I don't know if it's quite as high a resolution as some of the models that are out there now but this at least looks fairly reasonable quality the lighting's really nice the Shadows everything else it looks pretty realistic this one uh The Prompt is a jellyfish on a dark blue background I will say there's not a tremendous amount of detail in the jellyfish here I think you know even Dolly and a lot of the diffusion models do a much better job this look a little bit too I don't know washed out and blurred for my liking but you know still there and then this one a small canvas oil painting of an orange on a chopping board lights passing through orange segments causing the orange light across part of the chopping board matches the aesthetic pretty well it's a nice image so they they build this as advanced and improved image caption understanding it's kind of like what we were talking about before so to help create higher quality and more accurate images a better aligned to user prompt further description was added to image captions and images to training data set I have a whole video on how stable diffusion and generative AI images work so you might want to check that out but the gist of it is when you're doing the training of these data sets you're kind of building out all the images that are going to go into the model what you do is you assign keywords to the images that's going to help train the model so that it knows that okay this image of a b sitting on a hot tub is a b sitting on a hot tub if it didn't have those matches between the words and the image it wouldn't know how to generate those later on when somebody gives a text to image prompt but in this case they took it a step further and instead of just providing kind of short tokens and keywords they added further descriptions so now it's going to have a better understanding a more full understanding of the images that it's going to generate based on the descriptions that were added to the training set so you can think of this is the better the data that goes into the model more than likely the better the data that comes out of the model is going to be and here we can see a couple of examples so for this image they actually gave a hym it's soft Pearl the streams the birds renew their notes and through the air their mingled music floats it's a nice kind of oil painting that represents that poem similarly this is another one it looks like a whale in the sea with a whole bunch of fish swimming around it then finally we have this of a bird so the robin flew from its swinging spray of Ivy on the top of the wall and opened his beak and sang aloud so it's nice that they're able to get more descriptive you're able to pair images more readily with text so you could think of this as you're writing a story in a book and it's able to just automatically illustrate what's being told in the story because of these rich understandings of how text and images relate to one another and the gist of it is that it purports to have more realistic image generation because of all these things image gen 2's data set and model advances have delivered improvements in many of the areas that text to image tools often struggle with including rendering realistic hands and human faces and keeping images free of distracting visual artifacts you'll know a lot of stable diffusion systems and generative AI art has a really tough time with hands and artifacts and images a lot of the newer models have gotten much better at that but you can see these two pictures here actually three that have hands the hands look pretty good this is a cool feature fluid style conditioning so you can actually provide a reference image and then all of the output that's generated is sort of going to follow that aesthetic of that reference image that you provided so in this case you could provide a picture of sort of this floral image this floral texture and then everything you generate whether it's this mid-century sideboard a t-shirt a pillow it's going to carry over that aesthetic to it image N2 supports advanced in painting and out painting this is great if you want to either add images or objects to a scene so if you've got this sort of Studio environment you want to add some additional things to it like a floating bookshelf you can do that or if you want to take something out of a scene you can also do that and then this last part probably my least favorite responsible by Design This means it's going to be highly censored so they say before we release capabilities to users we conduct robust safety testing so what that means is if you ask it to generate a crystal skull it's not going to just like being image search yeah and this is kind of why I like running diffusion models on my own Hardware at home because you don't have somebody holding your hand and telling you what you can and can't do now image N2 is going to be built into a lot of different products across Google's entire Suite of offerings including things like office it's available via API now but you can't access it outside of that and obviously as soon as it's available I'm going to take it I'm going to compare it in Benchmark it against Dolly 3 mid Journey my own stable diffusion setup on my home PC see who comes out ahead of things and with the launch of IM N2 and even things like Gemini Google's really just playing catchup with open AI Dolly 3 and gp4 are both Superior in a lot of ways to what's being released by Google so they're pouring all their resources to make this a successful launch time will tell and we'll see how it pans out I'm of course going to run these through a Gambit of test as soon as they're available to the masses I'll let you know how it comes out let me know what you think down in the comments below and what you want me to test as soon as this is available ailable as always hit that like And subscribe button I'm Brian love it this is all your Tech AI we'll catch you next time

Info

Channel: All Your Tech AI

Views: 1,043

Rating: undefined out of 5

Keywords: text to image ai, ai image generator, google imagen, imagen 2

Id: ZSlsgY_43cs

Channel Id: undefined

Length: 7min 35sec (455 seconds)

Published: Sat Dec 16 2023