ChatGPT-4o NEW Image Capabilities: 3D-Renders, Consistent Characters + More

Video Statistics and Information

Captions Word Cloud
Reddit Comments
gptt 40 is here and it has some astounding visual capabilities that you might have missed take a look at its ability to render out 3D representations of objects as well as generating the most accurate consistent characters we've seen so far now that's not all in this video we're going to break down all of the latest GPT 40 visual enhancements that will give you more creative power than ever before so let's dive in and explore the next Frontier in AI visual techn ology together now first of all we're going to take a look at the 3D object synthesis capability now what this does is it allows you to generate various images of the same object as here they have created a realistic looking 3D rendering of the open AI logo then they put them all together and create a 3D Reconstruction from the six generated images so you can simply ask chat GPT to develop you various views of the same object and then it will be able to pass these together into a 3D reconstruction now there's another example of this this time with a sea lion and the word open AI is etched out on the model and they've combined these together to create this revolving 3D model so this can be very useful for 3D modeling and also representing logos in 3D now the next exciting cap ability that they've showcased on their site is the ability to generate images of fonts and you can then easily translate these into full-blown usable typographic fonts for example here they've generated the letters of this font and they've asked for it to be showcased as a font would be in a font book and they've asked for a font that combines both futuristic but retro elements a molded stamped font and here you can see it's outputed a beautiful and consistent font what's remarkable about this is that it has recognized how to keep the same language between each of the characters inside of the font I have a course on how you can take this type of imagery and turn it into a usable font and even sell that and I'll leave a link to that in the description below now they've also showcased how you can create other types of fonts using this method this one is a Ultra futuristic font and it's beautiful and minimal just look at how they have removed the elements of the E yet allowed it to maintain its communication as the character of an e it's absolutely beautiful and you can see the breadth of design capabilities inside of Designing these fonts is extremely broad here you can see an oldfashioned Victorian font that looks ornate and belongs on a steam engine very specific but they're absolutely beautiful but let's continue because there's lots more to show you the next capability is the ability to take a photo and turn it into a caricature so to easily translate from one type of medium into another and it has a few examples of this turning these different photos into different types of illustrations and you can see it works very well across different facial types ethnicities and from different angles but that's not all now what's particularly interesting is the capability it's displaying in the visual narratives example and this shows firstperson view of a robot typewriting the following journal entries now what's particularly interesting is that it's then able to create another image that is related to the first image for example here it's illustrating how the robot wrote the second entry the page is now taller the page has moved up there are two entries on the sheet and you can see this reflected in the image now what is remarkable is that it's kept all components of the previous image the same apart from the ones that it's been Direct ired to adapt now this opens up the door for creating highly usable storyboards and comic book strips as well as actually using these in a separate way to generate longer video clips with AI and the idea for moving forward with getting longer AI videos is going to follow this process this is what we're seeing emerge as the most likely solution to to generating much longer video clips by taking a long story breaking it down into its constituent parts and generating images that are consistent for different checkpoints in that Series so for example if you had me get up turn around and sit back down on this chair you would start off with one image of me on the chair another image of me standing a third image of me rotated 180 degrees a fourth image of me facing the camera and then a fifth image again of me seated and it would then look at the most sensible and realistic way to animate between each of those images now the final image output in this example is the robot was unhappy is the robot was unhappy with the writing so he's going to rip the sheet of paper here is his first person view as he rips it from top to bottom with his hands the two halves are still legible and clear as he rips the sheet you might have to say there's a little bit of distortion on the text in the second paragraph and it's not exactly clear why that's been distorted and one thing I would have to mention is it has left the original inside of the typewriter so it's very odd that he would take it out rip it out and then type it all again and then that would be a very odd process for writing a book now another example shows them taking the open AI logo also taking a coaster and describing the materials of it and then asking them to Overlay the logo into the coaster and it's done a remarkably good job at previewing this mockup of how the open AI logo could look on this potential piece of merchandise and this shows the possibilities for rapidly creating product packaging and different types of merchandise for different situations now the ability for this version of chat gbt to render text in different circumstances has accelerated a huge amount here we can see them asking for a poem to be rendered on a page so you can see this realistic handwritten poem executed with zero spelling errors so it's been able to take the exact text and then render it out accurately on the page and that's something that's been really challenging to do recently we've had text but it has not adhered 100% to the exact text that we've asked for now the ability for this version to render consistent characters is absolutely astounding here in this example you can see that they've created a character called giri the robot and he is then rendered out in a number of different stances positions and activities and you can see that he maintains a remarkable level of consistency between each of the frames now I'm paying particular attention to the proportions of this individual and he maintains a high degree of Fidelity in every situation and this again opens up the possibility for creating much more complicated narratives and stories using chat GPT now this is another interesting example where they've taken the open AI logo and asked it for a concrete poem in the outer shape of the open AI logo composed of the word Omni so here you can see that they've changed the stroke the outline of the logo to be comprised only of the word Omni which is not a simple task to go about doing it has to understand exactly what that means and then also to create and render an image that solves that exact problem now it's gone a step further and actually overlaid a rainbow coloration to the logo so this is great for taking logos of your own and creating different versions of them for different situations now this is a very exciting example where they've taken two images of two individuals and then asked it to render out a poster of this example using these two characters now it's asked chat GPT to improve this poster and you can see the final poster takes the two characters and also puts them into a poster with legible accurate text and applies a stylistic approach to this with a number of different grungy effects now this is a particularly interesting example because it also shows the capabilities of generating multimodal assets so it's not not just creating image but it's also generating sound so first of all they put in a description for a commemorative coin and then after that they ask for an improvement to this which includes adding symbols around the outside of the coin which represent some of the capabilities of chat GPT so you can see here in the updated version it's taken this feedback and iterated on it to make an improved and more detailed version now finally they've asked it to play the sound of the coin clanging on the metal and it's generated a realistic sound of that now it's also got a wonderful Showcase of how it's uploaded entire video and asked for a detailed summary of this so the capabilities of chat GPT are hugely expanding showcasing our abilities to work across different types of input and relate those together in a coherent and intelligent way this is going to open up huge possibilities for what we can do and they are only starting to emerge as we get our hands on the tool but from my exploration of these tools the key things to understand is the ability to create consistent characters to be able to ask chat GPT to interpret and understand how different objects and characters can relate to each other across different scenes about how you can synthesize different elements together together using chat gbt so you can ask it to take inspiration from one image and another and how to incorporate those together without leaving that to chance I hope you enjoyed this video what did you find most interesting about the visual capabilities of gbt 4 o let me know in the comments and thank you for watching most of all I hope you have a delightful day
Channel: AI Samson
Views: 57,291
Rating: undefined out of 5
Keywords: ai, samson vowles, gpt4o, open ai
Id: tyN7a4BPXyc
Channel Id: undefined
Length: 10min 53sec (653 seconds)
Published: Tue May 14 2024
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.