Mind-bending AI: The Future Looks Crazy

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so I have a habit of collecting really cool AI research and tools that I come across and building it up for weeks and weeks and weeks and then making a video like this to share all of the cool stuff that I've come across over the last several weeks some of the stuff is recently released research where we can use demos on places like hugging face or Google collab and some of it is research that the creators are kind of showing off what it can do but we don't quite have access access to it yet I love making videos like this because this is literally the future of AI and visual effects and I love these little sneak peeks into where everything is going so let's start with image generation now I'm personally in this like AI bubble when I'm on Twitter or X or whatever and lately I've been seeing a ton of this 0123 plus which essentially is an AI where you can upload a single image for example this fire extinguisher from this single image it will actually show you what that image would look like from multiple other angles here's another example of an image with a ghost eating hamburgers and you can see that same ghost eating hamburgers from multiple angles and viewpoints I have this headshot image of myself on my desktop I'm kind of curious to see what happens when I pull this in we'll drag and drop it right into this hugging face demo here submit it and 15 seconds later I have my uh face from all sorts of different angles I uploaded this cool AI generated wolf image and I've got different angles on this but it seems to struggle a little bit with the wider aspect ratio of the image cuz it kind of cut off the wolf and all the images but pretty cool that you can upload a single image with one perspective and actually get a different image of all the various perspectives on that image here's some more research out of Microsoft called idea to image the code isn't publicly available yet but basically it's going to let us do all sorts of cool stuff to to generate images closer to what we're looking for so some of the examples that they share here they've got this object count five people sitting around a table drinking beer and eating buffalo wings not bad I actually think Dolly 3 does that pretty well already a logo suitable for a stylish hotel and it generated this logo here here's another example photo of the object pointed by the blue arrow and a brown Corgi so they uploaded an image of an arrow pointing at a ball and it noticed that this was a ball and it put that same ball with a corgi this one they uploaded an image of somebody playing tennis here and gave the prompt a cartoon drawing of Mr Bean playing tennis with the same clothes and pose as the given image you can see the result output over here on the far right Mr Bean wearing a yellow shirt with the same exact pose as the pose in this picture here image manipulation a drawing with the background change to a beach so it's this image here but then they put a beach behind it in pretty much the same pose with the same person in it they were even able to put two images in here photo of Bill Gates with the same clothes as the given image with a dog that looks like this one in this image you can see on the far right here Bill Gates wearing a pretty similar suit and a dog next to Bill Gates that looks like this dog just imagine how much more dialed in we can get with our images and get to the exact idea that we have in our mind using some of this stuff check this one out blending images for new visual design a logo with a design that naturally Blends the two given images as a new logo the first image is this stethoscope with a paw and the second image is this little pug image here and look it generated this logo of a pug with a stethoscope around it this one to me is really really impressive again not something that we have public access to yet and if you want even more ideas of what this is capable of here there's all sorts of examples that they share down here that you can click through and see exactly what it's capable of really really killer stuff I'll make sure I link to this in the description below I link link to all the research that I'm sharing in this video and it looks like Matt vid Pro also made a video on this exact research so check that out for an even deeper dive into what this idea to image is capable of next up let's check out pixart Alpha fast training of diffusion Transformers for photorealistic text to image synthesis some of these names just roll off the tongue so first off just look at some of the images that it generates now these are some really good images right you've got real beautiful woman uh Luffy from One Piece a poster of a mean iCal cat technical schematics viewed from front nature vers human nature in my mind these are like mid Journey Dolly three level results here these look really really good but we've already got Dolly and we've already got mid Journey so what's so special about this well they've managed to really optimize the training of this model so if we scroll down here the training of this is so much more efficient than what else is available out there you can see the CO2 emissions of the various models that are available Dolly 2 to train it had the CO2 emissions of five humans imagin had the CO2 emissions of8 of a human stable diffusion 1.5.7 and then pixart Alpha 07 and then the cost to actually train these models Dolly 2 it cost $2.14 million to train Dolly 2 imagine $366,000 stable diffusion 1.5 320,000 pixart Alpha 26,000 so it's just a much more efficient model of training but the results are on par with something you get out of a mid Journey sdxl or Dolly 3 I really really love the contrast in these images they remind me of like something I'd get out of mid Journey it also works with control net so you can upload extra references like reference images and the outlines and and things like that to dial in the image just like you would in a normal stable diffusion generation it also works with dream Booth which if you've watched my past videos dream Booth allows you to train your own sort of objects or likeness into the model so you can actually train your own face into this model if you wanted to or your own dog's face into this model and there are some more samples on this page and they're really really good now on their project page they do have a hugging face demo button but when I click on it it doesn't appear to be an actual working demo yet next up check out hyper human hyper realistic human generation with latent structural diffusion this is a model with one goal in mind to make the most realistic looking humans possible you can see here a young kid stands before a birthday cake decorated with Captain America look how realistic this person is a man who is sitting in a bus looking away from the window you see an image like this on Instagram and somebody tells you it's a real photo probably going to believe him an older man is wearing a funny hat in his dining room man sitting on brick covered ground appearing dirty and tired they've got some other examples down here that compare it to some other models if we take a peek here you can see we've got a man riding skis down the side of a snow covered ski slope and this is what it generated super realistic compared to all these other ones that are here a pedestrian walks down the snowy street with an umbrella ultra realistic something going on funky with with his left leg compared to his right leg but compared to these other images here pretty dang realistic the sdxl one's pretty dang good though and then finally a man riding on top of a brown horse while wearing a hat this is their generation compared to these other models sdxl is pretty decent but the realism on this model so dang good so if you look at this image the skateboard is all jacked up but the person on the skateboard looks pretty damn realistic but compared to these other models stable diffusion 2.1 like what's going on this person's arm is as long as their entire body deep Floyd just lacks the detail sdxl this person's got three legs human SD like this person looks like they got ran over by a car that's just the image generation Tech that I wanted to show you I've got so much more cool stuff to show you here around text to video text to 3D World text to 3D objects before I do I want to quickly tell you about today's sponsor which is wirestock you can learn more about wirestock over at wirestock doio but if you're not familiar with them they are a platform where you can upload your images or photographs and they will distribute them to all of the stock photo web sites for you and many of the stock photo websites now allow you to sell AI generated images as well sites like AO Adobe stock 123 RF dreams time and free pick all allow AI generated images so you can generate your AI art upload it to wire stock once it will distribute it to all of these sites for you write a description write a title add the tags for you and let those sites know that it is AI generated so that you're in compliance with their rules and you don't have to do anything but upload the image in fact one other really cool feature of wirestock is you don't even need a tool like mid Journey or stable diffusion or Dolly you can now generate AI art directly inside of wir stock by click on this generate button at the top and you can generate your own images you can upload an existing image and reimagine it you can upload multiple images and mix the images together and they just added a brand new feature where you can actually change the face on an image so if I click into this image for example you can see there's now a button down at the bottom that says says change face I can click that click on upload a new face pull an image of my own headshot in here and now if I click apply it converted the man at the computer to me at the computer it's a pretty cool new feature that lets you even further dial in the images that you're looking to create before sending them to the stock photo sites now this reface feature is a premium feature but if you use the coupon code Matt 20 at checkout you get 20% off a premium membership of wirestock you can find it all over at wirestock IO once again if you do decide to upgrade to the premium account use the coupon code Matt 20 thank you so much to Wi stock for sponsoring this video I do really appreciate you guys here's something I came across on Twitter that I thought was pretty cool from Jared Lou he says that the latest version of Adobe Express now has the ability to create character animations from your voice now he says this is voice to AI character animation but I couldn't actually confirm that this was using AI so I'm not 100% sure it is but regardless I think it's a pretty dang cool feature if you check out Adobe Express at adobe.com Express click on get Adobe Express free here I can scroll down and under suggested quick actions there's this one that says animate from audio if I click on this we have the option of multiple characters and by the way that character that I just saw a second ago this guy he looks very familiar I think this is a trick that my buddy Olivio SAS here has known about for a little while cuz if you go to the end of one of his videos he's got this little dude uh This is the End screen there's other stuff you can watch like this but now I know how he did it it looks like he probably made it with Adobe Express using this toll character but let's go ahead and use a Talking Taco cuz everybody knows I love tacos and I can change the background I can make it a transparent background and then I can put the animation over any scene that I want or I can use one of their existing backgrounds here I'm a huge baseball fan so let's put the Taco on a baseball diamond I could scale my character up or down down let's make it a giant taco and put it right in the center there and then I can change the aspect ratio but I'm just going to go ahead and leave it at 1 one and I can record hey my name is Matt the taco and I love baseball and I love eating tacos does that make me a cannibal now it says hang tight generating a preview and here's my video hey my name is Matt the taco and I love baseball and I love eating tacos does that make me a cannibal I think some of the animations are a little bit more PR announced if you use one of these actual characters here uh using the taco was probably not the best example to show it off but if you want to see a really good example go watch olivio's videos cuz he does it at the end of every single one of them again not 100% positive this is using AI but it's really cool nonetheless and I want to share it with you all right now let's shift into text to video text to video has gotten so good lately you've got Runway Gen 2 you've got P laabs you've got Moon Valley you've got morph Studio you've got animate diff there's so many cool text to video options out there and now we've got this new one called show one marrying pixel and latent diffusion models for text to video generation and if you look at some of these examples here they look much more realistic than what we've gotten out of some of the previous text to video and it even looks like we can generate text inside of our videos now so check out this comparison using the prompt of panda besides the waterfall is holding a sign that says show Lab here's show one where you can see the panda waterfall and the sign that says show Lab this one no sign at all but you do have a panda and a waterfall this is model scope zero scope you got some letters in there you got the Pand on the waterfall and then some random letters floating in the sky and then Gen 2 is well you can see what I see here's some more examples look at the snail right here snail slowly creeping along super macro closeup high resolution best quality model scope's actually pretty solid but I mean look at the colors in show one it just looks so good zero scope we've got like a swarm of snails and then gen two not great compared to the rest speaking of text to video we have new research called motion director motion customization of text to video diffusion models now this is really cool because you can actually give it an input video and then have it generate new ideas based on the combination of the input video and your text prompt so they uploaded a few videos of people lifting weights and then gave it the text prompt a bear is lifting weights and you can see here's a video of a bear lifting weights sort of following the way the the video was doing it a dog is lifting weights you can see here's the video of that they input a video of like a drone shot going around this house here and then they gave a prompt of a pyramid in a forest and you can see it sort of circles around this AI generated pyramid in the same way it circles around the house a temple on a mountain same idea here's a input video of a car running on a road they changed it to a tank in the desert and a tiger in the forest and it sort of followed the same path as that car here's some other examples here some input videos of people playing golf you can see it compared to tuna video and several zeroc scope models and then the motion director model which looks a lot more like a monkey playing golf than the rest of them and this page has several other examples that you can explore and see what this model is really capable of now let's talk about audio with AI This research recently came out called salmon speech audio language Music open neural network it allows speech audio and music inputs and you can essentially chat with the audio to ask questions about the audio and this one does have a working demo live over on gradio here for example you can upload an audio they have some examples here let's go ahead and use this first gunshots one and take a listen can you guess where I am right now so if I click upload and start chat here I can ask get a question what sounds are heard in the background gunshots and explosions are heard in the background what is the person saying the person is saying can you guess where I am right now so it listens and it can hears both the sound effects and the person speaking and understand it where is the person from the audio person is from the United States I was looking for like war zone or something like that but you get the idea you can upload audio and then ask questions about the audio here's something really interesting that's been circulating around the web lately is this video of a car on fire now the environment is the real world so everything you see the car cars the streets everything in the background this is actually real world but the car itself is actually computer generated so the car the fire the smoke this is all computer generated imagery you can see as they move this little pill-shaped thing through it the fire and the smoke is all impacted by it it's just so crazy because this is the type of thing that makes it harder and harder for people to actually believe their eyes when they see videos on the internet now do keep in mind this video has been circulating on Twitter and somebody actually wrote this video was generated in Unreal Engine it's crucial to understand what fifth generation Warfare looks like social engineering and misinformation is the name of the game but the creator of this actually said my work is being reposted on Twitter as misinformation by claiming that it is misinformation so do your own research they then wanted to say to give more context it is in fact not generated in Unreal Engine and not rendered in real time but the point still does stand that seeing stuff like this does make it pretty hard to believe our eyes now once again I don't necessarily know if this has anything to do with AI I don't know if any of this was actually AI generated I don't believe it was I just thought it was really cool and wanted to share it with you now check this out this is called 3D GPT procedural 3D modeling with large language models and this is a text to 3D scene generator so here's an example the desert an Endless Sea of shifting sand stretched to the Horizon its Rippling Dunes catching the golden rays of the setting sun creating an everchanging landscape of Shadows and Light or the Lake Serene and glassy mirrored the cloudless sky above reflecting the surrounding Mountains and the graceful flight of a heron as lily pads floated like Emerald Jewels upon its tranquil surface and from that prompt it generated this 3D scene blinding sunlight rains over the vast desert expanse casting sharp Shadows behind the few resilient trees small sand piles sculpted by the Relentless wind pepper the golden terrain if you look at it you basically enter a prompt it creates the scene converts it into python code and then the python code then becomes a 3D model in blender so these scenes that you're creating you should be able to pull into blender Unreal Engine Unity any of those tools and use these 3D scenes in your games or your video Creations or whatever you need 3D scenes for speaking of 3D scenes we also got this research called dreamspace Dreaming your room space with text driven panoramic texture propagation so with this you can actually film a real world scene you know walk around with a camera and film and it looks like it's creating a a Nerf or a gajin Splat or that sort of thing it then reconstructs the scene into 3D objects and then you can apply text prompts to change what that room looks like so you've got sci-fi theme here or Zelda theme here where it looks like you've got hyru out the window there but what's even cool is then they show off you can actually then look around in this room in Virtual Reality it looks like they're using a meta Quest or something here to look around this new room that they created with the prompt seeing through the Galaxy here's another example of here of somebody's like apartment or house or something you can see the 3D object scene that it created they applied The Prompt cyberpunk The Prompt nebula The Prompt anime landscape and the prompt Harry Potter and got completely different scenes out of each one here so next let's look at any portrait Gan animatable 3D portrait generation from 2D image collections so this is research where you can actually upload 2D images and then actually turn them into sort of 3D movable avatars where even their lips move and they talk and you can see them smiling and opening their mouths and things like that and I actually just find this animation here borderline hypnotic I hate to admit it but I stared at this for way longer than I um care to share with you but these are examples of these characters that are animated you can see them moving their head they're smiling they're looking around and the characters can actually be driven by real video so you can see a video of a real person talking here and then these animations are actually following the real person talking same with this video below somebody talking the animations follow along now let's talk about text to 3D because this is something that has made massive Leaps and Bounds recently this one's called GSG text to 3D using gajan splatting you can see all sorts of examples here where they were able to create these 3D objects using text prompts and gajan splatting which ever since gajan splatting came out I don't know four or six weeks ago it's created these massive leaps of what we can create with 3D scenes and 3D objects we've got a plate of delicious tacos a car made of sushi a furry Corgi a pineapple and all of these are pretty dang good looking here we can see it compared to previous models look at this Corgi and dream Fusion compared to their version this Panda and dream Fusion compared to their version obviously it's quite a bit better if you want to know how it actually works and you know how to interpret this here's a screenshot that shows how this works it's a little bit over my head so it's not something I can totally explain how it works but from my understanding you give it a text prompt it generates a 2D image then tries to generate a 3D Point cloud from that 2D image and then with that point Cloud uses gajin splatting to turn it into a 3D image but I don't totally know what I'm talking about honestly and then we have gajin dreamer fast generation from text to 3D gajin splatting with Point Cloud prior this sounds like it's using a very similar method to what we just looked at but they look even more detailed and this one I feel is a little bit easier to understand if we take a look at this image here the prompt was given a fox it uses a 3D diffusion model it generates a point Cloud if you remember Point e that was one of the earlier models that was doing text to 3D and they didn't look very good they created these sort of like Point clouds of the 3D object but they weren't very detailed it then takes this Point cloud and using gajan splatting fills in the details of the points and gets a much more clear slightly more realistic image here's an example of an axe where it starts with this point cloud and then I believe using gajan splatting it sort of filled in the details there here's some other examples here an airplane a dragon a flamethrower a magic dagger a mushroom boss a fox a banana a jellyfish lots of cool examples and the detail of these 3D Generations has just gotten so good compared to what we just had weeks ago it's it's mind-blowing honestly sticking with the theme of 3D generation this one's not using gajin splatting I don't think but it's called MV dream multi- view diffusion for 3D generation and here we can see an example of what the process looks like as these images are generated in 3D and it generates both the object but it also on this version puts a texture over the object here's some other examples here there's just the object untextured then obviously with the texture applied to it here's just a object of Gandalf and then if I apply the texture you get that and here's some comparisons against previous models dream Fusion magic 3D text to mesh prolific dreamer and then our model using the prompt an astronaut writing a horse and you can just see how far these text to 3D models have really come just in a short amount of time honestly and what I find really cool about this is you can actually train your own images into it using dream Booth so here in this example they trained images of a very specific dog into it and were able to generate 3D objects in different positions of the dog they trained into it so here's one of the dog sitting one of the dog jumping one of the dog on a rainbow carpet one of the dog sleeping and these were all generated from a trained in image of their dog so text to 3D blowing my mind right now how far it's come once again one of my sort of longer term goals is to try to create a game using something like Unreal Engine and now we're getting getting text to scene generation with that 3D GPT and we're getting really really good 3D objects with text to 3D and I'm really excited because creating game assets is going to be a lot easier with these tools creating game environments is going to get a lot easier with these AI tools I'm really excited with the pace at which all of this is moving obviously if you've been following along to my YouTube channel you're seeing the pace that all of this is accelerating at and it's just getting so exciting and so fun anyway that's all I got for you today I love making videos like this where I break down all of the advancements as they're happening and try my best to explain them even though I often don't even know what I'm talking about myself but it's fun to explore and see where things are heading and look at where the future of AI is going if you're just watching the news videos and just seeing what's available now you're only getting half the picture with videos like this you get to see where things are going next and get ahead of the curve and I love being ahead of the curve on this AI stuff and hopefully you do too so thank you so much for nerding out with me this was a total nerdfest for me and I'm excited to make more videos like this for you you haven't already check out future tools. this is where I curate all the cool AI tools that I come across all the latest AI news on a daily basis here and I've got a free newsletter you can click this button join the free newsletter and I will send you all of the coolest AI tools and all the latest AI news directly to your inbox you can find it all over at Future tools. IO so thank you so much for tuning in again and thank you to wirestock for sponsoring this video I really appreciate you guys and if you enjoyed this video maybe consider liking it and if you want to see more videos like this I'd love it if you'd subscribe to this channel it would really make me happy so thank you so much really appreciate you again I can't say it enough I'll see you guys in the next video [Music] bye-bye [Music]

Info

Channel: Matt Wolfe

Views: 126,506

Rating: undefined out of 5

Keywords: AI, Artificial Intelligence, FutureTools, Futurism, Machine Learning, Deep Learning, Future Tools, Matt Wolfe, AI News, AI Tools, generative art, ai art, ai video, text to video, 3d ai, ai video generator, ai video maker, text to video ai, ai video editing, text to video ai free, ai video creator, ai video editor, ai, matt wolfe, ai tools, ai voice, ai animation, hugging face, best ai tools, video ai

Id: Jv0G8lly1uk

Channel Id: undefined

Length: 26min 7sec (1567 seconds)

Published: Thu Oct 26 2023