I tried to build a ML Text to Image App with Stable Diffusion in 15 Minutes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you ever wanted to get your hands on one of the most expensive and one of the most interesting deep learning models of our time well that's exactly what we're going to be doing in this episode we're going to be taking a look at stable diffusion what's happening guys welcome to another episode of Code that where I try to build stuff in a ridiculously short time frame in this episode we are going to be building our very own text to image generation app using stable diffusion and P kinter this is going to allow you to type in a prompt into a text box and have an image generated using machine learning or if you want to call it AI but before we do that we need to appreciate the rules of it first and foremost I'm not going to be able to look at any pre-existing code or doco or statco if I do it's a one minute time penalty that brings us to the time limit for this particular code that episode we are going to have a 15 minute time limit which could get interesting as it usually does but last but not least the Casio component if I fail to make that time limit it's going to be a 50 Amazon gift card to you guys all right ready to do it let's get to it alrighty guys ready to do it 15 minutes on the clock let's go okay so first things first we need to create a new file and we're just going to call that app.pi and then we've got to import a bunch of dependencies so first up we need to import tkinta as TK and then we're going to import custom tkinter I'm desperate to achieve this time limit for once uh we're going to import that as ctk and then we are going to import a lot uh so from kilo we're going to import image TK so this is just going to help us render our image from stable diffusion back to our application then we need to bring in our auth token so this is an auth token from hugging face so you're able to get it from your own free account so from auth token import auth underscore oop let's zoom in or token and then what do we need we need a bunch of stuff from pie torch they're from uh we need pie torch the import Torch from torch import to Auto cast and then we need to import the stable diffusion pipeline so from diffusers import stable diffusion pipeline I think that's it and then all right so then we need to create our app create yep so we are going to go app equals tk.tk and then app.geometry so this is going to set the size of our actual app so we're going to set it by 532 by I think of 622 was ideal and then app the title is going to be equal to uh stable bud you can name it really whatever you want um and then and then we need to change the color theme so CK dot set appearance mode oh no ttk dtk dot set appearance mode we're going to set that equal to dark okay and then what do we need to do so then we can run this app just by running app.main Loop we don't need to run that yet because we are we're not going to have anything in it but if we run it let's just try uh so python app dot Pi let me just double check my head's not blocking that so we should get a little pop-up [Music] come on come on come on come on all right cool so you can see that that's our template right then we need to add in a somewhere to enter a prompt so we're going to use the entry field so prompt equals CK ctk Dot ctk entry and then we are going to set the height equal to 40 the width equal to 512 so we'll set it to the width of the entire app or with 10 a margin on the side um so font actually text font equals to Ariel 20 and then we need text color so this is just going to set the color of our prompt box so text color we're going to set that equal to Black and then we want to set the color of the actual box so FG color is going to equal White and then we need a place that's a prompt.place equals X is going to be 10 and then Y is going to equal 10 as well so it's going to be a little bit further down okay so if we run that 11 minutes okay we're good all right so you can see we've got a box here we can type stuff into it we're looking good right then what we need to do is we need a placeholder for where our image is going to go so we're going to create a frame uh we'll call it l Main equals c t k Dot uh label C tick come on ctk label all right and then uh we can set that equal to height AG crazy when you there's a time limit Tire coding is just insane uh height and width is going to equal 512 by 512 because that's what our stable diffusion model is going to return back to us and then l Main dot place x equals uh all right so we need to work this out so let's just put it at x equals 10 and then I think y we're going to set that to 110 for now but we'll see how it looks once we actually render an image okay so that is our placeholder for our image then what we need to do is create a button so it's going to be button and we'll call it trigger equals ctk dot uh ctk button and then height is going to equal 40 width is going to equal uh 120 then we can probably copy the rest of this let me hide this side bit because we don't need that uh all right so then we can probably copy this okay so text one equals Ariel we want our color here to be white we want our button to be blue wait hold on no I've just gone and applied that to up label wrong this needs to go down here all right that's fine that's fine textile is going to be this color is going to be white FG color is going to be blue so that means our button will be blue and then we need to place it to trigger we've got place and X is going to equal 10 so if we've got a height of 40 for our prompt box plus 10 so if we put in a bit of a gap let's say y equals 60 that should be okay so let's just run this again let's see what that looks like oh we need it centered um so uh where's my calculator so if our box is 532 divided by two uh minus 60 because that's half the width of the button so we want a 206 I think x equals 206. let's just close that up run it let's just make sure that's centered all right that's looking way better we need to uh so we need to configure the texture trigger the configure text equals generate close that down how we looking all right that looks way better all right so we've got we can type in stuff we can hit generate right now we don't have anything generating uh but we'll get there so we are now going to create a function So Def generate pass for now and then we're going to pass that function to this button here my head's covering that the command is going to equal generate um and that and that is fine for our button and then we need to do some magic stable diffusion stuff so now first up we need to specify a model so model ID equals comp VIs forward slash stable Dash diffusion Dash V1 Dash four so there's a bunch of different models that you can actually try out then we're going to create a pipeline and we're going to set that equal to stable diffusion pipeline Dot from underscore pre-trained pass through the model ID seven minutes we can make this guys we can make it okay so model ID um oh Lord no I've thrown uh it's a revision so the beauty of it is that if you load this particular revision you can um load it into a GPU with four gigabytes of vram so you should be okay um I think it's for fp16 and then you need to pass through torch underscore D type or C types I think that's it equals uh porch.load 16. and then we also need to specify the token so use underscore auth underscore token we're going to set that equal to this token that we just imported up here and that's available from um the hugging face website so you can go into settings and generate that then we need to send our pipe to our GPU so pipe two and then we're just going to say Cuda you can actually create a new variable so device equals Cuda and then pipe to device cool so that is our model loaded and then we need to use it over here so we need to then go and create an image um so with AutoCast device so this is going to send it to our GPU we then go pipe image maybe I was too confident guys five and then we need to get the prompt so the prompt is to get the prompt we can type in prompt dot get I have to get come on uh and then we need to specify guidance scale so the guidance scale is how closely we want stable diffusion to follow what we've written in the prompt so I think the higher the value the more strict it's going to try to build that image the lower the value that more flimsy it's going to be probably better time for that um okay so then we need to pass through samples so we're going to extract the sample out of that and then we need to extract the image okay um and then keep saying um then create the image or image it equals image TK dot photo image I think it's that yes and then pass through image and then we can go and set our frame so over here we need to configure that that l Main section so then but first up let's say that image so we can type in image.save and we'll just save it as generated image dot PNG so this way you can actually bring it up and use it wherever you want then to redo our image.l main Dot our main dot uh configure image equals image that looks okay save let's run it takes a little bit of time to start [Music] I'm gonna pause it because I'm starting all right we've got four minutes 50 on the clock come on guys we can make it so this looks good this looks good all right so we've opened Let's test it out I'm just gonna hit start let's go um as space trip landing on Mars okay byres Lucian so if it works okay we'll start to see stuff happening down here you'll actually start to see it generating you'll get a bit of it like a progress bar first time you do run it does take a little bit of time that looks okay [Music] so you can see the progress fires popped up oh no [Music] oh no we've used all the memory [Music] no no no no no oh no the revisioning because fp16 this should work we might be out of memory because of the GPU torture float 16 that should be fine there's a torch D type or torch D types oh no all right so revisioning because I think it's torch D type I don't know let's try that how much GPU is being used should be fine the only thing is that I can think that maybe it's either torch underscore D type or torch D types come on come on p16 I'm pretty sure it's that uh come on I don't want to make the time limit for once [Music] all right it's up at a space trip landing on Mars okay hi Rez come on come on come on come on oh there we go it was torch D type yes come on uh no photo image has no attribute save hold on no no we've got an arrow good in there uh so this should be image.save throw that up there come on come on make this we can make this we can make this all right restart so I think it was it was meant to be torture underscore D type not D type this is not plural singular come on come on Startup three minutes come on [Music] all right uh a space space ship landing on on Mars okay Iris all right so that's generating that's what my GPU looks like it's okay that's actually kind of good each speed I've got a 2070 something one minute 32 left on the clock guys come on your boys done it all right hold on there's one thing so we need a little bit of a border at the bottom I'm not happy with that size so uh if I just make this 632 I think we'll have a little bit of a border at the bottom sorry my everyday is kicking in has to be good I mean I'm 14. come on we can we can perfect this come on come on just taking a ton of time to start up I mean it is a pretty hardcore model all right uh space ship landing on Mars okay Ires full generating generating 136 seconds guys 28 seconds left on the clock we did it so you can obviously generate other stuff as well right so let's let's let the time so I could say uh Rick and Morty uh planning a space Heist and you can see it's going to generate again boom take a look at that Rick and Morty planning as face ties that's that time up guys we've managed to make this one for once yeah boy did it come on you gotta give him props for that but take a look at this so like it obviously allows you to build and leverage some of the most state of the art deep learning models that are out there stable diffusion is absolutely amazing guys and it's obviously a free alternative to something like Dali 2 and it's super great now that other thing as well is that you can see it's also we've saved out our image so by writing uh image.save over here you're able to grab that image if you wanted to go and send it to someone or if you wanted to show somebody you're absolutely amazing artwork um if I try another one so a 3D 3D uh Charizard realistic 3D Charizard in the forest realistic uh 4K high resolution you can obviously generate a ton of stuff the beautiful thing about this is that the guys have made it open source you can go and use it you can go and try it out yourself personally think it's absolutely amazing look at my high resolution Charizard slightly on fire but there's an absolute ton of stuff that you can do you can um Board of the Rings insane fighting Orcs from movie okay there's also a website called prompt hero I think it's called prompt hero where you can go and find a bunch of prompts and test those out so if you wanted to go and see what's actually possible with them or what's actually possible with these Marvel take a look at that that actually looks like it's from the movie that's absolutely amazing anyway guys you've seen it you've been able to build this stable diffusion app you can give it a crack I'm going to link all the code in the comments below I'll catch you in the next one herpes hopefully you've enjoyed this episode of Code that if you have be sure to give it a big thumbs up hit subscribe and tick that Bell it really does make a difference and we're on the road to hitting a hundred thousand subscribers I thank you all so much for all of your support thanks again for tuning in I'll catch you in the next one peace foreign
Info
Channel: Nicholas Renotte
Views: 60,852
Rating: undefined out of 5
Keywords: python, machine learning, python app, tkinter
Id: 7xc0Fs3fpCg
Channel Id: undefined
Length: 18min 43sec (1123 seconds)
Published: Tue Sep 20 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.