AnyText - Easy Text Generation Using Stable Diffusion!

Video Statistics and Information

Video

Captions Word Cloud

Captions

trying to generate images with text in them using stable diffusion typically produces garbled words um if indeed you do get any text at all I mean I really want these cats to smell of milk written on the cake but all I'm getting is cake however with the release of any text we can now not only make sure the text actually appears but also reduce how garbled it is here I've got a prompt of a delicate Square cake mixed berries and cats smell of milk down here I've been able to position where the text goes so I've got the first word the second and the third there and as you can see from the generations we now have some lovely cakes telling us that cats do indeed smell which they do of milk because I've asked for a cake the other thing you can see it's done there is do sort of cake style lettering which is very nice but generating text in images isn't the only thing you can do because you can also edit the text here the original image has I heart my mommy obviously mommy isn't a word in British English so you'd want to replace it with something that makes a lot more sense just like I have done here characters written in chalk on the Blackboard that say rodent and now you got the result I heart my rodent I particularly like the last one there that seems to have kept the style pretty well and of course you can do that up in the prompt there I've got it chalk on blackboard however it doesn't actually need to be specified in the prompt because it will still keep the style pretty well for example here the original image says love you more on a sort of cake topper once again I've improved on the original text here but this time I'm just using a single word as my prompt rodents and like you can see in the four examples even though I've not specified it it should be a sort of red plastic cake topper it's gone ahead and done that anyway pretty cool the GitHub page has loads of other examples in the gallery things you can see here coffee art coins graffiti childlike drawings got Handbags and all sorts of other things there hopefully that should give you some inspiration and a bit of an idea about what this can do but I will generate some more images in just a moment like they show there it handles both Chinese and English characters plus if we scroll down a little bit there we've got an evaluation showing the things they tested it against and of course this beats them all installation is just a case of running the installer from my patreon or by following the instructions they provide and even better still if we scroll up here we can see they've released it under an Apache 2.0 license meaning you get all that lovely open-source Freedom the model itself is based off stable diffusion 1.5 which opens up the possibility of this being something we could use in both automatic 1111 or comfy UI now originally this did use 20 gig of vram but as of a few moments ago the default is now to use fp16 which cuts that in half down to just 10 gig additionally if you're not using Chinese characters then you won't need to load the translator either further dropping those vram requirements down to a mere 8 gig for generation of a 512x 512 image through my extremely thorough and extensive testing I have discovered that it's not all sunshine and Roses because some words just don't work or look quite right all the time but it's certainly better than a plain stable diffusion model of course you could use control Nets and things to help out with that but they can be pretty rigid all right let's dive into this interface and see what they provide up the top here we've got an instructions panel which is particularly useful if you've not used this sort of thing before however if you are used to gradio interfaces and stable diffusion it will be pretty intuitive if we close that down of course we've got another panel here with parameters this one gives you all the usual goodies image count number of steps resolution seed and all that sort of stuff for me I found the defaults to be fine but if you're the sort of person that likes to get into those finer controls then at least you get the option to do so you get two tabs one for text generation and another one for editing I'm going to start out with the text generation and this is where the whole image is generated by stable diffusion so your prompts are really essential here the text Parts you want should be quot oted and you can pick where they go and in what order so there you've got manual drawer manual rectangle or an automatic option as well and there's the positioning either horizontal or vertical if we pick one of their examples here we've got a photo of a caramel macchiato coffee on the table top down perspective with the text any and text so you see that's the first word in the first quote and text is the second word that means you got the two boxes there by click run using their default example parameters we should in just a couple of seconds get four Images out there we go let's close that one down so we can see all four at once and there we have any text in a caramel macchiato style I particularly like the last one there which looks delicious okay you can of course edit this any way you like now this text doesn't include rodents so we need to resolve that and we've also got three words there as well so if I try and run that oh what's gone wrong error found two positions and I need three from the prompts so make sure you use the correct number of masks and a reasonable size for each word as well okay so now I've selected three areas that error should no longer appear and just for Giggles let's also go into these parameters and pick a random seed change some other stuff make sure we're not getting cherry pick results you can drag that slider all the way down to minus one there to get some randomization okay now if I run this through here you can see that it definitely isn't perfect every time the first one and the last one are okay but let's do another generation see what happens here will the letters still be a little bit squiffy yes they will but often it is very good there we've got one rodents a cool that's quite nice but yes those other ones little bit random just a really really emphasize that letter placement is quite important and we'll obviously change that image quite a lot I'll uh I'll switch it to manual rectangles this time and we'll rerun that now this is quite blocky obviously so what is the result going to be I'll give you a clue it's going to be fairly blocky so as these are squares we've now got this very lined up text which is nice because that may be what you want perhaps you're doing something like a sign or some other image that would benefit it from quite straight writing like that also in this case you can see the text is a lot more clear so it really does depend quite a lot both on your prompt and the masking you do as to the quality of the output which you're going to get the other option you can use is Auto randomized and as you can see in this case once again it's done the text very well but it hasn't positioned it exactly where I want and also in this attempt it's like no I can't actually autop position it so sometimes it really will just fail overall I'd really suggest doing the manual drawing as that gives you the best control over things like latte writing unless you want that square sign and you'd use the manual rectangle as mentioned that isn't the only thing it can do because there is also text editing over in this tab they give us a selection of results as well so there you can see where I changed it from Daddy to rodents and we'll generate another four Images there just to show you this result and as we saw earlier that is pretty cool now one thing with their examples is you can't actually edit the mask they they've sort of baked it in there but that doesn't really matter because if you pick your own image then you can this time I've gone with a welcome sign and obviously welcome isn't a word we want we want that to say rodents instead so let's generate another few of those see what happens and there we've now got our rodents signs once again we've got a one there at the bottom which doesn't quite say rodents but these are quite good that is wobbly that's what I asked for it's kind of metal and of course you can keep generating to your heart's content until you get something that works for you one thing I do like is how it's kept the shadow in there which goes across one of the letters like in the original so that's pretty nice I was kind of expecting it to cut that out but it's melded it in nicely as is the way just as I'd finished my first take on this video I discovered a work in progress version of any text for comfy UI though I've not found one as yet for automatic 1111 it really is a work in progress right at the moment though and as such I've not actually been able to generate anything using it so perhaps while you're waiting you could watch another nerdy rodent video

Info

Channel: Nerdy Rodent

Views: 8,928

Rating: undefined out of 5

Keywords: AnyText, Any Text, anytext

Id: hrk_b_CQ36M

Channel Id: undefined

Length: 10min 3sec (603 seconds)

Published: Fri Jan 05 2024