How to Make AI Generated Art (With CLIP and VQGAN)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everybody it's brad from roboflow and today we're going to be talking about ai generated art if you've been around twitter in the last couple weeks you might have seen this exploding it's like a new genre that people have really uh started to experiment with and it's pretty neat what you can create the gist of it is that we're taking a generative model and we're able to steer it with language to produce art that is similar to what we want uh it's still kind of a exploratory process right now where you're crafting uh prompts and trying to you know steer it into something but you can't give it exactly uh what you want for example here this person told the the model to steer towards time and space come to their ultimate end and this is what the ai ended up coming out with um it's you know maybe not what i would have drawn but it definitely has kind of the themes of those sorts of things and there's all sorts of tricks that you can apply that people are discovering to make these sorts of models work a bit better and produce a bit more realistic or stylized art in the way that you want it to and that's part of the art is kind of choosing the prompts that convince the ai to do what you wanted to do uh so today we're gonna uh go through how this works and how you can create your own and uh i'll show you a little bit about what i've been working on uh and uh give you a hint of of what's to come so this is all inspired by a model that open ai came out with called dolly uh you can go to their blog openai.com blog slash doll dash e uh to read all about how it works but to give you an idea of it um we can kind of go in here and uh click on one of these um interactive widgets uh and you can see that they have this text prompt here a storefront that has the word open ai rented on it and each of these is a drop down list so we can test it out and see what the model comes out what if we say something else so instead of uh open ai let's say uh that we want the sign to say pi torch so if you switch that out you can see that now these storefronts say high torch and let's say instead of a storefront we want something else let's go scrabble letters and so now it says pie torch and scrabble letters uh let's say maybe you want to do something else let's do a t-shirt instead and you can see that it generates that instead um and so how this works in the background is uh it's using another model called clip which if you've been following our youtube channel or our blog which if you're not you definitely should uh you've probably seen us talk a lot about this this is the model that all the way back in january we use to create uh what we're calling um our ai sandwich game essentially it uses gp2 to generate a pictionary prompt a human will draw it and then cliff the same model behind dali uh scores it which is really neat uh because it basically puts the human in the place of the gan that we're about to try the the human is the one creating the art and the ai is the one judging it whereas with dali the ai is both drawing and judging its own art and so we've written pretty extensively about clip we've written about how to try it for image classification uh we've written about you know the behind the scenes of how the machine learning works we've written about using it for content moderation which we actually uh implemented inside of paint.wtf itself um we noticed that people were drawing inappropriate drawings and we figured hey yo clip knows the context and and the semantic meaning of images we can just tell hey if this is not safe for work or matches any of our categories of different not safer work things um we can flag those out and not show those in the ui and we've also talked about some of those magic words that you can use to get clip to perform better and so we're going to use some of these strategies to get our gan to steer in the right direction as well so i'll link to all these resources below if you want to check out all of our other clip content but for today what we're going to do is we're going to use a collab notebook that essentially puts us all together and lets you steer again um to create images that you want and this is what uh most of the people that you're going to see on that twitter account that retweets people is going to use it's called vqgan and it's being steered by clip so the the easiest way to get started that this is a collab notebook which i will also link below if you're not familiar with google colab it's a hosted jupiter notebook that connects to some free gpu resources on google's cloud so when you click the notebook below you're going to want to go to file and then save a copy in your own drive so that when you make changes those will be reflected and then the way that the notebook works is on the right here you have some cells which represent code that you can run and then on the left you have a file system of files that you can access and so you'll see mine won't look exactly like what you're going to see when you open up yours because i've already come through and run this a couple times and i'll show you a little bit about how that works and what i've been working on and a hint of the post that i'm working on of you know pro tips around this and how to create cool things and then also uh maybe making that ai sandwich a bit more into a dagwood sandwich with multiple layers um which i'm gonna create a follow-up video and post about later uh so be sure to like and subscribe below if you're interested in seeing that follow up um this uh collab notebook is a little bit weird because it's in spanish um but that's all right you really don't need to know anything about what this says if you want you can you know paste this into google translate and it'll give you the english version um but the gist of it is that uh you need to go into each of these cells uh and hit the run button or uh shift enter um to run them uh the first time you open it up uh and i guess every time you reconnect to a new collab instance so this one's installing the libraries which is going to install all your dependencies and then down here you can select a whole bunch of uh different machine learning models uh that'll download the weights for you to try these are some some different models trained on different public data sets so imagenet is generalized images wiki art is art photos from the public domain that are on wikipedia s flickr is from the flickr photo sharing service and these will produce produce different kinds of results so for instance uh if you want to create um you know something that looks like a photograph probably flickr or imagenet is going to look good if you want something that's a painting maybe wiki art or if you want to produce uh images of people celeb a hq is trained on celebrity faces um and so you can use that to um to steer your gan and so this is kind of the prior knowledge that it has of what an image should look like and what we're going to try and do is steer our way through its knowledge uh to have its output match um the text that we're gonna we're gonna um type and so uh basically the weights that you pick are all of the knowledge that your model has and we're trying to just unlock that knowledge and so if your uh model doesn't have information about the thing that you want it's gonna be tough to like find it because we're not inventing this information we're navigating towards information that's already there inside of the scan and so i'll show you a little bit about how that works i've only tried the imagenet and wiki art ones because i'm going to try and reproduce some of the prompts that we have on paint.wtf and see what a generative adversarial network would have drawn if it was a participant in our pictionary game that is scored by clip which is really nice because since clip is both steering this generation and also judging the game uh you should expect that uh this these models are gonna be really good at playing the game uh perhaps better than the humans because they're kind of on the same wavelength with the judge if you will so the next section is where you set up all of your different parameters here so the text is what you're trying to generate and if you search around twitter you'll find all sorts of cool tricks one that i thought was really neat is that if you add rendered and unreal engine it adds like a certain level of realism to it i've also seen people posting things like trending on r slash art or a couple other like ways to signify to the model what kind of output you want i think another one i saw was like amazing 4k wallpaper um to kind of get the the feel for it but you can also put things like in the style of you know the mona lisa or in the style of van gogh or you know painted with chalk or or colored pencils basically any words that you can come up with here um you can put different prompts separated by this pipe character which you can type by clicking your sp your shift key and the backslash on an english language keyboard at least and it's going to split each of these up into a separate target and then it's going to essentially on each step steer a little bit towards getting closer to that target uh and the the math behind that is pretty complicated fortunately to run the notebook you don't actually have to know any about anything about that it's all kind of encoded in the background um so here uh i'm trying to um on paint.wtf we have um a prompt that's uh a raccoon drawing a driving attractor here we go and here's the top human submissions um here of out of 3 000 people who drew a raccoon driving a tractor what clip thought were the closest matches to that and you can see the humans all actually pretty did uh like similar sort of drawings which is kind of interesting because as we're going to see in a moment uh the gan goes a bit off script and creates some weirdness um but i'm interested in seeing whether um what the gan creates is scored higher or lower than what the humans created uh and essentially gonna compare human output to computer output i haven't gotten quite that far yet but i'll show you how far i have gotten so far so on here i created raccoon driving a tractor and i specifically wanted to try and recreate this image that a user drew and so while i wanted it to steer towards the result i also wanted it to look like this picture and so i i described that pictures as best as i could with a cartoon raccoon driving a red tractor with big black tires in front of a blue sky and white fluffy clouds and then i want it to look uh more realistic so i said rendered an unreal engine and then you get to pick the output size um so width and height i did 512 by 512. i've tried a few different things here which i'll write a follow-up blog post about i found this to be a pretty good trade-off you can go smaller and it'll generate faster but i found that the smaller quality ones were pretty uh like were not as detailed and you can't one thing i tried to do was um you know get the basics of the outlines and have it you know go through and draw a 64 by 64 image and then use that as the input to draw 128 by 128 and then grow that which has been a strategy that people have used with gans in the past to make them bigger and that actually didn't work very well with this it's got kind of stuck in a rut and it produced less good results than if i just started at the output that i that i wanted to and then here is where you can select uh those models that you could show choose from above right now we're on the imagenet one um and then how often you want to print a sample image um on my collab notebook uh i have a p 100 linked to this right now your mileage may vary collab kind of assigns the gpus randomly to to people um but on this it was producing about uh at 512 by 512. i think it was about two iterations uh per second so this is printing out about one image every 30 seconds so you might want it to be a little bit faster or slower it really doesn't matter that much because it'll save them all to this file system this is just you know how you get a heads up of what's going on just as long as you're not printing out so many images that you crash your browser you're probably fine here if you want to start from like an initial target image you can do that if you don't put anything here it'll start from random which we'll see below is what i did i also experimented with actually starting with this output image from paint.wtf and i can show you a bit about how that differs if you start from random weights or you start from a starting image and and have it shoot towards the same thing um and i'm not sure what this one does i'm gonna experiment with that here if you already did something before there's a random seed that's going to tell the random number generator what state to be in you can take the random seed that you saved from before and put it here and it'll re-run the exact same thing uh so if you want to like take one seed and you know change which direction it goes from there this is essentially like your starting point but in the neural networks uh space versus your starting point as the image space uh and then max iterations if you leave this at negative one it'll just keep running until you hit the stop button uh but if you want it to only run a certain amount of time and then stop you can put that in here so once you're done with all those you hit the plus button here and then you come down to the next frame and those are what's gonna load up what's going on here so here you see this like split those up but with the pipe characters so i have four different targets that it's or three different targets that it's going to i didn't pick a random seed so it gave me a random one this is the number of pixels in the image so the the shape of the neural network that it did um and then it's using my uh my loss function here with the vgg weight model weights and then it pre-filled it with the checkpoint that i said i wanted and so it's going to start with the image net weights on this model and this starting point and shoot towards these and so once you hit play uh then it will start making the progress bar go up i guess this was actually down at one iteration per second not two like i said earlier um it depends on your output size uh what and in your gpu how quickly it's going to go so you can see it started out with completely random uh a completely random image this is you know not recognizable at all as a raccoon driving a tractor but even after 50 iterations you can start to see you know it's starting to put separate the sky from the ground and maybe has a little bit of grass here and there's some sort of subject and then 50 iterations later you can start to see i can maybe see that's a bit of a tractor and as we start going forward it keeps zoning in so you're starting to see sky here and whatever these red things were starting to fall away i'm starting to see a bit of a raccoon here and maybe a raccoon tail here here we're getting even better um it kind of like just is a bit abstract it's not entirely clear why it does what it does and every time you hit go it'll generate a new sort of thing but then i just let it run and i actually let this run for a long time this was an experiment that i did to see you know what happens if you let it just keep going does it keep getting better does it stall out somewhere and it the interesting thing is that it really like set this uh setting like the the composition of this very early and then stuck with it so you can see um at the beginning we had those like red like streaks in the sky even here after 2 850 iterations that the the echoes of that are still here and you know it had that little black thing in the middle that is now turning into a raccoon on a tractor but it's really doing refinement even by this point not um like big changes so i let this run for six hours um scroll all the way down to the bottom and see where we ended up at and then the cool thing is that at the end you can actually it's saving each one of these images and you can create a time lapse so i'll show you what the time lapse of this was over the course of those six hours all condensed to just a couple seconds um so we are all the way at the end here i let it run for over 20 000 iterations um and you can probably see as i've been scrolling that they all are kind of looking the same but the interesting thing is that as it got better it got more and more detail around kind of the the raccoon's nose and eyes uh and this tractor now has very well defined wheels um this for a while was like a very like cool looking raccoon tail looks like it it made that fade away this thing over here uh really changed over time at at the beginning it kind of looked like a mahogany bookshelf then it looked like pipes and then it kind of looked like a toilet and now it's actually kind of start i can maybe see how that's like supposed to be a tractor with like exhaust pipes or something um but maybe you have to use your imagination just a little bit there so then when you're done you can run this script that will generate a video and then you can visualize it so this is a past draw and you can see this has nothing to do with the one that we just did but still loaded in my browser so i can show you how it works so this was a different prompt and a different starting point for raccoon driving a tractor you can see it's kind of getting the raccoon pattern at the top kind of the tractor at the bottom and over time it just like hones in on what it's going after and it really depends on what your prompt is starting point and you know how lucky you are because those random weights kind of set things into motion um so basically you want to try a bunch of different things i have some uh ideas for maybe how to make this process a bit more efficient that i can share later um and then down here i linked it to my google drive this isn't in the notebook but you can google how to link your drive and then i copied over that video over there and then put it on youtube um so i will now show you kind of the the time lapse of that one um that we just went through that the six hour run um i have it open over here i actually listed it unlisted on youtube so i'll put this in the comments below also but over the course of six hours you can see it was just kind of like playing around with uh what what the tractor should look like honing in on um the raccoon's face and eyes and then this thing was kind of transforming and the the red wall behind it was receding a little bit you can see down here the the raccoon tail was there at this point and eventually recedes i'm not really sure what this like white and uh blue thing that stayed down here the whole time was supposed to be i think it might have been like a first-person view of a raccoon driving a tractor uh where we're supposed to be sitting in a cab looking at another raccoon driving tractor um it's a little hard to like get in the mind of what the what the machine learning model is thinking um but you can kind of see that it steers towards the direction that you're looking to go um so yeah that's kind of a high level overview again check out the uh description below for links to all these resources and be sure to look at blog.roboflow.com in the coming days i have a whole bunch of tests that i've run and i'll post my results of what worked best and what strategies i used and comparisons with human drawings on paint.wtf so until next time uh i hope that you create some really cool art if you do link to it in the comments below and be sure to tag it with clip plus vqgan uh that there's a hashtag going on on twitter and then we can all kind of see what each other make and if you find something neat like that unreal engine trick uh or anything else uh be sure you're sure to share it we can all kind of help discover what those sorts of magical key phrases are together uh so anyway have a good one see y'all

Info

Channel: Roboflow

Views: 110,135

Rating: undefined out of 5

Keywords:

Id: LM8dil6n5h8

Channel Id: undefined

Length: 20min 54sec (1254 seconds)

Published: Mon Jul 19 2021