Parseq tutorial 1: Fine grained control of Stable Diffusion and Deforum – Keyframing & Interpolation

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hi everyone, welcome to the first Parseq tutorial video. Parseq is a tool to help you create AI generated animations like the ones you're seeing at the moment. More specifically Parseq is a parameter sequencer that integrates with the deforum extension for the automatic1111 UI for Stable Diffusion. It gives you a lot of control over the parameters you can feed into the AI image generation process, enabling you to create detailed changes in camera position, prompts and other parameters, as well as helping you synchronize the animation to music. The interface can be a little intimidating at first, but I'm hoping to show you it's quite straightforward once you get the hang of it. I'm the author of Parseq so if you have any feedback or suggestions, don't hesitate to get in touch in the comments Before we get started, there's a few things you need in place firstly make sure you can browse to “sd-parseq.web.app”, and that you can see something similar to this. I'm also assuming you have access to a working version of the automatic UI – either locally or via collab, it doesn't really matter which – and finally in that automatic UI, you have the Deforum extension installed, so you can see that Deforum, and make sure you can find the Parseq section somewhere – it should be under Init, right at the bottom of the screen And this is the video we're going to create today. It's a simple animation with a single prompt and a few camera movements. The result is pretty simple and something you could do without Parseq, but following through with this will help you get familiar with the basic concepts of Parseq so you can go on to do more complex things. In subsequent tutorials we'll look at basic beat synchronization, more advanced prompt manipulation, and advanced audio synchronization using the audio analyzer. Let's take a little tour of the Parseq UI. Most importantly up here if you ever want to buy me a coffee there's a button right here to do that :). Shout out to the 3 or 4 people who've done that already, thank you very much. There's also a link to the docs here which are quite complete and a good reference once you've got to grips with the basics. Over here at the top left is a status view that shows you the current status of Parseq’s processing. As you'd expect green means good. And there's also some links to some more advanced parts of Parseq which we'll cover in future videos. The top right section up here is a document manager. This is where you can do things like revert to a previous version of the document you're working on. You can load other documents you've worked on on this system; everything is stored in browser local storage, unless you explicitly upload and export things. And you can also load Parseq documents that have been been shared with you. And finally you can share your own documents you can either upload to a URL, which you can share with people, or you can just copy and paste the keyframe definitions that people will be able to load into Parseq using the the load dialog that we saw previously. Just below that is a document title which you can edit. Parseq will pick a name for you by default but you can change it to whatever you like. Below that is the prompt section, where you can put in any prompt that works with automatic1111 including weights, composable diffusion, and so on. Parseq also allows you to do stuff with Parseq formulae you can put those in the prompt. We'll talk more about that in a future video, but that gives you a lot of control over the way the prompt will evolve over the course of the video. Next, we get into the real heart of Parseq, which is this grid here. And this is where you define the keyframes that will result in parameter changes over the course of the video. Each row is a keyframe; each pair of columns represents a parameter or a field. “Parameter” or “field” are terms I use interchangeably for these values here. We'll learn why each field has two columns soon, but in short one of them defines a value at that keyframe, and the other defines the interpolation, which is basically a mathematical formula of describing how the value should evolve between keyframes. Below that is a graph showing a visualization of the parameters. The nodes represent keyframes, and between the keyframes are the actual rendered values that will be used at each each frame between the keyframes. You can edit keyframes directly on the on the graph, you can add keyframes as well, and you can also remove keyframes by shift clicking. The instructions are right here. In some cases editing the nodes directly on the graph won't do exactly what you expect because the value of the keyframe is actually being influenced by the formula, not just the position of the node, but that'll make more sense as you use Parseq, and get to grips with it. You can choose to show either absolute values, or a percentage of the maximum value. That's useful if you're trying to compare values that have very very different absolute values. You can also show and hide fields by clicking on the sparklines here, which is handy. The sparklines give you an overview of how the different parameters change. By default the ones that are completely flat and don't have any variations are hidden, so you can expand out to see all of the parameters by ticking that box. And then finally at the bottom is the rendered output. This is the actual JSON data structure that will be used by Deforum while it's rendering, to actually get the actual values for each of those parameters during the rendering process. You don't really need to understand what's going on here, but you might find it's useful in the future as you're trying to debug or find out exactly why something is the way it is, or check that your prompt is the way it should be, There's also another copy of the status screen here, so you don't have to keep scrolling up and down… um… I will probably move that somewhere more sensible, like a fixed footer, in the future. Alright, that's a whirlwind tour of the Parseq interface. Don't worry if that doesn't all make sense quite yet, or there’s things that you won't remember: that's fine, you'll get familiar with it as you use Parseq. Okay let's get started creating our video. The first thing I'm going to do is start from a blank slate. When you open Parseq, the default document already has quite a lot of content. It's intended to give you an idea of what's possible with Parseq, but today I'm going to nuke all that and start from scratch. There's a few ways of deleting everything and soon I'll add a button to make it really trivial, but what I'm going to do right now is import a blank document which I'll share with you in the comments. It's conveniently in my clipboard so I'm just pasting it and importing it. I'll just tweak the name. And you can see the document is pretty much empty: the prompts are blank, I have just two keyframes at frame 0 and 120 – these are the first and last frames of the video. And the only field that's set is the seed at -1. Everything else will just use default values. Next we need a prompt. I wouldn't recommend figuring out your initial prompt in Parseq. That's why I've switched over to the Stable Diffusion UI. Finding a good prompt here and coming back to Parseq when we're happy with it is going to be much easier. Alright, so here's my prompt: “an angry frustrated shouting crying trendy man sitting at a desk desperately trying to produce pretty animations on a computer”, with a bunch of qualifiers describing how I'd like it to be, and then I've got a pretty standard negative prompt with some magic incantations I don't really believe in, but I chuck in anyway. And I'm going to use the Deliberate model today, you can choose any model you like. So I'll just switch over to that and let's see what this generates. Okay that's good enough for me, it's pretty cool, there's some trademark messed up hands which is fine. I'm happy with that, so I'm gonna go over and copy my positive and negative prompts into Parseq. Now, let's take a look at our grid. For the purpose of this exercise the only things I'm going to be interested in are: the seed, the strength, and… let's do a 3D zoom which is a translation on the z-axis. For the values let's keep -1 for the seed, set on frame 0. there's no other value set on the last keyframe for the seed, and nothing in the interpolation column, which means -1 will be held for every frame. This will result in a random seed for each frame, which behaves just like in straight-up a1111, because the -1 value is passed directly to the underlying engine. Strength in the Forum is a value between 0 and 1 that influences how much the previously generated frame impacts the current frame. So if you want some kind of consistency you need a strength that's reasonably high. But if you go too high you won't get any variation or you end up with artifacts. If you go too low you'll get too much variation frame-to-frame. I'm going to go with a fixed value of 0.6. So I just enter that value on frame 0 and make sure nothing else is set for strength All set for our first try. We'll come back to Z translation later. Now “render on every edit” is ticked by default. This means that whenever I edit the grid, the Parseq output is calculated automatically and the graph is drawn. I can see in my graph my constant value of -1 and 0.6 for both the parameters I care about, and the default value of 10 for Z translation. All the way at the bottom is the output Parseq generates. We'll just copy this into Deforum for now. There's a quicker way to do this, so you don't have to keep copying and pasting back and forth whenever you make changes in Parseq, and I'll show you that soon, but for now let's just copy and paste. Now that we're in the deform UI there's 3 important things we need to take care of before we even think of pasting our Parseq manifest in. Firstly let's make sure the frame rate that Deforum will use matches what we've set in Parseq. If you don't do this you'll end up with out of sync results. I've got 20 frames per second in both places. Secondly let's make sure the total number of frames Deforum will generate matches what we've defined in Parseq. Technically I have 121 frames in Parseq because I've got from frame 0 to 120, and Deforum is only going to generate 120, but that's okay I don't mind losing the last frame. Lastly make sure you set the correct animation mode. I don't really care right now because I'm not doing any movement, but when we start playing with Z translation we'll switch this to 3D else we won't see the effect of the parameter change. Now that we've checked all that we're ready to paste our Parseq manifest into the Deforum extension UI, and we can hit generate. And I'm getting what I expect: a desperate looking digital artist changing on every frame because of the random seed, but with some influence from the previous image to create a little bit of consistency frame-to-frame. Great that's finished generating. Now let's take a look at what we've got. And so we have our suitably frustrated person who's struggling to generate some pretty animations on his computer. I know how he feels. A few months ago this animation would have been pretty mind-blowing. Nowadays it's not all that special. So let's see what else we can do. There's no escaping the fact that AI image generation involves a lot of trial and error; therefore, it's really important to be able to control exactly what is changing each time you try something new. Currently we're using random seeds so we'll get different images on every attempt, even if we don't change anything. This will make it really hard to compare our efforts as we play with other parameters. Let's look at how we can avoid this problem. Let's take the random seed that was used for the first frame and paste that into Parseq instead of -1. This will result in a fixed seed for the whole 120 frames. Unfortunately if you try to generate an animation with a constant seed in Stable Diffusion and you're using a strength greater than 0 or 0.1, the results will be really really messy. We can quickly generate this and see how bad it's going to look. Um… I don't feel like copying my Parseq manifest around anymore, so I'm going to upload the output. This is something you can only do once you've logged in to Parseq. I'm also going to tick “upload on every render”, so every change I make gets uploaded to the same URL. Now I can copy and paste the link that I get here into Deforum instead of the Parseq manifes. JSON object, and Deforum will pick up my Parseq changes every time. All right let's generate this, and pretty quickly we can see it's degrading. The input isn't varying enough so the Stable Diffusion artifacts are being accentuated on every frame. People call this kind of result overcooked or deep-fried or spaghetti like. Let's interrupt that because it's rubbish, and see how we can get the seeds to change on every frame in a predictable way in Parseq. Let's say we want to start with seed 10. I can put that on my first keyframe. To make the value increase by one on every frame I just need to put 130 as the value in the last keyframe. By default Parseq will interpolate linearly between the keyframes. There's an easier way to do this which I'll show you once we've talked about interpolation functions, but this will do for now. Now we can go and render that without needing to copy and paste anything, Deform will pick up the changes automatically. And we now have a video with a predictable non-random seed that changes on every frame, but will yield the same seed every time we do that generation. Okay now it's time to work on that Zoom effect or Z translation. Just like we did for the seed, we can put in an increased value, say 128, for the Z translation on the final keyframe. Because we're uploading all changes we can flip right over to the UI and hit generate, and as expected we've got a linear zoom in the resulting animation. Well that's a bit boring – how about we make the zoom bounce around a bit. One way to do that is to add more keyframes. There's a few ways to add more keyframes, but the most intuitive is just to double click on the graph. Then you can find the node that corresponds to Z translation – and sorry the color is kind of hard to see, that'll be fixed soon – but once you've found it you can drag it around. I also want to zoom out so I'll just edit the value manually in the grid and make it negative… and now let's add a few more nodes in the graph An interesting side-note while we're here is that Parseq tries hard to let you work with absolute values. If you're using Deforum directly, you might be used to using relative values for some properties, for example if you specify a value of 1 on a rotation with Deforum that means move by one degree on each frame. In Parseq it would mean move by one degree on *that* frame, and then stay at that position. So when you're using Parseq you have to think in terms of absolute values you want to land on in each frame, not the relative change you want to put in. There are ways of changing this which we'll talk about in future videos. Okay let's give that a go. I always find the units for Z translation to be a bit confusing so I'm not sure how far we actually zoom in and out, but let's give it a go… okay that's our video, that's quite fun. Now we're currently using Parseq's default way of traveling between keyframes which is linear interpolation. One of the Tool's main strengths is giving you a lot of control over that interpolation algorithm. So let's see what else we can do. As you might have guessed we're going to start modifying the interpolation column for Z translation. I can enter C (upper case) here for “cubic spline” on frame 0, and that will carry to all keyframes for this field and we immediately see the variation becoming smoother. This is just the beginning of what's possible: you can use all sorts of mathematical expressions here, conditionals, built-in functions for sine waves, pulse waves, Bezier curves – check the docs for the full range of possibilities that are available today. But for now let's just see what it looks like with a cubic spline interpolation. Okay, that’s finished rendering. Let's just remind ourselves of what it looked like with linear interpolation from our previous generation, and you can kind of feel those harsh angles. If we load the latest render with cubic spline it immediately feels much smoother and bouncier as you'd expect given the curves. There’s a much simpler interpolation which we refer to just with S (upper case): it just takes the values seen at the last keyframe and sticks to it as we can see here in the graph. Let's go and set our seed. That might seem like a mistake because we know fixed seeds are bad unless strength is extremely low, but here's a trick: there's a variable called “f” (lower case) that we can reference in our interpolation function that simply represents the frame number. We can also enter arbitrary mathematical expressions here, so “S+f” will start from the value of the keyframe and then add on the frame number at the current frame. The result is a nice linearly increasing seed. And now if we want to change the starting seed, we only need to update one keyframe. So you might be starting to get a feel for what you can do with interpolation functions. Another tip is that there's nothing stopping you changing the interpolation function on a given field part way through the animation: for example I could start off with my Z translation being (S)tep interpolated, and then switch to (C)ubic spline part way through, and you would get the result that you'd expect Deforum can of course do way more than just zooming. Why don't we do a bit of a showcase of a few of the other parameters that are available. Let's do some 3D rotations over X and Y, a bit of Z rotation which is basically a 2d rotation, and then some translations over X and Y. I'll quickly enter some keyframe values and some interpolation functions for each of those fields separating things out so we're not trying to do everything at the same time. And so you can see on the graph we'll start off with a smooth zoom in and then we'll do our rotations along each of the axes and then finally we'll do a translation on the x-axis followed by the y-axis. Hmm I think it would be better to zoom out at the start, so I'll just whack in some minor signs up here and generate again… and everything we expected to see is happening is here. Our poor animator looks... uh... decidedly worried but hopefully after watching this tutorial he'll feel a bit better. :) That's all for today, but in future tutorials we'll be looking at using oscillators for simple beat synchronization, using Parseq expressions right in the prompt to evolve the prompt during the animation, and also advanced audio synchronization with the Parseq audio analyzer, that takes an audio file as input and automatically generates keyframes for it. Thanks for watching. Don't hesitate to let me know all about your experiences with, and ideas for Parseq, and come and say hi on the Deforum Discord. See you soon!

Info

Channel: Robin Fernandes

Views: 17,412

Rating: undefined out of 5

Keywords: stable diffusion, deforum, parseq, tutorial, audio synchronisation, zoom

Id: MXRjTOE2v64

Channel Id: undefined

Length: 18min 42sec (1122 seconds)

Published: Fri Feb 10 2023