Stable Diffusion ComfyUI Married With Ollama LLM - A Streamline Prompting Workflow

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
introducing the Revolutionary AI image and video creation process that combines Cutting Edge technology and seamless comfy UI workflow by using ol with large language models and stable diffusion in comfy UI you can now generate stunning visuals immersive animations and engaging stories all in one streamlined workflow with ol and all open-source large language models you can now bring your ideas to life like never before say goodbye to complicated steps using multiple tools copy and paste your story line like what other video show you with ol and stable diffusion you can simply input your text generate the image and pass it to stable video diffusion to create animation video easily Plus for beginners why still need this kind of prompts PDF in Etsy if large language models can help you to generate image prompts let's check it out the Streamline workflow that save your time without you manually convert contents into prompts hello everyone here's another cool custom node called if prompt to prompt or comfy UI if AI tools you can download it in the comfy UI manager by clicking on the list and searching for this name Additionally you can go to the GitHub page to download it and find more information about this custom node so what does this custom node do it connects with l local large language models that are set up via and generates prompts for stable diffusion style of prompt for image generation to use this custom node you need to have Ama set up on your local machine run AMA and download any large language models that you want to work with during the installation you will be instructed to integrate AMA in a specific way I also have an olama installation setup guide for Windows in my LL M channel the link is provided in the description below in the custom node folder you'll find pre-created workflow templates for you to try out once you download this custom node in the custom node workflow folder you'll have a workflow Json file to get started I have improved the layout and made it workable for other large language models combining text to image image to image and even video animation with stable video diffusion after installing this custom node and setting up Al how do you get started first of all restart your comfy UI and open another CMD window for the AL command prompt in the CMD window type serve and press enter this will boot up from the back end then you can return to comy UI the workflow diagram interface click the refresh button to let the IFA tools detect the Ola connection once detected the model drop-down menu for selecting llm models will show a list of available models in your setup as you can see there are already two API requests from comy UI to the backend server here I have three large language models downloaded on my local machine llama 2 lava and Mistral Mistral is particularly good at handling General stuff and outperformance a llama 2 so I'm going to use MRA for llm prompts for my request where I'll input any text content and generate positive and negative text prompts for stable diffusion then I'll connect the generated string to the conditions in the stable diffusions text to image generation to make it easier to recognize I'll create groups for the stable diffusion part for generating the stable diffusion image I'll use the sdxl Juggernaut XL checkpoint model it will connect to both the clip text node and the large language models that read through your prompt or questions lava model has clip Visions ability to create stable diffusions text prompts through the image and it will pass that text string to the conditioning of the stable diffusion group now what's the difference between this and the IP adapter clip Visions I'll talk about that later in these videos first of all this provides an intuitive way for you to ask questions about the image you load in here for example I'll ask the question what is this and write a prompt for sdxl image Generations this will generate the stable diffusions text prompt Styles text strength and pass it to positive and negative prompt s let's try it once and you'll understand what's going on here's AMA the AMA back end is running through the API connections with comy UI the data communication happens between the backend and comy UI now we have the image displayed lava is a large language model for vision assistance it can read and process images using GPT manners models it's similar to GPT 4 where you can upload an image and ask the AI models to identify what elements are present in the image and this custom node if image to prompt specifically works with large language models and transforms the responses into stable diffusions text prompts so right here as you can see the text prompt in the show text is describing the loaded image what is that in there and as you can see the image from the generate result is in different styles but then all the elements they have are the same so in this example for example here we have the futuristic environments with Scientists controlling computers let's check out this image this is another similar image as well so this text prompt is updated and as you can see the image shows a person sitting at a computer with multiple monitors the individuals appear to be engaged in some sort of digital work or gaming indicating a keyboard and mouse in front of them so it is able to read through the image to identify what is in the image using the model leava and it is a really handy large language model that you can install locally on your machines and these custom nodes also enable you to generate different styles of images for example we have photography or futuristic Styles or neon Palm look at this one this is the minimalistic style image that has just been generated so right here we have another style as well let's try something like neon pong in neon pong there will be some purple and paint colored lightning backgrounds appearing in the image giving it a cyberpunk style yeah so there you have it this is the neon Punk style image that gathers the text from the lva large language model understands the text and generates the description as a stable diffusion text prompt through the stable diffusions text to Image Group resulting in the generated image here and this is quite different as you can see the text prompt is passed through the stable diffusions command prompt as well one thing that is very different from this compared to the IP adapter is that this allows you to generate totally different styles of images and characters everything is different let's create an IP adapter group here so that we can try using the same image and load it into the same sdxl checkpoint model we can try that with this IP adapter loader let's load the image here I'm going to use the same image so I'll copy this file name actually I can just follow this file name to to search for it in the load image here the 3 PNG that's the one so there we go we have the same image that we just loaded previously and we have it in the IP adapter as well so right here I'm going to connect the checkpoint models to the IP adapter and the output of the models from the IP adapter to the case sampler this is a very typical usage of the IP adapter actually we need to convert this conditions text strength back inside this custom node without having it as an input parameter so let's go back to convert text to widget and the negative prompt as well and we can erase all the predefined text just leave all of this empty the conditions here and we can generate an image using the IP adapter this will help you understand the difference at as you can see the output from the IP adapter is almost the same style as the source image the reference image here it positions the computer screens the lighting in the computer room and even the characters face color dress and hairstyles everything looks the same but if we want to be more creative we can try something different we want to use this image as a reference for other stories or purposes then we can use this large language model and connect it with stable diffusions to identify the content of the image let's connect it back to the large language model groups we can test it once more and see completely different styles while still having the same elements present in the image for instance we have characters controlling multiple computers in both images but they are in completely different styles the first style is neon Punk cyberpunk while the other one has a character controlling a computer with multiple monitors both images contain the same elements but in different styles the layout and structure of the images are generated with different structures allowing us to be more creative and obtain diverse image styles for various purposes for example if we want to create cyberpunk style sci-fi stories we can do that here we can also generate different text prompts using the same image resulting in various styles of images even though the generated images include the elements of multiple monitors and female characters they have different styles now let's discuss the text to prompt groups by connecting these groups you can input text and interact with the large language models through your input text prompt the large language model can generate an image text prompt for stable diffusions for example let's consider a typical use case where you use clova 3 in Poe to create short sci-fi stories you can input your text prompt copy and paste the relevant Parts into the larger language model and transform the story lines and scene descriptions into stable diffusion understandable text prompt this allows stable diffusions to generate an image that accurately represents your input text once you're ready to proceed you can start Mistral again load in the AL and transform your input text from the large language model text prompt to the stable diffusions text prompt in this example the character is standing in front resembling a Sci-Fi cinematic Style starting from the storyline of these sci-fi stories it looks pretty cool the descriptions provide Vivid background details such as helicopters and broken buildings now we can move on to scene two let's rearrange the groups so that everything is visible on one monitor if you don't like the generated image you can simply click to generate another image using the same text prompt for example this one looks better with a cool side angle shot of a character when working with stable diffusions natural language input in full sentences may not always be understood however using the text prompt group you can input natural language descriptions which is helpful for Content creators who want to create a story scenes using AI to tools like stable diffusions on their local I have added the stable video diffusions group here and organized the diagram by tidying up the other custom nodes within the stable video diffusions group purple groups I have the output of the Sci-Fi story I wanted to demonstrate I can bring this text prompt into the large language model I don't need to use the image to prompt groups here so I can remove them temporarily let's enable the text to prompt groups and paste the storyline from cloud 3's response into the large language model continuing with the storyline I can use Scene Three which I have already done and proceed with the rest of my story and as you can see this is a check one more time I will disable the image to prompt groups make sure you connect all these nodes together and click the generate text prompt button we can start generating the text prompt from the large language model to stable diffusions text prompt and we have this image generated from our story line so mostly I will put this to fix seat numbers for the stable diffusions group now it's passing the image data to the stable video diffusions As We Know stable video diffusions are very simple to run and use just retrieve the initial image pass it to the stable video diffusions image to video conditioning nodes and that's all you need to do I have also resized custom nodes to identify the width and height of the image and pass that to the SVD conditioning as well now we are waiting for the case sampler to generate this scene here's the result from the previous image we generated and it's exactly the same as before you can see the camera motions of the spaceship behind are moving if I don't like the direction of the camera panning I can use the same image in a typical stable video diffusions workflow we load the image connect it to the image reroute node and then use the same image we just generated on top of the sdxl groups we bring it down to the SVD groups and generate it again this allows us to edit or redo the camera motion styles of the videos If we don't like the current one let's disable that to prevent loading those data as well we are only focusing on the SVD groups at the moment let's say I don't like this video's output and I want to use the same image to generate another videos output here let's see the result it's a very handy tool we simply change the route direct point from the input image of sdxl and connect it to the load videos node here before that of course I have to save that image to the load videos node to input this image so there you go guys how amazing is that it's a streamlined and fast way to create animations from stable video diffusion you can use chat GPT or any models to create a storyline and just input that story line into the comfy UI workflow generate an image and pass it to image to stable video diffusions I believe lots of you seen many YouTube videos teaching you how to make YouTube shorts or history shorts video using multiple tools jumping back and forth generate storyline in GPT bring those contents in Leonardo AI then PAB those uh typical so-called tutorial videos but now with just the local machine installed with large language models and stable diffusion comfy UI you can create bring your storyline and just create your animation video in just one workflow so I hope this inspires you to try it out guys if you want to create YouTube videos create short stories or anything video content that you can leverage large language mod model and diffusion model to generate feel free to try this out you will have a smooth workflow pipeline to create content with this so I'll see you guys in the next video have a nice day bye
Info
Channel: Future Thinker @Benji
Views: 5,104
Rating: undefined out of 5
Keywords: AI image creation, video creation, Ollama, ComfyUI, large language models, stable diffusion, workflow templates, IF Prompt To Prompt, ComfyUI If AI Tools, Clip Visions, IP adapter, image prompts, creative workflow, AI video creation, immersive animations, stunning visuals, ai art generator, ai art, stable diffusion tutorial, stable diffusion video, AI art
Id: EQZWyn9eCFE
Channel Id: undefined
Length: 18min 16sec (1096 seconds)
Published: Mon Mar 18 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.