Hi everyone. I'm Vishnu Subramanian, founder of
JarvisLabs. In this video, we are going to see how anyone can create beautiful images without
actually knowing much about prompts. To do this, we are going to use something called if prompt
will get to the technical details a bit later. But before that, let's see what can be done here.
Right here all I am giving is the word woman, which is probably the most used for generating
images and see how the prompt is being getting enhanced, and how the model is able to generate
really good looking woman without any of the fancy prompts that we otherwise would be giving.
Right? And there are like different ways, like you can say probably neonpunk, and it gives
a completely different style of the image, right? Just by certain tweaks, we can control how the
image is being generated. How can you achieve this or how you can do it in your instance or something
will be looking in this video before we do this, we need to understand important tool called
OLLAMA, which is an open source tool. If you are using JarvisLabs it all, it already comes
as a template. So you can just click and you can point it to your local UI tool, which probably
I'll be making another video soon. Right. But in this video, what we will do is we will not be
using the OLLAMA template, rather we will be using the comfyui instance. And installing the OLLAMA
in it. Write the actual steps for that. What I'll do is I'll leave it in the description so that you
can try it. If you're using your own workstation, the steps should be similar or you can go to the
OLLAMA website. They have steps for Microsoft Windows, Apple Mac and Linux. Machine on. We
provide Linux machines in Jarvislabs, so you can use these two steps to get the OLLAMA installed on
your cloud instance. Once you do that, you have to start the OLLAMA serve, which actually enables a
server. So let's do that. And we will also need to run some particular model or we need to download
some particular model. Let's say for example, in this case I'm downloading something called Gemma
: 2b to be right. Just to make things quicker. I've already downloaded this model so that we
don't waste time. Right? Once I'm here, I can talk to it, like how we interact with a platform
like ChatGPT, right? Can you give me, . Prompt for generating. You know. Beautiful.Mountain.
Right? So if you see if you look at the response for this, it's not very much specific to how
we use prompts for stable diffusion. Right. So that's where we use another tool or a node for
company called if I tools, which is , probably an amazing work done by probably an individual
contributor. It's open source software. So what this allows us to do is it allows us to use the
OLLAMA, multiple LM models available in it with our Comfyui software. Right. So here what we will
do is now let's get rid of this by clearing the workflow. Right. And let's start with the default
one. Right. So before we do this, let's just move all these things to the side so that we have
enough space to build it let's. Probably pick. I've been playing with the juggernaut model.
Sorry. It is SDXL. Oh, let's use this. Right if you're doing it for the first time, you also
have to install this multiple nodes. For this one, you can just install custom nodes and you can
search for if comfy UI, right. It's already there. Or what you can do is in this my case, I have
already installed it. Right. So this one is the comfy UI if ai tools. You install it, restart it.
You can just do it from ComfyUi Manager. As usual, if you're using a JarvisLabs installs the ComfyUi
managers pre-installed for you, right? Once it's done, let's try to go and add the if ai tools node
so that we can talk to our lama . different models right. So it's part of impact frames. Let's start
with something called if prompt to prompt okay. So here. What. So we added a node. Okay. And in this
node what we are going to do is it's talking to the local host and port number 11434. That's the
default port at which ollama runs right. Let's pick up a model called GEMMA okay. And let's leave
everything aside. And then what we will do is here. We are not going to give the positive prompt
or negative prompt. So what we'll do is we'll say right click and say convert text to input. Once
we do this, we can actually connect the text to input here. And we'll do the same here so that we
can do the same for the negative prompt. Right. So let's minimise these things so that we have enough
space for playing around. Right. So I think this should be enough for us to generate the image. And
let's bring it here so we can see things. Right. So this is very simple right. But let's say what
if you want to understand what is happening. So in order to do that lets there is another node as
part of this impact packed. So impact frames which is called IP display text. There is other node
also called save text which you can use probably if you want to , save the prompts that is being
generated. So let's put this to the response and put this to the negative. Right. And let's also
bring another one and node. Impact frames and see IP display text. Let's put this to the question.
Now let's run the prompt again. Now what you would see is we can see what is the positive from that
is being generated. Right. So we are just saying okay let's remove this. And let's say just say cat
give me an image of a cat. So what is happening here is the Gemma model. And the node is doing
certain applying settings styles to the prompt. And it's also giving a lot of more details to the
from that we have. Right. That is what is making the , images to be generated very beautifully. So
we are able to do a lot more professional images, or it gives you a starting place for us to
have images. And probably you can remove the words that we don't want. These are not
perfect, but this gives you a better place to start. Right? So but what if I don't want prompt
whatever. I want to start with an image. Right. So in order to do that, what we will do is we will
not use this particular thing this particular node that is prompt to prompt, but rather than
that, we will say image to prompt. Okay, now we pick this. Right. Let's delete this. All
right. And let's do these connections as usual. Let's keep this bit far, okay? For now. Since we
have seen how the prompts are getting changed. Let's remove them. Right. And let's connect the
positive prompt here and negative prompt here. So I have to enhance this. Then I have to connect
this to the positive prompt. Now let's minimise this. Let's expand this and let's connect it to
the negative prompt right. Once this is done. So what we can do is we can attach an image to this.
Let's say load image. Okay. And let's use the car, that image that we have. Let's see if you're
able to generate the image. Which I don't think so. Okay. The reason it doesn't work is all
the models are not good at understanding the image. To do that, we have to use a model called
llava. It is a multi model LLM’s or usually single model. That is, it understands the text right.
So llava is capable of understanding the images also that also you should be able to run it with
the ollama tool which is open source tool which allows us to run multiple large language
models or other multi multi large models like lava. You can also run them on your laptop
right. So now we have this image. We have chosen the model as llava. Let's see if the
workflow is able to generate images of a car. All right, so now we got the image of a
car. That's it guys hope you enjoyed the workflow. If you face any challenges,
you can write down on the comments we will try to answer. Or you can also
join our discord group please hit the like button and subscribe to our channel
for more such videos. Thank you. Bye bye.