Okay. In the last video I talked about
some of the updates around Ollama. And one of the things I talked
about was that you can build a lot of little apps, that actually
do things on your local computer. And so one of the ones I
mentioned was about screenshots. So people asked about that and
in this video, I'm going to go through, show you how to do it. I'm going to talk about the
basics of what's going on. It's very simple. And at the end, I'll talk about
a different version that I'm using personally, which is a
more advanced version on this. Okay. So the whole idea of this is that
basically we've got a folder, which we do screenshots for, right? Most of us on our computers
have a folder where we save automatically saved screenshots. And over time that folder
gets very big, or at least if you're me, that gets very big. And often I want a screenshot that
I did a while back at a certain point in time and stuff like that. But the only way to go through
that is to actually go through and look at each of the screenshots. And while that works, one of the things
that you can do, I'm going to show you here is that you can get one of the
vision language models to automatically annotate or write captions about the
images, which will make it easier for you to find the image later on. So in this video, I'm just going
to show you a basic version of how to create the annotations using
Ollama, using the LLaVA 1.6 model. And then perhaps in another video,
I'll show you how to sort of integrate this with a RAG system. So that you can actually just use a
Q&A to basically query your screenshots and get the answers back that way. All right. so let's jump in and actually look at
sort of a diagram of how this all works. It's actually very simple. Really we've got a folder where we've
got our screenshots So the first code we're going to basically have, it's
just something that goes, To the folder and gets a list of the files out. And then basically sorts those,
So that we've got them in some kind of order to basically use. The next thing up is like
the main logic of this. And this is really just going
to basically load up the file. In my case, it's loading up PNGs. You could set it to load
up different things. But in my case, because it was PNGs,
I found that Ollama didn't seem to process the PNG that was more
used to JPEGs, but what you can do is just convert the file to bytes. And then it can handle
bytes, quite easily in here. So basically I load up the file. I convert it to bytes. I then send it to the lava 1.6 model. Now here, there's a few different
models that you can use. You can use the basic, 7
billion parameter model. and you'll get okay results out of that. You can go up to the 13 billion
parameter model, which is the one I'm going to show you here. And then you could also go up to
the 34 billion parameter model. So in my experimenting, the thing
that I've found is that the 7 billion parameter model will often miss
things that are really obvious. And it can do a really good job on some
images and not as great on other images. The 13 billion parameter model
is definitely better at sort of having a bit more understanding
over the image and stuff like that. And if you've looking for something
specific in an image, you could actually put that in your prompt
that goes along with this. And probably get decent
results out of that Obviously the 34 billion parameter model
is going to be the best one for this. But for a lot of people, either you're
not going to be able to run that, or it's just going to be insanely slow. For me on this machine that I'm
using, I've got a 32 gig of RAM. I can run that. So it's not, you know, you need
like a supercomputer to do this. But it's definitely slower than that
than the 13 billion parameter model that I'm going to be using here. Now, when you send the image there, you
want to send it along with a prompt. And you really want to customize the
prompt for your particular use case. So if you're just trying to
index stuff, maybe you have something about describing stuff. I'll show you when I go through the code
of the prompt that I'm using in here. We then bring the results
back from the Llava 1.6 model. and then I basically just
add that to a dataframe. So one of the things that I did
early on was I basically check if there is a CSV file in the folder. And if there is, I just load it up and
then we basically check to see, okay, has this file already been processed? If not, we'll process it. If this file has been processed,
we just leave it and go to the next image in there. But once we get the results back
from Llava 1.6, we basically put that into the dataframe. And then finally, we just saved
the dataframe out to a CSV file. So I'm just showing you a really simple
sort of standalone version of this. You could obviously
save this to a database. You could save this to a vector
store, which is one of the things I'll talk about at the end. You've got a whole wide variety of
things that you could do in here. And then you end up finally with your
CSV file, which you could load into excel or Google sheets Or use it wherever
you want to use this kind of thing. All right. Let's jump into the code
and have a look at how this. actually works. Okay, so go through the code. It's pretty simple in here. We've got our imports up first. So I'm just bringing in Ollama
and then I'm going to bring in generate which we're going to use
for actually generating the return. We're then going to use glob to
basically get a list of the files. I'm going to use pandas
to make the dataframe. I'm going to use PIL to bring in an image. And then I'm going to convert it to bytes. So that just shows you
what I get on in here. So first up, we basically try
to load a CSV file with the file name that we've got here. So I'm calling it image_descriptions.csv. And if that exists, we'll just
convert that to a dataframe so that we can then add anything that's new. And then that will be saved
at the end, back to this. If it doesn't exist, we're just going to
basically make a new, pandas dataframe. we're going to give it two columns. One is going to be the image file. One is going to be description in here. So when we run that, now
we've got our dataframe. All right. So we need to get a list of the files
from where we're going to get them. So basically here, we're just
doing a glob of the folder path. In this case, I'm going for a PNG
files just because I know that folder has nothing but PNG files in it. But if you were using JPEGs or something
like that, you could change this or you could use it, star.star to get any thing
and then put in a check to make sure that it's an image, that kind of thing. Alright. so we run that, we've then
basically got the list out. I'm just going to sort the list. In this case, I'm just going to print
out just some sort of debugging, again, a printout, the first three images it gets. And we're going to print out the head
of the dataframe, if we've got one. If we don't have one, obviously we'll
just see an empty dataframe there. All right. So now we come down to
the main part of this. So what we're going to do is
we're going to have A loop where we're going to basically just go
through each of the image files. In this case, I'm just
taking the first five. We're going to basically check if
this image file is in the dataframe. And if it's in the dataframe,
we're just going to skip it. If it's not in the dataframe,
we're going to process the image. So this is the main function
here, the processing the images. What this does we pass in
the path name to the file. It was just gonna print this out
to the console so we can see it. Obviously you could turn these
print statements off really easily. And then we're going to
basically load up that image. And, convert it to A bytes format. So that we can just pass that in. we're then going to basically
pass that into generate You'll see this full response string. I'm going to just set that up there. But we're going to pass into the generate. So we're telling it which model we want. Now, in this case, I'm using
the LLaVA 13B 1.6 model. But in my case, I've got all three
of the different sizes in here. And if I was just running it as say a
Cron job or some sort of job that got run. in the middle of the night or something,
I could just go for the really big model. And I wouldn't worry too much
about it taking that long. And it doesn't take huge
amount of time anyway. It's more, if you're a system has got
enough RAM to actually run it or not. And then I've got the prompt. So the prompt that I'm passing in here
is, Describe this image and make sure to include anything notable about it. And then in brackets, I've
got include text in the image. So the idea here is that if it sees
some texts, then that's probably going to be one of the most important things. Now, the challenge is going to be
that it often won't get that text, especially with the smaller models. So this is something that you want
to be aware of, but you can totally play, not only can you, but you
should totally play with the prompt here for your particular use case. So sometimes if I've got it sort of
Looking for something in particular, perhaps you've got it where you're trying
to get it to do a, not safe for work check or something like that, play around with
the prompt for those kinds of things. And be aware that the smaller
models, they're just not going to be very good for certain things. All right. So that's my prompt. I then pass in the image here. And then this is just taking in a list. I'm just passing in the image
bytes that we got in there. And I'm going to stream
the response out here. And then it's just so that I could see the
texts coming through and basically print out each response, as we go through this. And then basically just
adding each of those streaming texts into the full response. And then finally, at the end of
this function, I'm basically just adding the path name or the image
file name and the full response that we got back to the dataframe. So it's just adding it
to a new row in there. And then finally at the end of
it, we're just saving this out back to a CSV file with the same
name as what we started with. So if we've updated the CSV file,
we'll just be updating that file. If there was no CSV file to start
out with that we're basically creating a CSV file here. All right, so let's come in here. I'm going to run it and let's
see the sort of time frame that it actually takes to do this. So you can see that basically
there was no CSV file in this case. and we can see that at the start,
it is, processing this image. So it's actually loading
up the LLaVA model. And then sure enough, now we can see
that, okay, it's generating text out here. Now, this is what I mean by
the smaller models will not always get the text, right? So that particular first image
there is for Gemini, right? It's a screenshot something
about Gemini advanced in there, and you can see that
it's actually got Gemni. So, they won't always get perfect
OCR or something like that. Obviously the bigger models will do
better at some of this stuff .. you can get some things that will
get quite nice results out here. if we look at this one
here about the Google logo, it's done a quite nice job at sort
of being able to interpret that. So this is a stylized version of a Google
logo at features and abstract dragon light creature with Chinese elements. So here it's done a really nice job of
being able to capture what's in there. And actually on this one, probably
a lot of OCR systems wouldn't get the Google, in there very well. Now looking at the results out, we
can see that, okay, it's basically done five different images here. And It's had no CSV file to load. It's done the five images because
I had that set to five there. If I come out now, we just save
that after changing it now to run through all the images. now can actually see that,
okay, already the CSV file has these images done in here. So it doesn't need to
do those images again. And you'll notice that in this time,
it was actually quicker to actually get the model because the model was already
there and stuff as well in this case. So you can run into some issues
where, if you're trying to load the model twice and you've got it
half loaded or something like that. In any of those cases, you can just
quit out of Ollama and come back in and it should work fine, or you
can actually go through and kill all the Ollama processes manually. But it will often, be in the middle
of something and then restart a new process, So you can see for a bunch
of these, it really has got the whole sort of idea, the number of images
in here of this sort of design. and it really does understand that
this is like a CAD design going through and working that out. Which will make it quite easy for us to
find this either just doing a keyword search in there, or to actually use
some kind of RAG in there as well. And you can see, at this point
it's generating, pretty quickly. like I mentioned before, I'm
not using a super fast Mac here. I'm just using a Mac mini And
it's going through and being able to generate these images out. Just to show you some of these
images One of them was a Walmart, a receipt that I got from online. And I think it's done that this
image appears to be a receipt from Walmart, a large retail store. The receipt lists several items. Okay, so that one has
done a pretty good job. what about the flying cat one. this image shows an orange
Tabby cat in mid jump. the front paws are extended out. So you can see that it is actually
getting some of these, quite nicely. and then, like I mentioned, that doesn't
need to get them all perfectly for you to be able to then put this into some kind
of search or some kind of RAG for this. how could you extend this and
make an advanced version of this? So one of the things that I've done
to make it advanced versions of this, is where you get also things like
the file modification or creation date and store that in the list. And then when you put those into RAG,
you can then basically use those as metadata to actually do searches and
say, I want to search for this image. It was from December 23. And it just be able to sort of hone down. So especially if you've got a lot
of images that are going to be very similar, from time to time adding in
anything that can get a metadata in there as well, can be a really good. So Things like getting the modification
date, getting the owner or the username of the person who saved it. Those kinds of things can be useful
if it's not just for yourself as well. But anyway, this project gives you a
simple, example of how you could do this. And hopefully you can see that
this it can be really useful. Okay, so, in one of the next videos, I
think I'll look at, how to add in a custom RAG with a fully open source local model. so that we can try that as well in here. As always, if you've got any
questions or comments, please put them in the comments below. if you found the video useful,
please click and subscribe. I'm going to be doing a
bunch more things like this. I'm currently working on a number of
things to try and show people how to do the function calling with some of the
open models and looking at the different results that you get from different
open models for doing things like that. All right, I'll see you in the next video. Bye for now.