I trained an AI with over 10,000 memes
to see if I is funny. Insert your image, wait for the prompt and automatically get your meme ready. So in today's video I'm going to walk you
through how I got this data, how I train the model, and some strange things
that I've encountered along the way. So first, it's worthy of knowing
how unfunny AI can be right now. I found this image here on Twitter
showing a graph. Let’s asked to generate a funny caption. Start from the bottom. Now we're here. Just hope my battery doesn't die
before I cash out. Like, I don't even think my grandma
would laugh at that. But before we get started,
we have to ask ourselves what is a meme and what makes them so funny? Memes have evolved a ton
since the very first internet meme, The very beginning was associated
with the rise of user generated content dancing Baby Hamster Dance,
where all the things that show the internet's potential. Then we have the classic era
that could be associated with the ease of sharing content online, like Chocolate
Rain, Rickroll or YouTube poops. Then the idea of rage comics and advice
animals came along. These images were crude, easy to make, and almost made you feel included
by repeating faces or lines. and after. This means could arguably be described
just by irony. Making fun of old or current trends and completely over-the-top
ways, like over editing and gaming videos or adding filters
so much that it deep fry the meme. means have changed a ton. I like this quote right here, a meme
is a piece of media that is repurposed to deliver a cultural, social, or
political expression mainly through humor. currently the Apple App Store is under
scrutiny as of recording this video. So this meme has found a way to be funny. but even able to capture something like
this is funny as well. right, history class is over. Let's get into the code. First we grab as many memes as humanly possible. We take the text of the meme and the image and run it through a large language model
to explain everything in detail. we take these images and text data
so we can fine tune a large language model to give a good meme
caption based on was given. Then we create an interface as able
to create a meme from an image input. Related News Articles for modern context
and other data. the ideas that make means
are somewhat funny and at least relevant based on the subject matter,
which I can't really do that well. That means when a news story happens,
you have a relevant meme ready to go. This is such a stupid thing to work on. First, the data collection. The issue we have with
beams is going to be a hot take here. They're just not good, boring and funny. However, there's a lot of websites that are out there that help users
generate their own memes. as well as create the templates here. Like for example,
we have a bunch of trending templates right here
that people are using for recurring memes. And what makes us significant is that it
provides us with the actual meme, like in as blank form with the community making posts about it
to see what they think is funny about it. And this helps us understand
what the meme is about and the relationship as to why it's funny
with the caption of that meme. So this gives us a good training framework
to give the blank version of the meme while having captions
given to it to train. What is considered
funny in this particular image. But for now, let's scrape all of these meme templates
as well as the memes made about it. We can also grab the meanings behind
popular means on wiki sites like Know Your Means,
which honestly is the perfect database for these types of things. If you're familiar with web scraping,
you know, I can't just use old methods. A lot of websites force
you to use JavaScript to navigate their whole website,
which makes things really tough. So we have to simulate
a whole entire browser to get the information ourselves
and act like a user. So I'm going to be using Bright Idea
scraping browser, which will be a proxy to my browser
automation tool. I've been using bright Data for a while
now for a lot of my AI related projects, and they were kind enough to sponsor
today's video. The World of Web scrapers
as a gigantic rabbit hole. You'll go down. One of the biggest rabbit holes
is using proxies, by. With bright data helps
you get all of the information that you need using their proxy network,
like the one I'm using right now, a player or their massive proxy
network of IP addresses worldwide. Okay. So I wrote a script here
to start gathering these images. Let's just insert this one line of code
here and look at the remote browser to see if everything's fine. Let's go. I mean, it's kind of cool. It's kind of like, you know, hacker vibes. Now I can just leave this running
for a while. Right. Data can also save captures
and rotate the proxies for me automatically
so I don't even have to worry about that. All right, let's see. Wow. So we're able to grab a ton of meme collections
here, like a whole lot. Now the reason why having a template
with the example captions with it is perfect
because it allows us to train the model. Why a particular meme is funny. Then all of those memes
we can apply the description of them to the template itself. So when the user decides to get a meme, it will understand what meme to choose
based on the relevant context. And all of this work just to make some
crappy teams like it's seriously pathetic. By the million dollar question,
what makes something funny? So I'm going to use this model
that's pretty popular and hugging face called TVL. I hope I'm saying that is a Chinese one. To read a meme
and give me a reason why it's funny. For I will give an incredibly detailed
reason why this is funny. So I'm going to have that read the meme
and tell me what the photo is about. Lava will give me an incredibly
detailed reason why this is funny. Then for all the memes, that is a part of,
I'll transcribe it into text and also make sure it matches
with this meme during training. During training. Love that. open. A.I.
has their GPT four vision model. I don't know what it's called. It's
what they use in charge. CBT, but it has a 100 day rate limit
like I'm sorry here, but I have like a million memes. You think I am going to wait
for 10 million years to do this, So I'm going to write this Python script
here. was able to run the whole thing
on my own machine, which is great. I used a Allama
which just came out for Windows recently. highly recommend it
if you have a beefy computer. So, like this one, for example,
image shows a chalkboard sign on an easel. The sign reads, No, hipsters don't be
coming in here with your hairy faces, vegan diets at your feet,
your sandal wearing no waste mug, no brews, no hamsters. then the chopper design is on
the sidewalk. I'm from a building
which appears to be a shop or cafe with a sign that says hipsters. The sign seems to be a humorous
and directed at those who fit certain stereotypes
often associated with hipsters suggests that there will be, my God, no entry for individuals matching
the characteristics listed on the sign. The style of the image is informal taking outdoors and my God,
that guy is a long description. or this one. The image shows an open
laptop with a screen. All your files are exactly
where you left them. The laptop appears to be an Acer model. The desk or table
of which the laptop says has a blurred background
but seems to have a brown surface. overall settings
suggest an indoor environment. So I mean, We have a ton of data here. A concerning amount be perfectly honest. need to fine tune a large language model
to use our data to create funny or memes. So the one I found here
is called the Sphinx, which is a multi-modal large language
model that you can use images as prompt. This would be perfect because I want users
to submit their own photos as memes. Now the documentation is as fun to follow. Fine tuning large language models
is like turning the knobs of the controls of the large language model itself. Whenever these knobs are turned,
it produces a slightly different result that you want. So for things like GB, for they have fine
tuned versions of their model to serve their purpose as a general
AI that can do anything. and for my scenario,
I want to adjust my knob so that it can make a funny meme
based off of the scenario. I give it to do this. I'm giving the model training data
that looks like this a prompt. I give it the response a large language
model would give and the path to an image. There's this project called Onslaught,
which makes fine tuning models really fast and support
some of the most popular instruct models. So for me, I'm going to use Mistral
seven be instruct version 0.2, which at the time has some of the best
benchmarks since I can fine tune this locally, it means I can iterate
as much as I need to and the documentation is pretty good. made it so you can even do it on
Google CoLab if you wanted to. So Sadly, it'll just be with the captions,
with image descriptions. We're
going to be waiting a long time for this. So something that I have noticed already
is that despite the language changing and some of the captions coming out, know,
somewhat humorous, is still not funny. Really. So at this point, I almost just
completely gave up on the project, started to realize a truth
that made me a bit uncomfortable. Maybe I doesn't know how to be funny
because while human beings
don't really know how to explain something that's funny in the first place,
let me explain. why is it
that this image here is not funny? did someone order a fill Minion? You seriously didn't
laugh at that, did you? But then when you do this to an image,
maybe a slight chuckle happens. This is at least more funny
than this image. But with this specific example,
deep frying, a meme is a form of a parody
for this low quality types of images you see by juxtaposed with content
that might be familiar to us. this led me down a gigantic rabbit
hole of why things are even funny in the first place. Reading the theories, I decided that
this would be the perfect way to add an element of surprise
to the meme as well. So I decided to split them into six
different humor types unexpected exaggeration, absurdity,
wordplay, juxtaposition and incongruity. Incongruity, incongruity. Right. I hate that word. So instead of just being like,
be funny, too, I we are now telling the AI how to be funny in a very specific way
without the user knowing. So the AI is adding humor by just being
random, I guess you could say very quirky. part of why the memes are so funny in the first place
is not because of what the caption is. we're talking about memes, the largely
written model knows how to copy the language that these beams can produce,
but that's about it. huge part of memes as well as the context
that goes with it, which is massive when it comes to the virality of memes. and one of the biggest sources of
beams are current events that go on in the world. So a feature I'd love to try is getting
the most current events in the world right now and using that as a context
for my large language model. Let's give it a try. I'm going to use Bright Data
as proxy network again to plug into my web scraper to get news
articles really easily. Again, let's just copy and paste this
line of code and we're off to the races so the memes are starting to come out
really funny, especially when I attribute it
to modern events that are happening. So let's just find a way to make this so everyone can use it. now a meme format
that is super common amongst all of the internet is labeling objects
within a scene. so you'll often
see it as a picture of something. And then labels over top of them
to represent some sort of metaphor of what's happening in that scene. This usually is an extreme scenario of some sort by is represented
into a relatable scenario. So I was thinking about creating some custom code
that could recreate these types of images really easily. Now, because you're seeing the highlight
reel, this was much harder than expected. Let me explain. First,
we get the information from the news link. We get the meme that it most relates to
for most context. We then add a category of humor to it. if you're interested in the more raw
and technical details, I have a second channel
where I go into the nitty gritty. Check it out if you want. so when it comes to the images to create, we need to be able to locate these objects
within the scene. Throughout my entire project,
I tried using something called YOLO V, which is a fantastic library
for detection, sure it'll be fine
for most scenarios that you use. It's a seriously a great library. I recommend it's unable to identify
what this thing truly is
because it's trained on real data. I mean, one time I said it was a dog. So this large language
model called Owl V2 combines this understanding of large language model
reasoning and understanding with masking space
software as pretty crazy. And this is perfect for my use case
because it allows us to identify objects of interest
right at the beginning of the software. So we just have to change
a couple of lines of code and we're good Last part is to create the user interface
behind this whole entire operation. for this. I'm going to use what everyone is using
nowadays, Streamline, which is this open source project in Python that gives you a bunch of these cool
UI elements to build applications easily. great with this is that I was using just a command line to run my scripts
over and over and over again. Trust me, it was horrible. But rather than having to connect
to a whole API, create a front end, I can just plug in my function
and it just deals with everything. All the UI elements for me. okay,
I have something that is sort of done and ready to be shown off again. All of the code is open source and you can
try them out in the And here we go. So here's the application right here. candy faces
uncertainty due to Chinese imports. Okay, I think that'd be a good one.
and I have four demo images. I kind of want to give a try
with this whole labeling feature. So let's try this one right here. So. So I came back with licorice glass, okay. Never mind. Now let's see if we do the Drake
and Kendrick Lamar Beef and see if we can get a meme out of it. So I'll put it in there. Hopefully a Wikipedia article works. I don't see why not. my God. If they can do this one,
they'll be so funny. a kind of guy, right? I mean, this is not really necessarily
funny. I mean, I kind of expected it to be like
this guy is funny how it just put it in this type of language Taylor Swift
debuts revamped RS tour set list with. Yeah, you get the point. Okay, let's
try this starting car right here. Sometimes it doesn't come out good. I mean, this image wasn't necessarily
going to be funny. I do have some code in there
that makes it so that it can do a caption rather than like a label,
But sometimes it just doesn't work out. Outside of this, though,
here are some of my favorites that came out
that you may have already seen before. Now, this video was tough to me
because I wasn't sure if I was even going to release it
or I had a blast working on this project. But as you can obviously tell
from the different background, all over the place, like future and present,
like it took so long to make this sometimes I see online
people saying that like, you know, I'm a great programmer,
I could circle around people. Like, it's just not true. of me wanted to release this video because it was just a nice way
to show that, you know, I make mistakes. I make bad code. Sometimes it doesn't work Again,
I have my second channel as well, which I kind of go more into detail. Check out the livestream as well. I also have a hackathon coming up soon
here, so be on the lookout for that.