AI art is one of the hottest topics
when it comes to AI discussion in general, and there's questions to be asked here. Is it possible to get a high-level
AI image generation for free, or is this future
exclusively behind paid services? I've decided to compare
two of the best examples in each category, Stable Diffusion versus Midjourney,
to find the answer. First, I'd like to briefly highlight
the main differences between the two. Currently, Stable Diffusion AI
is an open-source text-to-image generator freely available to anyone. It supports thousands of custom
Stable Diffusion models, each tailored to a specific style. It provides an extremely flexible
customization model and has a dedicated community
that expands the possibilities daily. However, it's also hard to run
for an inexperienced user, and it requires quite a bit
of learning to get the hang of it. Midjourney AI image generator
is different in almost every aspect. It's not open source. In fact, it hides its secrets pretty well. And using Midjourney is only possible
with a heftily priced subscription. To give you a context
of Midjourney's pricing, the basic plan is almost as expensive
as the Netflix standard pricing, but it also has restrictions
on high-speed generation. Comparing Stable Diffusion
versus Midjourney, the latter is less customizable
and only has a couple of models, but they are extensive and results created by Midjourney
are of very high quality. It's also a hell of a lot more
beginner-friendly, to the point where all you need
to use Midjourney is a Discord account. It's also important to note,
the Midjourney Discord bot requires a constant internet connection, while Stable Diffusion can be run
through a cloud server or locally. I mean, that does require
quite a strong PC to pull it off, unless you're willing to wait minutes
for generations to finish. Going beyond the general differences, the way both tools are trained
has certain similarities. Stable Diffusion is very straightforward
in its approach. It learns how to generate images
by destroying them. The idea here is to add layers of noise
over and over and over again until there's barely anything left
of the original image, just so that AI will attempt
to reverse the process and recreate the original
from just a few scraps of data. The original Stable Diffusion model is based on a massive data set
containing various art pieces. However, the fine-tuned models
are more popular in the community. These models are trained
on a narrower data set and will generally produce
the chosen style quite closely. For instance, a standard model
would perform worse when generating a picture
in an anime style, while a model trained exclusively
on anime-style pictures will have no trouble with the task. Furthermore, using pictures only from a certain artist
can replicate their work with a certain accuracy, which opens a whole other
legal can of worms. But I'll touch on that
a little bit in a minute. It's time to talk a little bit more
about the Midjourney AI mystery. It's a mystery since Midjourney
is closed source, of course, so all we can do
is make an educated guess. Such a guess would suggest that Midjourney
combines the same approach of Stable Diffusion
plus a large language model, or LLM. LLM is typically trained
on a massive dataset of text and images. In this case, one such dataset is Microsoft's
Common Objects in Context dataset. This allows it to learn the relationship
between text and images and to generate
text description of images. Afterwards, Midjourney
will just have to use the Stable Diffusion method
to fine-tune the relationship between the generated text and images. As a result,
Midjourney has a great understanding of what kind of output
is required based on the text prompt. And now let's address
the elephant in the room. Where do the images used
for training come from? Well, most of them come from LAION-5B, a data set
with more than 6 billion images, photographs, renders of 3D models,
and more, each with a text description. Naturally, there's a creator
for each of these pictures. And even more naturally, nobody was credited
during the AI training. Even though Midjourney stopped using
LAION-5B after its second version, it faced a class action
copyright infringement lawsuit this year. While one could argue that a similar fate could await Stable Diffusion,
it's not under the same scrutiny due to it being free and thus not profiting
from the copyrighted material. That said, Stable Diffusion claims that
any image created with it can be used commercially, and that's where users
can be held responsible depending on their local copyright laws. Now that we understand
the AI art generator training and how both image generator tools
are operated, let's take a closer look
at the models I've mentioned. Starting with Stable Diffusion, there are many versions
of the default model by the same name. Stable Diffusion is trained
on a wide subset, so it does a good job
in almost every style, though it's not as detailed
or nuanced as Midjourney. As I've mentioned a bit earlier, Stable Diffusion's strengths come from
the community-built fine-tuned models. There are websites
with thousands of those models online, and some really creative members
of the community even managed to completely transform
a video into a pretty epic animation using Stable Diffusion. Father always said that you
are the more artistic one. Have you forgotten you're my twin brother? The creator dealt us
the same hand of cards. I just played mine better. Midjourney is not that customizable at all and relies on a single,
constantly updated model. However, due to the more
in-depth training, Midjourney has images of better quality, which come much closer to the prompt
compared to Stable Diffusion. I have never felt the need
to specify my prompt too much or use a negative prompt
to clarify what I don't want to see in my generated pictures. On the other hand, with Stable Diffusion,
negative prompt is pretty much mandatory to avoid generating
nightmare fuel imagery. Talking about nightmares? Midjourney has a strictly enforced ban
on any explicit imagery. Naturally, an open source
Stable Diffusion has no such restrictions. And yes, there are even specific models designed to create heavily
not safe for work imagery. One more thing I'd like to cover is the copyright aspect
of AI art generators. Do you think it's possible
to copyright AI art? Think about that for a moment. All right, time's up. If you said yes, then you are wrong. But don't worry, the guys
that said no are also wrong. In reality, it depends. As of August 2023, AI-generated art
can't be copyrighted in the US since the copyright laws only protect
works created by human beings. In a recent case, a US court ruled
that a work of art created by AI without human input cannot be copyrighted
because it lacks human authorship. But you might have noticed
a small loophole here: without any human input. Yeah, that's right. If a human artist uses AI
to generate images and then modifies or arranges
those images creatively, the resulting work may be subjected
to a copyright as an original work of art by a human artist. At the end of the day,
what's the main takeaway here? The thing to keep in mind is that
Stable Diffusion is free and flexible, but requires
more technical insight to use. Midjourney, on the other hand,
is easier to use, is trained more meticulously,
and provides a better result on average. I still believe that
the open-source approach creates a much more potent soil
to nurture the technology. But hey, only time will tell. Which AI image generator would you use? Or maybe you're using one of them already. Let me know in the comments
below this video. And if you enjoyed it,
make sure to leave a like and subscribe. I am on the warpath
to create even more interesting videos, so be sure to join me in this battle.