I'm running something called private
ai. It's kind of like chat GPT, except it's not. Everything about it
is running right here on my computer. Am I even connected to the internet? This is private contained and my data
isn't being shared with some random company. So in this video I
want to do two things. First, I want to show you how to set this up. It is ridiculously easy and fast to run
your own AI on your laptop computer or whatever. It's this is free, it's amazing. It'll take you about five minutes and
if you stick around until the end, I want to show you something even
crazier, a bit more advanced. I'll show you how you can connect
your knowledge base, your notes, your documents, your journal entries to your own
private GPT and then ask it questions about your stuff. And then second, I want to talk about how private AI is
helping us in the area we need help Most. Our jobs, you may not know this, but not everyone can use chat GBT
or something like it at their job. Their companies won't let them mainly
because of privacy and security reasons, but if they could run their own
private ai, that's a different story. That's a whole different ballgame and
VMware is a big reason. This is possible. They're the sponsor of this video and
they're enabling some amazing things that companies can do on-Prem in their
own data center to run their own ai. And it's not just the cloud man,
it's like in your data center. The stuff they're doing is crazy. We're
going to talk about it here in a bit, but tell you what, go ahead and do
this. There's a link in the description. Just go ahead and open it and take a
little glimpse at what they're doing. We're going to dive deeper, so just go ahead and have it open right
in your second monitor or something or on the side or minimize. I
don't know what you're doing. I dunno how many monitors you
have. You have three Actually, Bob, I can see before we get started,
I have to show you this. You can run your own private ai. That's
kind of uncensored. I watch this, So yeah, please don't do
this to destroy me. Also, make sure you're paying attention
at the end of this video, I'm doing a quiz and if you're one of
the first five people to get a hundred percent on this quiz, you're getting
some free coffee network. Chuck Coffee. So take some notes,
study up. Let's do this now real quick, before we install a
private local AI model on your computer, what does it even mean? What's
an AI model? At its core, an AI model is simply an artificial
intelligence pre-trained on data we provided. One you may have
heard of is open AI's Chat GBT, but it's not the only one out
there. Let's take a field trip. We're going to go to a website
called hugging face.co. Just an incredible brand
name. I love it so much. This is an entire community dedicated
to providing and sharing AI models and there are a ton. You're about
to have your mind blown. Ready? I'm going to click on models up here. Do
you see that number? 505,000 AI models. Many of these are open and free
for you to use and pre-trained, which is kind of a crazy
thing. Let me show you this. We're going to search for
a model named Llama two, one of the most popular models out
there. We'll do LAMA two seven B. Again, I love the branding. LAMA two is an AI model known as
an LLM or large language model, open AI's Chat. GPT is
also an LLM. Now this LLM, this pre-trained AI
model was made by meda, AKA Facebook and what
they did to pre-train. This model is kind of insane and the fact
that we're about to download this and use it even crazier, check this out
if you scroll down just a little bit, here we go. Training data. It was trained by over 2 trillion
tokens of data from publicly available sources. Instruction data sets over
a million human annotated examples, data freshness. We're talking
in July, 2023. I love that term. Data freshness and getting
the data was just step one. Step two is insane because this
is where the training happens. Mata to train this model put together
what's called a super cluster. It already sounds cool, right?
This sucker is over 6,000 GPUs. It took 1.7 million GPU hours to
train this model and it's estimated it costs around $20 million to train
it and now made is just like, here you go kid. Download this
incredibly powerful thing. I don't want to call it a being
yet. I'm not ready for that, but this intelligent source of information
that you can just download on your laptop and ask it questions, no internet required and this is just
one of the many models we could download. They have special models like
text to speech, image to image. They even have uncensored ones. They have
an uncensored version of a llama too. This guy George Sung, took this model and fine tuned
it with a pretty hefty GPU, took him 19 hours and made it to where
you could pretty much ask this thing. Anything you wanted, whatever
question comes to mind, it's not going to hold back. Okay, so how did we get this fine tuned
model onto your computer? Well, actually I should warn you, this
involves quite a bit of llamas, more than you would expect. Our
journey starts at a tool called O Lama. Let's go ahead and take a field
trip out there real quick. We'll go to O lama.ai. All we'll have
to do is install this little guy, Mr. Alama, and then we can run a ton of different
LLMs Llama two Code Llama told you lots of llamas and there's others that are
pretty fun like Llama two Uncensored or Llamas. Tdrl. I'll show you in a second.
But first, what do we install alama on? We can see right down here that we
have it available on macOS and Linux, but oh bummer, windows coming soon. It's okay because we've got WSL,
the Windows subsystem for Linux, which is now really easy to set up. So we'll go ahead and click on
download right here from os. You'll just simply download this
and install like one of your regular applications for Linux.
We'll click on this. We got to fun curl command that will
copy and paste now because we're going to install WSL on Windows. This will
be the same step. So Mac OS folks, go ahead and just run that installer.
Linux and Windows folks, let's keep going. Now, if you're on Windows, all you have to do now to get WSL
installed is launch your Windows terminal. Just go to your search bar and search
for terminal and with one command it'll just happen. It used to be so much
harder, which is WSL dash dash install. It'll go through a few steps.
It'll install Ubuntu as default. I'll go ahead and let that do
that. And boom, just like that. I've got Ubuntu 22 0 4 3 lts installed
and I'm actually inside of it right now. So now at this point, Linux
and Windows folks, we converged. We're on the same path.
Let's install alama. I'm going to copy that curl
command that alama gave us, jump back into my terminal, paste
that in there and press enter. Fingers crossed, everything should be
great. Like the way it is right now, it'll ask for my pseudo password and
that was it. Oh, LAMA is now installed. Now this will directly apply to
Linux people and Windows people. See right here where it says Nvidia
GPU installed. If you have that, you're going to have a better time
than other people who don't have that. I'll show you here in a second.
If you don't have it, that's fine. We'll keep going. Now let's run an
LLM. We'll start with llama two. So we'll simply type in, oh Lama run, and then we'll pick one llama
two and that's it. Ready, set go. It's going to pull the manifest. It'll then start pulling down
and downloading Llama two. And I want you to just realize this,
that powerful LAMA two pre-training, we talked about all the money and
hours spent. That's how big it is. This is the 7 billion
parameter model or the seven B. It's pretty powerful and we're about to
literally have this in the palm of our hands in like 3, 2, 1. Oh,
I thought I had it. Anyways, it's almost done. And boom, it's done. We've got a nice success message
right here and it's ready for us. We can ask you anything.
Let's try what is a pug? Now the reason this is going
so fast, just like a side note, is that I'm running A GPU
and AI models love GPUs. So lemme just show you real quick. I did install alama on a Linux
virtual machine and I'll just demo the performance for you real quick. By the
way, if you're running a Mac with an M1, M two or M three processor, it actually
works great. I forgot to install it. I got to install it real quick and
I'll ask you that same question. What is a pug? It's going to
take a minute, it'll still work, but it's going to be slower on CPUs and
there it goes. It didn't take too long, but notice it is a bit slower. Now if you're running WSL and you know
have an Nvidia GPU and it didn't show up, I'll show you in a minute how you can
get those drivers installed. But anyways, just sit back for a minute, sip your coffee and think
about how powerful this is. The tinfoil hat version of me
stinking loves this because let's say the zombie apocalypse happens, right?
The grid goes down, things are crazy, but as long as I have my
laptop and a solar panel, I still have AI and it can help
me survive the zombie apocalypse. Let's actually see how that would
work. It gives me next steps. I could have it help me with the water
filtration system. This is just cool, right? It's amazing. But can
I show you something funny? You may have caught this
earlier. Who is network? Chuck? What? Dude, I've always
wanted to be Rick Grimes. That is so fun, but seriously,
it kind of hallucinated there. It didn't have the correct information. It's so funny how it mixed the
zombie apocalypse prompt with me. I love that so much. Let's try
a different model. I'll say bye. I'll try a really fun one
called mytral. And by the way, if you want to know which ones you
can run with Llama, which LLMs, they get a page for their models right
here and all the ones you can run, including llama two,
uncensored Wizard Math. I might give that to my kids
actually. Let's see what it says. Now who is Network Chuck? Now my name is not Chuck Davis and my
YouTube channel is not called Network Chuck on Tech. So clearly the data this thing was trained
on is either not up to date or just plain wrong. So now the question is cool, we've got this local private ai,
this LLM, that's super powerful, but how do we teach it the
correct information for us? How can I teach it to know
that I'm network Chuck,
Chuck Keith, not Chuck Davis, and my channel is called Network Chuck. Or maybe I'm a business and I want it
to know more than just what's publicly available because sure, right
now if you downloaded this lm, you could probably use it in your job, but you can only go so far without it
knowing more about your job. For example, maybe you're on a help desk. Imagine if you could take your help
desk's knowledge base, your IT procedures, your documentation. Not only that, but maybe you have a database
of closed tickets, open tickets. If you could take all that data and
feed it to this LLM and then ask it questions about all of
that, that would be crazy. Or maybe you wanted to help troubleshoot
code that your company's written. You could even make this LM
public facing for your customers. You feed information about your product
and the customer could interact with that chat bot you make. Maybe this is all possible with a process
called fine tuning where we can train this AI on our own proprietary
secret private stuff about our company or maybe our lives or
whatever you want to use it for, whatever use case is, and this is fantastic because maybe before
you couldn't use a public LLM because you weren't allowed to share your
company's data with that LLM, whether it's compliance reasons or you
just simply didn't want to share that data because it's secret.
Whatever the case, it's possible now because
this AI is private, it's local and whatever
data you feed to it, it's going to stay right there in a
company. It's not leaving the door. That idea just makes me so excited
because I think it is the future of AI and how companies and individuals
will approach it. It's
going to be more private. Back to our question though,
fine tuning, that sounds cool. Training and AI on your own
data, but how does that work? Because as we saw before with
pre-training a model with mata, it took them 6,000 GPUs
over 1.7 million GPU hours. Do we have to have this massive
data center to make this happen? No. Check this out, and this is such a fun
example, VMware, they asked chat GPT, what's the latest version
of VMware vSphere? Now the latest chat GPT
knew about was vSphere 7.0, but that wasn't helpful to VMware because
their latest version they were working on chat hadn't been released yet. So it wasn't public knowledge
was vSphere eight update too. And they wanted information like this
internal information not yet released to the public. They wanted this to be available to
their internal team so they could ask something like chat GBT, Hey, what's
the latest version of vSphere? And they could answer correctly. So to do what VMware is trying to do
to fine tune a model or train it on new data, it does require a lot. First of all, you would need some
hardware servers with GPUs. Then you would also need a bunch of
tools and libraries and SDKs like PyTorch and TensorFlow, pandas, MPI side
kit, learn transformers and fast ai. The list goes on. You need lots of tools and resources
in order to fine tune an LLM. That's why I'm a massive fan of
what VMware is doing right here. They have something called the
VMware private AI with Nvidia, the gajillion things I just listed
off. They include in one package, one combo meal, a recipe of
ai, fine tuning goodness. So as a company it becomes a bit easier
to do this stuff yourself locally. For the system engineer you have on
staff who knows VMware and loves it, they could do this stuff, they could implement this and the data
scientists they have on staff that will actually do some of the fine tuning,
all the tools are right there. So here's what it looks like to fine tune
and we're going to kind of peek behind the curtain at what a data
scientist actually does. So first we have the infrastructure
and we start here in vSphere, VMware. Now if you don't know what vSphere
is or VMware, think virtual machines, you got one big physical server. The
hardware, the stuff you can feel, touch and smell. You haven't smelled
the server, I dunno what you're doing. And instead of installing one operating
system on them like Windows or Linux, you install VMware's, EA XI, which will then allow you to virtualize
or create a bunch of additional virtual computers. So instead of one computer, you've got a bunch of computers all
using the same hardware resources. And that's what we have right here.
One of those virtual computers, a virtual machine. This by the way is one of their special
deep learning VMs that has all the tools I mentioned and many, many more
pre-installed, ready to go. Everything a data scientist could love. It's kind of like a surgeon walking in
to do some surgery and like their doctor assistants or whatever have
prepared all their tools. It's all in the tray laid out
nice and neat to the surgeon. All he has to do is walk
in and just go scalpel. That's what we're doing
here for the data scientist. Now talking more about hardware, this guy has a couple Nvidia GPUs assigned
to it or pass through to it through a technology called PCIE Passthrough.
These are some beefy GPUs. I notice they are V GPU for virtual GPU
similar to what you do with the CPU, cutting up the PU and assigning some
of that to a virtual CPU on a virtual machine. So here we are in data scientists
world. This is a Jupiter notebook, a common tool used by a data scientist, and what you're going to see here is a
lot of code that they're using to prepare the data, specifically the data that they're
going to train or fine tune the existing model on. Now we're not
going to dive deep on that, but I do want you to see
this, check this out. A lot of this code is all about getting
the data ready. So in VMware's case, it might be a bunch of the knowledge
base product documentation and they're getting it ready to be fed to the LLM.
And here's what I wanted you to see. Here's the dataset that we're training
this model on. We're fine tuning. We only have 9,800 examples that we're
giving it or 9,800 new prompts or pieces of data. And that
data might look like this, like a simple question or a prompt and
then we feed it the correct answer and that's how we essentially
train ai. But again, we're only giving it 9,800 examples, which is not a lot at all and is
extremely small compared to how the model was originally trained. And I point that out to say that we're
not going to need a ton of hardware or a ton of resources to fine tune this model. We won't need the 6,000 GPUs we needed
for MATA to originally create this model. We're just adding to it, changing some things or fine tuning it
to what our use case is and looking at what actually will be changed
when we run this and we train it, we're only changing 65 million parameters,
which sounds like a lot, right? But not in the grand scheme of things
of like a 7 billion parameter model. We're only changing 0.93% of the model. And then we can actually
run our fine tuning, which this is a specific technique in
fine tuning called prompt tuning where we simply feed up additional prompts with
answers to change how it'll react to people asking you questions. This process will take three to four
minutes to fine tune it because again, we're not changing a lot and that is
just so super powerful and I think VMware is leading the charge with private ai. VMware and Nvidia take all the guesswork
out of getting things set up to fine tune an LLM. They've
got deep learning VMs, which are insane VMs that
come pre-installed with
everything you could want everything a data scientist
would need to find tune an LLM. Then Nvidia has an entire suite
of tools sensor around their GPUs, taking advantage of some really exciting
things to help you fine tune your lms. Now there's one thing I didn't talk about
because I wanted to save it for last. For right now it's this right
here, this vector database, post gray SQL box here. This is something called rag and it's
what we're about to do with our own personal GPT here in a bit. Retrieval,
augment the generation. So scenario, let's say you have a database of
product information, internal docs, whatever it is, and you haven't fine
tuned your LLM on this just yet. So it doesn't know about it. You
don't have to do that with rag. You can connect your LLM to
this database of information, this knowledge base and
give it these instructions. Say whenever I ask you a question about
any of the things in this database, before you answer, consult the database, go look at it and make sure
what you're saying is accurate. We're not retraining the LLM, we're
just saying, Hey, before you answer, go check real quick in this database to
make sure it's accurate to make sure you got your stuff right.
Isn't that cool? So yes, fine tuning is cool and training
an LLM on your own data is awesome, but in between those
moments of fine tuning, you can have rag set up where
it can consult your database, your internal documentation and give
correct answers based on what you have in that database. That is so stinking cool. So with VMware private AI
foundation with nvidia, they have those tools baked right in
to where it just kind of works for what would otherwise be a very complex setup.
And by the way, this whole rag thing, like I said earlier,
we're about to do this, I actually connected a lot of my notes
and journal entries to a private GPT using RAG and I was able to talk
with it about me asking it about my journal entries and answering questions
about my past. That's so powerful. Now, before we move on, I just want to highlight the fact that
Nvidia with their Nvidia AI enterprise gives you some amazing and fantastic
tools to pull the LLM of your choice and then fine tune and customize and deploy
that LLM. It's all built in right here. So VMware Cloud Foundation, they provide the robust infrastructure
and NVIDIA provides all the amazing AI tools you need to develop
and deploy these custom LLMs. Now it's not just Nvidia, they're
partnering with Intel as well. So VMware is covering all the
tools that admins care about. And then for the data
scientists, this is for you. Intel's got your back data analytics, generative AI and deep learning tools
and some classic ML or machine learning. And they're also working with IBM, all
you IBM fans. You can do this too. Again, VMware has the admin's back. But
for the data scientist, Watson, one of the first AI things I ever
heard about Red Hat and OpenShift, and I love this because what VMware
is doing is all about choice. If you want to run your own
local private ai, you can. You're not just stuck with one of the
big guys out there and you can choose to run it with Nvidia and VMware,
Intel and VMware, IBM and VMware. You got options. So there's
nothing stopping you. It's not for some of the bonus section
of this video and that's how to run your own private GPT with your own
knowledge base. Now, fair warning, it is a bit more advanced,
but if you stick with me, you should be able to get this up and
running. So take one more sip of coffee. Let's get this going. Now, first of
all, this will not be using a lama. This will be a separate project
called Private GPT. Now disclaimer, this is kind of hard to do.
Unlike VMware private ai, which they do it all for you, it's a complete solution for companies
to run their own private local ai. What I'm about to show you is not that
at all. No affiliation with VMware. It's a free side project. You can try just to get a little taste
of what running your own private GPT with rag tastes like. Did I do
that right? I don't know. Now L Martinez has a great doc on
how to install this. It's a lot, but you can do it. And if
you just want a quick start, he does have a few lines of code for
Linux and Mac users. Fair warning, this is CPU only. You can't really
take advantage of RAG without A GPU, which is what I wanted to do. So
here's my very specific scenario. I've got a Windows PC with an
NVIDIA 40 90. How do I run this? Linux-based project. WSL, and I'm so
thankful to this guy Emelia Lance a lot. He put an entire guide
together of how to set this up. I'm not going to walk you through every
step because he already did that link below, but I seriously need to buy
this guy a coffee. How do I do that? I don't know, Emil, if you're
watching this, reach out to me. I'll send you some coffee. So anyways, I went through every step from installing
all the prereqs to installing NVIDIA drivers and using poetry to handle
dependencies, which poetry is pretty cool. I landed here. I've got a private local working private
GPT that I can access through my web browser and it's using my GPU,
which is pretty cool. Now, first I try a simple document upload, got this VMware article that details
a lot of what we talked about in this video. I upload it and I start asking
you questions about this article. I tried something specific like show me
something about VMware AI market growth. Bam, it figured it out,
it told me. Then I'm like, what's the coolest thing
about VMware private ai? It told me I'm sitting here chatting
with a document, but then I'm like, let's try something bigger. I
want to chat with my journals. I've got a ton of journals on markdown
format and I want to ask you questions about me. Now this specific step
is not covered in the article. So here's how you do it. First, you'll want to grab your
folder of whatever documents
you want to ask questions about and throw it onto your machine. So I copied over to my WSL machine and
then I ingested it with this command once complete and I ran private GPT. Again, here's all my documents and
I'm ready to ask it questions. So let's test this out. I'm going
to ask it what did I do in takayama? So I went to Japan in November of 2023.
Let's see if you can search my notes, figure out when that was and what I did. That's awesome. Oh my goodness. Let's see, what did I eat in Tokyo? How cool is that? Oh my gosh,
that's so fun. No, it's not perfect, but I can see the potential here.
That's insane. I love this so much. Private AI is the future and that's why
we're seeing VMware bring products like this to companies to run their own
private local AI and then make it pretty easy. If you actually did that private
GPT thing, that little side project, there's a lot to it. Lots of tools you
have to install, it's kind of a pain. But with VMware, they kind of cover everything like that
deep learning VM they offer as part of their solution. It's got all the
tools ready to go. Pre-baked again, you're like a surgeon just
walking in saying scalpel. You got all this stuff right there. So
if you want to bring AI to your company, check out VMware private AI link below
and thank you to VMware by Broadcom for sponsoring this video. You made it to
the end of the video time for a quiz. This quiz will test the knowledge you've
gained in this video and the first five people to get a hundred percent on this
quiz will get free coffee from Network Chuck Coffee. So here's how
you take the quiz right now. Check the description in your
video and click on this link. If you're not currently signed into the
academy, go ahead and get signed in. If you're not a member, go ahead
and click on sign off. It's free. Once you're signed in, it will take you to your dashboard showing
you all the stuff you have access to with your free academy account.
But to get right back to that quiz, go back to the YouTube video, click on that link once more and
it should take you right to it. Go ahead and click on start now and
start your quiz. Here's a little preview. That's it. The first five to get
a hundred percent free coffee. If you're one of the five, you'll know because you'll
receive an email with free coffee. You got to be quick, you got to be smart.
I'll see you guys in the next video.