Understanding How AI Works is Critical to Our Privacy Defense

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I will state this clearly AI is incredibly useful for personal use it's like adding more brains to your head but without any chip without neuralink or in our case no connection to any external party and that's the only way we will use it with a privacy focus a local Ai and I guarantee this way will be safe and free if you avoid AI out of privacy fear then you will end up falling behind in knowledge compared to others and we don't want that as I discussed in Prior videos currently the way to do this is to go to al. and download olama and your desired open- Source models to your computer but in order to maximize the benefits of AI you need to understand how it works and when you do you will realize what kind of questions are suited to the AI and what are not and understand also why it's safe when using it the way I recommend the local AIS we're going to be using are large language models or llms Skynet type questions nope is it spying on you nope a local AI will not the best approach to understanding an llm is to see how it works then you can control it and manage its limitations to learn how to do this stay he right [Music] there sometime in 2017 a bunch of AI researchers tried out a new approach to doing AI which was described in a paper called attention is all you need that new way was called the Transformer architecture this is what is powering the current current llm AI Revolution the New Direction allowed for faster training and scalability because of a mechanism called attention this allowed the model to learn how to view the input data simultaneously instead of sequentially looking at each word as was done before so in simple words instead of just sequentially feeding input or a prompt to the model the model learned how to analyze input ahead of time in totality and by itself roughly 3 years after that paper the first example of the new Transformer based AI chat gpt3 showed up this appeared to simulate intelligence of an elementary school kid today we now have the promise of a coming chat GPT 5 by next year that may have PhD level skills I'm giving you technical details of the Transformer architecture so you can appreciate its strengths and limits this will allow us to control it let me set you up with something you can visualize let's imagine that the model is a Supreme Being at least it thinks it is it can make universes Galore as many as it wants and can make all the elements in the universe like galaxies clusters of stars individual Stars and planets we humans communicate with a model using words but someone figured out that a way to represent words is to give them a location in this theoretical universe and that's how it starts first the AI developers put words and word fragments in this universe in order to find these words they are initially recorded with a location and direction which in math is called a vector I describe this universe using three dimensions for Simplicity however a computer can actually store each word in a location with very dense Dimensions so consider my 3D description of this universe as an oversimplification in general vectors point to some spot in the imaginary universe of the model but here's the important thing to visualize the end goal of the model is to represent every possible occurrence of language in this imaginary universe they're not stored in this imaginary Universe randomly in a fixed location instead after training a model will move similar Concepts together this movement of data enables the model to represent complex linguistic phenomena such as semantic meaning syntactic structure and pragmatic context with a correlation to its location in the universe pretty heavy words so close by concept in this universe will be related and all we have to do is find these Concepts inside this universe to be able to simulate intelligence using this analogy we will travel through this model Universe using a spaceship Guided by maps to search for a context and we should find related contexts nearby maybe this is the way a human memory is organized we don't know but this is how a model interprets it as you will find out shortly there are actually many universes in the model A metaverse each Universe focusing on a particular Nuance of context for its training memories but the first Universe we encounter in a Transformer model is always called the embedding layer now to be specific in the case of an open- Source model like llama 3 no that 50,000 words or word fragments are stored in this embedding and initially each word is assigned some initial pre-established location in this universe this is part of the initial input data to the model and is not learned an embedding looks like this basically a grid where every word is a column and the row of values represents the vector of the word each column is a word that only appears once all words are represented either as a complete word or with word fragments there are 50,000 of them as I said and then the rows underneath each column represent its individual Vector each Vector is unique and random during the training of a model it begins to group words into actual meanings as it learns patterns from the data it trains with the same words found in the bding layer can be featured multiple times in Another Universe and again are put together in some grouping that is tied to proximity based on detected patterns of contextual connections between words this next universe is called the encoder layer except there are multiple encoder layers during the learning process each encoder layer is focused on some particular characteristic of a contextual relationship the idea is that each additional encoder layer refines the results of the prior layer by adding more and more nuances depending on the model there could be roughly a dozen ENC coning layers in a smaller model or aund layers in a larger model or maybe more now the reason the Transformer based model splits the neural network into multiple encoders is that it has figured out that this is the way to make managing large inputs more easily this allows it to analyze the input in more manageable chunks and use less resources by the way the actual data in each encoder layer is built from multiple weights and biases and these are layers of numeric matrices and are fixed in a pre-trained model like a pre-drawn map this map guides our spaceship in each model Universe depending on what we're looking for which is triggered by the input we give it this is the big picture of how a user interacts with a Transformer architecture in an llm model when we are actually querying the model we are doing inference so let me describe the flow of the model during this inference stage first there is an input token layer then there is the embedding layer then you have multiple repeats of an encoder structure then a decoder structure and then the output then there's that arrow that shows the loop back to the encoder structure as I will explain later now let me get into the detail first you would provide an input to the model which is your prompt this is stored in a token layer then from this token layer your input triggers the equivalent words in the embedding layer now it's more powerful than this because it could actually activate words it knows about in a prior conversation which I will call context memory and this includes all prior responses in its memory cache thus a lot more data could make up the input think of the input layer as really having three parts one the current input prompt two prior contacts from memory cache three words generated so far which we will get into next this is the part you need to understand the safety of a local AI the only portion of the AI with changing data is this input layer and this always starts out as empty when you start a local open- Source AI session the input data is not persistent nor is it learned now you'll be surprised by this if you didn't know but the encoder decoder layers are just there to predict what the next word should be in the response based on its combined collection of tendrils from the total collection of inputs it passes the input to a mapping function which mathematically it's made up of weights and biases that are activated based on your input then this will guide a spaceship of the model to the most relevant area in the universe of words Group by related meanings and there the spaceship will discover the best related words that fit into the current input sequence and those words are selected for output after the last encoder layer the model will go to the decoder and choose the most appropriate word based on a probability function which can be manipulated by the user using a temperature parameter to add creativity now at the end of all this the make it all clear so far we only get one word then the model will loop again through the encoder decoder layers to find the next word until the model determines that it has completed its response and stops the loop now let's talk about an important subelement which explains what the encoder actually does in the encoder layer the model makes some context sense of the next word to generate based on the accumulated information acquired from the entire input layer but don't imagine the input tokens and subsequent generated words as individual words instead think of the input layer as being constantly examined in word groups with parallel processes to find new words holding a similar contextual relationship by the way how it examines the input is also learned so the model will have optimized how to do this by itself think of a spacecraft that is searching for the next word to respond with but the spacecraft has special features it can connect active tendrils to the entire input layer either original input words context tokens or newly generated words and by doing so has realtime guidance so its map is more accurate sort of like a car navigation system with knowledge of current traffic conditions in your current route in practice there are multiple attention mechanisms and it could be different at each encoder or the decoder because the attention computations themselves are learned as part of the encoder machine learning just a few examples the attention mechanism could examine the sentence as Parts like subject verb object also the positions of words can can be determined the input tokens can be summarized entities can be identified the sentiment of the input can be analyzed but the model learned how to do this by itself it is not set up by pre-established rules so in our illustration of the Transformer architecture the attention blocks are part of the encoder but are connected to the input layer now given this understanding of General AI design with transform former architecture what are the limitations when you use a model from the AMA list let's say like a llama 3 understand that this is called a pre-trained model based on what I just described it means the organization and locations of word Concepts in the model universe will no longer change or get additional data basically it was trained with information from from some past period in time let's say 1 to 4 years ago this means a model is never up to date with current events so don't waste time asking current event questions blindly understand the concept of knowledge versus intelligence for example you could meet Sam today an MIT trained scientist who was sailing around the world for the last three years and is not up toate on science news Sam is very intelligent but Sam does not have current knowledge knowledge if you ask him a current event Science question he won't know the answer but if you gave him a quick summary of that knowledge before you ask I'm sure he'll have a good answer so trying to trick AI into responding about something it doesn't know about will result in something called hallucinations meaning it will make it up asking a model questions about Skynet plans to kill humans and world domination will reveal nothing in a pre-trained model if that data was not part of the training and likely it will hallucinate the answers will likely come from novels had read and not be based on any actual reality so be careful in interpreting this an example that will solidify your understanding of the pre-training cut off is to ask the AI a question about stock market prices if I ask the AI for the history of the stock price of Apple during the last 10 years it will end the price where it last got data and will assume that nothing has changed since so in my testing of llama 3 the Apple price was from a year ago because of the size of the data used in learning basically digitized books academic papers the entire internet and all other interactions that can be digitally recorded there is no vetting process that can be performed in advance this means it is possible for the model to receive either inaccurate data or bias data or incomplete data so private data for example the internal documents of Corporations will not be known if they are not published your personal profile and data will not be known if it is not fed to the AI training for example your Facebook interactions or your Google searches now you can overcome this limitation by supplementing its knowledge base later on bya context which I will focus on later and this could be the part that can make the AI really good or really evil pretty much instantly but you need to understand the limits of its learning larger models often give better results in the context of my analogy of the AI encoder Universe the more contextual subdivisions of ideas with deeper nuances then the Richer the answers are from the model this explains the failure of the trans former model in the chat gpt3 stage it didn't have enough knowledge the size of current models is constantly growing and often changing within a couple of months so this is a short-term issue in today's lingo the size of the models addressable space is correlated to the parameter count smaller open- Source models for local use are typically 7 billion parameters larger open source models are 70 billion parameters and the large Cloud models have trillions of parameters later I will explain how you can overcome the limitations of smaller models using the current context models are just existing in a theoretical Universe they are Frozen with fixed data remember that fixed data that is the current state of AI they don't even know what model they are they certainly don't know they are running on Ama or that they're open source the model doesn't know AMA because it was created after the model did its learning they are not connected to the internet by itself they cannot connect to your computer clock they do not even see what their parameters are when you run them so if you ask questions that expect some awareness the model will hallucinate it will invent answers however you can control this for example in my chatbot I will always include clude the current local date and time in my context so the AI can respond with comments like 2 years ago or even tell you what time it is now which will be based on what I told it last so if you're going to ask a question about something that requires the AI to have situational awareness then provide that data to the AI if you need to refer to answers about it for example some cloud-based models will acquire context from you automatically by asking you to take screenshots a very important element in AI use now is the token limit actually there are two limits to consider one is the input token limit this is how many words you can include in your prompt and there's another limitation and that is the context token limit the context token limit includes prior conversations and it's the more important value this context limit is extremely important because even a smaller 7 billion parameter model can be made very powerful if you give it additional context as part of your prompt this is like giving Sam the scientist the latest news to bring him up to date and this will allow the model to add the current data to its knowledge base and respond more intelligently just as an example let's say you are reading a brand new app academic paper on AI you can preing EST this paper in your local Ai and then start asking questions about it there are technical terms for explaining what this is called including retrieval augmented generation rag or fine tuning depending on how it is done but the end explanation is very simple it's a way to pass knowledge to the AI without having to go through an expensive machine learning the idea by the way is to pass it only only what's relevant to your question otherwise the context limit May apply if the AI has enough space it can pre-process data ahead of your prompt and then you can use that temporary learning in your future prompts in the current session so in essence it's like learning on the go the pre-trained model may be out of date but this is solved by adding this information as context during a session there's a way to do limited find tuning that does not require retraining and is built into the olama project it is called Model file and allows you to copy an existing model and make minor additions like changing roles but complex fine-tuning like teaching a model all lot of detailed private company data it's not something you can do without using some Cloud tools so I will not recommend it by the way I believe the stated context limit to the Llama 3 Model is around 4 96 or 4K tokens which is around the number of words in this entire video in testing I'm going to guess the limit is smaller like half that the quent 2 model may have a larger context limit but I haven't tested that yet when the context limit is reached the model will become forgetful and may not understand the complete pass context so passing long documents will not work the newer Cloud Model solved this by having very large content context limits like 128k tokens so this is a temporary problem the higher the context limit the more usable the model will be by having the ability to augment its knowledge via context it could really be as simple as saying to the model I will pass you this document as a context only you do not need to respond and then follow that up with a prompt or as I do in the chat box I wrote I just gave it some categorization in the input context prompt and it will figure it out this is technically called rag or retrieval augmented generation but think of it as just passing context examples of this are like passing source code in context or passing an image and then performing The Prompt I showed several examples of this in my last video in actual use now something I will discuss in a future video is how a personal profile or personal data can be passed as context and result in the AI having deep knowledge of you this does not have to be part of the training to be a danger this is partly one of the approaches considered by Windows co-pilot and by Apple intelligence this is one of the reasons I have complete distrust of externally controlled AI someone can manipulate the context and thus possibly manipulate the user for example say you are antiva and you query the AI about that topic an external party can alter the return information based on who you are and reward things to fit your thought process by adding a supplementary context or fine-tuning this can be applied to politics or any current event this is why I only want to use a local AI it won't happen in the way we use the AI models here a model is typically sensored by doing fine-tuning and this is more detailed than what we can do with a local model using AMA you actually have to do some machine learning so we typically receive a model with censorship built in what this is it actually is like another encoder layer added to the inference sequence which can alter the weights and biases in the original fixed model while the model developer may try to include everything in the censorship you will find that you can reason with the AI to bypass it in fact the censorship rules are just instructions given to not allow certain questions so if you have the wherewithal to manipulate the AI you can bypass the censorship for example one approaches to imply that you are role playing this of course has to be allowed because if you're right writing a story for example it is not realistic for the story to not have some evil protagonist doing evil things in conclusion whether the model developers like it or not censorship will never be perfect now the problem with this topic is that it is so huge that it is impossible for me to fit everything in one video so after you've absorbed this we will get into some of the mechanics in later vide videos if you can't wait use the. a site to download the local AI in easy steps then I'll give you the link to the current iteration of my chatbot built using python it will be in my GitHub page which will be in the description it's very simple to run since it's a single file called my ai. py and I run it using vs code which really makes the UI more workable I'll have to go deeper in how to use the chatbot more effectively in later videos as I will be changing it over and over in the near future but I want to be clear that my chatbot is local only no internet connectivity required outside of accessing the local Port Local Host colon 11434 and then AMA itself accesses another random Port made by the actual Transformer module called llama.com I've just explained to you how the Transformer works so that's what llama CPP does both these ports are accessible only inside your home network so there's no fear of someone connecting to your AI from the internet don't worry about that I've checked the seource code on all of Alama and also the Transformer at llama.com with HQ anywhere and the model does not retain your context outside of the tools I built for you to maintain your own context with the chatbot I hope this gives you a conceptual starting point again leave comments if you want me to dig deeper into some of these elements folks as we switch to an AI driven world what I've been teaching about privacy seems to have even more importance we need to stop the AI from knowing you personally because if it does you can be manipulated by it fortunately we can stop that with products I've already made and products that support this channel we have the Google phones running aosb that do not pass information to Big Tech and are not directly connected to Big Tech we have a bra virtual phone product which allows you to have inexpensive phone numbers that you can use to keep your identity away from Big tech for future AI intelligence we have Brax mail that keeps your identity private so it cannot be harvested for AI data later on by big Tech we have btes VPN and bra router that hide your IP addresses which is a major identifier that can be harvested to identify our past actions all these products can be found on my social media platform bra me this platform is a place for people to discuss privacy and over 100,000 people are there talking about privacy issues there a store on there with all these products come visit us chat with the community and support what we do thank you for watching and I'll see you next time
Info
Channel: Rob Braxman Tech
Views: 11,446
Rating: undefined out of 5
Keywords: internet privacy, tech privacy, privacy, de-googled phones, brax2 phone
Id: tEwYpbj-td4
Channel Id: undefined
Length: 29min 14sec (1754 seconds)
Published: Wed Jul 10 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.