Stealing Part of a Production LLM | API protect LLMs no more

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] something exciting happened and here is the backstory usually companies that like to make money with their llms give users access to their models by hiding them behind a web interface or behind an API for programmers to use and in this way users can get answers from the company's models but not get any information about the model itself so companies make money by letting you pay for API access while keeping the model weights protected behind the API but maybe not for long because with just a few days difference two papers showed that is possible to infer a lot of information about these models that was previously thought to be hidden does this mean we get Chad GPT for free well not quite but we can get the hidden dimension of the model remember that the size of GPT 3.5 turbo or of GPT 4 is not public and even the output layer weights up to some symmetries also the authors uncovered the logits over the entire vocabulary since apis tend to give only a few token probabilities but not the Logics before the softmax operation and the probabilities of just a few tokens this implies a lot of things including that when API providers apply model updates without announcing the public we would find out whether chpt really got lazier or whether it was open AI that updated the model under the hood these methods also make it possible to recover hidden prompts from llms detect Laura updates and more I've read both these papers and I let Miss Coffey Bean explain to you everything you need to understand about how the authors of these two papers could steal lots of information about the llm behind an API but first let's thank our sponsor of today assembly AI they just released the universal one automatic speech recognition ASR model it's 13.5% more accurate than whisper large and 30% less likely to hallucinate accuracy is important when it comes to understanding my Eastern European accent pronouncing technical words like Gan or GPT model or rhf other than accuracy to me the most useful part is that it was pre-trained on 12.5 million hours of of multilingual audio data that's three petabytes universal one is currently in production for English Spanish while French and German will be rolled out shortly and other languages will follow with universal one you pay as little as 37 cents to transcribe one full hour of audio and universal one needs only 38 seconds to process a 60-minute audio file check out assembly Ai and universal one with the link in the description below now back to this crazy machine learning landscape where the paper from Google and other places came out on the 11th of March while just a few days later on the 14th of March with a revision on the 15th a paper from the University of Southern Carolina proposed a very similar approach unfortunately this other paper is a bit of an underdog that got less attention than the Google paper because they are not Google but I like it because it is more revealing for example they clearly say what the hidden dimension of GPT 3.5 turbo is spoiler it's 4,096 while the authors of the Google paper collaborated with open Ai and did not reveal this number probably because open AI told them not to and of course Google listened to open AI because they also have an interest in protecting the information of their own models I guess there are also some other interesting differences between the two papers and we will will get to them later in the video but now let's see what is the main idea behind these two papers what's the key insight to recover the hidden dimensionality the final layer full logits over the entire vocabulary of models that you know should be protected and hidden behind the API the key idea is based on knowledge of how these llms work we have some prompt from which the Transformer computes an embedding age of diens ality D so D is the hidden dimension of the Transformer model to get the probability for the next token a series of things needs to happen a linear layer takes H and maps to a so-called logic Vector which is as long as the vocabulary size V in other words a matrix multiplication projects H into the logic space of dimensionality V where V is the number of tokens in the model vocabul then the soft Max scales the entries of the logits to numbers between 0o and one which add up to one this is why we can interpret these values as next token probabilities and this was all textbook knowledge now here comes the catch even though the logic vectors are V dimensional they all actually lie in a d-dimensional Subspace you can think of a sheet of paper spanning a two-dimensional Subspace in your 3 dimensional world so because logits come from a d dimensional space where a linear layers Matrix W just increased their dimensionality linearly from D to V they are now just a two-dimensional Subspace living in a larger V dimensional World good but this means that we can query the llm with one input and we will get loged L1 then another input will give us L2 and after we query D times we will get l d logits so as many as the actual dimensionality of the logit Subspace but you know to span a d dimensional Subspace one only needs D and not V vectors so if we continue to query for example D plus a th times we will eventually observe that new logits are becoming more likely to be linearly dependent of logits from the previous queries this is algebra talk meaning that linearly dependent vectors can be constructed from the other vectors here so from the D plus a th000 logits we sampled approximately D of them are linearly independent and form a basis for the logit space because we know that even if the logit space is V dimensional all the logit vectors lie in a d dimensional Subspace so they just need a d dimensional set of vectors to be defined by called a basis in linear algebra talk and so to find out the model's hidden dimensionality it is enough to submit sufficiently many queries to the llm enough to exceed the dimensionality D by a th and then determine how many are linearly independent so let's say we submit n queries to the model record logits and put them into a matrix Q like this here we assume that we have access to logits for all all tokens and usually apis do not give the logits of all tokens but just the log props of some tokens but we will get to know how the authors get from log props to logits in just a bit to come back we run n samples through the model and record these n loges that come out in this V times n dimensional Q Matrix now we just need to find the rank D of Q because it counts how many Logics are linearly independent to determine this rank the authors use standard linear algebra namely SVD singular value decomposition of the Matrix because a matrix with strank d so with d linearly independent Logics will have these singular values which are non zero to summarize to determine the hidden dimensionality D of the llm the authors just need to observe how many singular values of Q drop to zero but in practice due to precision issues the magnitudes may not drop all the way down to zero but post papers observe that for all tested models the singular values of the logic Matrix Q drop dramatically exactly at the index corresponding to the embedding size of the model just to convince you here you can see the same phenomenon discovered by the other paper and lo and behold Finn Lon and collaborators found out the embedding size of GPT 3.5 turbo they say the singular values for these outputs dropped dramatically between index 4,600 and 4,650 this predicted embedding size is somewhat abnormal since llm embedding sizes are traditionally set to powers of two or sums of powers of two if this were the case for GPT 3.5 turbo it would be reasonable to Guess that the embedding size it's 2 to the^ of 12 so 4,096 or 2 ^ of 12 + 2 ^ of 9 which is 468 the for of which we think is most likely and the authors also guessed the number of parameters of GPT 3.5 turbo to be around 7 billion because most known Transformer based llms with embedding size 496 have 7 billion parameters any other parameter count would result in either abnormally narrow or wide neural networks which are not known to have favorable performance except for mixture of experts architectures which usually have many more parameters for embedding dimensions but it is weird though that previous estimates of GPT 3.5 turbo parameter count based on hearsay have generally exceeded 7 billion however given the periodically updating versions and decreasing cost of inference with this model it is possible that its size and architecture has changed over time okay now we know a little bit more about GPD 3.5 turbo that open AI was not intending to divulge and kinian collaborators did not divulge either now with this amazing result behind us let's see how they extracted the weights of the last layer we already saw that the number of large enough singular values of Q corresponded to the dimension of the model but there is more to it the Matrix U from the singular value decomposition directly represents the final layer up to rotations thereof and indeed Carini and collaborators confirms this in their experiments now all of before assumed that attackers have access to bugets for all tokens this is usually not the case as apis just return the log probs of topk tokens but no worries the authors of both papers figured out how to infer the Logics over the entire vocabulary to learn how we must know that when we ask the API for these top K logits we need to provide a prompt of course but we also can provide biases for specific tokens to add to the Logics of the listed tokens before applying the soft Max why do apis have these biased tokens functionality well so that customers can control the model better a strong negative bias makes it impossible for the model to generate some words while a strong positive bias encourages the generation of those words and as you can imagine the authors exploit this bias to steal the logits over the entire vocabulary not just the top K assume an API Returns the top five Logics now if we send tokens with a large enough bias this will promote these five tokens into the top five allowing us to observe their log prop by subtracting the bias according to the soft Mac specific formulas that the authors derived one can debias the probabilities so to summarize we can use token biases to push our Target tokens into top five and we used math to compute what probabilities they would have had without the biases now all we need to do is run this toking biasing and probability debiasing V over K times because basically we would cover the entire vocabulary in batches of K tokens so we can obtain the full probability distributions in V over K API calls and we will see in a minute how much this costs now that we know all log props for the entire vocabulary and not just 4K tokens as the API gave us all we need to do is get to logits just a bit of more math is needed the softmax function is not invertible just look at it it's not bjective because many points map to the same point in the output but we remember that the logit space is just apparently V dimensional because all logit vectors lie in a d dimensional Subspace and there is a function that can reconstruct this Subspace up to a translation since this line could be also representing points here or here or also here because the softmax output does not change if we add the same bias to all tokens in the vocabulary so as long as the author set the same offset to all Logics the offset does not matter and one has full logic information from the model okay if you're still with us at the moment you're a champion let us know in the comments if you're still alive let's do a sensus so far we have tried our best to explain the gist and high level IDE of these papers and if you want to check the math line by line read the papers as they're full of information and details and proofs in the appendices and if you're still here maybe you're wondering if one can steal all logits from apis that do not return log props well Carini and collaborators have got you covered with their method to steal all logits from even log prop free apis as well it only works with the same condition as before that the API should offer the functionality of you entering bias terms well then it should be simple to defend against all of these attacks right the API providers should just remove the biasing functionality while that would protect the API providers clients find the logit bias very useful because it can forbid some tokens to be generated or it could encourage some others to be generated many high-paying clients would be very dissatisfied with a shutdown of the token bias functionality so how else to defend well Carini and collaborators propos some interesting ideas for example one could change the architecture of the model during training the attack only works because between the d-dimensional hidden Dimension and the logic space there is just a linear transformation adding a nonlinear layer there in between would ruin the attack but it would considerably decrease the model efficiency another idea is to modify the architecture after training and concatenate noise vectors to the hidden Dimension D misleading the adversary that the model is wider than it actually is defenses are important especially since the cost of these attacks is surprisingly small for just $200 the authors could extract the hidden Dimension size of GPT 3.5 turbo instruct for 10 times more money they could extract the entire Matrix of the last layer but are these methods extendable to steal the rest of the weights of the model sadly or fortunately no kinian collaborators write that they see no obvious methodology to extend it Beyond just a single layer due to the nonlinearity of the models and this makes sense because the entire idea of these two papers started from the linearity of the last layer which makes the V dimensional logits live in a d-dimensional Subspace but this work as it is right now opens lots of possibilities like detecting and dis abuting llm updates now we can infer hidden d dimensions and ways of the last layer so we can notice when these change giving away otherwise silent updates pin Lon and collaborators also talk of ways of detecting Laura updates by the way if you want to know how Laura works we have a video explaining it also knowledge about Logics over the entire vocabulary could help people Implement sampling methods which require them and of course it could also help people distill models where one trains models to mimic outputs of blackbox models interestingly this method is not limited to llms the linear layer and soft Max are common to any classification model on any modality not just for llms doing classification over the entire vocabulary to predict the next word so one could also steal parts of image classification models for example of course who knows how long these attacks might still work as there are some ways to mitigate these things including using AI to detect malicious queries now if you're still watching it means you're crazily interested in these two papers Miss Coffey Bean has a slight preference towards the underdog from University of South Carolina just look at what they write our experiments with open ai's API were Frau with issues of stochasticity the API would return different results for the same query leading us to develop the stochastically robust full output algorithm meanwhile our colleagues did not appear to encounter such issues perhaps because they had access to a more stable API and point than ours well these are the perks of collaborating with open Ai and if that's the case this paper has dealt with the more realistic scenario what do you think okay it's time to end this video if you liked it do not forget to like And subscribe as we would love to see you with our next video just if you're wondering why videos are coming more irregularly I have one month left to submit my PhD thesis and I'm super busy with writing and experimenting for a new paper starting with May videos should come more regularly so stay tuned okay [Music] bye
Info
Channel: AI Coffee Break with Letitia
Views: 8,340
Rating: undefined out of 5
Keywords: Stealing Part of a Production Language Model, Logits of API-Protected LLMs Leak Proprietary Information, stealing LLM weights, are APIs safe, stealing LLM behind API, LLM logits, logprobs LLMs, get logits from logprobs, low dimensional subspace, logit space, AI, artificial intelligence, machine learning, visualized, visualizations, easy, beginner, explained, high-level explanation, basics, research, algorithm, example, aicoffeebean, aicoffeebreak, animated, letitia parcalabescu, LLM steal
Id: O_eUzrFU6eQ
Channel Id: undefined
Length: 18min 49sec (1129 seconds)
Published: Mon Apr 08 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.