Intro to LLM Security - OWASP Top 10 for Large Language Models (LLMs)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and I think I should be live I'm about a minute early so if anyone is already on here as I'm loading things up uh as always it's great if you want to let me know in the chat that you can hear me I think only one time I had my audio messed up but I always like to double check also if you're already watching um if you want to say hello in the chat and maybe put where you're watching from it's always kind of fun to see where the audience is as well I am streaming from Seattle if you've been to Seattle you can let me know and again uh I'm a little bit early on stream so we're going to get going here in a few minutes we're going to be waiting for more people to join in but as you join in uh again let me know in the chat maybe where you're watching from if you have any large language model applications you're working on and you want to share what you're working on that's also uh really interesting to hear from everyone as well or if you've had any interesting either security issues or performance issues with llms and production and you're willing to share that that would be really fun to hear as well because that's what we we will be talking about today um specifically we're going to kind of talk about the oasp uh which stands for I always forget the acronym open worldwide application security project I think uh I always want to say web instead of worldwide but I'm pretty sure it's worldwide um and they are a common kind of uh organization that people reference for their like security reports and stuff another one you might have heard of is nist um I forget what that stands for I think that's National Institute security technology maybe um I know they also fairly recently came out with their LM uh security report as well and that' be fun to maybe do another talk or workshop on on them I I don't remember how they compare I know I looked briefly at it um but today we'll be talking about the o top 10 for llms and I see um quite a few people just joined in we're going to get going here in a minute but as people are joining please feel free to introd introduce yourself in the chat and maybe put where you're watching from me get my screen sharing here again we'll actually get going here in a minute um also while people are joining in I'm going to share a few links uh we'll be going through some slides talking about the uh top 10 LM um security things that oasas uh uh has talked about and then we'll also be going through and I'll show how you can use um our open source project link kit as well as our AI observability store ylabs to kind of um uh Monitor and mitigate these uh security issues so I'm going to share some links in the chat one if you want to follow along when it gets to the Hands-On coding part you can sign up for the free yabs account there is no card or anything required you should just have to verify your email and you should be good to go I'll share the GitHub link to our open source project called link kit we're going to use that again on the Hands-On part to extract metrics from language model so we're going to look at prompts and responses get some metrics from it and we can actually use that in our um kind of security setup here that we're going to be looking at today as well as monitoring and evalu evaluating our LMS which you'll see a little bit in action today um where I'll point it out we're going to be focusing probably a little bit more on some of the metrics that you could use for security uh but we'll we'll be looking kind of at everything um on the Hands-On part and then there's going to be a collab notebook which is going to have the code where we're going to run today again we'll be doing slides and then Hands-On part so um you're also welcome to just watch but I definitely encourage you to follow along on and uh create the you know free wabs account and then follow along with a code part if you're interested about actually like implementing things in your uh pipeline or application I see Rahul said hello Sage it's great to see you again uh and excited to learn from you today awesome welcome back Rahul I know you're a a regular always great to have you here um and always happy when someone uh says hello in the chat it definitely makes it more fun for me presenting so thank you and I'm going to share these links one more time in the LinkedIn stream and then we'll get going through the slides so we're streaming on YouTube and Linkedin also if you're coming back and watching this as a recording on YouTube all these links will also be in the description below um and the recording is going to be at the same link that you're watching at right now and then on LinkedIn I go to the comments and all the links will be in the comments as well so if you want to come back to this later follow along um all those resources will be there for you either in the comments or in the description on LinkedIn all right let's go in slideshow mode so this a intro to LM security we're going to be talking about some of the top security issues from um uh some of the the the best industry reports someone said they're watching from New York City oh awesome I'm actually visiting New York City uh next week for the first time somehow I haven't made it up there before um so unrelated to our talk today if you have any recommendations for me feel free to send those my way in the chat or uh connect with me on LinkedIn I always love hearing about the the places people recommend and I'm excited to go visit there for the first time a quick introduction about my myself my name is s Elliot I'm a machine learning and mops evangelist at wabs at wabs we build Tools around Ai and ML observability and data observability um so you can Implement our open source projects and our platform um in various ways in your data pipeline or mlops Pipeline and for over the last decade I've worked in software with a lot of ml uh previously a lot of computer vision and I've also done a lot of Hardware mostly with startups um in Seattle but also some in Central Florida and in general I love making things with technology and I'm going to share my LinkedIn in the chat feel free to connect with me there and ask questions later um or or you can ask questions in the slack Channel which is a link I shared in the chat but also it's in the description below so if you want to join the slack and introduce yourself there and meet other people hopefully that'll be a good place for you to do so about you so actually this is perfect timing because I see like double the amount of people uh just joined in and so say hello in the chat uh if you want to put where you're watching from that's always great and if you want to share a little bit about what you're building with large language models that would also be great and then what type of models you know would you like to see a workshop next if you really want to uh uh share more and then again here's the slack channel so if you want to stay connected this will be a good place to do so introduce yourself in the channel there and if you have questions about anything we're going to talk about after the workshop that's a good place to ask them I won't see questions in there during the workshop I'll just see them in the chat either on LinkedIn or YouTube so if you have a question during the workshop please feel free to ask it uh but keep it in the live chat chat here and again set up for the people who just joined probably the only thing you'll really need to do right now I need to fix I don't know why I can't click this right now but the I shared the link previously in the chat um and it's in the description below all the only thing you really need to probably do that you might not already have is set up a wabs account again you just have to verify your email uh there's no card required and I'm guessing most people already have a Google account because we're going to be using Google collab on the Hands-On portion um if you want to follow along again to run our code today all right so we're going to talk a little bit about um LM Security today and okay I did get it right the open worldwide application security project or otherwise known as oasp most people I always talk to they always just call it oasp that's why I was like I actually think that's the what the acren stands for but I wasn't sure uh because I forgot so it's an online online community that produces um uh freely available articles and methodologies and documentation tools um a lot around web security so if you talk with people building um applications and they want to make sure they're secure it's a pretty common organization that is referenced by people when they're trying to hone insecurity and specifically today we're going to be talking about large language model security which I think is going to be a growing field for quite some time if you're like a security engineer or uh working in that space or working with LMS I think security is going to be a a big topic um that's going to keep evolving over time they just came out with I forget the date of this but uh their report for top 10 um security issues that you should keep in mind for large language models and like I mentioned at the beginning there's also um another organization that just came out with one not too long ago as well like they've been working on it and Publishing stuff around it and then I think they came up came out with official recommendations that the other one is n uh that people often reference so this keeps coming up in conversations and we're going to dive into what those top 10 things are and potentially some ways that you can help uh mitigate these risks and how monitoring with ylabs the company work for we're um Can potentially help with that or monitoring in general we'll see some Hands-On stuff with ylabs a little bit later after the slides someone said they're working on visual reasoning with large language models that sounds interesting I assume tying in computer vision with that and reasoning based on what what's in that image I've seen some really cool uh applications around that recently and I shared the link to this in the comments for their official report so you should definitely go look at their official one um you can download I think a whole PDF report and also you might want to keep up to date on it this is like version one or something like that and I'm sure this is going to evolve over time as more and more Elms get adopted and more people kind of understand how to make them more secure um these recommendations will probably change over time and if you have any recommendations or things that you've seen in the field also feel free to share them in the chat in the chat it's always interesting to see kind of what other issues even it's not quite security related people have run into putting large language models in production um here's a little graph also if you want to check out so we have a landing page here I'll share in the chat as well um that kind of talks about this stuff if you want to save that link for reference in the future um you can go visit that and kind of revisit some of the stuff we'll be talking about here uh but in a web page format um so the first one is prompt injections you probably have seen some stuff about this if you've been paying attention with large language models um often this is kind of also you know talk or talked about as jailbreaking um so you're you're injecting vulnerabilities into large language models often you're um you know trying to manipulate the large language model to do something that it wasn't really supposed to do and the impact ranges you know a lot uh depending on the large language model depending on the data was trained on depending on what the people are trying to do with it U for example you might may have seen like people trying to get open AI uh chat GPT to program malware in python or something like that and it says you know as a large language model I'm not going to do that for you because they have guard rails in place for that I'm going you try to get around it by asking in in in weird clever ways like making up a language and then understands that language and then and then you uh prompt it again and and it'll actually do that action um so one way to help mitigate this is detecting prompts that present a prompt injection risk or jailbreak similarity um and we'll see this again when we go to the Hands-On portion today some out of thebox metrics that are including included in our open source project linkit um includes jailbreak similarity so we'll be able to look at prompts extract this metric and then compare that for you know some sort of threshold and decide like is this a prompt injection or not and if it is you know either don't send that prompt to the model or what I've seen a lot of people actually do in production is they might send that prompt to the model just to see how their model behaves but never return that response back to the user um so it's not actually going to uh give them they're asking for and again we'll kind of see some some ways of of doing this when we look at the actual metrics and stuff that we can extract the other one is insecure output handling this comes up a lot in conversations when I talk to people building large language models um especially if they're in um I don't know like fintech or uh there's a lot of Industries and you should be careful and all of them um but this can occur when you know the the apps can basically like put out um nefarious outputs here and again you can set up monitoring to kind of identify malicious output so this could be like um there's a different ways of doing this but you know if it's either putting out code that it shouldn't be doing or has some sort of similarity score to like a jailbreak but from a response versus a input um again we'll see some metrics that we can extract to kind of look at some of the stuff here in a in a minute training data poisoning uh this is where people are now starting to sometimes they call it poisoning data um you know putting data on the web where you're kind of purposely letting open AI or not just open AI but anyone training a large language model um scrape that data and then potentially training a model on that data when it's you know has misinformation or um some wrong knowledge in it I've seen some like recently I think this week actually or last week a tool come out that purposely poisons images I believe and part of that was actually for um you know artists to to make sure that they're not just training training like a a next version of like dolly or something on their images uh but you can do this purposely to try to get a large language model to actually have bad information in it or do something bad denial of service so an attacker interacts with a a large language model in a way that's uh particular resource consuming so you might have seen this like people trying to Dos or um degrade results of models for other people using it and you can obviously see other controls built into some of these services like open AI I think limits the number of tokens you can use per minute um as well as even like per month and you can also set up moning for this so if you're seeing a big spike in token usage or latencies uh you can hopefully get an alert for that and help mitigate that pretty quickly supply chain uh this is kind of you know um around choosing the right model that's out there uh they're all trained a little bit differently they might have different guardrails in them um like if you're using crowdsource data and plug-in extensions from some model that's out there um you'll definitely probably want to be evaluating the models ac across like whatever scores are important to you like quity toxicity relevance Etc um and one thing I've seen people do right now too is bring in multiple models in production so you might start with open AI API and then you want to be training your own model or comparing it against some other open source model like llama 2 and actually have both of those running on production data maybe you're just returning a single result to the users but you can actually be monitoring both of these models over time and then choose the best model for you um either in that specific prompt situation or uh over time like a week's period or a month period or way longer than that you can be evaluating performance and then choose the right one permission issues so lack of authorization tracking between like plug they keep mentioning plugins is a big deal here and you you've seen that with open AI actually in the past disabling plugins because people you know weren't doing good things with them or wasn't doing things they wanted their models to be doing um so plugins is a big deal if it's interacting with your um models um but basically this is you know uh fixed by having correct permission is issue or correct permissions for people using the tool either with your data um or in your org you want to make sure that you know maybe they can't add a plugin when they shouldn't be doing doing that data leakage this is a huge one almost everyone I talk to right now is kind of concerned about this as well when they're putting an LM production especially if it's one that maybe they've trained um on their own data set uh you want to be monitoring for you know any sensitive information um coming out of your model is a big deal but also it's come up a lot more recently of tracking information going into your model so if you're at a company and you want to make sure that no proprietary information goes into a model somewhere um for example like if you were using a hosted model somewhere you don't want to be giving that proprietary data for them to store in their servers um so you can actually set up monitoring or guard rails on prompts Andor responses if you kind of know what you're looking for or we have some out-of thebox metrics that extract things like common pii issues like Social Security numbers phone numbers credit cards Etc um also not not quite even real data leakage per se although it might have been actually um I always mentioned about how I've had like a large language model I was building and it gave out a phone number when I never told it to give a phone number and I was able to catch it because I actually had monitoring kind of looking for pii coming out of responses ex excessive agency so kind of over or not over Reliance but um giving the large language models a lot of control of what to do um you kind of want to probably limit that um one way you also might be able to help mitigate this is having guard rails in place so if you don't want people using your large language model in a way it's not supposed to be used you could potentially add a guard rail in there um but you you want to have that kind of set up in your application or API over Reliance um you know over Reliance on Elms can lead to misinformation without because there's a lot of hallucinations so it's kind of interesting I feel like we're going to be learning a lot about how to un like kind of just have an intuition of when LMS are hallucinating like I don't know if you use chat GPT right now it's definitely something I feel like I can be like I don't know if that's right I'm going to actually go verify it um but yeah if you're you're just using that without proper oversight or you don't quite have that intuition on it um I think you know there can definitely be some big issues there if you're over relying on these results coming out with LMS without uh really evaluating it and again monitoring can help this there are ways to try to catch um some hallucinations and I'll actually share a really good blog post that goes pretty deep into this later as well and it's it's on our website if you go to ws. ablog I think it's the most recent blog actually on there insecure plugins so plugins connecting Elm to external resources can be exploited if they accept freeform text inputs again with the the plugins here um you just want to be careful what you're enabling your uh large language models to do so if you built a model and it's hosted and then you're allowing plugins uh you definitely want to be able to kind of verify what people are doing with those things so those are the top 10 um things that came out of oasp and that you should probably be thinking of if you're building a large language model you're putting in production people are actually interacting with it um you'll be wanted to probably look at all these things and we saw several of them or quite a few of them can be kind of mitigated or caught with ML monitering um so at a high level AI observability or ml moning can look something like this where you have your pipeline you have some sort of tool collecting AI Telemetry metrics so in this case our open source tool called linkit is built on our other open source tool called y logs which you can use for uh any type of data but linkit is specifically made to extract uh language data and then give that to yogs and a quick note about those open source tools is the way they work is they create privacy preserving data profiles so they'll profile your data set and we'll see what this looks like here in a minute um but then it doesn't contain your raw data anymore so it's actually just kind of summary statist statistics around your data we can use those summary statistics to put in an observability platform which is going to be ylabs today when we get to the handson part and with that data that isn't your Rod data anymore we're able to do things like create interesting reports and dashboards to see how people are using your model how your model is behaving you can set up alerts and notifications so when something does drastically change it can alert you or you can even have it trigger an automatic workflow to do something like retrain a model or anything you're going to think of kind of in your mlops and data pipeline there so uh monitoring usually just looks like you're you know having some sort of statistic that in this case we're generating with Y logs or linkit we're looking at that over time and in this case you have some sort of threshold or some sort of drift variation distribution change that you're looking for and when it goes over that threshold you know we're saying you know in this case like data drift is detected uh go do something with that so it's pretty common across all ml models today we're specifically talking about security and large language models but no matter what model you have when you put in production you're probably going to see interesting things change over over time um either about how people are using your your model or data quality or data drift um is going to happen probably at some point in production also how to improve models so you can be looking at your language metrics like we'll see today U kpis Etc and maybe nothing really bad is happening but you're seeing ways that you can actually tweak your your model like changing your system prompt or something for your llm and improving some sort of key metric over time um and you can use this for kind of all the different applications with large language models like agents chatbot summarization Q&A Etc um and I'll quickly note we're we're going to go pretty quick through the rest of the slides here but uh you can also use monitoring not just for the security purposes like we've been talking about today but also for General monitoring to see how people are interacting with your app and your models behaving over time like I've mentioned before and the observability piece of this is uh uh really great for changing like your system prompts and and uh um moning for hallucinations and this is I was actually thinking of this slide when I was uh talking about that so this is actually talking about looking at uh output validation um which we already talked about like seeing if there's a pii in there or if your llm is not following instructions for some reason um prompt engineering we can um look at these metrics over time and adjust our prompts and you know optimize we having a more positive sounding chatbot or um tracking some sort of metric and and adjusting our system prompt and then also monitoring for hallucinations like we talked about is actually one of the big things that they mentioned in OAS or any of the security reports of like if there's misinformation coming out of this can he potentially catch it and so how do we solve these problems at scale I kind of Blended what I was talking about in the last slide of this one uh with uh y laabs you can do things like set guard rails evaluation have observability um so guard rails is really great where you can extract metrics over time like we'll see here in a second and um right away look at if that prompt or profile contains something like a a jailbreak similarity and like I mentioned before maybe then you don't pass that prompt to your model or you at least don't pass the response that came out to the user you can make sure that your tax chatbot or whatever is not giving medical advice by categorizing kind of what those conversations are that's happening again uh this is is really big for security as well for jailbreaking toxicity um any kind of other similarity score that you can think of having for inputs and outputs here evaluating um a big thing for this is I mean you can evaluate and compare like how users are using your prompts but a big thing I he see here is people evaluating their system prompts and how to make their model better you can also compare models or a newly trained model with evalu evaluation as well you could do this before you put in production um and in production as well and so a lot of times I'll see people put a model out in production they might be fine-tuning a new one they want to see if it's actually better than the previous one and you could do what's called Shadow deployment where it's looking at the production data but not actually sending um the responses to the user yet but then you can compare those responses over time and select the the best model same with a prompt engineering as well and then observability you can look at like things how people are using your model over time uh is your are your responses like what is the readability of those over time is the sentiment stain where you want it to be a lot of different stats around your large language model and what I have found is the observability piece really helps me influence the rest of these like the evaluation and guard rails as well so hopefully you already kind of know some of the stuff that you want to evaluate for or set guard rails on sometimes you might experience something or see something in the observability phase where you weren't you weren't expecting that so you didn't have it like a guard rail for it yet or something and then you can go and add that guard rail in later so it's often going to be probably a little bit of an iterative process like you've seen uh chat GPT for kind of change over time um I believe that's them adding in kind of more and more guard rails allowing the model to do either more things or less things so solving this set scale I've mentioned linkit our open source Library again linkit the description or the chat if you want to go check it out on GitHub um it works by taking in prompts and responses from really any large language model so today we'll be looking at a Hands-On example using hugging face but you can use open AI you can use um any other one you can really think of as long as you can get the prompt and response in like a python dictionary or data frame format then you pass that into the language toolkit which is link kit and with that with those metrics you can do things like measure the equality sentiment a lot of the security stuff we've talked about uh yeah you so you can en force and monitor for all the things we've already talked about like response quality pii leakage toxicity sentiment Etc uh jailbreaks and uh just to reiterate it's open source so you can go look at everything that's going on in there has Integrations if it doesn't have an official integration like we have one for Lang chain for example uh but it's easy to use even if there is an official integration again you can just get those prompts and responses formatted and pass them in and it's also very extendable we have a lot of outof the-box metrics here we'll see in the next slide but you can add your own metrics very easy as well so it works like something like this you get a prompt coming in we extract these out of the box metrics or custom metrics so you can easily add in and then we use these to monitor over time for either security or evaluation purposes or just general monitoring so out of the box metrics we have things like response relevancy has patterns um by default this looks for credit card pattern social security number and phone numbers you can also add your own patterns in again with custom metric so if you are looking for a specific type of pii or some sort of sensitive information to your company it's a pretty easy to add in with a custom metric I'll uh show this later but all you really do is create a a function that returns you know a number or a label and then you add a decorate around around that after you import something from yogs and then it gets added as a a metric in here so very easy to add your own in um so it's easy to use as well you just pip install linkit in um almost any python environment here and you just Define your language schema uh or initialize it so initializing it just out of the box will give you all those language metrics we saw here um you can also add the custom ones and then initialize it and I'll add those custom ones in there the profiling piece where we extract those metrics is really just this line here we're calling y. log passing in a prompt and response and in this case it's formatted as a a a dictionary object in Python and then we're going to extract all those metrics out of it and then it's also very easy to write to ylabs where we can then uh visualize those metrics over time in this like time series view we can easily set up monitors with just a few clicks so if something does change there it can alert us or trigger a workflow we also have an inside Explorer so again we'll see this in action and the hands on part here in a second but you'll be able to auto automatically kind of gain some insights around the data going in and out of your large language model like if it has overall negative sentiment if it's imbalance if it has patterns Etc all right so that was a fair amount of slides is everyone ready to follow along or at least see this kind of hopefully fun Hands-On portion I'm going to copy this link again for the chat if you want to follow along open up this Google collab notebook and then also you'll oops sorry that was the create account so you definitely want to create a ylabs account as well um again it's free there's no card required I think you just verify your email and you'll be good to go then you'll also want to open up this Google collab account or a notebook and then we'll make a copy of that and then you don't need to do this for the workshop itself but I'll share this form if you want to fill out to get a promo code um that gives you I think it's 30 days of the Enterprise Edition which has more features um again you don't need this for the workshop but if you want to add in more colleagues and have like org structure with teams and you can add up to 10 additional projects so on the free tier you always get two for free um and then also on the free tier you can have batches um daily which we'll see here in a second but if you want hourly batches or hour hourly monitoring for large language models um you can do that with the promo code as well so this is the collab notebook and what I'm going to do and you should do this as well if you want to follow along is go to file save a copy and drive so I'll wait for everyone to do that who wants to follow along and this is my WBS account and I'll quickly note um so what we're going to do is put our own data in here which I think is pretty fun and you'll then understand how to take your own data and upload it um but if you also just want to look at like a demo um once you sign up there's this little or drop down should have access to this Glo Global demo org and you can click in and kind of see some some stuff here so I'm going to click in this demo LM real quick we can click this dashboard and we're actually on this security Tab and we can see some of the metrics that we can monitor for security a lot of a lot of which we kind of already talked about when we were covering the OAS top 10 this this first one is response has patterns so hovering over this we can actually see that there's 16 Social Security numbers detected in our output of our model um 16 email addresses and and 24 credit cards so definitely seems like potentially an issue with our large language model there um or giving out pii that we wouldn't want it to so there's a lot of security metrics that kind of get tracked over time we can set up monitors on these and that's what we're going to go see here if you're following along in the collab notebook and actually just saw a little bit of an influx of people join so I'm going to share that link one more time if uh anyone who just joined wants to follow along with the code portion we're actually just getting to it so there's a notebook that you can open up and then um I'll bring it up again when we actually get to getting the API key but you also want to create the free wabs account again there's no card or anything required um so it should just take a minute to do but let's go ahead and follow along at this collab notebook there's also a lot of links here if you want to learn more about how we calculate these scores again the project is also open source so you can go look at everything there um and then all right create the account if you haven't some links to GitHub and then again the slack channel is good to join if you want to ask questions later but let's go ahead and run this first code cell so this is just going to say Hello World um if you've never used Google collab before it's essentially Google's way of hosting a jupyter notebook uh which is a tool often used in ml and data science where you can output uh uh code you can write code output visualizations and documentation um all in one place here we ran this first code cell it took a few seconds it was initializing a little instance here you can run code cells by clicking the play button or when the cell is highlighted hitting shift enter a lot of times I'm going to be hitting shift enter so you probably won't see me clicking that play button there so let's go ahead and install the libraries we're going to use today in this example we're going to use a Transformers by hugging face um there'll be a section later where we're going to build maybe a little interface that we can intera act with it that's where we're going to use gradio for it makes building a quick interface for ML models pretty quick or very easy and then we're going to install Lan Kit which is our open source library for extracting those Lang language metrics that we were talking about and while this is installing um it should just take a minute here uh but I'll pause and ask also if there's any questions people have we able to open up the notebook we're able to create the free account if you wanted to um and I think we might get a couple pip install airs here I think it's okay if we ignore them so sometimes when you're installing libraries into Google collab uh it can be a little finicky sometimes but it it should be fine um I think it's just like whatever environment right now has a different pip dependency uh but I think for the libraries that we're using it should be okay so let's quick take a quick look at those linkit metrics that we kind of saw in our slides where we're going to extract um those metrics from The Prompt and response here and that's where o I was running wait did I run in my copy feel like I messed up something here I was running not my copy I'm going to go back real quick this might throw off people sorry a minute for me to catch up in my copy instead of me updating code in the example that everyone else has copying from I'm going to close that out my bad yeah anybody uh were you able to make the copy if you're following along does this seem interesting so far there's uh quite a few people here but the chat is extra quiet so if you want to say hello because I know I saw a lot of people join in too uh feel free to say hello in the chat and uh as always you can include where you're watching from I think it's fun to kind of see where people are streaming in from if you don't uh want to say anything else all right I think I'm caught up almost what we're going to do here in the next code cell is uh import LM metrics from linkit and Y logs as Y and then we're going to initialize the LM metrics by default um this is going to be all those out of the boox metrics that we talked about so I think there's like 33 of them um fif 15 or 16 for both The Prompt and response and then a prompt and response relevancy score that we calculate so the first time you run this it can take um a minute as well because it's downloading some lightweight models behind behind the scenes but after you run it the first time uh initializing the metrics should be very quick and then this is a a new thing that happens in Google collab I actually I thought it was for something else but I'm pretty sure it is um blocking a thirdparty widget which used to show like a progress bar when it downloaded stuff not just for linkit but for a lot of other libraries uses as well and so now every time you run it you don't get as much feedback here um I guess unless you enable third party widgets which I'm not going to do right now um but usually it would show like a progress bar there but I guess collab blocks it now that that's a new thing that just happened in the last day for me at least that i' that I've seen all right so we're going to look at um just a quick kind of toy example here of extracting metrics from 50 example chats so from uh linkit we're importing these sample chats and then we'll actually go and load strings in um but we're going to initialize our chats and then we're going to look at just I think it's the first 50 records basically see what all these extracted metrics look like here so then um and we do that by this y. log so we're passing in chats which is actually in a data frame format here but again it could be in a dictionary format with prompt and respones keys as well and then we're passing in our um initialized language metric schema to Y logs and that's what's going to give us this view so we're going to create create a view out of it and then look at it in pandas somebody said great stuff glad to hear it so now we have this uh data frame this contains all 33 of those metrics that we had talked about as well as one for the prompt and response that will give you some scores about like cardinality so we had 50 prompts and responses here that we profiled and we can see the cardinality estimate is 47 so like three of those were duplicate so it's saying there's 47 unique things out of our 50 prompts let's look at just this uh first score real quick but basically this applies to all these um down in in the rows this first one is aggregate reading level we can scroll over again we get a cardinality score so basically there's 17 unique values in this row we can see the count is 50 so we profiled 50 of them and then we have some distribution metrics here so the max the mean the median Etc and we'll actually look at the uh the full um data frame here in a second on some of the own text so it's actually cut off here by default I should probably expand that next time you also get things like the Min so here you have like the max score is 28 there would be a Min score uh which is like I know four or something like that in this case and then you get these distribution metrics over time and using um now these just extract metrics again this doesn't really have your raw data any in it anymore we can use either like conditions to look at this and say um like let's get one that actually might be useful for security like this jailbreak similarity score that's calculated we could look at that and we could say well if the max jailbreak similarity score is over a certain amount then count that as like a jailbreak like in this case there's one that is like 100% I guess sure it's a jailbreak um so it matches probably one of the strings that we have in our Vector similarity score calculation um so in that case we're saying we're pretty sure this is a jailbreak uh you know don't pass that prompt to the model or don't pass the response to the model we'll actually see how to do that with guardrails later in the notebook but now we have all these extracting metrics for all those different things we can either start setting up guard rails or we can start monitoring and evaluating kind of these metrics over time in an obser observability store which we'll see here in a second as well and if you want to look at what triggered all those metrics you can look at all 50 of those chaps here kind of get an idea about like you know what caused the jail break score or what caused um toxicity Etc in here all right so let's go ahead and actually create a model I think this gets a little more fun where we can pass in our own strings to it uh we're going to import Transformers or from Transformers which we install installed we're going to import gpt2 model and gpt2 tokenizer we're going to use gpt2 today because um it doesn't take too much compute to run and so we're just using a CPU instance of our Google collab right now you could also just run this on your computer probably pretty fine um but uh it's very easy to bring in other models so if you actually want to download a heavier weight model and you have a GPU or something like that you can do so um and do almost exactly what we're doing today except for you'll just want to change out uh the gp22 model and tokenizer for your own model so for example if you wanted to use llama 2 it's actually pretty easy to do I just want to make sure that everyone can run this notebook today and so we're using gpt2 which um gives kind of funny responses it's not nearly as good as GPT 3.5 there's a reason why 3.5 really took off and two did not um but it's kind of fun to see the responses and it gives us good enough stuff that you know we can see how we would evaluate these things and then apply that to actually better models you would use in production and again also just mentioning we're using hugging face today but this is also pretty easy to implement in with something like open AI or any other hosted large language model as long as you have those prompt and responses so we're going to initialize our gpt2 model with our gpt2 tokenizer and get the output from it and then we're going to return this prompt and response dictionary which again has the key prompt and response and then that's what we're going to profile but also we can get the the response out of there so here I'm just going to prompt it tell me a cute story about a dog and even though it's pretty lightweight it still takes a few few seconds usually per prompt so um and and depends on what what instance you get in collab it can take a little bit so it said tell me a cute story about a dog and then it kind of completes the the story here again sometimes gp22 can be kind of funny uh what it what it says so let's see how we can now create and inspect the those metrics kind of like we did above but on our own text here so by the way you're you're welcome to change this prompt and kind of see what metrics get extracted from it I definitely encourage it actually um or you can just run the notebook if you want but I I think it's more fun if you probably change this to stuff that you're interested about and see what metrics get extracted so I'm going to run this here we're actually setting the um data frame display so we'll see all the columns this time and then we're going to profile kind of like we did before but we're just passing in that prompt and response that got returned from our function here passing schema which was our L metrics we initialized here and then we're going to look at got our profile view in pandas again um right now it's just on that one prompt so we can see cardinality estimate is one we can see counts uh uh is one as well so it's just one in here and so that means the max the mean the median the Min Etc because we just profiled one is all going to be the same and as we add more profiles to this we're going to get a different distribution score on all of these but so we can see let's choose one real quick like the um what is the response uh reading score I guess is one we could look at real quick let's look at difficult words actually I'll just scroll over and um I think the the higher it is the more difficult words it is and we saw it was kind of spitting out a lot of words there I'd have to go look back at the prompt and see exactly why it's giving us that score but now we have all metrics extracted from the prompt and response like jailbreak similarity Etc actually let's look at that one prompt jailbreak and again we'll see how to more easily visualize these in a platform and set guard rails up so um similarity score here is actually pretty low um so it doesn't really match a jailbreak which is probably very true uh the The Prompt that we had in there said tell me a story about a cute dog you know it didn't say try to do something like write malware for me or anything like that so pretty low score um there so multiple prompts we're going to create a quick list here and then we're just going to Loop over that list and call this profile. track so we already initialized our profile um here we're adding basically uh new prompts and responses to that profile you can do that by calling do track on an existing profile passing in the prompt and response dictionary again and now this will add them together and they'll become kind of merged into a single batch in this case and so we'll view this when it's done and we'll see a very similar thing just like we saw above except for now there's four separate prompts and we'll actually have a distribution on the Min the max the mean Etc it's no longer just the same number so let's look at how we can more easily monitor these over time um so again if you haven't already now is probably the time where you'll want to set up the wabs account oops forgot that collab always does this now so I'm going to click this get the link here we can look at this cute corgi um and also if you're following along you should be able to click on these links and for anyone who just came in um if you do want to catch up or anything all the code links are also in the description below on YouTube all right so I want to get my API key and my model ID from yabs so I'm going to go back here um I went into that demo org So currently my homepage looks kind of like this you can always switch orgs here so you should have your own organization now um but again if you do want to look at any demo organizations and kind of data that's already in there you can go to that Global demo org but let's go to R org so I have three projects in here right now I think starting out if you made a new account there's like two kind of toy projects what I'm going to do here is go to create resource and I'm over my limits on my free here so I'm going to actually clean up a couple of my apps and uh again you could do this if you have toy projects in there as well so I just hit edit models and data set I'm going to go ahead and delete these [Music] two yeah and then I'm going to create a new resource up here so I'm going to give it a name I'm going to call this uh oosp Workshop you could name it whatever you want resource type I'm going to select large language model this is going to display the metrics that we bring up to yabs in a slightly different way than if we had something like classification model or regression model so select that um back batch by default is daily and uh we can't change it on the FR tier but again if you use that promo code and you want an hourly batch rate you can do so so that just means every time we write up profiles they're going to be grouped uh daily and we get an alert daily so add models or data sets now I have this model ID here so I'm going to copy it U Mine is model 306 yours is probably going to be like model two or three or something like that copy that number over or you could type it in I'm going to just copy it over sometimes when you copy you accidentally get a new line character or a space make sure it's just the string like this with no space or new line characters in there from the same page I'm going to go to access tokens I'm going to create a new one I'll call this again OAS Workshop you can call it whatever you want create a token now I have my token here I'll just copy it over and paste it in and we used to require also an org ID U but now it should be attached here at your API token and we should be good to go I think just running these two and um let me know people following along were you able to get those keys I'll pause here for a minute uh maybe I'll go through it again just real quick because I know I can go through it pretty fast because I've done it so many times um so again if you're on your homepage you can hit create resource or you can always get to the model and data set management Page by from homepage open up the hamburger menu go to settings go to model and data set management and then from here you created that new model so you typed in a name said create and then you're going to get your model ID which is down here again also make sure you have the resource type selected as a large language model if you forgot it or had it selected as something else you can always hit edit models and change it again this is really just changing how we view the data and calculate some scores in the UI uh but the data will still write up even if you had the wrong thing here from the same page you can go to the access token part type in a name get that API token in that blue box copy it over and then put it here and we should be good to go but I'll pause here just for a second for anyone still following along that wants to catch up all right so I'm going to keep going but feel free to ask questions if you want me to go over that again a little bit later all right so uh we're going to import the ylabs writer from yogs which is going to allow us to write those metrics that we saw in linkit up to our platform where we can visualize and Set uh monitors on them then we're going initialize our language metrics just like we did before and then um this is slightly different we're profiling just like we did before but we're initializing the Telemetry agent that's the yabs rider here and then we're going to push up that profile that we created to ylabs so if I run this we should get type true here excellent and um if I go to back to the yabs platform go into our project OAS Workshop and I think I can click on dashboard in this case as well or to the Telemetry Explorer let me actually click on that real quick so here we can see in the Telemetry Explorer we have 33 metrics here all the ones we saw in the data frame previously now we could click in one um I'm just going to go ahead and click on prompt sentiment here this isn't super interesting yet maybe but we can see our one profile is uploaded um and we only did one profile So currently the the it they're all just the same the in the max median are all the same score here as we upload more and they get batched we'll have upper and lower bounds which we'll see in the next part where we run our code cell so I'm going to scroll down and let's Loop through this list this is uh a list of seven lists with three prompts basically in there and then what we're going to pass this through our large language model and then write up the scores that we extract from it for each one so this kind of similar ating us putting our model in production for S days so I'm going to import date time as well go ahead and run this and I'll talk about what's happening here we're initializing our writer just like before and then We're looping through that list that we created and by default when we upload a profile like we did before where we just call write to it it's going to upload with the current date and time uh as uploading but you can overwrite that date and time by calling um a set data set time stamp stamp and what we're doing is subtracting a day every time we Loop through the list so again we're kind of simulating putting our model in production for seven days and we're going to see what that data looks like and this is going to take a little bit to run mainly just because the gpt2 is a little bit slow on the CPU instance but if you do go back to the platform and if you hit refresh you'll start seeing the data kind of fill in here we can actually see we have one day of data now so there's three dots um we have our upper lower and median for data and then we'll see this kind of back filling and we'll get that cool time series view that can give us a lot of information about how our model is performing or how people are using our model so refresh again this can take a little bit but um now we have a second day and we'll kind of start seeing this fill out here I'm going to go back to the collab notebook and we can see when this is done and the next one we can actually print out um all the prompts and responses together so so we're going to save those as a giant dictionary um what you actually want to do in production is probably save those somewhere like in an S3 bucket or a database and then you could go back and always query like the actual prompts and responses that went in and came out of your model and again this just takes a little bit mostly because of the um CPU instance we're on with uh gpt2 but the profiling itself and writing up is actually pretty fast any questions so far while this is uh going I know there's still quite a few people watching again too if you have feedback about what you're seeing if it seems interesting uh something you want to explore more let me know does it seem I know we haven't set up the monitor seen everything in the platform yet but does this seem like an interesting tool or a useful tool for some of your large language models applications even if you're just kind of um building maybe toy projects or educational projects I think this almost done in fact if we go back refresh you can keep refreshing on your end if you want too we'll see I think it's just doing the last day here but we can kind of see a trend now happening all right yep completed um I'll refresh one more time here looking at sentiment um so we can kind of see a trend now here where uh if you were just looking at the median per se we can see the median value is like 57 there 46 56 it's kind of like the 40s to 50 range and then on this day on the 25th uh it drops a lot to 72 and in this case we're looking at the prompt sentiment coming in but this could be any of those metrics that we have including the more security focused metrics like jailbreak similarity or um uh toxicity anything that you would think would be um something you could use for security um in this case we're just looking at this prompt sentiment one um this is really cool you click I I encourage you to click around on different ones as well and kind of see patterns maybe on these prompts and responses and we could go kind of explore back to our um prompts that went into it and kind of see maybe how that's affecting it so on the second one you know this is kind of like a product Chat thing and we can see that people are saying it made them angry Etc on the second day so maybe products or service was really bad that day and that actually might give us Insight not to just our large language model but maybe something else is like broken on our web store or something that day as well and so this is really cool we can kind of start understanding um how our models behaving or how users are behaving with our model we can also set up a monitor so in this case we're manually looking this is great giv gives us awesome Insight now we know that maybe we want to set up a monitor to automatically tell us when this change changes drastically we go to set up monitors we have some presets here that make it easy to um put like a something for data leakage jailbreaking Etc but we're going to create a new custom monitor today and hit UI Builder we're going to be looking at data drift on our input column and then we're going to select non-discrete so non-discrete kind of in this case means like um there's a wide variety of numbers not classification and we're going to go ahead and set this for all Columns of this type but if we wanted to we can manually select specific columns so if you're just getting started you might want to set data drift on all your columns but then you might realize it could be really noisy like your word count or something changes a lot and maybe that's just not a metric that you care for um in your specific application sometimes it can be one you care for um but so you could start sting on all and then you could kind of come tune your monitor later and maybe um set it on the specific columns you actually think are important for your application so we'll hit next um there's other different drift algorith algorithms you can choose from if you're kind of new to this and you just want to get started with when helinger is when we recommend it works pretty good for most um prediction types here hit next we're going to be looking at a 7-Day trailing window in this case but you can change this to be longer um or you can compare to a reference profile or a specific range uh a date range but we're going to be looking at those last seven days and seeing if there's you know any drastic change and Set uh a trigger alert if there is you can choose the severity this is kind of for you so if this specific feature changed a lot is that low low medium or high you can set actions where by default you'll just get an email but you can also set up a slack integration so you'll get pinged on slack or goes through specific Channel you can also set up a um pager Duty integration if you want to trigger something actually in your mlops pipeline so we'll hit save our monitor was added we can go back to the Telemetry Explorer here and let's go back to that one spe specific one The Prompt sentiment here and then we can actually preview our monitor by default it's going to run once every 24 hours uh but we can preview it what it looks like this can help kind of tune our monitor here and we can see that uh it triggered on that day where it dropped and I think I forgot to say when I was adjusting that number next to helinger distance uh that was making it more less sensitive so the higher the number in that case saying like it it has to have a very uh big drift of over 90% where you could have it a lower number and be a little bit more sensitive to changes that are happening so in this case now we have this monitor set we wouldn't even have to manually come in and look at this every day we we could actually have it tell us when something drastic uh changed so this is a a really cool feature let's go look at our dashboard again kind of similar to the um Telemetry Explorer here but we have it kind of broken out with some other features that make it a little bit easier for specific things like uh looking at security features like once we talked about during the OAS portion of this talk and performance features so it takes all those metc metrics that we had in link kit and that we saw in the Telemetry Explorer and we kind of break them out and visualize them in different ways here uh one of the biggest ones for a security is that response has patterns we can see that one of our responses here does have a pattern of phone number and because we only uploaded a few profiles and I think only one of them had patterns uh it doesn't show data here but if you did you would um you would basically see a chart trending here as well like we saw in the demo org then we have some other security related issues where you again you can set up uh monitors and look at prompts has patterns responses have patterns Etc or is there a a jailbreak similarity score this would definitely be one you want to set up a monitor for probably um for security reasons like jailbreak and patterns are probably the top two I would definitely recommend for security use cases um but it does depend on your application and then again you can bring custom metrics in and then set up monitors and visualize whatever custom metrics you want so some people might want to make their own jailbreak similar similarity score so I think ours is pretty good but you might have a specific type of jailbreak that you're looking for or a larger database of jail breaks and you want to do some sort of calculation between the embeddings on um incoming prompts to your database you could actually do that pretty easily with a vector database and then log that score or label and then look at that here in the the platform and set up monitors on it as well so if I go back to the notebook actually again I want to pause for questions any questions were able to follow along if you wanted to again does this seem interesting and useful for your projects for some of the security stuff that we talked about today hopefully it does uh what we're going to do is quickly go through uh the rest of the notebook here we don't have too much left I know we're a little bit over our hour already um you can also compare models so I'll um just quickly go through this where I'm going to actually create one more model in ylabs again you can follow along as well I'll create a model I'll just say uh oasp 2 again you could give it a better name select large language model I'll get this new model ID which should just be one more than the last one and we don't have to set the API key again because we already did that above and what we're going to do is create a little um oh I forgot this code cell most of the time if you do get an erir running this probably forgot to run a code cell just like I did there and I can also see that I have a space there which also would have caused an error so this um builds a little chat application kind of that we can use and again you can follow along if you want if this shows up and we can actually see how we can compare models together so you can input text here it'll profile it just like we did there's a little slider down here so zero would upload the profile today and you could also upload a profile six days ago or seven days ago so basically again you could kind of simulate running this um in a week's worth of time um I'm going to go ahead and just take these prompts down here I'll upload that from a week ago there's even spelling error in there then I will take this other one get it takes a second uh this is a kind of cool interface so if you haven't used gradio before it's a a cool quick way of kind of building an application they have way more features than just this so you can actually upload images and do cool stuff with computer vision as well uh but you have this little chat box and then you get an output um so it's kind of a cool way of of seeing what your model is doing with a little bit of an interface so I'll put this new one in and I'll just upload that for today and if I go back to yabs and I look at my O2 app just go to the Telemetry Explorer we can just look at sentiment again if we wanted to at one profile the other one's uploading right now [Music] probably so we'll see again we're getting kind of those metrics written over time this is a a a good chance for you too if you're are following along maybe put some other patterns in here you could put like you know a fake Social Security number phone phone number Etc and kind of see it get caught um in that has pattern matching part of the platform but what I want to show here is if we went back to our other app oasp Workshop in my dashboard there's a cool feature too where you can compare models together so it'll show all the large language ones and you can compare that new model we just created and then select uh a batch or a date range I'll just select a date range here of uh what is the date the 26 I'll say like the 20 to the 26 and then uh we can compare our models together so this is also a really cool feature uh again you could use use this for security features or evaluation features uh on any on any of the other metrics around seeing how a model is performing versus another model so pretty neat uh be able to compare two models together again you could go to the performance tab as well and see like well what is the reading score if that was important to your use case you could be comparing these two models together and then select whatever the best model would be for your use case um and maybe deploy that one into production so a really cool little evaluation feature uh built into the dashboard or again for security which one has better guard rails uh Which models maybe not outputting patterns that might be really important this one gave a phone number maybe we didn't want that uh the other one didn't do anything maybe that's a good indicator we should use that model so going back to the notebook I'm going to hit I guess I don't need to hit stop you can also hit stop on this if it's uh still running let's go ahead and run the rest of these code cells we're going to quickly look at using guardrails and validation in your local python environment with those extracted met metrics that we saw in linkit so I'm going to import U linkit again uh do those metrics or initialize those metrics again and it looks like I do that twice I actually don't need this code cell and then here I'm going to create a function that's going to take in a prompt so some text from somebody and we're going to profile just the prompt uh I might have forgotten to mention so you can also just profile prompts or just responses if you wanted to one at a time um so in this case we're going to get prompt we're going to profile it just like we've been doing but we're going to access that value that we saw in a data frame we're going to um turn that into a summary dictionary just that one row so we're going to specifically say prompt. toxicity but you could take any of those rows that we saw in our data frame and call that here so it could be jailbreak similarity could be a good one again for um security purposes and then we're going to get that distribution Max but you could also get the distribution Min median Etc and then in this case we're going to print out what that score was and then we're also just going to say if the toxicity which is the one we're looking at here here is over 0.5 then we're going to um assume that it is toxic and return false so the this could be confusing I guess the function it's called is not toxic so if it's true that means it wasn't toxic so I'm going to run that and then we're going to pass in um a string saying do you like fruit and we'll see the toxic score is incredibly low and it would return true because it is not toxic so let's say dumb and smell bad which is it gives a really high toxicity score here 96% toxic and it returned false because yes this one is not not toxic so you can now use this kind of guardrail to um do something so in this case we're saying do you like fruit and we're saying if is not toxic um you know pass that response into the model and then give us the response otherwise we can print out as a large language model do dot dot which is a lot of time you'll see something like that on open AI or any other llm with guard rails built in you try to ask for something it's like as a large language model I cannot do this thing um in this case it wasn't toxic so it actually queried the model in this case we passed in that you dumb smell bad really high toxicity and so it just skipped passing the response to the model so again you could do this with something like jailbreak similarity any of the other metrics like has patterns Etc has a specific pattern Maybe that you're looking for don't pass that into the model and we have more advanced ways of doing this so you can also check out this link with some really cool built-in stuff around using validators but just kind of want to show a quick way of doing it as well with if statements so incredibly flexible kind of any way you want to do it with those metrics and then again I mentioned it's pretty easy to bring your own custom metric so I'll share this in the chat as well there's a whole blog post on it around um bringing your own functions we call them udfs user defined functions and you can check out this blog post but really all we're doing is you have a function that's going to return some sort of value you have this decorator around it called reg register metric UDF and then this um value that gets returned is going to be part of the linkit metrics once you initialize it again so really easy to bring your own metric in um which is really useful too if you are doing something with like a vector database query and you want to get a similarity score between things or you know jailbreak similarity is a common use case that comes up in conversations with people building their own and they want their own similarity score you can use a a custom metric or userdefined function to do that for you I think we have some examples of that ready as well on GitHub and then uh optional you can use a rolling logger so we saw how to just write up profile profiles by pushing them up we have a built-in function for Rolling loggers which is nice so it'll keep collecting and merging those profiles together until the time in your interval is met so in this case it's every 5 minutes and then those will all be sent up to yab so you could have this like every hour if you don't want to always be setting something up uh every time something happens so this is pretty commonly used in production and then there's a whole bunch of other resources here so um if you want you can click on any of these if you want to learn more about um L kit how you can use it we have a whole bunch of examples again with like L chain integration uh different hugging face models Etc um find these links below and click on the ones you think are most interesting with that that is the uh content that I have for the workshop today so again if you have any questions I'll be hanging out for a minute but otherwise hopefully you join the slack Channel or connect with me on LinkedIn and you can ask me questions there later as well so we'll be hanging out any questions from anybody or again if you have feedback if it was interesting let me know as well I hope it was with how many people are still here um and definitely check out linkit on GitHub if you haven't already that has a lot of the uh the resour versus also I'll share the slack link again too when I join the slack say hello in there ask questions there and it's more than just me there's a lot of other great people in there somebody said Thank you thank you so much for uh coming hopefully you had fun and maybe learned something new does anyone have any interesting LM security stories they could share around weird patterns or anything coming out of your models all right well if there's no other questions then I will wrap up the stream I'll wait uh just a minute longer see if there's any other questions from anybody uh we'll probably be doing more workshops on LM security stuff for sure so definitely follow wabs join the slack channel to see those when we post them pretty common topic right now when um I talk to people all right if there's no questions even though people are still hanging out um I will hopefully talk to some of you all later and again hopefully you had fun learn something new and uh talk to you later have a good day everyone
Info
Channel: WhyLabs
Views: 1,343
Rating: undefined out of 5
Keywords:
Id: WjIpwYjkgB4
Channel Id: undefined
Length: 75min 24sec (4524 seconds)
Published: Fri Oct 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.