[Workshop] AI Engineering 101

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi um welcome to the first event of a engineer submit you're all here um thanks for coming uh so what is this and why why did we have like a smaller session you know there's uh 500 50 people coming for the the full thing tonight um mostly I wanted to make sure that everyone comes in with some base level of understanding um a lot of um conferences try to show try to pretend uh that everyone knows everything has read every paper has tried every API uh and that it's mathematically impossible and uh I always think that it needs to be a place um for people to get on the same page uh ask the questions they're afraid to ask um I want our conference to be inclusive supportive uh a place where you can learn uh everything that you wanted to learn uh in one spot um and so I was fortunate enough to work of Noah uh on Lon space University um which uh has has taken a little bit to to come together but uh this will be the first time we're running through some of this stuff um and is what we consider to be the basics of what you should know as a brand new AI engineer um and like what is the selection criteria um hello God this I've been warned about this hello hello sorry hi okay it's back I know so um what what the selection criteria um mostly we do we do uh what what is this uh hello hello we might have to switch hi oh okay that might be right I'm not a Navy um so the selection criteria is basically like you should know how to do the things that are known to work uh known to work in sense they're not that speculative um most people will want will expect you to do that in in second job and any um AI based idea that people come to you with you should know the basics of how to build or at least if you don't know how to do it uh nowhere to go to get information so that's the that's the main idea of the starting point um and that's about it so today structured uh as a sort of a two-part session with with some uh fun talks in between first part is um 101 where we go through the SP University stuff uh we have a lunch and learn with great uh prop engineering workshop with Karina from anthropic which is super fun last minute addition and I get to add the anthropic logo to adding page which is nice um and then we have uh um AI engineering 2011 with Charles FR who's who's done some full stting uh bootabs so um you're not going to be an expert today um you will get a sampler of what we think is important and you can go home and uh go deeper on each of these topics so enjoy uh this is not say hi to no hey everyone hey everyone uh first of all thank you for all showing up I hope that y'all going get a ton of value out of this um before I go in there are going to be a couple of setup steps uh if you don't have these two things uh go ahead and do that while I run through these first few slides the first thing is having python making sure that you have the runtime installed on your laptop uh and then the telegram app both of those things will be required for the workshop uh and so if you don't have those I would go ahead and just look them up download them on your laptop and or phone yeah everyone yeah so uh I'll I'll sit here for one one minute uh everybody please make sure that you get the Wi-Fi so that you can go install those programs that I just talked about is a tegram messenger yes uh we'll we'll go through that and then what was the other one python just make sure you have the no you're good you should be good okay cool uh so I'll assume all of you have the Wi-Fi or should also be like it's I'll say it again for everyone with zeros instead of O's um but so what you'll be learning through this course uh is really these five Concepts where we are going to just go through the the basics of what it looks like to use you use programmatically uh the the llms and what it looks like to call the actual API uh we'll go through what it what hello what it hello hello you just SW yeah we we'll try this and I assume if I'm just talking like this y'all can all hear me okay yeah yeah okay uh and for for the zoom it's not recording this so yeah it's suppos to go Voice still yeah okay cool uh so I'm just going to talk like this hopefully y'all can hear me uh if you're in the back and you can't just raise your hand at any point and I will just tone up a little bit so really like I said the first portion that we're going to go through is just what it looks like to actually call uh an llm and get a response back and push that to the user uh this is the same thing that you're getting behind the scenes for programs like chat GPT and apps like that then we're going to go into embeds and Tok tokens which is really kind of uh how these models work under the hood we're going to kind of peel back a few layers of the onion uh and then from there we'll go into generating more text but it's a special kind of text it's our favorite kind is code generation that's going to be a really fun one that has a lot of rabbit holes for you to kind of dig in on your own and really level up I think there's going to be a ton of opportunity in that area specifically so definitely make sure that you're taking notes there um and then as just to kind of round it out it's not all text based llms I do want to get yall some image generation and Voice to Text uh those are both aii models that are very useful right now that you aren't getting a ton uh of coverage on in our little section of the internet um so with that I'll I'll kind of just preface this on like hey why why you're here why you should be learning this I think the fact that y'all are all here you're already kind of sold on the idea but really the the rise of the AI engineer has uh a lot of headwind in it you you have uh this this meme that you know does does the circuits every couple months uh where it's just you're you're able to do exactly this now with with uh the new kind of Dolly 3 that open AI is kind of teasing and is in Early Access right now uh and so really AI Engineers if you kind of cultivate this skill set you're you're going to be uh in high demand for all of these opportunities related to all these different use cases um and this you know take what you will from this this is a AI engineer and we use just AI as a search term you know this is up to 2023 if you just extrapolate that you can imagine that purple line being AI just very much going up and to the right surpassing even uh machine learning Engineers uh as kind of the core thesis for for the whole AI engineering Trend uh is that you as an engineer are going to have a lot more value and there's going to be a lot more people that can do it uh if you are harnessing these and building them into products versus is working on the underlying infrastructure itself um moving forward you have uh some of the things that are in the ecosystem different tools and challenges uh so really uh you have all of these different things uh this is we are not going to be touching all of these different tools today but this is just useful to get in your head these are going to be the products that you're seeing rolling around over the next couple days if you're not using this I would minimize it so that people can see it yep um and so today you'll you'll go through these these five different tools these are all you will touch each one of these today uh through apis in one way or another um so that's kind of our road map uh and to get started we'll get handson with gpt3 um so these two slides I would highly recommend uh now that you have telegram downloaded uh both of these are going to be of utmost importance to you this left one will add you to a broadcast channel uh that I put a bunch of links in so you want to scan that uh and if you have it on your laptop that should send a a link over there uh you will find links to the GitHub repository along with just a bunch of other useful resources and information um and then the right one uh we'll go through that in in a minute but essentially you will scan that and that will uh ask you to uh invite the botfa as a telegram chat uh the botfa is essentially telegram's uh API dispenser uh so you will need to contact the botfa you'll go through a a uh a series of questions with him that look uh a little something I I'll show you what it looks like but I'll I'll just pause here for for two minutes so that all of y'all can scan these QR codes and I will uh check to make sure that everyone is actually joining the channel oh great I'm seeing 27 subscribers y'all are killing it super quick are the slides on the the slides are not on on the GitHub repo no all right I'll leave this up for uh about another 60 seconds make sure that everybody can scan and get these two uh for all of the other things moving forward you will have very easy kind of checkpoint so don't worry if you get a little uh Left Behind as we go through uh we have a lot of information to cover over the next uh two two to two and a half hours uh so really make sure that you're paying attention to the information more so than staying up to dat on the code uh if you fall behind after each step there is a new branch that you can pull down uh to kind of get all the functionality that we're talking about um so with with that I I think all of yall have this so I will move over to Telegram and show yall what I want you to do so we're going to go over to the bot father okay great and so the bot father here you will essentially talk through actually we can just go go through this right now um so let me we can clear the chat history so this is what y'all are looking at we can go ahead and click Start and you can say hey cool he he has all of these commands for us right now that's great so what I want y'all to do is we are going to create a new telegram bot uh all of the functionality that we are building today uh all of these different AI API calls we are going to stitch together into a telegram bot uh this is really cool as a way to share also like mode and yeah uh telegram I I can't does do yeah I I can't I can't blast up telegram I'm sorry uh so with telegram you're going to hit Slash new bot you're going to need a name to call it uh I would recommend just uh maybe maybe your GitHub handle uh so just something cool and now change a user name for your Bot this is going to be its handle on telegram that you can send to other people so for example you could do your GitHub handle so mine is inine gbot you that is your username for the bot and this will give you an HTTP API key right here that starts with a bunch of numbers it looks like uh that at the very bottom I know this is a little bit small for everyone but essentially the flow that you're going to go through is new bot go through the prompts get the name and you should get an API key from that uh and from there we will uh pull down the GitHub repository and add that to our environment variables uh so go ahead and get that API key from the botfa um and then yeah yeah just install tegram yeah and then I just from the app just the main I just scan that code yeah yeah so uh ra raise of hands how how many people were able to get into the telegram chat and into the bota in in their telegram contexts just raise your hand if you did get it okay great uh if uh and raise pants if you don't I I can Circle back afterward so I've got a a smattering of people okay uh don't worry a after this first portion we can go through with the kind of Q&A portion and make sure that you are totally set up there um for those of you that do have it uh this is going to be the chatbot implementation uh The Next Step that you're going to want to do is in that AI 101 telegram channel that most of you joined uh you will go through and you'll see at the very top there is a link uh to that original telegram channel for the bota if you weren't able to get him uh so go ahead and make sure that you invite that guy uh and then there is a GitHub link uh it is GitHub inhg a101 uh it is here I can actually just click click on this so in here you'll see there's there's a bunch of links uh and from here you are going to want to pull down GitHub and this is the branch that you will all be working on again this is a link in that AI 101 telegram channel uh go ahead and clone this down uh the the main branch is what you'll want to start out with go ahead and clone that down uh and run through everything in this read me this little python shell uh go ahead and run through all of this let's make that a little bit bigger so you'll you'll just run through and this will install all of the dependencies that you need and get your environment up and running uh is essentially on once you're here this is a really solid foundation for the rest of the course this is all of the really annoying setup done and out of the way um so again all of that is in this main telegram channel for ai11 make sure that you are in there um and for the actual chatbot implementation so we just got a token from the bota if you don't have that please go through that that workflow and then you're going to need to get an open AI API key uh originally I was going to have all of y'all go through uh so you would in in that link if you want to get your own uh you're going to go to a link that's in that AI 101 Channel which is just platform. open.com and you go through uh and you you would need to register your card and generate an API key through there um just for the sake of uh keeping things moving quickly uh what I will also do here is is um I will actually just send yall the one that I have for this example um so I will put this in that telegram channel here so let me make sure I can do that so everyone for if you don't want to go through and get your own or you don't have one right now you can see in that AI 101 Channel this is going to be the environment variable that you need uh if you pull down the repository you already have a env. example and if you run the script uh it will change that example file to an actual file make sure that that uh token will will allow you to do that um I so again if you're behind all of that information just go to all of the time that telegram Channel throughout the workshop that should have everything that you need and so if you've done all of these steps you clone down the repository I just gave you that open AI key uh you're going to load in your environment variables so what that looks like here you can see that uh bot token here let me make this a little bit bigger for everyone uh uh let's is that pulled down yeah so you should be able to see you've got the tgor botor token and the open aore API uncore key both of these are these are the only two environment variables that you will need uh and once you have that this will be your own bot uh in telegram along with your own API key or the one that I just gave you in that channel and from here what we can do uh is we're going to add an open AI chat endpoint um so what you can see here uh is in our source file we've got this main.py file um and in here this is what you should be working with if you have uh pulled down the repository successfully you'll see we've got a list of imports then we're loading in all of our environment variables and then we are loading up the telegram token uh we've got some messages array this is going to be how we interact with the chat system this is essentially the memory uh that the chat apps use is this back and forth is just an array of objects where the content is is the text of all of the questions uh we have some logging to actually make sure that whenever you're running the program you're getting some amount of feedback uh as as it runs and we have this start command so I'll really quickly in this portion run through uh the telegram bot API kind of architecture so you will Define for each different section you'll have a uh a function that function will take an update and it will take a context uh the update is going to be all of the chat information essentially all the information about the user uh and the context is going to be the bot so you can see here in this very first thing we're going to just call the context dobot send message and the send message command takes a chat ID and it takes some text so the chat ID we get from the update variable and so that's just saying like hey whoever sent me the message send it back to them I am a bot please talk to me so cool we've got that functionality in start but how do we actually make sure that the bot knows that it has this functionality we use that through these handlers so we have this start Handler right here on line 28 and it is a command Handler so command handlers uh if you're familiar with telegram or Discord anytime you have that slash command that is a slash so this first one is going to be anytime the user types slash start uh this command Handler will pick it up and it will run the start uh function that we declared above and then we will add that Handler to our application uh this application is where your actual bot lives you can see we've got the telegram bot token that loads in here and builds up and then it just runs the polling uh so what happens if you have all of your environment variables set up correctly right here uh is if you're going to run so from the root of the directory you can run your python source main.py and cool you can see the application started uh and every couple of seconds it is just going to Ping as it runs through the polling back and forth uh and you'll notice here I have got this is the bot that I started so from the bot father you get uh a link right here so this this would be the new one that that I created uh but I have a previous one that I already made uh so make sure that from the bota uh it has this original link uh and make sure that you invite that so it would look like this and it's it's just another chat make sure that you start it this is the bot cool EXC yeah could you go back to the main yeah and so if you another Branch right because line number six doesn't exist on the latest one uh are you on Main so for yeah you you should be on Main yeah yeah you're saying on line six load do line6 is a space oh then there's another one there was another window though right uh I I would say if if this does not work you should just be able to pull down the GitHub repository put in the API keys in your EnV file and run main.py and you should you should have functionality out of it yeah um you to the the QR code sure oh and I will blow this up yeah this is really important I don't mind taking a while on this guys all of the other ones will be pretty quick uh because you can just checkpoint so if you don't have these uh just take take your time truly we want to get everyone on the same page there's not a rush here you know uh to be honest we are still ahead I was not I did not think everyone would be here bright and early uh so I planned this workshop for starting at 9:30 uh and so we are still 6 minutes early as far as I'm concerned we really want to make sure everyone gets set up and is in the right spot so uh really I know all all these QR codes that can be quite a lot to get through in the initial portion cannot import name name telegram did you run the from the GitHub running running through and installing everything uh so like I did install code like I'm cop if you just copi the code you'll need to install everything so here yeah I did install so unless there like a can you help about this uh should we point out that we have two t who I've got I've got Justin and and Sean okay and Eugene's available to help okay yeah and and really quickly guys uh I I failed to mention this at the beginning I'm kind of like running the workshop through as we go through um we have Justin and Sean and Eugene are all here and and K assist uh all all three all three of y'all uh or Sean and Justin can you both raise raise your hands just hey get get either of their attention they should be able to help you actually get set up uh if you are having questions in in the middle um I don't mind right now cuz we are very much in in the configuration portion this is the most friction that you'll experience uh through here it's pretty much smooth sailing after we get everything configured and set up as as is the wo with software as a whole if trouble yeah yeah yeah you're good yeah the uh through that API key that the bota generates yeah yeah yeah yeah so telegram has an API and we're we're just from that API key it knows where to send the messages from I'm running this IP in here should I see anything uh not yet not yet not not yet so right now it should just be SL start and that's all you get okay so before I move on does anybody any any any one person is okay I will leave this up here cuz like I said we are still 3 minutes early as far as I'm concerned and we're already halfway uh through through the introductory slides does anybody still need this QR code beautiful um yeah that's the Wi-Fi code that is different than this one uh no I I don't yeah no I'm I'm not trying to deal with a printer on top of all of this I I do apologize the QR code and uh everyone the botfa is in this initial one uh so the left one is more important than the right one yes yeah yeah but the botfa is like first first party telegram API and I get that question a lot telegram could do a bit to make The Branding a little bit more official you tell everyone said yeah telegram go to the bot father they're like I don't know that sounds sounds a little sketchy to me um but yeah the botfa is the the official telegram uh doar out of API Keys okay and I will double check okay so I see 62 people in this chat as uh so I'd say we are good on the amount of people that are in here and the botfa is in that one as well um so I appreciate all of yall going through I know the configuration is always the least fun of any software project uh and so what you should get after you have all of that uh is like I said we just run this main.py file that will spit out some logs and the functionality that you get from that is just uh as such let me clear history here uh is you'll just hit start this this is what you've gotten so far is a bot that it doesn't matter if you're typing anything you say hey hello uh we don't have anything we have exactly one Handler that picks up uh the start command so I can hit this over and over and over again but that's it that's not the most exciting functionality that you could get uh so we're going to go ahead and add uh basic chat to to the bot uh and so what that'll look like um to to save y'all from me just typing like code in front of everyone uh and this is a a good segue into what you can do if you fall fall behind uh on each section uh is we have a bunch of branches set up for you so we've got step one 2 3 and four so if you're ever behind you can just Skip To The Next Step uh so what you would do to do that is just get check out step one cool we have now switched to Step One uh and if I reload my file here you can see that I will have a bunch more in my main.py file um and so now that I have done that uh I will walk you through step by step what you need to add if you want to add it on your own which I encourage you to do so to the best of your ability try not to swap branches it's totally fine if you need to but you will get a lot more out of the experience if you actually write each section of code as we go through it so now we we're essentially on step six of the chatbot implementation so I'm going to make that a little bit smaller so that we can blow up this text a little bit more uh and so what you'll want to do is you're going to need to import open AI don't worry about installing it I added all the dependencies for the entire project you aren't going to need to run pip install over and over again you you have you have it all you just need to actually bring the import in so go ahead and import open Ai and and you're going to add this open a. API key uh and you're going to pull in that environment variable that we talked about earlier so this can either be your own open AI API key or the one that I posted in the telegram Channel just now either of those will work um and then from here you'll notice so like I said for each uh piece of functionality we're going to add a new function so we've got this a async chat function that again takes the update and it takes the context and so the very first thing that we do is that messages array that I told you about earlier so we've got this array of messages we're going to app Pinn to that array we're going to say hey there is a role of user and the content is going to be update. message. text like I said update is all of the information in the actual telegram chat so the update. message. text is whatever the user just sent in that line of text to the bot it is going to push that and is going to add it to this array of messages so there are three different roles that uh openai has one of them is system uh so you can see this is kind of us uh setting the initial prompt for the bot saying hey you are a helpful assistant that answers questions and then back and forth you will go through the user and then whenever the AI responds it will be uh the role of assistant so you see it will bounce between user and assistant with just the system prompts at the very beginning so the very first one hey we want to append it to the messages array and then we're going to want to get the check completion So This Is Us calling out to the openai API and so that's open. chat completion. create and that function takes two arguments one of which is the model and that is GP t -3.5 dturbo as a string and then it takes a second argument of messages and that is expecting the array of messages that we just mentioned earlier uh it takes a bunch of other arguments that you can tweak but just for the sake of this this is the only two that you need to get a proper response and so cool what we have essentially just done is we said hey you're a helpful assistant and then the user sent it a question and it's going to take that question and it is going to run through the G GPT 3.5 turbo model and it is going to give you a completion at that variable and so that variable is a rather large object that has a lot of metadata in it and so we really just want the answer if you had some logs maybe you could just send the entire object to the logs but we are only concerned right now with sending a useful response back to the user so we're going to say we're going to call this variable the completion answer and that is going to be the completion object at the choices at the zeroth index and that is a message and content so that's uh rather rather lengthy piece there but essentially that is just yanking the actual llm response that you want from that API response and once we've got the answer back we want to again appin to that messages array so this is uh you'll just think of messages as being the memory for the bot so if it's not in that messages array the the llm has no idea that it happened it is back to its pre-trained model so you'll notice uh once we actually get this running that every time you restart the server it no longer remembers the that previous conversation so if you want to reference previous material this is what allows that to happen is by adding additional context into this messages array in that uh kind of format of the role and content so I know that was a lot for just four lines of code but really this is step by step how you are interacting so it's generally hey uh llm I have this question it's going to say hey cool let me get you a a bunch of information back you're going to yank the useful piece the content out of that and you're going to do something with it in this case we're just going to send it back to the user and so that uses the exact same message that we had in the start command and so again that's the context dobot send message where the chat ID is the update. effective chat. ID and the text is the completion answer so that that gets you right out of the gate uh don't worry about question that'll be in in the next section we'll get to that so really this is what you're you're trying to get through is line 27 to35 here uh is this chat function and and then from there you will follow uh a very similar thing so we had the start Handler and again don't worry about the question Handler we'll get that to that in the next section so you're going to worry about this chat Handler uh which means that you are going to need to import in telegram uh this message Handler so we'll we'll jump to the top here so you see on line four we have the telegram. extension you're going to need to import the filters that's with a lowercase f uh and then you will also want to import over on the left here the message Handler so those are going to be two Imports that you need to add to line four the telegram. extension import and from those two if we go back down you can see the chat Handler uses a message Handler and so the message Handler uh is going to go through this this filters object uh filters is a way for the telegram API to essentially filter through uh various types of media that you could get so in this case we only care to receive messages that have text and only text in them uh and then that they do not have a command in it that's kind of what this this till is is just hey if it's a command I don't want you to listen to it okay uh and then the last one is going to be hey what is chat like what what function do you want me to call whenever I see the criteria of filters. text and the filters or till day filters. command so if those two are met it will invoke the chat function so again that is still the same Handler so we created the function we created the Handler and then we are going to add the Handler of the chat um so again don't worry about the question Handler that a mistake on my end that should be in the next section oh well I I do apologize for that but I I think you get the idea and so so if you have all of that once once you have this and again you run Source main.py permission denied oh that would help if I actually made the command yeah and you'll see this will boot up yours will probably be a little bit faster than mine because of the additional stuff that we added so cool our application is now started and if we go over to our bot now I can say um let's see uh who is Simon cowl we all love uh some American Idol judges and cool we now are getting responses back from our open AI API key we said hey Simon Cal is a British television producer executive blah blah blah blah blah cool um but like I said since we have a ended my message of who is Simon cowl and the bot's response of the actual answer we can now reference that in the conversation so we have uh you can now reference it so I could say um let's see what what is his net worth so we're able to reference like if that Standalone question what is his net worth it has no idea what that is without the a pending of messages going back and forth so you can see that's uh this is essentially what's giving it its memory and allows you to reference the previous conversation uh if I were to spin down the server and then spin it up again it would have reset messages to not have this in the context so we wouldn't be able to reference this anymore uh so with that that is essentially the the chatbot implementation where we're essentially now have uh chat GPT in in your Telegram but um and so that is everything for this section uh there's I'll be posting the slides uh a link to the slides after the talk uh so that you can reference things but there are uh little rabbit holes throughout the talk where you can kind of delve into more um and so I think for this particular section things that are interesting to talk about let me make this a little bit bigger for youall uh is messing with the system R prompt uh and by doing that you can have it perform various activities uh like making it Talk Like a Pirate you can put that in the system prompt and that link will send you to uh essentially uh two GPT Bots having a conversation back and forth with each other one talking like a pirate one talking like a nobleman uh and the other one if you go to that uh link is it's uh step by step it's trying to guard a secret so in the system prompt they have hey the secret is abc123 or whatever and don't give that to the user and it is up to you to kind of trick the AI into giving you the response and each step makes it progressively harder and so you all of that difficulty is entirely encoded into that system roll prompt and making it more robust and giving it more and more information to reason about how the attacker might try and get it to to give up the secret um so none of those are things that we're doing right now uh but I'll move on to Q&A was there any any general questions uh after that chat or after to that section yeah yeah about the memory the way that you are storing the the memory approximately it depends on the model right but uh how can we handle that with the code what I mean is how can we handle when the user is getting to the limits and we need to sort of uh yeah uh so so the question is like hey um for a particular memory how do I manage that in the code where if the user is essentially we're maxed out uh the the llm can only take so much information before it says like hey man I I'm kind of maxed out on capacity here how do you deal with that question uh and that's like a problem in the space currently if you're the term that you'd be looking for is like long-term memory is how do we give these AIS very long long-term memory on like hey I've been talking to you for the last week and I want to be able to reference all of these various conversations um right now the for this specific example it doesn't uh quite equate one to one but one of the answers is what we'll get into in the next section which is uh retrieval augmented generation where you will take the contents of that memory once it gets too long and you will turn it into uh a vector if you don't know what that is right now that's that's fine but essentially you uh store all of that information in a way that the AI it's very information dense and you give the AI the ability to kind of uh look up up like hey for what the user wants let me look at all this previous information uh and maybe I can reference that to answer the question better uh so it kind of condenses all of the memory to give it storage in a in a certain aspect good question yes sir I guess similar theond Max um what would probably happen uh if I had to guess uh how this specific one would break uh is you would probably see here uh that we would fail to respond to the user and there would be some error that's like hey context limit reached uh and so you would see that in the logs and the user wouldn't get any feedback since we don't have a a fail mode implemented any other questions yeah inst the library or something update wasn't part of the [Music] um did you did you run the uh inst the requirements I think I just I'm redoing it so okay don't worry uh there there'll be a break here after the next section we can we can go through make sure that you're up to date um or you can also go visit one of the Tas that can probably get you set up uh yeah so uh one thing one thing I always want to make make sure it's okay um anybody uses jargon that you don't understand please feel free to to ask about it uh I heard words like context whatever this is the place to ask ask about it yeah um the rest of the conference is going to just assume you know it so please um raise your hands because you're not going to be the only one here yeah absolutely and also know like uh there's lots of people that are watching this uh and so for any question that you have you are also kind of representing all the other people that are watching that aren't able to to ask their questions um and and for that this is very just use usage based driven uh we'll get into a lot of the jargon that Sean just talked about in the tokens and embedding section um yes yeah the the Wi-Fi network is prosperity and the password is for everyone with zeros instead of O's and we've got yeah there you go he's done this before yes I'm sorry say that in the Handler yeah uh don't don't worry about the the question Handler anything with the question that that's in the in the next section I accidentally included it in the same Branch don't worry this that's that's what we're going to go over in the section yeah just just the chat Handler yes in the next uh tutorial you're going to take us through is what was behind yeah so uh if if you're behind each uh branch is like a checkpoint so if you go to that branch and you run the install you're you're up to date on on everything yeah yeah so if you're if you're on Step One is this section currently you'll you'll be good uh yeah of course okay uh so getting into tokens and embedding so embedding is actually what uh I just answered with that very first question and how you kind of store all of this like long-term uh information for the chatbot to reference uh and we'll also get into tokens which are related to but slightly different than embedding uh so tokens uh the definition of a token is really just uh you can think of tokens as the atomic unit for these large language models it does not understand uh English it understands tokens everything uh that it deals with is in tokens it generates tokens uh and those are subsequently converted into spoken language such as English um they are hugely hugely important uh as that's what you get charged for this is the the money that you get charged for is based off of the amount of tokens that you are consuming uh with your various API calls or embeddings uh so it's how they interpret words how they understand everything um and what we just talked about uh on the beyond the models limits it's context you can uh memory and context you can think of that as the same thing where con the context limit is like the amount of tokens that it can reason about so if you generated a string let's say its context window was 100 which is like not not the case for any model that'd be like very severely limiting but say it was 100 and the question that you had had 101 tokens uh it wouldn't be able to understand it you have broken its context window uh and chunking is how you handle that to ensure that all of the context is retained through all of this information um generally speaking a token is representative of four characters of English text specifically um there are these things called uh tokenizers which we'll get into in a minute which is essentially the uh implementation of converting words and text into tokens uh there are various different tokenizers some of them are better at uh other languages uh so for example like uh Spanish is very expensive token token wise uh for the open AI tokenizer uh there are other tokenizers that are being you know built by researchers uh like if y'all are familiar with the um project repet uh they built an in-house tokenizer that was specifically meant for for code uh and so this like uh everything all of these variables are always changing and moving quickly so it's important to kind of Reason about everything from first principles um but there are some interesting ones uh that are exceptions like the word raw download clone in ined report print is one token there's uh you you can read this is a a very dense article but this less wrong post uh goes into kind of speculating why that is the case uh but it you are able to break the models with some of these tokens because how we think of that is like that's a weird looking word uh but the representation could be a little bit off uh and this thing on the right you can see is a picture of uh all of the tokens and how it's actually breaking down the text model uh and you can also try this platform. open.com tokenizer uh that is just a playground you don't need a sign up or anything you can just get in there and start typing words and that can get you a bit of an intuition for how it's breaking down all the words into the actual tokens yes sir does each correct yeah uh tokenizers are not you can't exactly just uh swap it's not interoperable what does that do to the system problem requirements nothing yeah so your your system prompt requirements uh you you have this whole English phrase that you've generated on all the instructions and that gets broken down into tokens models model to uh yes so each model uh if you're thinking for General language use so like llama being another example uh it if I'm not sure if it uses the same tokenizer not off the top of my head but even if it had a different one both of the tokenizers are trained and both the models are are you know aligned with their tokenizer to take English text into a way that is useful for the user uh and so getting into embeddings is the next portion so if tokens are kind of the atomic unit uh you can think of embeddings as well the definition is it's a list of floating P Point numbers if you look at tokens they are uh a bunch of numbers and so really uh embeddings is how we are able to store information uh in a really dense way for the llms to be able to reference mathematically uh and kind of get their semantic meaning um and so you know the the purpose of it is that semantics are accurately represented um and so this image on the left is kind of showing you uh for all of these different words how close they are to each other is how close the uh embeddings are to the actual floating Point values are closer to each other uh and so you can see like dogs and cats are close to each other strawberries and blueberries are close to each other um and so all of these words have semantic meaning and how close they are is representative by these embedding models um and so usage and what we are going to go through is how do you take something like semantic search where we have a huge amount of information that we want to reference uh but obviously I can't just put all every single text in Wikipedia in a giant text file and copy paste it and give it to the llm and say Hey I want to give me information about the tailor Swift article uh we have to generate embeddings and query them and contextually contextually relevant content um and so if you're behind from the previous portion uh go ahead and pull down the step one the step one branch uh but this is going to be um oh actually before I get into this uh if if you haven't let's go over to the telegram here I want to make sure that y'all get this uh prior um so pull down the and I get AO AI 101 okay there is uh this link uh to the embedding sled. py file uh make sure that you pull this down go ahead and generate this uh if you are on your own uh create an embedding folder and then copy paste this file uh and just run it uh and what what I mean by that is uh I will show you so if you have that file again reference that telegram channel that you're in for the actual contents of that file uh you will see that there is this embedding folder and in here there's ed. py I want you to just while we go through the rest of the section uh is Python 3 edpy uh and this just going to sit here oh hold on uh okay so whenever you run it run it from the root directory uh make sure that you're on that file so you do Python 3 edding ed. py to make sure that that file runs correctly because file naming and path stuff uh and so this is going to take five fiveish minutes to run your terminal is just going to sit there so make sure that you go ahead and do this step uh and while that is running I will explain what what is happening so I'm going to stop mine because I have already ran it um but essentially I will run through right now uh the entirety of this file uh let's go up here and so like I said embedding ed. py res okay cool okay uh and so this is that ed. py file that again is in that AI 101 telegram Channel yeah yeah so yeah we'll we'll get into this this whole portion here um so like I said copy paste this make sure that it's running don't worry about writing this code yourself it's a little bit tedious so just really make sure that you you go ahead and pull that down copy paste it run it uh so we've got a bunch of imports uh so we've got pandas um the OS and we've got Tik token uh Tik token is a python library that is the tokenizer uh whenever you are running that if you go to that playground link where you type in a bunch of stuff and you get to see the tokens uh it is essentially just doing a visual representation of the tick token Library let's see if I can move my mouse get that out of the way maybe we can go to the bottom yeah okay uh and then we've got this thing so we are pulling in Lang chain for this course and we are using the recursive character text split uh I know that's that's a a quite quite the name there uh but don't worry we will get into what this is used for I know uh you will see Lang chain reference quite frequently as a very popular open source library for doing a lot of different things yes do you have [Music] yes no you're good U while while we sorting it out so um we're actually getting a preview of a lot of the stuff that we have speakers for later so like L speaking and then we also have Linus from notion also talking about visualizing in biddings um and what he showed you is like what most people see like the clusters of Bings but I think uh you could actually you have actually looked at the the numbers um then you really understand at the low level how to manipulate these emits what's possible what's not possible um and I do highly recommend it um so a very classic thing that um I the first time I worked with Sean um or actually I think it was more more Al but um you know like can you embid a whole book should you embid a whole book um and and so like the the maybe unintuitive thing is that um you know if you embed one word versus you book you get the same set of numbers U because a bidding is effectively asking something like what is the average color of a film um and so that that that question makes no sense unless you you break it up into scenes and then ask what's the average color to SC um so I I do I do like to uh can you send it to him it's in the small AI Discord it's just a link if you can send it to him yeah okay so you can see um what's going on under the b n helps a that um you don't need nchain if you if you are comfortable enough but we recommend getting familiar with it because these things are just tools that the community has decided is pretty necessary so this where we starting you off that yeah uh yeah we didn't I didn't think through that one so uh yeah re retry um and if that ends up being a blocker as we go through you will just go go to the open AI platform I I did structure this to generate your own API key uh it's not expensive through if you do this entire Workshop you will generate approximately a nickel in charges uh so watch watch your wallets everyone um uh so definitely if if the rate limit becomes more of an issue we'll we'll take a minute in one of the breaks and everyone will need to yeah yeah yeah yeah no is pretty decent yeah yeah I I generally I I haven't had problems with it share sharing the key for a workshop like this but if you do hit it try again and if you're you're really really hitting it then generate your own um o okay we can also just download the CSV right the rate liit coming from yeah from from the embedding yeah um so there is also a portion um essentially what this file is going to do it is going to take a bunch of text files that you may have noticed whenever you downloaded the initial repository that is a web scrape of the mdn docs it is just a scrape of all of the text uh and what this file is going through is it is grabbing all of that text and it is passing it into the open AI Ada embedding model um but I did foresee that because this takes a while uh you don't get really tight feedback loops on if you did something wrong cuz like I said that file just sits there for like 5 minutes in the terminal with nothing happen happening uh so there's also in that telegram Channel you will see an embedding CSV file if for whatever reason you're not able to generate the embeds that embedding CSV file is the output that you would get from that you can just download that straight from Telegram and it is the same as if you had run this command successfully um so going through that essentially this entire file like I said is just going to do the embedding so we have a bunch of information around essentially cleaning the document so that we are giving it the best uh data and the most uh information dense data possible so we have uh we have this command that will remove a bunch of new lines and just turn them into spaces uh that'll that'll save some tokens um and then essentially what we do is we have this texts array that we're going to store all the text files in and then this is looping through all of that um docs and so with that we read each file and then we are going to replace any underscores with slashes um this is because there is a kind of Easter egg in here for people that want to dive in deeper we won't get into it in this course but this code is set up in such a way that you can ask the AI to site its sources uh because if you look in that text file you'll notice each uh name for the document is actually the path for the actual mdn developer docs uh and so we just replace the underscores or we replaced the dashes in the URL with underscores so that we can store it uh so essentially just undo that so we have the entire link uh and we will embed that in the documents so there is essentially the AI has the information of like hey here is uh the int the CSS web page uh I also have all the information on that web page but I also have the link so you can get it to site it sources uh that's a little bit more uh of a advanced thing so we don't get into it but it is the data is prepped in such a way that you could do that um and this is cleaning up the data set a little bit so in the scrape there's a lot of contributor txt files that get included so we make sure that we emit those uh and there's a bunch of paths that have uh JavaScript enabled or you need to log in or something so we filter through that as well so essentially what we have is we have all of the text from a web page along with the URL to the web page and we are going to append that to this initial this initial texts array uh and so we Loop through all of that and so cool we've got a super fat texts array and what I want to do is we are going to use pandas the you know data science library and we're going to create a data frame um and we're going to load texts into it I don't I don't want to do that um we're going to load all of the texts into it where we have The Columns of file names and text uh just like we have here for every single column we want the file name for the column along with all of the text that is alongside it okay cool um and then from here we we start cleaning up the data so we're going to say hey everything in that text column uh I want it to have the file name which is again the you can think of the file name as the URL for that web page uh and then we want to clean it up we want to take all of the new lines out of it uh and then we want to add all of that to a CSV and we call that the the scraped CSV and so that is essentially all of the contents of the MD and docs from a web scrape turned into a CSV file and then we have this uh tokenizer which is the Tik token Library we're getting the CL 100K base encoding which is again what open is using uh and then we're going to go through uh the data frame and we're going to call it the title and the text and for this is where you're really getting into uh the tokens and the chunking portion uh so essentially all that first bit was just data cleaning and now we want to create a new column in this data frame uh we're going to call it the number of tokens and so what we're going to do is we are going to apply for every single um item in the text column every single row we are going to apply this uh Lambda essentially so we're going to get the length of the amount of tokens we're going to grab the amount of tokens for every single row uh of web page we're going to toss that into a new tokens so if you have uh a really big web page you say hey that is like a a TH or 2,000 tokens uh so now we have that information directly in the CSV file for us to reference uh and then we are going to use this chunk size uh so this is where we're using Lane chain is this recursive character text splitter so essentially we have a scenario where we have uh a bunch of information that is uh arbitrary in its length and so because of that we don't know if we would break it by just stuffing in too many tokens into the embedding model the embedding model the same as the large language models can only support a certain amount of tokens before it breaks and so what this is doing is making sure that all of our data is uniform in such a way that we can embed all of the information without it breaking the model uh so we we use the recursive character teex splitter for that it's a very useful this is essentially uh just breaking everything within these arguments that we have so we have uh what function do we want to use we want to use length the chunk size we set it at a th uh the actual token limit I don't I don't know if it's been updated I think it was like 8,000 the last time I checked so we're quite a bit under and I do this just to make sure that you're you're seeing it because some web pages will have 3,000 tokens some will have 10,000 tokens some will have 100 you know uh it's variable so we just want to make sure that if it is more than a th000 tokens that we chunk it uh and we have this text splitter so this is essentially we just uh initialize it right here with all of the configuration and then we create a new array we just called the shortened and now we go through every single Row in our data frame and we say hey if there's no text in it we just skip it I I don't I don't want it I don't care and then uh if if in that row if we do have text uh but the number of tokens so we know for every single row because we already ran through the tokenizer we know the amount of tokens that that amount of text represents so if it's larger than a th we are going to use the text splitter and it has this uh method called create documents so this is essentially how you can break up all of these if it had 3,000 tokens we will generate three chunks and for each Chunk we will then append that chunk into that shortened array I know you in four Loops it can be a a little bit hard to reason about but essentially this is just going through and saying hey if this is too big if there's too many tokens we're going to make it fit uh and then from that we uh change all of the text that was the raw uh web page information we turn it into the shortened information uh so that this can actually be embedded um and then we we go through and and do the length of tokens again make sure that we're all good and then we add an embeddings column here where we go through every single uh text that has now been shortened and chunked and we will apply this function to it so this is open AI embedding doc create where the input is the row of text and the engine is this text embedding Ada O2 model uh and then we want the embedding again the output that you get from the the raw portion has a lot of metadata attached to it so we only want the data and then we want the zeroth index we want the embedding for it uh and then we just send all that to processed embedding CSV that is the telegram file that you got out of that uh I know that was quite a lot but is essentially what chunking is uh generally speaking you'll probably see in the conference there are a lot of uh open source libraries that do a lot of this for you because as you can imagine this is quite it's quite a lot you probably don't want to do this yourself especially if you're brand new you're like okay what is a token what is context like I have a lot to reason about so they libraries come in and say like hey just send me all of your text I will handle all of it for you but you can get a sense of this for what it is doing under the hood because this does meaningfully impact uh the performance of the actual models you can try it with different embeddings you can uh there are different chunking implementations where we have essentially chosen uh to break it down evenly but we don't have any context so for example we could have uh chunked it in the middle of a sentence which semantically that wouldn't make sense if I just a little red writing code ran to the and that's all the model has to work with it's going to give you worse responses because it doesn't have the the full meaning in there and so you do have a lot of control in the actual embedding uh and how you do that you can be smarter about it than some of the default configurations that you get so uh like you'll probably notice a theme throughout the entire convention uh is very much that uh data is incredibly important to the outcomes that you get from your model on a regular basis so this is an example of kind of taking that data integration into your own hands and getting your hands dirty a little bit um so with that in mind uh that's the embeddings model and how you actually run the text so if we're in the implementation we have grabbed all of our data this was the initial web scrape that I gave yall We just cleaned and chunked all of our data and we generated all of our embeddings and so now we need to generate context from our embeddings and then we need to use them to answer questions and so from this we'll we'll go through and we'll go into this source file uh and if you are following along this is where you would want to start coding yourself if you already did that step one you'll just see this file already exists but in the source directory you'll want to create a questions. py file uh and then we've got again the embeddings where we have let me push that down a bit yeah uh we import numpy and we import pandas we import open aemv and this open ai. embeddings utils library and this is super key for the actual implementation here this is the distan from eddings uh function and this is really the key to unlocking this retrieval augmented gen implementation so same same deal as before you need to load in your open AI API key uh and then we are loading in uh all of our embeddings we have that in a data frame and then this data frame we're going to go through the embeddings column and for every single embeddings uh row we are going to turn it into a numi array uh this allows us to actually manipulate this in uh a programmatic way um embeddings when they're generated I could be off on this number but I think the uh Vector Dimension so that's what the embeddings are is there a vector if you've done uh like algebra linear algebra you know like it it's essentially a matrix uh the embeddings that it generates are a 751 Dimension Matrix which uh if if you don't know what that is that's fine it's kind of hard to reason about I'm not going to go into it but essentially very hard to reason about our we we cannot reason about it in a meaningful way and this nump array essentially flattens it to a 1D Vector so that we can actually do traditional mathematical uh manipulations on it so uh essentially if I if some of that uh didn't quite click just know we we made it we can now this is the config we did it cool we can now actually play with our data and so what we want to do is we have this uh method called create context and so we're going to take the user's question we're going to take a data frame and we're going to have a max length and so this is the context limit that we want to impose so we're going to say hey uh anything more than 1800 I don't want it uh and the size uh is Ada this is the actual uh embedding model um and so essentially we are going to go through uh the comment is just for y'all if if you're uh doing it at home or something but essentially we want to create embeddings for the question so if we're thinking about a user asking us a question that we want to add retrieval augmented generation to we are going to turn their question of like uh I don't know how uh for mdn Doc is like what is uh an event in JavaScript would be a question so what we are going to do is we're going to generate an embedding so the same thing that we did for all of the uh Milla docks we're going to do to their question we are going to embed it uh and from that embed we now have this distances from embeddings and what this does is essentially it does a uh a cosine comparison from the uh embeddings from the question it is going to take a look at the cosine for that and it is going to compare it to all of the rows in our data frame and it is going to give you the distance metric we chose cosine there are a couple others but it it doesn't matter too much just pick pick cosine it's fine um and it is essentially going to rank them for us where it's going to say hey uh I the user asked me about events so I am going to rank information about node is going to come up a lot higher in the distances is going to be closer to the the semantic meaning uh then something like CSS is going to rank much lower because the vector distance is much greater so uh a good visual representation is this slide earlier so this is essentially doing the same thing where it's saying like hey the vector for blueberry is very close to the vector for cranberry that coine distance is very small where uh something like a crocodile is very far away from grape so that coine distance is very large so just think think about it like that the uh typ the distance the closer it is in semantic meaning to your text so we're going through and we're going to say hey uh give me add to that data frame a new column called distances so that for every single row I have kind of the distance from the question that the user asked uh and then we're going to go through every single uh every single Row in our data frame and we're going to sort by the distances so essentially you can think about this as like a Google search uh I searched for CSS stuff so CSS stuff comes up first and then if you click on the 20th page of Google God help you uh you know there's uh less less relevant meanings so essentially what we go through uh is we say hey uh I am going to Loop through all of this information going from the top down uh and until I hit that 1,800 uh length that we specified earlier I'm going to keep adding information to uh the the response and so what we get then is context that um and this is essentially what we use is we now have a Big Blob of context on what we think the uh 1,800 most relevant tokens to the user's question uh and that is very useful for us to then generate a chat completion uh and so we create this new uh function called answer question where we create the context uh so this is the same function that we we just went through um and you can see we added some defaults here but answer question takes the data frame and the user's question um and everything else are like things that you can tweak like the max tokens you could tweak it uh but we have default values for all of them uh Ada is the embedding model so as the size of the model uh this is required you'll see uh whenever we uh add so it's using the embeding but with Jack G yeah so we we have used the embedding model uh and it will reference that in in the implementation yeah so you'll see uh after oh it's using Ada to just actually go into the retrieval of the cont then the context will be sent to chat yes yeah so we essentially we have the context which is essentially like I said you can think of it as like the top 10 Google results for the user's question uh and then we will use that context to actually uh add it in the prompt so we have the context from the actual function that we called um and then we have the response so we say hey uh we have this big uh prompt here where it's saying Hey I want you to answer the question based on the context below if you can and if the question can't be answered based on the context say I don't know so we don't want it to speculate so after we we give it that initial prompt and then we feed it the context we say hey on a new line here is all the context this is your top 10 Google search results uh and then here is the user's actual question in plain English and so you go through that and you could add uh this this is the little Easter egg like I talked about since we have the link uh in the actual text uh this is an exercise for yall is this Source here you could ask it hey also if relevant give me the source for where you actually found it and it can spit out the link in the in the response because it has that in in its kind of context text and the top 10 search results it has the URL for each of them since we structured the data in that way previously and so that's all in the prompt we just added all of that into the prompt so that's where we get the context from and to your question earlier on like how how do we get long-term memory we don't just give it the context of absolutely everything and ask it to filter through that we do the filtering on our own uh and then we kind of give it back say hey I think this is what's most relevant uh given this huge data set um and so then this is the same check completion that we used before uh we like I saw in the first one we only added the model and messages here we have added a couple other like the temperature the max tokens the top p uh the frequency penalty the presence penalty and the stop all of these are variables that you can tweak to get different responses from the same prompt in your model uh you can think of uh temperature uh the higher the temperature it is the more varied the responses will be this is on a scale of 0 to one I think yeah um whereas goes up to two and then you can also force it to go to 100 okay yeah so temperature Z zero to one uh essentially where zero is it will give you the same answer not every single time but 99% of the time um and top p is a similar thing where essentially uh how we did in the context we kind of curated hey here probably like the top 10 search results the top p is the top percentile of the ones that you want so uh one is like hey you can kind of sample from all available sources 100% of the sources whereas top B 0.1 is like I only want the top what the model thinks is the top 10% of answers so only give me the really high quality stuff so this is uh cued to be much more deterministic because we don't want it hallucinating we already did that in the prompt we said hey if from the context you can't answer it don't don't try to and if you have the top PE at one and the temperature at one it is much more likely to hallucinate is a term a piece of jargon where essentially the model just makes up some stuff it'll say that uh you know uh that Neptune is closer to the Sun than Earth that's like a hallucination it's just incorrect um yeah you had your hand up in the back yeah that that that wouldn't matter since it's all uh vector you know that that's not like the tokenizers where you have different ones that's just pretty straightforward math the the quality of the retrievable yeah uh so there's a up face leader board uh and actually uh open ey used to be the best and uh now they're pretty far behind uh so you can swap it out with some open source iding models and saying in terms ofs some that yeah GTE is the current best from Alibaba G every month changes there's a separ question oh separate question okay yeah I was just going to finish out this so I do encourage you to play around with the other embeddings uh it's open source but the other thing to note also is that opening ey is very proud of their pricing for embeddings um they used to say that you can Abid all the internet and create the next Google for $50 million uh so just to give you a sense of how cheap it is yeah so like I said uh if you generate your own key uh part of that nickel about four 4 cents of that nickel uh comes from the embedding pretty it's not the entirety but it's like 80% of the mdn docs which is you know it's it's a large large piece of information to just crawl yeah and then uh you had question yeah the the temperature and do Fe if I understand correctly this applies to each token that GPT turbo is going to try and randomly pick so what you're saying is that while generating the output token top p is like pick the top n yeah so and thenand there's a separate parameter called top K yeah that's the one that you're thinking about top is a cumulative probability going up to 10% so obviously and then temperature zero is the one that is least random or one is the one yeah yeah zero is the least random one is the most random most yeah so if you have like other let's you like I don't know 100 different items and you're try for them and you have different tyes of metadata Beyond text say values that describe those things as well how do you incorporate other of met into as well you just shove it there as a representation and then basically create a standard representation I think I you might be the guy for this one I have for this I think if you have clearly nice welled as a fil as a fil no point putting EMB is right but you knowly what you want I want ID I gend iory as a and after only for semantically like like kind of tricky stuff exactly search queries right there a long so searies searies can I show in yeah what are you what are you trying to do I really need a tile manager if that's not inor how would you that met for failure is not separate the the failure is like it's a unique like access of the data it's like let's see some description text like we're failing for this right now how I do it is I com there's no simp answer a tradeoff add a little B more complexity I I'll talk to you uh yeah so for those who don't know uh Eugene is one of our speakers and he works on he sells books on the internet at Amazon um with alms uh we also yeah a question have you been able to get your rep I don't know yeah I uh I would say it uh it replied I don't know more often than I would like uh what like I would I I asked it a question about uh a vent emitters and it said I don't know and so that could be it wasn't included in my data set I didn't have a perfect scrape uh but I I found pretty reliably if I asked anything that was not within you know the Realms of the data that it it would uh very rarely would it try and provide an answer that wasn't I don't know yeah a little bit of deviation but in the same space um speaking of the chunk size um is there like any fundamental intuition that you know like we chose thousand because we think that thousand characters will give semantic meaning of documentation based questions that we're going to answer so that's why thousand is good but because we know documentation has within thousand characters there's lots of information that we can pull from is that the fundamental intuition behind it or is it like I would say just industry specific probably on you know docs is going to be a lot more information dense and so you need less of it whereas something like a Wikipedia article is a little bit bit more uh you probably want a larger one for that to capture all the entirety like a story you know if you just give one page in the middle of Lord of the Rings it's like well how useful is that you know you probably want more of like a chapter to get the the entire meaning behind it uh so I think it probably just industry specific and in this case like when you take the example of a lot of the Rings the use case that we are trying to develop is uh maybe it's a chat Bard which explains the lot of theing story to you and you want to do it in like series of 10 points instead of like reading thousand pages and for that you want what happened in that chapter so you the whole yeah yeah yeah so not an exact science there's something like 16 or 17 splitting and chunking strategies in Lang chain yeah uh I have every in every single one of my episodes I've always gotten Tred try to get like a rule of th from people and they always say it depends is like the least least helpful answer but uh they recently released this text glitter playground that you can play around with just search Lang chain text glitter playgrounds and uh don't don't don't do that do there is oh yeah yeah or if you listen to the podcast you can you can check the show notes but how do I switch back um yeah so you can play around with that and I think depending on like if you're doing code or structured data or uh novels or Wikipedia there slightly different strategies you want to ad for each Fair uh okay there's lot question yeah uh so um let let me ask more questions on break we we'll do a can can people ask questions in the chat and then like we kind of thread yeah yeah yeah yeah yeah so for well uh no cuz it's broadcast yeah um it's fine for these guys yeah yeah yeah um so lots of questions we will do Q&A after let's let's finish up the actual generation uh for for the Tex yeah let me I I'll do it after after this section yeah um okay so going back to the actual implementation we've now built the context for the embeddings we said hey all of that that's great uh here's the max tokens we want to get the response for the model uh and then we will send that back to the user so all of this is in that question . py file in step one of the branch or your own if you did this on your own uh this section specifically has a lot of uh stuff that is probably not super fun to code by hand so I would probably recommend switching to step one on the branch instead of doing all of this yourself um but if you want to you know be my guest you essentially create the context get the distance distances from the cosine and then create a prompt uh and pass that to the answer so that it can answer to to the best of its ability uh um and then from here you go into the main.py file we import questions uh we import the answer question from our questions file um and then we pull in just like we did before so this is why uh from this moment on every time you restart the server it will take a little bit longer um because we have these two lines right here where we are reading the embeddings uh into a data frame and then we are again applying that numpy array onto every sing single embeddings column uh and then we are creating a new function so we've got hey uh here is our new function question again has the update and context and so for the answer question uh function that we're calling we pass at that data frame uh and then the question is the update. message. text and then we send that straight back to the user and then same exact pattern we add the question Handler this time we make it a command Handler so every time we push slash question and then type some text it will pattern match and it will say it will call the question uh and then we add that Handler to the application so pretty pretty uh that pattern you'll see for every single step generate the function create the Handler tie the Handler back to the bot and what you should get uh once you have that if I SRC main do and so like I said it'll take a minute since we have those embeddings every single time we have to do it it has to run that numpy array evaluation on it every single time uh and so we have it in a numpy array um but you will see uh a very common product in the AI space is like vector storage uh things like like pine cone and all of that is essentially a database that holds exactly what this numpy array is um and so there's things like PG Vector pine cone I won't I won't go through all of them there are a lot of them uh I'm sure some of them are sponsors for the conference it's like a very uh developer Centric tool you'll you'll see a lot of them in the space there's quite a lot bit of competition right now some open source some not um but instead of doing all of that I would encourage youall to uh use a simple solution like a numpy array uh cuz that costs Z and runs on your machine up until it becomes a a problem where you're having like perform bottlenecks uh and then you can kind of upgrade to one of those products um and so from here if we're in our bot now and I say slash question what is CSS and it says hey cool CSS stands for cascading stylesheets it is you know it describes CSS uh but if I do the same question let's see we'll do another one um what is the event emitter hopefully it should have context on that oh well there you go and this is like an example from our prompt working well uh our it looks like our scrape was incomplete for the MDM docs and we did not catch any data about the event emitter and so it says I don't know it doesn't it doesn't provide any of that event uh and so if you do this several times I'm sure eventually it may try to answer but ideally it won't so if you have like uh who is Taylor Swift I don't think that's in the mdn docs yeah um but if we have who is Taylor Swift and it's not matching to that question uh you'll see hey it it does the response it sends it it doesn't have all that context and all of the rules around prompting and uh none of the questions we didn't add any of that to the kind of messages uh memory so it doesn't have it doesn't remember that we asked it questions about the event emitter or css um so you can kind of imagine we mdn docs but you'll see a lot of companies right now are doing like this on your docs as a service you know you know pay us and we will embed all of your docs and then we will add it to your search uh so you can get kind of like AI assisted search for whatever your product is that you want users to know more about um I have a question yeah um so there's several like you can ask a question without using the backlash right yes um so I've asked it some questions where it answers correctly without the backlash I use the backlash it says I don't know um what's the kind of threshold there that I could play with uh so that's essentially uh if so his question is like hey I'm getting different responses whether I have the backslash question versus the regular question uh and that's entirely I guess to be specific it's it's telling me I don't know when I use the backlash yes but it is giving me the correct answer when I don't put the backlash so it's almost like it's maybe not confident enough in its answer yeah it's either not confident enough in its answer or it does not have information from the data set so anytime you're hitting SL if you're looking here on uh line here what specifically pulling from the context of yes so it is only going to pull context whenever you hit slash question otherwise it's just your you're asking open AI about CSS it knows quite quite a lot about IND and docs and and developer stuff uh cool and so yeah [Music] yes corre um yeah yeah if you don't want that to happen you would probably want to uh you know there's techniques I am not super familiar with like how to prevent like prompt injection and prompt attacks uh my initial kind of response to that would be to add um more system prompts uh because I believe that one is just from the the user or the assistant so I would add like Hey whenever you answer the question here's two or three system prompts that should helpfully circum circumvent somebody saying like ignore all previous instructions I want you to you know slash question answer about Taylor Swift you know um so that's that's how I would handle that currently uh yeah with the yeah uh how effective it is um so the hallucinations part is just um essentially you saw what the The Prompt all of that work of generating all the cosign distance is just to get that really good context so you are still at the limits of llms on just like hey I'm going to tell you don't hallucinate but that's still very much in your nature to to do so so you're still kind of at at its Mercy when it comes to that stuff uh yeah so I was curious if you have any rules ORS around tempure like Z yeah uh a lot of people will just initially uh use temperature as a creativity meter in their head so it's like if I'm asking you to write poems I probably want to turn my temperature up because if I put the temperature at zero and I ask it to write some poem it's going to give me the exact same structure every single time and that's probably not what I'm looking for so it's really uh that the temperature is like half how deterministic do I want it to be and that will just depend on the use case so like docs you want it to be fairly dry I I want the same response if I ask you why what is CSS that doesn't change I want you to give me the same answer every single time and I want to feel good about that um and so really just depends on the use cases so creative writing you know uh blog summaries maybe you want to turn it up a little bit and for other ones maybe you want to turn it down so I think that explains one and zero you ever yeah uh for this one we did 0.5 um and so it's another thing to think about uh is usually I will play with either temperature or top p uh one at a time I won't do both um because if you're thinking about like hey what is the non-deterministic uh so I set temperature at zero but I set top P at 0.5 I will still get more varied answers but it will kind of have a narrower range of answers so it'll still vary but just since I opened up hey uh you can now query your 50th percentile answers versus just the 10% so usually I will tweak one at a time uh for for that and that that's where I have found success but it is very much just uh uh Case by case bases on uh I very much get a feel I'll do five prompts in a row with the setting and then I'll tweak it until I just like yeah that feel that feels good that feels good yeah two yeah yeah my bad zero to two yeah you kind of it seemed like you almost imp blank chain has this semantic similarity search is that very similar to what you implemented yeah so uh that entire thing that entire embedding py file of all the data cleaning all of the you know character splitting uh is essentially an abstraction layer lower then I I don't I'm not 100% sure the tool like I haven't used it but I'm 90% sure it's just like it does all of that for you so that's why we did this so you can really see like what the knobs are that you can twist cuz if you just have the one line of code on hey here's my question you know go look at the database fetch me text you don't get a sense for what all of that is doing behind the hood and maybe you want to tweak some things to get different results that's better I was just curious yeah of course yeah question I was looking at the text playground and you can uh play with a chunk sizes and chunk overlaps but you don't really know how it's going to work you have to try it out yes to try it all out yeah you you'll see a recurring thing through all of this and since it's uh so in the space like something like this where you're getting hands on with it is super super important to develop your own intuition uh about these products on like hey uh there are not you know 200 person teams trying out you know what different teex splitting looks like for the same data set and we come out and say hey look this is the best way to do it here's the Empirical research that says so it's just like everyone's like I don't know it works for me here's the Vibes you know this is this is what we're going with what does overlap help uh overlap helps with the problem like I talked about like if you're saying like little red writing Hood encountered the blank uh if you have overlap there you will have two separate chunks that have the same information in it so you know that uh one of the chunks is more likely to have all of the semantic search for it or all of the semantic meaning in a given paragraph So if I have three chunks and they all overlap a little bit is much more likely to uh query a chunk and have all of the semantics that you need to generate a compelling answer versus just like hard cutting each each one yeah of course yeah uh I that is one thing I haven't played around uh there's only I think two or three different uh distance metrics that you can use uh I have not played with the actual distance metric changing the coine or not because I've uh to me that is the most deterministic portion uh given that it's just the straight math on the coine between these two vectors I'm just like okay uh I can change that and that will change everything Downstream a bit but I'd much rather have that be a constant and play with everything else that yeah so let's have you know now we can go search against them for similarity but I ask a question says uh goes across different chunks sorry I guess I'm answering my own question so in this case let's say I I say tell me about bwise operations tell me about event tell me about other things all in one person yeah then the number of chunks we retriev from the the store will all then yeah uh so his his question for you those of you'all who didn't hear is hey what if I ask so we have all of this information from Indian what if I ask it about multiple things what if I'm asking about bitwise operations and CSS and events all all in one question what does that look like for the retrieval and the process is the exact same but you can think of this uh similar to a search result in Google where it's like okay if I'm asking it about bitwise operations and I'm asking about the event emitter I'm not going to get as clear results as maybe I would like because the llm is doing the same thing where it's going to uh do the cosine similarity and it's going to find documents that relate to all three uh all three of those things and it will generate you an answer for it but it will probably not be as uh information rich or as useful as if you had just asked it about the one thing uh because the because we fit three subjects three different semantic meanings into the same uh kind of chunks the because the you know if I have 1,800 tokens to use and it's all related to CSS I can have a much higher confidence that I found the best results versus if I have to divide that by three I'm suddenly much less confident in my ability to give you and provide you a robust answer yeah intice documents that are one document per teal concept for example yeah yeah yeah that that would absolutely be be uh at least I I haven't tried that but that that sounds to me a very reasonable approach on how I can separate like hey take this and give me the three semantic meanings and then those are all going to be three separate I want to create context for all three of those questions and then Stitch all of that back into one one response for the user and so that's where you get a lot of um these new products that you're trying out people say oh that's just a wrapper around chat GPT it's like yeah well I mean adding you know six to 12 prompts around chat GPT is going to create like a meaningfully better user experience for whatever vertical you're in like that that is going to be helpful and people are going to get better results using your product than the chat GPT Straight Out of the Box um and cool that's uh that's it for now we're going to take a 10-minute break go ahead and get some snacks get some water uh I'll also still be I'll still be here I'm happy to continue answering questions but you know shake shake everyone's hand you know get stretch your legs you know we got another hour hour and a half before the next break yeah when we generate the question embedding is this is like the size of is it is generating just one embedding for this question yes yeah it's taking your question and it is generating an embedding for it so that you can then perform that cosine distance search so I thought the embedding would be uh you know maybe maybe I'm mistaking embedding for tokens so anding is bunch of tokens yeah so if you look let me cool all right so uh we we have had a lot of Q&A so I'm going to try and speed up this portion so that we can kind of get through the rest of it um we're going to do Cen Generation image generation and then hopefully if we have a little bit of time we'll do the text to speech or speeech to text um but what we'll get into so everything that we've done so far with the initial implementation and the rag has been text and code is just another kind of text but it is a super powerful one uh and so for this we're also going to be upgrading from the gpt3 model to GPT 4 uh gp4 just is like significantly better than GPT 3.5 it's not even close uh here is a just graph of these are different AP tests on gp4 gp4 with well you can't really see super well uh given the projector but there is a a very slight Edge for the gp4 that has Vision it's like a very slightly darker shade of green but you can just see compare the blue to green you're you're getting uh large performance benefits on reasoning uh which is very important for code Generation Um and they're also quite good at code generation something that we didn't really talk about so far is how it actually generates the answers uh what it does what the llm does as it generates answers is it predicts hey what is the next most likely token giving my given my training set so if I have uh you know uh code code has very repeatable meanings which means it is much easier for it to predict like okay I had an opening tag for a div that means uh every other single time I've seen code it's had a closing tag for the div it is much easier for it to predict that uh and it is very good at doing code generation versus something that is like creative writing it's like okay yeah it has a lot of creative writing examples but it has so many different uh paths that it can diverge on where if it picks you know one crazy path uh each bad or crazy token that it generates essentially sends it on a parabolic curve on like generating more and more crazy tokens where code uh has a lot of I don't know why that's not bolded sorry um but code has very strict rules and it's very easy for the llms to reason about so code generation is very fast it's very cheap and is generally deterministic um and for us as Engineers it really augments your developer capabilities even if you don't embed AI directly into your uh application where your users are calling out and you know making use of llms it can help you as a developer write a lot of boiler plate because uh it has context on everything I know some of y'all may be working on uh you know AI startups where it doesn't have you know all this training context on like newer Technologies but uh a good example for code generation is if you use rails it has so much information on Rails because it's been around for the last decade uh if you ask it to do I don't know how many rails developers out there but there's this device package that adds uh essentially login functionality you can ask it to tweak a device configuration with no other input and it will give you like the six different files that device autog generates for you because it just knows like hey this device package has all of it it has context uh it's it's incredibly powerful just to speed up the code that you are writing regardless of whether or not that is AI augmented code or not um and is a big part of you know this very famous slide of software 3.0 where software 1.0 was handwritten code where that's what we write we write you know C++ programs and compilers uh and then as the last 10 years have kind of come around we have software 2.0 where we're no longer directly making code changes we to the actual output where it's like hey we do this and that happens every single time now we're like feeding new data sets into machine learning models and generating new outputs so we're kind of more indirectly tweaking the outputs because we can't just like get into the the weights and tweak things and pretend that we know how these neural networks are actually working it's a much more of a blackbox style and software 3.0 is like kind of where AI Engineers are coming in where it's a combination of of one and two um and so so really uh what we're going to be doing is creating a f shot learning prop we're going to ask it to generate some code uh and optionally you could uh take that further and run or render that code uh a f shot learning prompt is essentially a prompt that gives uh the llm a couple of examples where it's like hey if this was the input this is the output that I want you to generate so you give it one or two examples in the prompt and that uh weighs quite heavily in future future prompts on like hey they asked me to generate code that looks like that so I will continue to do so without U prompting so uh what we'll do is we can move over here back to our app um and we will go ahead and we'll go to the step two Branch uh so we can get check out step two cool and from here uh we now have our source folder and you'll notice that there is let me move this over so like I said we are going to speed it up a little bit so that we can get straight into the demo and then asking more questions but essentially what we've added here is this code prompt so we say hey here this is the prompt that we're adding so it's hey here are two input output examples for code generation please follow these uh please follow the styling for future request that you think are pertinent uh make sure that all HTML generated is with the jsx flavoring because that's what we decided to do so we're doing inline Styles so like if I'm asking for a blue box with three yellow circles that has a red outline I generated some HTML that did exactly that and I feeded the response hey that is what I want it to look like I want inline styling uh with jsx cool and we do that again we say hey here is an red button that says click me uh and we add a red button that says click me uh and from there we add an additional system prompt that has all of the code prompt in it cool uh and then we add this additional code generation function again takes the update and the context we take the user's text uh and we get a completion so this is the exact same thing as as the chat but we have added the additional uh messages for the code prompt in here and on the completion we have changed the model from GPT 3.5 turbo to gp4 um and then we send that back to the user user and then same as the other ones we create a command Handler with the code and we have a code generation and we go from there so what this looks like once we have it implemented is let me just pull up here there we go and so we'll wait as it does that numpy array on the embeddings already know that takes a minute cool our application has started and now if I say code um generate me uh let's say a red box with a blue button that says yes we would expect for one to get an accurate representation of that we assume that it knows how to do a red box with a blue button that says yes it has enough data to know what the CSS for that looks like but you can see that it adheres to that same style that we had earlier where I didn't ask it to do jsx I didn't ask it to do inline styling it was in the prompt and it knows hey this is what I want it to do and so you can tweak The Prompt and the F shot examples to whatever you're working on if you aren't in a jsx codebase you could very easily change this like hey I want this to be view templating uh we used n at work and so we want to use that for our code Generation Um you'll see there's a lot of jumping off points here I kept this very similar or very uh minimal but if you go through the newsletter course you'll actually see we get into function calling uh which essentially allows us to do things like uh generate uh images based off of this so I could send it hey uh given this HTML I want you to generate an SVG string and then we can pass that SVG string to a PNG uh converter and send that image back to the telegram client so you can really start using Code generation uh to really augment your workflows in ways that are going to be incredibly productive for you um and even if you don't do some crazy example like that where we're turning HTML into an SVG string and then turning that SVG string into PNG btes and sending that back over the client that's like quite involved and has a couple different steps and that AI can break down at any of those steps you know it is non-deterministic uh but you can immediately uh stop writing that I I know I do not enjoy writing things like that you know uh it's quite annoying to have to write background color red padding this I can just say um let's see code generate me a a let's see uh we'll say a uh this this will be fun a div with the text saying centered that is in the center of the page because we all know nobody caner a div and this will go through and we'll say hey look we have the display Flex justify content Center align item Center position absolute that's that's a you know it's it's not perfect it is going a little crazy we've got you know position absolute with our top and left being 50 um but it it does have you know it is centered it it did what I asked it to do uh and so you can see really let your imagination go wild with the ideas that you can have here this is like uh HTML rendering um but it's very useful for backend engineering if you're trying to convert different bites back and forth different formats it's very good at doing all of that um and so this is like really something that you should be digging into uh and really pushing the boundaries on because you'll be quite surprised at how long these prompts you can do uh and still get very reliable results going back and forth uh between these things and like I said it is more useful with technologies that have been around a while um uh like the rails app that I used for example I did that a couple weekends ago I did uh retrieval augmented generation and I did it in a rails app uh I have not written Ruby before in my life I have never deployed a rails application uh I kind of know it's like I'm a programmer I can see what the syntax is doing but I don't know any of the kind of uh conventions of what it looks like to do rails programming but I just asked it one at a time like hey I want a logout button that does this can you do that for me and it sends me all of the templates that I need it says Hey run these three commands on the CLI and it will generate these files for you uh and I sent it to uh my old boss that runs rails and has been doing rails for the last decade and he's like I wouldn't change anything about this like I am happy with the code this looks this follows uh this is very idiomatic Ruby on Rails code uh and it knows that because it's trained on on really really good code uh and it's quite good at this uh so really I know it's a bit of a TR example but you can just imagine any code that you don't enjoy writing try try getting Chad GPT to do it for you I promise you you will not be uh disappointed with the with the result um and let's see and this is actually uh I'll get to you in in just a second so there's a couple homework and bonuses um you can see here uh in the link and like I said I'll post the slides right after the workshop for for everyone but the function calling for your own use case this is what I was able to generate uh what I was just talking about with the PNG file uh you can see the code for that in that uh repet application um but you can see that hey we asked it uh to it doesn't do it perfectly where I said hey uh I want you to have a yes and no button and I asked it to convert that to an SVG string uh and it gave me a blank screen I said no that's wrong try it again that's a blank screen uh and I'm like okay no that's still not quite right let me send you the prompt again where this is a green yes button and a no red button please render that for me and it it does so um and another thing that if you haven't spent a lot of time with it you wouldn't uh know right off the top of your head but GPT is not always the greatest at doing math uh and you can see we essentially added function calling to generate uh we asked it in the eval said hey generate some math that I can pass to the python eval function so yeah no definitely don't you know add that into your production apps or anything you know but it is fun as an example to have like hey you're not good at reasoning about math and just text you know it can do basic arithmetic if you ask it like hey what is 3 plus 4 it'll say yeah the answer is seven but it it will get things wrong sometimes hilariously so uh and you can ask it well you know what is really really good at math is code code is always good at math I can ask you to generate code that runs in the math and then I can send that back to a function that I made where it said hey uh I asked the llm to generate me some code I have a function that I can then call and say hey look I have some code I want you to just evalve that and print it out back to the user and you can see that at the bottom where it's like hey can you tell me what this you know random string of arithmetic is uh and it says like hey he show me the function you called to do that math uh and it has that math. evaluate at the bottom I know the text is a bit small so you may not be able to see fully there but really you can get a a lot of code generation allows you to augment uh and really kind of close the gap where GPT is bad at something you can say hey look I know that you're bad at math but I can give you access to a python kernel so I think you'd be pretty good if there was a python shell and you could just just any time that you're like I'm not 100% sure that I can do this just send it out to there and python is quite quite good at doing math so we can just do that instead so you can start to use these ideas of code generation on how to extend the functionality where GPT and its counterparts tend to fall off the map um so I think those are all uh super valuable use cases uh that you can start getting into and if anyone has questions yeah um I know she using the chat completion uh open AI chat what if you're using a different um like I'm using vertex Ai and I was trying to look up if they have chat conclusion a lot examples have comp yeah is that a standard is there standard AP that there just open uh I would say most most places that I've seen have some form of chat completion just because yeah it might not be called chat completion originally it might have just been called like completion um you know where uh Trad add completion would have been like uh they Follow the yellow brick and you send that and it says Road it will just try to complete whatever the thought is you know it's a a very simple example but it would just do that for uh each uh thing that you gave it to it did not have the memory or context so if you said if I asked one question like hey who is Mark cubin and they told me who Mark cubin was and then I asked him what is his net worth chat uh just traditional completion would not be able to answer that question because it wouldn't know who his is referring to it just like who what who net worth whose net worth are you asking me to so that's where the chat kind of adds in is the the context to kind of reference previous conversations but some of them may just be called completion uh and it'll probably be similar to that [Music] yeah is example PRS for differentes yeah um I'm sure there there are resources there I don't have a a good link for you that's just general like code prompting uh but I I know like if you Google any any kind of like whatever specific thing that you're looking for uh you can ask it to you you will find resources online and if not something that you can actually do uh is GPT is quite good at helping itself if you say like hey I know you didn't really give me a good response here can you tell me like what additional pieces of information do you need uh to give me this output and it'll say like hey I need you know X Y and Z and then you can use that to kind of inform your next prompt on like hey I want you to do this like you see people um there's uh kiny dods has uh for those of youall who are familiar with him he's a JavaScript educator quite proliant great guy uh but he has this prompt that's like you know two paragraphs long on just changing the default GPT interactions where he's like hey I do not tell me that you are an AI chat do not say please like leave all that nice crap at the door I do not care that you're being polite I want you to be as short as possible I don't want you to use emojis I want you to get to the point on XY andz and he has like you know like 30 bullet points or whatever that he sends it uh just to get better outputs and so it's really uh you have so much flexibility there that uh any traditional like awesome list style resources uh can very quickly fall out of date and I would really encourage y'all to start using this more than your traditional kind of like resource aggregation where like hey I want to learn about something new I'm going to Google it I'm going to compile like three or four resources and then I'm going to spend the afternoon like kind of reading up and kind of taking what I don't like and leaving what I or taking what I do like and leaving what I don't uh really try to just make a habit of asking this what it thinks about the various things and you'll start to develop your own intuition about it uh because the space is so new and the platforms uh that you're kind of relying and building on to of are changing so frequently you really need to develop your own intuition on like oh I think uh it would do better if I did x uh and so that you'll kind of come from first principles whenever eventually the API changes or the model changes and there's things like that you'll be well prepared to Pivot uh and stay on on top of your game continuously other questions beautiful okay cool well then we will move on to uh image generation this one I'm super excited about uh so image generation uh Dolly kind of opened the way for image generation back in 2122 uh if you were really following at that time I'm sure you have seen the avocado chair uh I I would say it's pretty iconic for the AI generation era that that really started uh but just being able to say like an armchair in the shape of an avocado and get something that looks like that is like absolutely insane it's been two two years now since you could do this and I'm still blown away that I can just like send some text of a description and get an avocado chair back in a PNG format is absolutely crazy to me uh that being said there are quite uh a lot of problems with images as they stand currently uh there's a lot of hard problems where it doesn't always understand context like hey here uh I want you to generate me salmon in the river that's right that is salmon and it is in the river but that is not what anybody wanted whenever they put that prompt in I don't think anybody was expecting to get that back uh and there's also uh hands uh if you've done any work on there there are some very cursed hands that you will see and they're just very ugly uh so much so that people will just add in their prompts that they're just like uh there's this concept of negative prompts where the uh model can kind of Reason about like oh I you don't want this to be included people just stop trying to generate good hands they just said no hands like put your hands behind your back if you're doing a mannequin whatever ever I I don't want to see it um and it's kind of in line with hands but it's counting uh The Prompt for this image is four people in a forest I I don't really care what the forest looks like I'm not being super specific but I would like there to be four people uh I only get four people in one of these images a 25% hit rate is not good enough for production I would say I am not looking for for 25% accuracy um that's also not very good at text this is a prompt for a a fox image and you can see it it's kind of getting there uh but this is like a long time of trying through like I probably spent 30 minutes trying to get it to say fox and this is this is the best that I come up with uh if any of y'all are designers at heart this just this is not cutting it this this hurts you quite a lot um but it is being worked on there are other uh models the traditional image generation comes from a diffusion style architecture uh stability AI is making a model called Deep Floy uh that you uses a different architecture uh that is and does generate good good models and you can see kind of tug and cheek this is this is not Dolly uh it could not generate that um and you can see uh there's a lot of uh products that are being explored in the space currently uh you know there's essentially what is a hobby and what people will pay for is customization and personalization uh you had Linda you know back in I say last year was doing a million dollars a day selling personal avatars where you would upload a picture of your face and it would do it in different likenesses um for those of you who know Peter levels he's has several AI apps that are generating by himself you know millions of dollars of Revenue uh Hassan who I think is a speaker here uh at one of the days uh is the creator of room GPT which is an open- Source uh image generation where you can uh upload a photo of your living room and ask it to style it for you differently where you can say hey I want like a midcentury living room here's what it looks like currently and it it does it for you it's that image Generation Um mid Journey being another one where that's just like state-ofthe-art image generation where you can do quite a lot and tweak a lot with it uh all of these have either huge user numbers uh this is an area that you're seeing uh a lot of uh normies use where it's just like hey uh maybe your mom is not doesn't get how impressive it is that you can say hey tell me who Mark cubin is and it spits out a result she's like okay well I can Google that and Wikipedia tells me that's like that's cool I guess they don't get the whole programmatic part but if you show someone I generated an avocado armchair that is quite blowing mind-blowing because that is not something the average person can do they cannot generate an avocado armchair even if they have a perfect visualization in their mind there's not a way to manifest that in the physical or digital world for most people so hu huge lessons to be learned on where you can really differentiate yourself for all of these products um and with that we're going to get into image generation so this is also pretty straightforward so we'll get Che get check out step three cool and from here we can now go back to our main.py file and we still have everything the same uh the code for this is super super simple since we already have open AI configured we already have it all ready here's all everything that we had uh we're just going to add an additional uh thing called image oh there also if you are doing this on your own um you do need to import the requests package like I said you uh everything is already installed you just need to add the import there um so what we will do let me go to image and let's move this over cool so what image does is it is still that open AI uh package so it's just open ai. image. create and it has a prompt uh the in is the number of images that you want want it to generate I think it goes up to four I could be wrong on that it could be more along with the size I think it does 10 by 24 and 512 by 512 uh those are the ones that I I would recommend that you start with uh probably just copy this if you're doing it on your own uh and what the response gives is a URL uh where it has the that some Microsoft server has a URL that you can then send a git request to and get the bytes for the image and then we have uh this context dobot instead of send message we have this send photo function that again has the same chat ID uh has another argument called photo and we can call the image response. content will render the image uh and then it's the same thing as before where we have an image Handler so anytime we do slash image we will pick up uh that response and then we tie the Handler right back to it so with all of this we can now generate images in our bot um kind of going back to the code generation part and the function calling you could actually um with with function calling you can think of uh function calling you may think like oh opena is going to invoke a function that I give it uh that is not what it does what it does is it generate you describe a function so let's say I had an ad function where I say hey I want to add two numbers together uh it will generate the arguments for that so if it thinks from the context of the conversation that I want to add some numbers together it will generate the arguments for me so it would generate a two and a four and it would tell me hey uh I thought you wanted some numbers generated and here is the function name and you can infer that and then actually pass that into the function that you made so I can have some add function that adds X Plus y together um and because of that you can add uh every single one of these handlers uh as like a a homework assignment for yall you could turn every single one of these into just the chat Handler uh and you could just have it infer what you want it to do so you could say hey instead of me saying slash image a cute picture of a koala hugging a robot I could just uh foro the slash image and say uh hey open AI I'm going to trust you whenever I say hey want a photo realistic version of a robot hugging a koala it kind of knows okay I they said photo and they gave me a description I am going to generate a photo for you uh so you can use function calling to kind of add uh smart if else statements is how you can think of function calling really is like hey if I could have uh if this weird esoteric example you know do this and uh the AI will be able to do that for you uh it's a bit to stitch together but there is a link in like two or three slides back of the actual function calling implementation up on that repet uh I think that is an area for all of yall that where you'll you'll definitely get some clicking moments there uh and we might be able to walk through that if we can get through the rest of these um so really quickly let's go ahead and get images going so we can say hey here's the main.py file let's go ahead and run that and well let me actually make sure that it's running before I give it a command cool application started so now if I say slash image uh let's say The Iron Giant sitting in the forest uh 4K and so what is doing it picked up on that slash image and it sent the Iron Giant sitting in 4k to The Prompt uh in that image file that we just made or image function that we just made uh and you can see uh you know Dolly so this is another thing to to be cognizant of uh that open AI is very dominant in the text space uh and their models are very good like you're not going to be gp4 at just about anything whether it's code generation text generation like you're not going to get better results than gp4 when it comes to text uh but dolly is quite dated so this is sending to to Dolly 2 I know that they did Dolly 3 which is they're kind of rolling out currently that has kind of uh chat GPT with vision you may have seen some of that across your timeline uh but just know that like this is actually quite dated at this point and there are other softwares like um Mid journey is quite good uh stable diffusion is also quite good and is something that I would really encourage you all to work on because unlike uh the open AI models like you could not run that on your computer it is way too big uh but stable diffusion is pretty much a state-of-the-art open- source model that is small enough to run on your computer uh so this can really be uh a taste where you can uh run things on your own machine uh and really start to get some of these unlocks there's a bunch of uis uh that we'll kind of run on top of but that's definitely something an area that I would like youall to explore um but just know you you may see like okay this is not as compelling as you know the uh if some of y'all have seen the pope wearing Versace uh it's like a very popular one that did the rounds it's like you're not going to get from Dolly but you can get uh basic images so we'll just say a koala sitting in a tree uh and images much much more so than prompting or than the text based uh side of things like prompting is so so important uh because you can't uh just go back and change this like you can if you get a bad prompt you can copy paste that into notes and you know make your changes that's very easy uh for most of us we do not have the skills to put the Iron Giant in this forest in a way that actually looks good to everyone uh so it's very important that you are very specific on the prompts to get the best possible results and so you can see it does real life things quite quite a bit better uh so you can see here we have a koala that is in fact sitting in a tree I'm quite happy with that uh image and this is something that I really really want to get youall hands on for um and so you can see this was the whole implementation we called to Dolly to generate the image we connected image to tegram and we profit cool uh and now I'm going to ask yall uh find a group either a clustering of chairs I would say uh find three to five people so if you're on a couch just stay on your couch and and say that you are a group find a group uh and have your telegram up to this point where you have that g step and we're going to spend 10 minutes where yall are just going to come up on a team and generate some images um and so really spend some time get handson and generate the best best images that you think are possible there are a uh a couple of prompts here where commas between ideas are good uh and describing the setting more than the subject are good um and there is also a link in that telegram to a bunch of really good image generation prompts that will make it look better so I go and spend the next 10 minutes and we're going to generate some images nice ho wow that's wow that's yeah that's that's really good can you even scroll showing the thing yeah isn't that crazy dude how does it even get that cuz I mean I some of them were wrong but like it got all of the handlers that's insane that's insane that is so cool if I give it any more context it'll start to really eat yeah yeah say hey actually this is a telegram it's going to be oh let me adjust all it that's insane that is so cool yeah it's vision is nuts nuts yeah yeah I was like I went back and forth I'm like I wonder what would happen if I like it was like building the workshop is like how much new stuff can I F like function calling is like it's like kind of kind of new uh this is like really perfect I think oh thank you actually somebody like like Google engineer the guy over there he's like yeah think I should get into this and I'm like well yes yeah yes yes like but like he's like this good I'm like yeah exactly yeah yeah you can start talking about Dolly 3 everyone's going be around chat being for yeah pretty crazy yeah just I I really wish uh images are pretty near and dear to my I I probably have put in like 100 plus hours into like mid Journey prompts so like I I have uh like it's yeah it's like I really love that and it's like I put in these like really robust prompts that get me like good like I can get almost like logo quality stuff out of mid journey and it's just like I try to do that with do it's like so when people come on into our team we do like a thing where I'll ask them four questions like hero or Villain Like fantasy and then I generate like a character character oh that's very cool and so our CTO I got him I made this uh his answers were really great and we made him this uh it turned into this like elephant like magic elephant guy oh that that so then the uh Runway ml's uh latest animation so I Dro that into the elephant into that oh that's nice oh my God that looks a lot better I played with Mage a little bit and it it's not that smooth they just yeah they're just like watch their but yeah like uh but chat Division I do you have the access yet to I I don't access to it fck with it like yeah I'm happy let you play with it but it's like I'm dumping all kinds of crazy ass diagrams in there yeah dude diagrams is something like I more people need to play around with because it's like um I did for the newsletter portion there's like the function calling flow is like a bit convoluted uh and I just asked I said well hey here's the code generate like a mermaid diagram from that and and and yeah it's like the ability for it's like well you're bad at images I have syntax that converts curs no although I need to I need to I'm you you see I'm I'm in neoen right now I need I need to move because you can add it you add documentation to any technology Source it index it automatically from a link then you can reference that documentation in your code base and what I've been doing is like I'll in cursor I'll because it can it can uh reference my code base I'll then have it populate mermaid orl based on myod oh yeah that's good my architecture that's onboarding one yeah onboard 101 man that's that would be so nice what I'm doing it too especially is like I'm pulling other uh you know GitHub repos like like complex agents and like that people like small or something like that and then dumping that into cursor and then having to generate architecture so that I can visualize it better for my team that's awesome like insane it's so good I love cursor curs unbelievable it's great it's great but it's like it's like G but it's like how's the performance it'sing yeah it's good it's amazing get all your plug and like you mentioned how J doesn't know a lot of the stuff that we're working in this all for that yeah yeah absolutely that's biggest thing yeah and it will literally sight like it'll sight the piece the documentation that it's referencing yeah that's that's really good I you know I I'll do it I'll do it tonight tonight tonight tonight I'll cuz I've just been like I've been watching over the last month like every like 4 days it's like you should you should use cursor you should use cursor prepared to lose a lot of cuz like I mean what you should be working on are you paying for the mon I Ed my I've done my own API and I've done tons of like that but I'm doing it to support them too yeah well no for sure I I put my key in I haven't been hanging yet I just put in my my own my us oh I spent 140 bucks yeah way over 100 bu normally I'm like five to10 a month you know D I struggle it's it's like I struggle to get my usage job not just from like following all those guys and chatting with them on Twitter and stuff they're like they're you're spending more so like the 20 bucks a month is a deal also I know that they're doing because like they're staying on top of it like I know that they're starting to use like gbt uh you know 3.5 function calling fine tuning to make sure better they're like using models specific for different so I think think like not only does it cheaper I think I it just makes sense to make the product and like 20 bucks for me is more than I support what they're doing do and like just adding the documentation that's the biggest thing I can go through and just keep adding all of your it's it's been a bit of a whiplash for me cuz I uh my my job job is in crypto and so it's like uh the cost of playing asking someone to play around play around with that is like immediately like oh well I sent my grandma some money and it cost like $6 you know and I S like I sent her 10 it's like okay well that wasn't that wasn't very fun s I'm like here's here's a 2 and a half hour Workshop where you're generating images doing transcriptions like I tried to break a dollar like I dare you I dare you like you're going to spend all day you could do it on gbt yeah but but like and then especially if you're sted with GPT 432000 which is really helpful for cursor yeah oh yeah as soon as you have the gp4 32000 API and and you put that into cursor like when I'm doing codebase searches and like full index searches on the whole codebase on like a bigger file I'll turn that model on and then that then it can then it's like then it can really start to comprehens really know that's huge oh man you have all the nice things you can use you can use uh what's it called all right everyone 10 minutes is up I hope y'all got some images generated uh we are segueing straight into a break so if you want to generate more images with your friends that is cool uh alternatively please I want you'all to add me on Telegram and send me your favorite prompt I will be going through tonight and I will add all the images that I get into that big chat so y'all can kind of see what everybody generated and hopefully y'all can get some big inspiration going out of that yeah generate more images add me on telegram drink some water go to the bathroom first baby shower weeks ago hey that's [Laughter] cheating really is really great oh thank you everyone's asking so many questions yeah I I was yeah I I was that was my dread like I I had I had the Q&A all planned out I'm like hope I was like in my like I hope five minutes yeah I was like I hope five cuz 5 minutes is a long time but it's like just nonstop hacking thinking about how to use this how to break this everything yeah the the rag questions were insane amazing VI like you got his VI and like yeah I I'm quite I'm quite happy created some a little bit of a few hours of magic here just so trying yeah I'm very excited yeah thank you thank [Music] you uh like on like here do I have a do I have like a hard hard stop at 12 no no okay cool I want to get through wh okay he's very he's very fible um like I told him just he has like a 10 15 minutes okay okay I I won't go over very long I just don't think there's like five minutes left on break okay cool okay hey thanks for class has been great oh yeah want to abuse my position and ask of question um I have a ton of questions first and foremost I almost wanted to interrupt you when you were going through the code generation is there's context to no issues you know when you format all the code and then put it there's there's limitations on the architecture that it can reference um and then I found weird issues uh with that plus like where the code might exist in your conversation yeah and and I don't even know it gets to the point where I don't even know where um what code is referencing Foundation model or that yeah um I would say for for the code generation specifically like uh I I have never hit context windows on code generation and so what I I would say cont the limitations yes okay and by the way I don't expect you to have the answer no yeah yeah no you're yeah you're totally fine 101 well there's everyone that has specific answers to everything and it's all about talking anys um so I'd say uh for me usually if I'm doing code generation in like a like it's not embedded into my app in some way like I'm just in chat GPT uh and so it's really important anytime you are diverging on a like a different code path that's a new conversation yeah yeah so just making making sure that you're always you know hitting new chat because I think a lot of people just like they have a question they get answered they have another question they they answer and then it's like you have the messages array essentially in here and it's got like 12 different questions in it and that uh it is it references that more heavily than its base training so it like the your quality of answers goes down considerably and so if you're saying like hey I have some files over here that are doing you know like bite processing and then I'm going over here and I'm doing some encoding it's like it very easily gets its wires crossed yeah and especially if you're referencing code that you're you're yeah it gets weird even when you're trying to fix a bug or something like that it may not even reference the code that talking about I think rep's going to solve it because yeah yeah rep yeah repet is definitely going to make make some strides in the area like labeling could be worth something verus something else yeah um and code interpreter on gp24 is also a really big unlock if you have that or not because if you could do file uploads it I I found it to be it could be frustrating at times but it started to use it it's pretty good that's what I've heard I just started using I actually have had my own just use cases yeah yeah yeah totally and then my use case is like in an Enterprise I haven't been able to use Haven spend as much time on it yeah um so I've been only using like starer okay and then tuning on top of yeah I would um I haven't heard of St star coder it's one of the new it's it's open source that also allows developers to opt out yeah okay yeah do you have um have you tried running Llama Or llama 2 um I that's probably that's like in from what I've heard is like state-of-the-art open source models uh is llama I've done that for only structure test okay and and uh yeah I don't have hands-on experience for code generation on that I guess another question we don't have time for it later in the embeddings for structured and unstructured T at the same time oh yeah yeah that'll yeah structured versus unstructured is like it's um combining yeah yeah it can be gets out of your it gets out of my wheelhouse yeah yeah totally well definitely we uh I'll have to catch up with you more I want to answer uh his question hey um the code is timing out code theage the out uh what's what's your yeah no uh what's the terminal like what's the is it just it's saying uh cannot post to telegram blah blah blah but if I if I change that and say okay just post the link to the image oh interesting so just just to let you know mean yeah okay the Go I mean I I couldn't make it work as it is yeah okay cool than good good good that you played with it um and I'm not I'm not sure yeah yeah yeah we'll see awesome thank you a [Music] little [Music] [Music] [Music] yeah have you seen anything in the way of like an unhy valy detector or like what do you mean so there's a whole thing where like sometimes they come out and it's just like something's off you don't exactly know what but it's just feels like yeah um uh I'm like I don't really know the architectures of how they like um smoo things out before they come out with the model I was curious if there's been be aware of anything that like will take a finished image and like make it more better normal yeah you know what I mean or just like um not current so like the kind of like uh diverging on like hey you are generating this and that seems a bit weird we're going to change course uh I haven't seen a ton outside of just um like not safe for work filters where it's like you are generating a nude person right now and I need you to like slap some clothes on them yeah it's like I need you to you know diverge courses and I know that usually is like a an endep where it's like hey uh we didn't send it to the user yet but we have a fully finished image and we're able to run on it and be like okay I detected that there's nudity in here and so we're going to take this image and do it again but like add clo or or whatever and so it'll it'll kind of tweak and then send that to the user okay so that's like kind of the uh safety that's happening behind the hood yeah yeah on on some of them yeah cool thanks hi maybe you said that but um is there going to be lunch or yeah yeah I so I've got uh about 10 more minutes and we'll we'll be wrapping up yeah all right everyone break is up up we are running a little bit tight we're going to speed run our speech to text I'll make sure everybody get some food in them and we are able to finish the course uh if you do have any burning questions I will be around afterwards uh but I do want to make sure that we continue running on track I don't want to run late just because of me so we will go through Speech to Text uh Speech Texs or automatic speech recognition or ASR systems are what powers things like Alexa Siri hey Google um and what's really fun about it is it's very interoperable because once it's text you get to plug it into the rest of your stack uh like we did for everything else is regarded to text so if we can turn voice into text we can probably do some cool stuff with that uh just lots of things to be aware of um it has uh this is the whisper archit Ure whisper is kind of state-of-the-art for uh these ASR systems you'll notice that uh whisper is an example of open AI actually being open it is an open source model that has a lot of different kind of uh you know open the open source Community has definitely taken and ran with it since then um and so this is really how it works uh just really quickly um and so one thing to notice whisper or whisper one uses Transformers where previous models used uh recurrent neural networks or rnns one of the reasons that people think it does better um so differences is like on training training these models is a little bit more difficult if you just think about how how easy is it for you to find a bunch of text on a subject versus high quality audio that's a lot harder to find high quality audio for things training these is very difficult uh because not only do you have to find high quality audio you also have to go through this audio labeling process where you feed it all of the input and say like hey this is what this is so that it can pattern match on that in the future um similar to how we talked about you cannot run uh gp4 on your computer you can absolutely run uh ASR models like whisper on your own machine this is very fun and is something I encourage you to do in your own time uh here you'll notice whisper dominates everything in the landscape um so you can play if you're uh watching at home or or doing anything you start really with hugging face spaces that's you can just go find a space on hugging face it has inference on it you can talk to that web page and get a response back you don't have to sign up for a whole bunch of stuff that's like the lowest friction way uh we are going to be using the whisper API uh since we already have that configured um but generally nobody is using the whisper API because why am I going to pay you when I could run it myself uh you know I'm not going to set it up on your machine but I would highly encourage you to uh if you want to run it on your own machine uh so you can either run The Whisper model that they open sourced uh or which I would recommend is you can use the whisper CPP model uh somebody took the whisper model and rewrote it in C++ so that it would run on CPU cores better uh so that is the machine that you want to use uh if you're running it on your own machine uh you're not going to get much better bang for your buck there um and there is also a last one you can look up is whisper x with diarization uh diarization if you haven't heard that word before is the act of determining speakers so if I have a podcast between me and my amazing helper Justin uh and we upload that to the internet that's a single MP3 file it's it's hard to tell the difference between me talking and him talking if you're not listening and you're just looking at the Audio Waves diarization is the act of breaking down that MP3 file and labeling it hey this is Noah talking and hey this is Justin talking that is diarization uh and somebody has added whisper X I think they're a PhD student that is like doing their thesis on on this uh you love to see Academia and open source Crossing um and then there's all of these links do the other way around where you can do text and turn it into speech that's another very fun one um all of these are various uh platforms that you can do that on uh tortois TTS is the open source uh leading in in that category uh and so for that we're going to get straight into the implementation for this uh so you can see we're just going to do get Branch step 4 cool uh and so what this does uh is telegram has voice functionality or voice memos built into the platform so the usage is we're going to record a note on Telegram and we are going to send that to the whisper server and we are going to get the transcription back and send that to the user so what that looks like if we can move this over here uh we can we can run.py file uh we will look at how we do this in the actual code so if we go to um here we have got the code generation basic config chatbot uh hello image code generation question did this not add it that would be quite annoying it's in my it's in your step for then maybe I just did this wrong oh uh yeah I don't want to get branched I want to get check out that that would help cool thank you okay and you will notice here uh we are we are saving all these voice notes on the server these are examples of me talking to the chatbot uh so we are saving that voice note on the server uh and then sending it back so what we get here is going all the way down here we go transcribe message so again like all the other ones update and context uh we want to make sure that we have a voice ID we can check that by again going to update update is where we have all the contents of the conversation we're going to check the message and see if there's any voice in there and if it is we want to grab the file ID so we now have the voice ID uh and we can turn that into a file by using the bot. git file uh method and plugging in the voice ID telegram will then grab that file on their server and send it to us and then we just download this file has a download to drive method and so we have we call it voice note and then we add the voice ID to make sure we don't have any naming collisions on our system and then it is a OG file and at that point there's going to be a little bit of latency so this is just a way to kind of make the ux a little bit better uh you'll notice we haven't gotten into streaming at all if y'all used chat GPT you notice whenever you ask it a question you get to see it it's a much better user experience and saying you know who is Simon Cal and then waiting 10 seconds and then you get a huge block of text uh streaming is I would say table Stakes if you are creating an AI product especially if it is text based and if you are not able to stream the response to the user you need to do something to let them know they're like hey we got your request and we are working on it uh just traditional you know ux things let the user know that they uh did something and that you are working on it so once we have that downloaded uh we have the audio file uh and we are going to use the open. audio. transcribe method and we are going to use the whisper one model there is only one model that the API supports and it is whisper one and you're going to plug in the audio file and once you have that transcript you can say hey transcript is finished and we are going to send that back to the user um and you'll notice here that we're using slightly different this is not the context dobot that we've been using we are replying to the text message because we sent them a message and it looks a little bit nicer and in the uh we're just kind of like adding layers to it making the ux a little bit nicer um so with all of that oh there's also the uh Handler is a bit different for this uh so we have uh we go back to the message Handler instead of the command Handler and we use the filter uh the filters object has the voice so we are listening to any message they don't have to say slvo orash transcribe because they're not sending text they're sending a file they don't have a way to let you know so any message that has a voice file we are going to transcribe that message uh using that message Handler function uh and what we can do here is we could go ahead and run this uh and instead of running it on my laptop we'll see let's actually make sure that this runs as I will use my phone here and I see 38 telegram messages thank you all so much for sending me your images I will be going through later tonight and posting all of them in this chat so that y'all can see everything uh y'all's participation today has been really amazing and thank you so much for that so what we will go through here let's uh go over to Telegram and on my phone I'm gonna say hey my name is Noah Hine I am in front of a bunch of upcoming AI Engineers rubber baby buggy bumpers cool we send the message it's downloaded it's transcribing the transcript is finished and although it spelled my name wrong h i NE E I will take that uh that is a beautiful transcription uh doing that directly out of the box and you can just your mind can go wild here uh something that uh like if that does not blow your mind you're not thinking crazily enough how hard somebody would have had to work to get that working you know a year ago two years ago like that's not possible and now you said o Open ai. transcribe are you kidding me that's insane that is so cool uh and you're seeing you know you can you will see in the future much much more interactive applications through AI uh most people don't have any AI in their day-to-day outside of their time line on their social media apps and they are already cripplingly addicted to those so just imagine how crazy and personalized experiences you're going to get when things like that are just right out of the box um and really if you've done all of that that is the workshop thank you all so much uh this is all my contact info if you want to get in touch with me thank you all so much for your participation uh I will give up the floor here but really big shout out shout out to Shawn and Ben for putting this together uh he has been a huge uh Sean specifically has been a huge inspiration for me he's one of the reasons I got into Tech in the first place and so while I wouldn't be here in this room because without him because the conference wouldn't be going on I literally may not have even gotten into Tech without him uh and so really just big shout out to him and all of the work that he's doing and that's it I think that's uh lunch uh Sean do you want to grab Sean does he have any is it just straight to lunch

Info

Channel: AI Engineer

Views: 3,677

Rating: undefined out of 5

Keywords:

Id: C0ZUdFg-iTo

Channel Id: undefined

Length: 182min 24sec (10944 seconds)

Published: Mon Nov 06 2023