Live Coding - Langchain Agents for Pandas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
foreign foreign hopefully you can hear me and see me and all that it is June 27th the year is 2023 and we are here to do some Live code live coding and hanging out with our friends tonight hope you're doing well it's been a little while since I streamed last and I figured I'd give it I'd let you all choose what we do tonight what we're doing tonight first we need to mute this mute that all right so if you're not already with us on Twitch you should join on Twitch because that's a way you can um where you can chat with us but I'm going to put in to our chat the link to the twitch stream in there I am also going to make a poll the poll is going to contain what we're gonna work on tonight and it's completely up to you Transformers please someone wants Transformers uh let me know also if the sound is okay and everything video producer no I need to go into Channel moderator how's everyone's week going so far hi good morning from Vietnam welcome duck ducknh welcome to the stream I hope you're doing well I'm gonna go ahead and switch this up to here um we don't need to see this nonsense I'll just put this over here this is my um camera working and we are going to go ahead and take this over here for a second ah ah too much over here we're going to make a poll what should we work on tonight new poll I want you all to decide what to work on tonight we could look a little bit more into H2O gbt we could test it out a little bit more on my local machine I have a video that I've put together that I want to come out tomorrow hopefully and where I am testing H2O GPT and showing you how to install it on your own machine we could also just look into Lang chain I'm a little bit interested in link chain but I'm completely new to it so I'd love to maybe check that out on stream and get to know stuff uh we could just do something totally different we could just like uh create our own data set any experience with Mojo no hasn't Mojo just come out um I have no experience with it I've I listened to a podcast of the Mojo Creator Matthew but that's about it uh any idea how to approach Higgs ml challenge I don't know what the Higgs ml challenge is either good night from Columbia oh good night thanks for watching is that a new kaggle competition wait this is this is one from nine years ago I don't think this is the Higgs challenge you're talking about right okay so um any other ideas of what we could create our own data set [Music] um uh test out uh just do some data wrangling we can get a data set that we already have on kaggle or maybe something like that we can do some data wrangling so this is up to you guys we have guys and gals we have five minutes on the clock I'm going to go ahead and start this poll and we're going to view the results as it's going on if you're not already in twitch you should be joining us in the twitch chat where you can do that vectorize something what does that mean fact just vectorize something let's take some code and vectorize it look at all these tabs I've opened I need a I need to chill out on all these tabs too many tabs for one man uh we got Lang Chang H2O GPT in tied for first I'm gonna go ahead and put this on here would you remember would you recommend some good data science projects for a final year project yes you gotta tell me what you're interested in it all depends on what you are excited about and what you're interested in you got to give me something to start with and then we can talk from there because I'm sure from there we can find some really cool stuff that you might want to work on I mean maybe that could be a data set that we could look into um okay three people have voted come on join us over on Twitch and vote all you have to do is click this link here um for you it's going to look a little bit different and then you gotta vote I don't know how you vote where's the Voting is it in the chat I I loaded this up like it was easy okay expand the chat and then what are we working on here and then you get to choose so uh Lang Chan chain I'm gonna give it a vote just to just to add my opinion in here but it looks like H2O GPT Lang Chan uh those are all uh tied while this is going on oh this maybe isn't reloading while this is going on I do want to show you guys something um I think it's gpt.h2o.ai it's pretty cool they just modified this version of H2O gbt so now you can ask a single question I'm going to put this link into the chat you can ask one question and get all these different models to tell you their results so it's kind of cool this um okay so we have the falcon 7 billion parameter model now the Falcon 7 billion perimeter model was actually trained on more data than the 40 billion perimeter model because if you think about it the bigger the model the longer it takes to train right so um so it's kind of com but for a lot of things I mean it kind of depends what you're asking it what what it responds better with but more complex things I think the 40 billion perimeter model work well if you guys don't know about the Falcon models these llm models were released completely open source and then H2O and other people have gone in and fine-tuned them I think the H2O ones are the best but that's what these models are here then this MPT 30 billion model came out that was that's supposed to be pretty good and I think it also has a really long um window that it can um what's it called input window Max sequence length uh here it says the model was initially trained on a sequence length of 2048 additional pre-training phase was included for sequence adaptation of 81.92 so that's like the context length being that long is kind of interesting then we have uh the kuna and then h2o's llama train that's a 65 billion perimeter model but that is just for research only so that kind of changes that and then gpt3 like we all know chat gbt right context window yeah that's what the word I was trying to think of hey Rob long time no see I'm just have hey what's up hash what welcome to the chat uh can you take a look at that nine-year-old Higgs competition I was working on it for projector was a projector um maybe we'll look at it it's it's old though it's nine years old aren't there people that have answered that already um so here is our voting people are voting for Lang Chang a lot of votes for laying chain so we might check that out um but let's ask this some questions so the crazy thing about these large language models is it really in order to test them you have to think of something novel to ask it that it may have never heard before and I've I heard um people like there's this there's this um common question that you can ask it where you're trying to get it to figure out um the context of what people are thinking so like uh Billy uh is in a room with a cat and a box he puts The Cat In The Box and leaves the room Sally comes in the room and moves the cat from the box to um to to under a sheet when Billy returns where does he think the cat is I don't know if that's I'm even phrasing that right um so now we can see each of these models is giving its Billy thinks the cat is under the sheet but wouldn't what maybe Billy can see it under the sheet maybe that's the trick in this uh Billy thinks the cat is still in the box okay so uh the 40 billion parameter Falcon model got it seven billion didn't Billy thinks the cat is in the Box because that's where he left it Mosaic got it when Billy returns to room he will likely think that the cat is still in the box as he left it there the kuna got it Billy we think cat is in box yeah so everyone but um sorry I was talking you up Falcon 77 billion you didn't know it other questions we can ask it are like uh what is the tallest building in South Carolina be specific about it this is just like a simple lookup uh Columbia Tower a Bank of America Corporate Center yeah what's up with uh MP3 MPT instruct being so short and sweet South Carolina State House 180 feet oh that's nowhere close to this and these are getting this these are giving us different feet tall for it so let's look up tallest building in South Carolina so spit okay South Tyrant that's 786 feet all these are wrong even gpt3 or maybe this is wrong I don't know what do you guys think all right so we're gonna go we are going to go into the voting and we see that laying chain has one all right so I know very little about Lang chain and we're gonna work on it together as we go I'm gonna go ahead here and I'm going to um go into my repositories directory and then I'm going to go into our twitch stream projects now this is something I'm interested in because I know that H2O GPT has Lane chain integration so um could potentially be something that we'd work on okay so I do have this Lane chain uh data frame agent that I was working on before so I'm going to go up one directory I'm going to activate my kaggle 2023 content environment and then I'm going to open up Jupiter lab I like my Jupiter lab don't make fun of me for using Jupiter lab it's my fave okay we can load up vs code in a little bit but not right now all right so 62 was Lang chain and we can open up a new window so Lang chain introduction so we're gonna call this Lang chain and we're going to get groovin groovin and proven is my chat working or is just no one saying anything yep no one's saying anything that's okay I'm having a good time okay so I'm gonna in this other window here need to see if I installed Lang chain already so we're gonna go to the Lang Chang documentation Lane change is a framework for developing applications powered by language models we believe that the most powerful and differentiated applications will not only call out into a language model via an API but it will also number one be data aware connect a language model to other data sources sources of data to be energetic allow the a language model to interact with its environment so maybe we can jump forward here and see an example because I thought this was the best way for me to understand it was when I looked at this data frame agent um all right so it's cool because you cannot like they're trying to make these agents where you can write python code or you can tell it to write certain code but then it actually calls an interpreter and tries to do it and the response is kind of it'll try a few Loops of testing to see if that code works or not um so it's like using a language model it's almost like taking um co-pilot to the next level where you're actually using that output and seeing what it is uh do you know how to make markup cells have a different font I have Sans serif font because I never have the eyes and oh you hate Sans serif um I don't know how to but we could probably I know you can do all the HTML stuff and it gets kind of let's see if we can do this since you're the only one since you were the only one uh messaging here on the chat I'm going to see if this works for you I am in love okay so I hate I hate that you have to write HTML stuff to get this to work like CSS formatting but you can do this you can do the different font you have to create this span with a style and then give it the font family and the font size but then then it does work I'm not going to write this every time I write markdown though but in case you really want to hey Rob do you still feel in 2023 a masters can open doors to start a career in data science engineering or would you invest in other things to stand out um I think it depends where you're getting your master's degree from [Music] um and I think it matters more if you're gonna get as much out of the master's program as you want like it's to for me that my master's program was more about connecting with other people who are excited about working on the same stuff that I was or or learning about data science and those connections really did help if anything just to be talking with those people a lot and um and also making friends I think is helpful in terms of like really being a career changer I do think if I was looking at application and one person did have a master's degree and one person didn't I mean it would probably lean me towards the person with a master's degree but at the same time like it really depends on what company you're trying to get you're trying to work at uh some people that some companies value that more than others do you find co-pilot useful I do find it useful yeah I find it useful I've had a few times recently where I found it really useful let's load this extension lab black I like this one because it'll Auto format our code so this is just downloading this Titanic data set so we have the Titanic data set in here as a data frame I bring this up every time that the Titanic data set is mentioned but I did have a great uncle I think my my grandfather's brother maybe maybe his cousin died on the Titanic so just throwing that out there I am cool if you weren't sure if I was cool now you know that my grand grandfather's died on the Titanic uh I I started in Georgia Tech but not feeling I'm getting the environment feeling knowledge wise I'm learning more in diverse content online yeah the Georgia Tech program I've heard a lot of good things from so um really in life you have to make your own decisions though so you might need to take my advice a bunch of other people's advice and then do what's best for you that's really the only way you can do it um all right so we have this pandas data frame right and let's say we wanted to do something like let's ask a question if we wanted to uh what's the average age of men who serve survived yeah let's just ask that question right if we're gonna do it here we would I know we would write this pandas code that's DF query survived equals one which is like why isn't that a Boolean why isn't this column survived of Boolean true false it really should be like this like true false but okay we know that it's one zeros here so we're gonna filter to one and uh sex equals male and I needed single quotes right so these are all the men who survived we also have a lot of missing ages um and we would do the mean on this so the average value here would be ignoring these these nand values let's just drop an a to be sure and then do a mean on this yeah so it's 27 is the average age and we I guess we can't really do anything about the ages that we don't know all right so that's how we would do it here so but we are going to create an llm agent that will do this for us all right so we can do this by using uh oh chat open AI so open AIS um endpoint but hopefully eventually we would switch this out for maybe a custom train model that's trained specifically to work um Group by sex into AG yeah you could do that too AG um but we're going to create this agent and if we learn we look at the laying chain create bit oh this is an issue with it Panda's data frame agent what they say here I'm going to turn on my dark mode is that you have zero saw react description this shows initialize yours here's the so you can create this Panda's data frame agent will you provided the data frame I think what it does then is it like get some descriptions about the data frame and it provides it to the llm prompt and we can set verbose to be true which we will to try to see what's going on under the hood we can also change the temperature of the agent and pick different agents if we wanted and then we call on the agent by saying dot run and it should be able to Loop through and ask the llm over and over again until it solves our problem right um so let's go ahead and create this agent does that all make sense to you guys Heyman ever use polars seems faster than pandas on certain aspects uh hey check that's a great question um woo third if you go to my YouTube which is YouTube Mulla here you will see I have a whole video on it I have a whole video on uh pandas spark and pullers I also have a whole video on just specifically polars here that you can watch so yeah definitely subscribe to my YouTube channel you might see some videos on it yes uh polars is is great I I don't use it nearly as much as I should just because I'm not really needing to write the fastest code as much as just writing the code that works all right so how many of us think this will work if I just ask the question how should we phrase this question um because we know this data frame looks like this right we could be specific and say like when survived equals one and sex equals male what is the average age ignoring null values or we could be less verbose and we could say is there mouth audio delay or is it just me it probably is like is my is my audio more delayed or less delayed so I can make it more ask the same way you read it earlier okay so tell me if the delay is better now in my voice I just made it a larger delay all right so I'm gonna run agent run and say what is the average age yeah let's let's write it what's the average age of men who survived we got that quote in there so we got to make this double quotes how many people think this will actually work and how many how many people think it's not gonna work in chat anyone all right let's go ahead and check it out all right so it's creating a new chain invoking python Rebel asked with DF okay so this is what I got DF survived equal one age DF male hey look it worked the average men who survived is approximately 27.28 years when we ran it here it I mean that's the answer 27.28 so it was able to answer that just straight from our data frame pretty good right pretty nice all right let's make it a little bit harder uh let's make it a little harder I'm survived well I'm surprised that it the thing is it's also the Titanic data set so we have to remember that this model may have been trained on this exact question like it might have already seen it on this exact same data set so it already knows that survived equals one and that the gender column is in this uh column name sex but let's give it the benefit of the doubt and give it a little bit harder stuff so um what was the fair okay let's do it let's do the difference between two groups the difference between two groups would be um what is the difference in Fair paid by passengers passengers who survived in passengers who didn't also break down well I don't want to make it too too complicated so now we're asking for the answer that's kind of like two different things right we're not necessarily passengers oh yeah I spelled it wrong not surprising um um so what's the difference in Fair paid by passengers past injurers who survived and passengers who didn't if we were going to do this here we could just do a group by survived right and then we could do Fair and then take the mean of each and now we have the fair paid by passengers who survived in the fair paid by passengers who did not make sense so we could do that pretty easily ourselves let me delete all this stuff all right let's have our agent do this and I guess if we wanted to find the difference so we're asking for the difference so that would be this I don't know I guess we could do like this minus this would it actually be the difference in Fair would that be right or I guess there's a bunch of different ways we could do this we could do this mean unstack uh or we could do a diff yeah diff would work diff would be 26.27 so basically the difference the difference is 26.27 but really the individual values that we would know for those who survived and didn't was um 48 and 22. so if you paid more to get on the Titanic you're uh well I guess it's not causal but we can see a correlation between surviving and paying more we could even plot this how to get the dark mode in Jupiter notebooks Samir watch my whole video on my setup I also have a whole YouTube video on uh Jupiter so this one this one on Jupiter notebook I tell you how to do it I also explain how to do it in this video the data science setup it kind of explains the coding setup that I use all right so can we see if the agent is able to get this answer oh my goodness oh my goodness it wrote almost the exact same code that I wrote it wrote almost the exact same code I wrote and it took it another it took it another step the average Fair paid by passengers who did not survive is 22.12 while the average Fair paid by passengers who survived is 40 point 4-0 yeah that's right therefore the difference in the fair passengers is 26.28 that's just that's good uh could you ask it to command show ask it to command to show the data frame with background gradient on Fair I don't know if it's going to be able to do that because it's just taking the data frame in and then giving us a text as the output like the output of this agent running is just the text response uh so I can't imagine now now before we start getting our minds blown about this let's keep in mind let's keep in mind let's be skeptical here um it may just be really good at the Titanic data set because that's one of the most popular data sets out there that people mess around with so let's let's uh test this Lane chain thing out with maybe a little bit more complicated data set or maybe a data set that it's never seen before so I'm going to read in a local data set what did we what's something we created we created this this is a CSV read CSV so we created this data set ourselves on stream I don't know a year ago and it has all the different I think it's the top 50 or so baby names per year so like if we let's just call this DF to make it simple and then query where year equals 2021 and and sex equals f ort values by count and let's do a sending equals false and let's do a head um so as an example as an example oh let's also drop this uh we can see here in this example that we have the top names baby names for females in the year 2021 were Olivia Emma Charlotte Amelia and Ava hello Rob how are you hey odinson welcome how are you we're just uh trying out llm agents I think we can make a really good fine-tuned model for this sort of stuff do you have a master's PhD or different degree or no degree would you recommend a path that you went does it really depend on a person it depends the most on the person I I had a I went to school got my undergrad then while I was working I got another master's degree but these were both in electrical engineering and then I wanted to code more and I wanted to do data science and data science was starting out so I went back and got another master's degree in data science and then I was thinking maybe I should go get my PhD and I was talking to this professor close by and at the same time this is my story at the same time as talking that professor and really thinking about potentially spending three to five years of my life um trying to get a PhD I was also getting really excited about participating in kaggle competitions and I told my wife I was like I could either do whatever these people want for a PhD program and um have to go in and like take extra classes and stuff like that at the end of that I'll have a PhD or I could just really grind on kaggle and see if I could like get higher ranked on that and turns out I became a four times Grand Master after like a few years of doing that so I wouldn't take that back I'm glad I made that decision and that's the sort of thing you just have to like think to yourself what do I really want do I like phds are definitely um worth it for the people who want to do it how long did the part-time Masters take and if you don't mind answering did you pay or have support yeah so the big thing great question for me with the master's degree the reason why I got both of my well part of the reason why I got both of my masters is because my company paid for it almost in full so my recommendation would be if you can get a job that also supports education it's like a no-brainer you just gotta grind every night PP I'm in the cornhole data science course it costs money but is exactly what you're looking for nice how many hours have you spent on kaggle in total I could not count there have been months where I'm like in the in the zone always thinking about it but not right now right now I'm I'm starting a new job at H2O and I'm learning as much as I can about large language models because that's what we've been doing might try to squeeze in a cargo competition here or there so we're using large language models to try to see if it can work on this new data set now this baby data set um I don't believe is out there publicly so we wouldn't expect these questions to for it to know very well all right so why don't we just take this uh uh baby names and our question is what were the top five female names in the year 2021. I'm gonna be surprised if it can answer this one so we're gonna have to create a new agent from a data frame again using GPT three three I guess we could switch to see if it's any better with other ones we're providing our data frame and we're making this verbose so we can see the result and we're going to run this question what were the top five female names in the year 2021 let's see what our agents gonna come up with up Booyah it's done okay um now I'm pretty impressed right because it's doing it no problem Steve happy to answer where can I get lectures on large language models um there's a lot on YouTube I think only on YouTube basically we're trying to make content about it the top five female names in the year 2021 are Olivia um with this many occurrences Emma wow that's good uh let's try to so far it hasn't failed at anything what could we ask it that might stump it um let's ask something that's more it would have to look more broad across the whole data to know uh um so let's look at this let's query where sex equals m and we'll set our index to year no wait we'll Group by year and we're going to sort values by the count ascending equals false maybe I'll have to do this first and then we're going to take just the first so this should be the top name of every year right um and then we're gonna plot then we're going to take the count and just plot it okay so this is the most popular name male name so let's put a title um count of most popular male name by year hey xcode just subscribed xcode thank you for subscribing Because of You subscribing we're gonna go ahead and spin the wheel let's go ahead and do this I haven't done this in a while sigh into the mic this is for you xcode ah thanks for watching thanks for subscribing I appreciate it let's go that's right I've had chat gbt build regression trees from scratch with all sorts of constraints other complex thing and it's always produced ridiculous um yeah it is pretty it's pretty impressive um I've found that like you still need a you know think you need to think a little bit all right so this is the most popular name by year now if I took this so we could see the plot we could see visually that this year what year is that something in the 1940s maybe the name is Robert oh no I think that name was popular More in the 20s um but if we took this and then we sort the values so we're taking the first one which is the top and then we're going to sort values by the count ascending equals false um so what we're seeing is in 1947 the name James occurred 94 000 times and that's the most a single name occurred in a single year at least for the male names so how do we ask that question we want to see if it can figure this out so there's a little bit more complex and it might even be able to do it a better way than I just did here uh but let's phrase this question um thing is it doesn't even need to do this stuff because it could just do it could also just do query yeah but let's let's let it do it um what find so the question is find the year in this this data set where the male name occurred the most out of any male names of all time what was the year and the name yeah what was your in the name so let's run this agent I think this should work entering a new chain is it going to go zero shot again no shot hello from Brazil what up says lovey okay so I think we got a failure I think we got a failure because I don't know if it was able to parse it out so the answer that it gave to find the year where the male name occurred the most any amount of time you can group a data fine by year in sex columns some the count column for each group and to find the maximum count finally we can find the data frame to get a row with the maximum count and retrieve the year in name so this is just like asking gpt3 and getting an answer but it wasn't able to like get this through the chain it just returned this text so I'd say this is a fail this is a fail this is the first time it actually failed oops integrating msft is doing some super cool stuff with fabric in its Power Platform oh uh what's the overfit what are you talking about what's the what what is the find the year maybe I need to ask as more Direction what is the year and name for a male in that occurred the most in a single year I mean let's not really well phrased by mean but we got an even worse failure could not parse tool input okay so it even failed just straight up let's try it again like most things with large language models if you just try it a bunch of times you might get lucky all right so that's not working what is the year and name um with the maximum occur currents of all time it's completely failing now I'm just going to restart just to make sure it's not an issue with the kernel messing up reload these in load in our names data set see if it can answer this question and answered that see if it can answer this question no we still get this Json decode error so maybe we need to see ask chat gbt fix it have the robots fix the robots they don't it doesn't work that way you get an infinite Loop was here in the name of the male with the highest count of all time okay so let's let's use take my derivatives suggestion much better wording oh if this would let me just copy please let me copy paste this in here with the highest highest count of all time still a Json decoding error you can see that it wants this uh it wants kind of this to be the code right will this work key error so this provides a key area why is this breaking because that's the location of the max yeah I don't think it can be done this way maybe this would work yeah so this works it's just bad code I think that's why it's not working I think this needs to be in here yeah there that would work so this code works what they provided doesn't work and I don't know that if that's the reason for the error but it might be slight difference in the syntax but this will work because you can find the ID Max on the entire data frame otherwise you're trying to locate it on a pre-filtered data frame which is a No-No no no what weighs faster what do you mean asking the bot or just knowing how to do it if you know how to do this then it's much faster just to write the code for me at least oh the actual function call what versus what what are we can comparing the speed to all right so we can do a time it on this one and mine which I get which I know is going to be slower mine's definitely slower uh mine would just be like a head of one that's James 1947 I got the count there um run my time in on this yeah mine's slower but I mean they're both slow enough that it doesn't matter yeah I wouldn't I wouldn't write it this way for Speed definitely but I mean it was close it was close so let's try to see what's happening under the hood here with Lang chain um we can also do multiple data frame um um so let's look at the GitHub repo I think so what would be interesting too is just like how in this H2O GPT which I was showing you guys earlier where we can run and we can test how well it does on different models like let's ask this question in this and see how H2O GPT does because hgo H2O GPT is meant to be like a human agent question answering sort of thing not necessarily fine-tuned for coding um let's let's ask this one I have a panda's data frame with uh name pop you popularity by year columns are hey let's put the dark mode on columns are this and then let's ask it our question create code to find what the what were the top name female names of year 2021 so a lot of them are creating sample data frames ooh this font here is messing up but this looks right oh what is vicuna doing I need definitely turn this off here filter the data frame to 2021 okay this one's going rogue this one's going rogue falcon 40b is really going all out with this sample data frame I think it's in an infinite Loop sometimes that happens yeah so this looks like it would work Group by year sex as index faults name value counts maybe that wouldn't let's try this one yeah it I mean you can't do that you could do it like this I don't think that's right all right so um what were we doing we're looking at Lang chain we're gonna go into Lang chain and see what this code of this agent actually does create Panda's data frame agent is this just gonna be in the init this is going to be in the agents tool kits let's find the actual function all right so if agent type is zero shot react description which I think that's what we set it to right no we set it to open AI functions then it's going to use git functions prompt and tools data frame prefix suffix input variables include data frame imprompt is that something we could set and then it's open a functions agent and it provides it the prompt the tools and a callback manager yeah this is beyond me I've never used this before so if we look at the open AI functions agent this looks like it's more Bare Bones it just takes the llm tools that has access to and the prompt I don't know what do you guys think I don't know if we've learned anything about Lang chain I just joined can you quickly tell me what you're doing we're using Lane chain um agents to analyze our pan as data frames automatically just using texts um so trying to figure out exactly how this works but maybe we need to jump just basic introduction to Lang chain to understand that so so if we go back here quick install chat bot agents let's look at an end-to-end example of oh this is one of the ones that talks to you like it's actual solution we want to see the documentation action agents at a high level an action agent receives user input decides which tool if any to use that the tool input calls the tool and Records the output decides the next step using the history of tools tool inputs and observations repeats three and four until it determines it can respond to you users so it's like a an infinite Loop until it figures out how to do my data frame filtering action agents are wrapped in agent executors which are responsible for calling the agent getting back in action and action input calling the tool that the action agent references with the generated input getting the output out of the tool and then passing it all the information back into the agent although the agent can be constructed in many ways it typically involves these components The Prompt template the language model and the output parser so did this have um did this one have a template base prompt template create pandas data frame include data frame imprompt so I want to see what it does like okay so here's where it does it look it's definitely running a head command on the data frame and then converting it to markdown so it's doing something like this and providing it to even though that looks super ugly it's doing something like this and then providing it to the model when it runs so that I can see it should be smart enough to see the column names in some sample data I don't know why it doesn't also do like info or describe something like this this might be more helpful too link chain that you are using is not running locally right be careful using sensitive info can you do the same around a local open source on CPU locally shorts that's my plan eventually and I like I mentioned I have um a video that's going to come out on H2O gbt but yeah it's not running locally here we're just hitting up the the um GPT endpoint we're just trying to understand how length chain works so it is kind of cool that it it creates this suffix to use okay and then this is how it deals with the fact that if you have a multiple data frames then it basically runs a head command to mark down for each data frame in the data frame um and then it adds that as the data frames head but let's see an example here getting start first let's load a language model okay all right so we're intro to agents we're learning here so this is going to be a little bit hard at first but we're gonna get to it load all right I'm feeling it put this over here let's see what we're doing okay so we're pulling in a bunch of stuff for agents creating an agent type and then using the llm from openai for now um let's just create this open Ai and then let's also add some tools that we can use so load tools is it poppy what is this and then LML M math I don't know what these tools are and we'll give it the llm is this llm I need a what is this serp API I need an API key for this in order for that to work so let's turn this off just use llm math then let's initiate this agent with the tools the language model and the type of agent we want to use zero shot react description is what we want Let's test it out who's Leonardo DiCaprio's girlfriend what's your current age raised to the 0.43 power Google is not a value valid tool try another one okay so now it's trying Wikipedia now let's try and Google again now it's trying Wikipedia I think it needs this sir poppy do you guys know what this does anyone know what this is search engine scraping agent so let's sign up don't want you guys to see it serp API oh Google search API that makes that makes a lot more sense sign in with Google free plan you need to verify email confirming my email off screen verify my phone seriously what six digit code oh my phone's out of battery isn't it I have no texts okay this might be more trouble than it's worth we could see what their examples shows sir poppy yeah data science versus software engineering whatever floats your boat Audi it's more about what you want from it agent uh so this is this finished after it hit the time limit so this math okay so I think what it's supposed to do is use the Google API to search this should be smart enough to do this and know that they should use the Google search and then be smart enough to know that the math one can raise that age to a certain power my dream is to become competent in data science and software engineering it's long tough road yeah serp API is not really done by Google right I don't know I don't know trying to get my API key they haven't texted me let's see what other agent tools that there are load tools laying chain agent tools okay so it's saying that we can do that but what are the tools defining custom tools serp API wrapper human in the loop tool that sounds cool okay so these are all the agents that exist I think already maybe just be cool to check out some of these oh this is pretty cool so let's just try to use the python agent instead of starting from scratch all right we're going to test this we're going to try to break it um so they have just this python create python agent and then their example is to ask it what's the 10th Fibonacci number so it's going to write the code I think okay I need to calculate the 10th Fibonacci number that's a question action python Rebel it returns this and then print it and then it knows to it knows to actually run this in Python I believe and then it'll tell us the 10th Fibonacci number um did it actually ask the model to create that function create a python function that computes the 10th Fibonacci number let's see if any of these models can do it yeah so most of them know Fibonacci so basically it's taking this function from the llm then it's running its own thing knowing the answer is 55 and then forming that as text what's another what's another question we can ask what's a question python question we can ask I kind of funny to invoke a bunch of models at once yeah this is a cool this is a cool thing like it it'll show the answers of all these open source models and then gpt3 in the corner I guess llama isn't completely open source but just to see how they compare and the thing I'm noticing these days is how much the Gap is being closed especially with the new Falcon Mount models they can do pretty well not quite at the point of gbt35 but getting there and new models are just going to be coming out more and more so I don't know why I would ask the python API that what else would you ask a python agent okay so this is the this is the next next example they give understand write a single neural neuron neural network in pi torch take synthetic data for y equals 2x train 1000 epochs and print every 100 epochs return prediction for x equals five okay so is it actually going to execute this python code maybe create a data frame yeah but it would have to read the data from somewhere right so this is this does have it import a library okay oh so it was able to work but it's kind of hard because you don't have access to like your hard disk and it's a lot of trusting like why would it uh I guess it saves time because you don't have to you have to write this text and then trust that it's going to find the right answer I just don't see how I could apply this to any real Pi torch application would it create simple data on its own all right let's give it let's ask it that question uh create a random okay python agent create random data create a panda's data frame with is that how they phrased it here understand write a single neural network uh create a pandas data frame with uh random data and of the following columns oh no no don't go yet don't go yet columns no uh name age and height height and State make name a random sounding name age anything zero to one hundred and height anything zero two seven feet uh and use numpy random to randomly select from a normal distribution I don't know I'm just trying to add in numpy so it's like more of a python question entering new chain ask it to test and verify its latest last response um Oh weird this almost got there so it's trying the cool thing is that it's like uh it's trying and then when it fails I wonder if it's providing this to the last response the final I know the final answer the final answer is DF all right so did this work so it's kind of cool that it actually is running this in a pin in a python interpreter and then creating the data so here's the question is the height yeah the height if I plot this uh plot kind his it's not going to work because it's not that many examples but if we make this number of examples 100 it's going to be normally distributed because it uses a normal distribution yeah that's pretty cool um I I wasn't really explicit about which which one to uh to make the normal distribution it's also interesting that how like the final result is always either going to work or it's going to fail I don't know how to actually it would be cool if it then threw you into an interpreter with this code having been run but it's almost like the best thing to do at this point once you get the right answers to copy and paste it yeah I don't know cool I think that a lot of cool stuff with Lang chain we learned about Lang chain agents uh we also tested out this Lane chain data frame agent and the python agent and I could just imagine once we have fine-tuned models for specific tasks like that the llms might even perform better although it was surprising how well they performed just on their own I still don't see this as a huge Time Saver though because if you're really someone who works with data all day maybe you're not the end user then you're not the type of person that's going to be using um this type of tool but then I wouldn't trust you to answer any of these questions if you're just someone who's going to ask a llm model who knows though it seems like we get blown away with new technology every day so I hope you guys enjoyed the stream we can go ahead and um what should we raid someone else's channel on Twitch yeah let's go ahead and do that thanks for following all you people coding with strangers is live and we are going to raid him he's a nice guy be nice to him here he is on his stream so if you're with us on Twitch don't leave hang around say hi to coding with strangers if you're with us on YouTube check us out on Twitch because that's the place to be here in the chat is my YouTube you can also follow me on on Twitter I'm gonna put that here do you feel that way about specifically about laying chain or about llms generally I don't know yet no I think that LMS are helpful I'm just not sure if like how if automating it to that extent is really the best use maybe it is yeah thanks for hanging out but Butowski bogusky I always mess that up um let's go ahead and do the raid we're gonna go here raid coding with strangers and there we go I will see you guys um next time stay safe be kind to each other and thanks for watching I appreciate it until next time bye-bye we raided okay
Info
Channel: Rob Mulla
Views: 4,232
Rating: undefined out of 5
Keywords: gpt 4, machine learning, langchain in python, langchain ai, langchain agents, data analytics, large language models explained, langchain chatgpt, langchain prompt examples, langchain chatbot tutorial, longchain chat gpt clone, langchain prompt hub, langchain agent, langchain pandas, pandas langchain agent, pandas llm
Id: avCEiaTq3ws
Channel Id: undefined
Length: 83min 47sec (5027 seconds)
Published: Wed Jun 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.