Deep Learning Chatbot R&D

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on everybody and welcome to an R&D style video I don't normally do videos like this because it's basically either impossible or very difficult to structure things in such a way that people can follow along but in these times I think people are pretty starved for content so I thought you know I could do a video on at least some of the stuff that I do that you guys really never see so a project like what I've been working on here with this chatbot is kind of so big and so just complex that probably there will never be a series from like start to finish of you know doing what I'm doing here so instead I thought it would be kind of interesting to try out kind of like a more R&D style type video as well as at least towards the middle and maybe the end of what I can do is I can leave what I'm going to try to work on here is scoring chatbot output and I think I'm gonna do some very rudimentary scoring and then later kind of talk about next steps but then also what I can do is kind of have an output of these chatbot outputs and then see if anybody can come up with some better ideas for scoring of course credit all given to anybody who does or if they do so also on the note of content starvation still watts of people asking about neural networks from scratch two things the videos my hope is to get them coming out sooner than I kind of had initially planned just because again I think so many people are kind of just stuck at home with nothing to do in scope so it would be nice to start putting out those those videos but if you are just dying for some content as a reminder the draft for the book is is online right now there's thousands of people with access if you preorder the book or even just the e-book you get access usually same-day access to that draft and so you can read it you can highlight ask questions and inline and so on so anyway and that draft is basically everything from at the moment from you know coding a neuron to actually you know full forward pass back prop actually training the model and then coming out very very soon is testing regularization like l1 l2 regularization and drop out so that is about to be pushed probably within a week or so to the draft so anyway definitely check those out and you can get access like I said right now otherwise the video is my hope is soon but I really don't know I really wanted to take my time on those videos and make them good so I really don't want to rush them but I'm thinking of maybe moving some things around so we can kind of do those sooner than I had hoped so or planned so anyway back to what we're going to do here what I've got is there's kind of so what this is is that is a chatbot but it's based on the google's nmt which is neural machine translation the kind of concept of nmt was or is to convert from one language to another which is a highly complex task especially depending on the language so something like converting you know english input to german or something like that what I thought of doing a very long time ago though is what if we use it to English to English so in English comment to an English response and it turned out that actually kind of works it's definitely a hack I don't think this is how it was intended to be done but this is kind of what I've been playing with for honestly years and so I really should just do something else with my time but I'm very interested in this concept so so what we've been working with is reddit data so it's reddit comments in responses if you there is a chatbot tutorial if you go to python permanent and just type chat bot this is the same thing basically it's the nmt chat bot now there's a couple of tweaks but that's what we're dealing with so what I have the model that I've been training is a twenty total layers it's ten layers encoder ten layer decoder and then 1,024 nodes per layer so just an absolutely massive model and I've been able to do that with a bat sighs of like well I think it's at 128 right now thanks to the to RTX 8,000 so so yeah that that's the model I've been training but I just can't get like the perfect model so like you know at let's say a hundred thousand steps the model answer some questions really well but then still fails at other questions and then add a hundred and twenty thousand steps let's say answers those other questions that it wasn't answering well but then kind of screws up some other questions so then I started thinking ha what if I treated this more like an ensemble where you take many models and you kind of put them together so historically I've used ensembles in like a sort of voting classifier sort of way where you know let's say you're trying to predict a class of a thing yeah it can be nice to have just one model but almost always you do way better if you're willing to have like five models all predict and then they run a vote on which class they think is most likely this tends to actually perform better than any one model will ever so then I was wondering who well you know this isn't really a prediction it's not like a regression necessarily but what I could do is just take all of the possible outputs and then take their scores or make new scores and then figure out which one of like really what we could we'll wind up with is hundreds of outputs then we can pick one because surely one of them is a good one right so let me show you what a single model looks like and then we'll jump into the ensemble so the ensemble code I had enlisted the help of Daniel because basically what I really wanted to do is have all these models running live at the exact same time take that input spit out the output and then make it really simple for me to start working with it so there's really no one better definitely not me to do that task so Daniel wrote this chat bot in Samba code which until I guess I'll put some input here below see if we get no we don't get any output yet but we'll just wait for that battle it'll come it will come maybe I also have well actually missed all the errors are going to be output in the ensemble code the way he sent it Daniel sent it over it was silencing all the errors but then I think actually I opened it back up so will will probably see the errors I'm just waiting to see like a real error but I think we're all set here anyway so while we're waiting for the single model to run what I'm gonna I can like hear it it sounds so good I should share I love the sound of a GPU at work I love the few times I've shared it I used to have it as my channel intro and people literally complained about it oh no am i doing Tyron I think I ran trained haha whoops I think I must have run trained I recognize that too much I meant to run inference I've been constantly running train I think I that's probably what I did whoops let me check make sure that okay Python in friend stopped by okay let's try that again I should that should have been my flag anyway now we'll wait for it to load and while that's going and I've been severely disrupted this is what you get with an or any video my apologies people seem to like the Kaggle one though so anyway I can't remember what I was talking about before okay so here's an example output I literally said hello and this is the output at Gibbs let me zoom in for you guys so that's obviously like a terrible output and so the outputs basically each chat bot model has I want to say it's called beam I forget what it stands for but you'll get it depending on how big your beam width is I think again it's been so long since I'm like we've had so many abstractions on top of the original nmt code that I honestly forget but basically the model will output ten responses and in general I want to say the default one chosen by the model is always the top response and then the pointing at it you can't see what I'm pointing it the this here is a score that I believe mostly Daniel wrote and this was written probably two years ago or a year ago and I don't remember how he was scoring them but there's some problems with the score and then again just like what I'd like to have is possibly multiple scoring mechanisms and those could also be an ensemble type method so anyway as you can see that was no good let's try how are you doing okay I'm good thanks that's a pretty good one but and then you can see these other ones they're color-coded by how bad they are so so this is the best one and then these are all kind of pretty bad as you'll see they're just not finished sentences it's I'm good I'm just trying that's I'm good I'm just uh you know these aren't done so we could use packages I'm not too familiar with Spacey but I don't know if I mean saying that right but anyway this it's spelled Spacey I could try that I just don't know enough about that package to know if I can do what I'm thinking of but I'm sure I can but I know like with n ltk I could use n ltk to know is this a complete sentence is it a coherent thought so that would be a really good way to score outputs my curiosity is how quickly I can do that but that will probably be the next step after I do what I'm thinking of doing today because there are far easier ways to judge these outputs so one way that I'm pre confident is actually included in this Daniel score that I'm going to call it for now on is does it end with punctuation so none of these end with punctuation one thing that I'm not sure is included but that I would like to include is any repetition so if you'd say if it's like I'm good and then I'm something else like generally those are pretty bad like just from what I've seen but we can continue what's your favorite food wise black and white black and black okay what's your job I'm a software engineer okay these are pretty good I mean this one's doing pretty good this is the this will be the largest the latest model so this is the one that has been training the longest so as time is gone on this one has actually had pretty good responses but one of the things like so just hat like having a valid response and then having like a really thorough response or like two totally different things so some the models have had longer responses that were just really interesting responses like very not always the same I guess like a lot of times the chat bot wants to gear itself towards very short quick responses so that's another thing I would like to kind of have some sort of length value as well length but without repetition okay so something like that um again I'm just going to be trial and airing again this is just pure R&D but then here you can see yeah so like the problem here is yes it's long but it's all repetitive like we don't want repetitive stuff yeah so I think you get the idea so so this is just one model and we can continue asking questions but but you know some of them are good some of them are bad what's your GPU yeah so I mean like these are this is pretty good I mean that's not bad at all and like really a lot of these are pretty good responses trying to think of like a decent where do you I'm in Singapore but these are pretty good responses to be honest I haven't really played with this model too much but I know there are certain outputs that are not that great so like for example there's our ensemble let me go into what does it be model and then usually we can load by the latest oops not that way this way there we go so then like these are all just outputs sorry if some of these are not the nicest also this one is very short I guess that's okay yeah so like not all of these are very coherent you know like this I don't even know what that's a response to it's a song not a song that's I don't I don't really care what that's and these are just replies so every new line is a reply and like that I don't like that one so many these are terrible replies what's alone nice I'll check it out it's time to stop Business Insider oh when I upload these videos I check the box not for children I'm just saying okay anyway so you can at least even in this one we can see not all these responses are ideal and obviously those are just responses not even paired to the input so we don't even know if they were related to the input because sometimes the output is not even related to the input but I think we'll see that better when I do the ensemble so what I'm gonna do is go ahead and break this and I guess you'll just have to take my word for it that sometimes the responses are just bad like hello so I'm gonna at least that was one that we can prove but when we do the ensemble you'll will be able to see tons of stupid stupid ones so a brief overview of the code for ensemble or at least what's happening I'm not gonna go over the code it's not even my code again Daniel wrote this this section I made the request and he did it so what I have here is basically we are going to run models this will be this is the number of steps in that model so we're going all the way up to two hundred fifty nine thousand steps I believe that's the one we just looked at some of the outputs for and that's the one we were just running live in France on but as the thing has trained I've been watching outputs and these are kind of just as it was training I kind of liked some of the outputs so maybe they were very long maybe had very unique outputs not a lot of repetition good answers funny answers stuff like that I was like yeah let's save that version so what we have here is two of these got commented out so it's a total of 22 of these 20 layers of 1,024 nodes per layer models that's a lot of models by the way again just running all those on the to RTX aight thousands okay so basically just it's gonna run all of these and pull it in it's any input we pass is going to basically inference against all of the models and then we're just going to output all of that code now or all of the responses rather so I'm gonna run inference top pie but really the code that we're going to be modifying is this from sources model ensemble and then model ensemble we're gonna be modifying live ensemble inference later we can also generate a huge list of these responses using ensemble inference just here and either all either that way or via live ensemble inference or something what I'm thinking I can do is I can generate some example outputs and then share those on like a gist or something like that that way if anybody else has some ideas that they want to try to make their own kind of scoring mechanism I you'll be able to and you can share it with us and again I'll give full credit anybody who has a good one anyways so okay so but now what we're gonna do is we're gonna head in to model ensemble and then this live ensemble inference code so we're gonna pop in two sources model ensemble dot pi and okay so what's going on here it doesn't really matter what's being commented out here's all the errors that errors and warnings and just that kind of stuff so we're just not gonna look at that and then here is the code I'm just gonna zoom out a little bit here's the code that we're going to be working with but the first thing I'm gonna do is again this is not my code this is Daniel's code so because I'm gonna be mucking around I'm gonna copy that whole thing and I'm just going to save that for by the time when I inevitably screw something up I'm gonna do my best to just code around this code because it's just not mine but but um yeah so let's go ahead and save that and before we go too far let's go ahead and run an example so let's back it up here and let's run inference dot pi so python inference PI this is gonna take a while for this we're gonna see all this again we're seeing all this because I am not silencing all those warnings man I can't tell you how many like clients I've worked with who we do like deep learning stuff with and even though these warnings aren't bad they just they want them to go away like it just certain people are just so bothered by output so if that's you I you know I apologize also let's just go ahead and I forget if it was a capital H or a lowercase H I don't think it matters but I'm gonna just toss this in we have that little symbol right now but it was there it's way earlier but all these warnings kind of got in the way anyway I'm gonna hit oh it's already Wow so as you can see hello what was the input I can't let's do a lowercase but as you can see there's at least one decent hello also what's general Kenobi that's in a lot of these I'm tempted to google it but again this is all from reddit I'm sure that's totally a benign reference to something I just don't know about someone comment below if that's a safe for work okay so this must maybe this is the one it's weird because I'm almost I'm kind of seeing different responses then I swear we just saw now I want to check I'm sorry I just have to check was that the latest 2:54 I don't know Oh cuz I'm so we have a 261 thousand okay so this isn't the 261 thousand model maybe we should I might move back in the 261 so that one actually had okay outputs that at least I saw initially but as you can see even here like I'm not sure what the deal is with this score Daniel really a lot of these but as you can see there in there the good answers are all over the place in here we just have to find them so this is running all 22 models where are you from I don't know I don't know if this mic is picking up that noise but that is a wonderful noise I love that noise where are you from I'm from New Zealand fantastic response South Africa I'm from England I'm from Australia obviously he doesn't know where he's from electrical engineer I'm a software engineer it's funny because all of these are again this is already right so in theory we should have a whole grouping but over time and as its bond through red it clearly software engineers computer science and electrical engineers just like engineering I guess that's what the reddit's for that I'm an engineering student oh here we go no I'm Astrid okay so some of these are cool a lot of them are terrible and so on so so one option is we could just take every response that has a pretty good high rating and then look further but I'm tempted to kind of start building my own my own version of a score first and then we'll combine the two scores and try to find good answers because then we might also try to you know I might have a separate model that just looks for coherence and then we check score stuff like that because the other thing is like a lot of times the emojis don't come through at least in the Daniell score but I like them like you kind of want that right so we did an emoji there at least this one is a proper one but here we choose just a bunch of pounds rather than the emoji yeah so we're like missing a bunch like in here why was it the colons I honestly don't know like I said that's not my code like the heart we don't get any of those but like that's a really common one especially to end on so when people end on emojis a lot of times they don't end on a proper punctuation so we want to handle for that but anyway so now that we for this bar what we want to do is take this data and then maybe apply our own score to it just to start working with it so we've got got to wrangle a few things together first so what I'm gonna do is we're gonna come over here and basically every question that gets asked we're gonna just build a build dictionary that is really we just need the answers I'm not really sure I care for the score at the moment but we'll go ahead and take em answers we'll make that a dictionary I can't get this out if I just want to make it a list or a dict but I guess we'll leave it this way and then basically my hope for this dictionary will be make sure this is actually I think this is correctly sized I must have done it already I don't know anyway if answers basically my plan here is to be like answer score there we go so and this will be the D score or the Daniel score first mmm and then we'll kind of go from there so the first thing we have to figure out is where is it hmm comment below if you think this is a Pepe acceptable line how long this line is yeah I think we could have some like formatting on that line but okay so what I'm after for I in response numerate for I and enumerate so I actually we're not looking necessarily like this is clearly one of them but we're enumerated so we're probably not grabbing index up over here if champa answered so response scores I so I'm gonna copy that and let's see if we can find response and then scores bad response threshold response answers I so that should be it so the answer will be response answers at that index and then the score is response scores uh plural I so if response is none otherwise we're going to iterate through them and then I am now going to save all of them so when am answers that's the score so we're gonna say equals the score this is plural I feel like that should be score like singular but let's say what was it answer the answer I think it was answer not singular and then score was the plural but now I have to go figure that one out response scores ooh okay we'll find out we'll get in there if that doesn't work okay so now we have built this this dictionary that we can then when everything is done hopefully come down here and begin working through working through any sort of rules that we want to run on here but I think the smartest thing for us to do is to like print this once because we're gonna do a lot of R&D and running this over and over and over is going to be a pain so we're gonna say in the answers we'll just print that out I'm trying to think of a good question [Music] I'm gonna ask it I know I've got my question I want something that's like somewhat challenging to answer but okay if answers okay we're gonna save that and less hope let's break this and then we'll rerun it ask our question and then we should print out the full full dictionary and then we'll make a super file to work with it and I think this is the sort of thing I can kind of share with you guys these warnings and stuff how do you sleep oh it's still airing I want to ask it how it sleeps at night I'm just gonna ask a quick question and then we'll we'll get to it Kiera answer [Music] answer why answer hold on hold on right chatbot answers if answers I'm probably mating some sort of stupid stupid mistake but we just want that answer we get the score if it's the max it'll be green otherwise response scores response answer oh my is plural what are you kidding me it is plural oh and now I understand why the score was plural oh okay I'm following okay answers because the list of answers and that's at that index copy that okay all right we'll try it again I'm just gonna run a quick hello just to get an output and then once we get their output hopefully we'll have something that then like this is why I want to hurry up and get that dictionary so then we can get away from working with this here it Scotland there she is beautiful okay so let's ask something how do you sleep at night [Music] okay so we have a big dictionary how do you sleep at night I'm drunk I'm sleep-deprived I don't wake up in the morning there's a lot of I don't wake up wake up's in the morning variations of that that's kind of funny I don't wake up at night when I wake up I wake up in the morning and then that's a terrible one okay mmm so yes so what I'm gonna do is I'm gonna copy this entire dictionary copy and we'll come into here and what I'm going to say is Nana Tess stop hi I'm just gonna say start cool now I've got a test up hi open that in smooth and whereas a in underscore answers equals actually it will just be this boom okay so I think answers for the record I think I think this default still to the Python to build I need to you set that to again this is a new machine so there's a lot of things I'm properly set up yet so at some point especially if I use f strings or something I'm going to need to not run this down here in the shell so bear with me okay print lens so we have a hundred and sixty eight answers how do we even begin to sort these answers well the first thing is does it end on a punctuation or an emoji so so one thing you could do is you can import string and then I want to say you can say print string punctuation run that and you can see here these are all available punctuation I'm not gonna use that though mostly because a lot of times for whatever reason the chatbot likes to end on like I ' so it like doesn't finish the I'm and I think that the the bot has learned that you end on a punctuation and oftentimes the ending punctuation could accept ibly you know you could have like a this is the end right so in theory you could end on that upon't that like quote i guess but anyway as a problem so what I want to do probably is specify my own punctuation or my own acceptable endings okay and in this case it could be it could be a single quote but I don't want that so I'm gonna say it could end on double quote it could end on a period it could end with a smiley face right or you know help ending parentheses for whatever reason I'm gonna allow it to end on a three-four you know if it has a heart that's a pretty common ending obviously a question mark and exclamation mark and I guess we could look down here and see if we think there's anything else here do we have a period yeah okay okay so you know I think that's good enough we might later add more to that but I think for now those will be acceptable ending the other one would be maybe some sort of like emoji you know that might be a thing that happens so I don't know if I want to include that or not because the original Charles chatbot out of nowhere with will will use even discord emojis which was kind of shocking because there's no one boat like it's trained only on reddit and yet it used some discord emojis like if you do like a beer clip planking the moji or whatever you know like a Cheers it'll do a Cheers back which I don't know how learn that but okay so maybe that I don't know hmm anyway I'll allow it for now especially because I guess sometimes people do those backwards Smiley's right so you could say like that oops all right so full allowed for now okay that's an acceptable ending so now we're going to say is for answer in in answers if if it's negative one M acceptable endings pre-dance and I'm gonna go ahead and use the full answer just so the silver nine jerk answer just so we're not using to shorthand because this this is probably gonna get messy pretty quickly so let's go ahead and print answer so these are all answers that and I just thought at some point I'm probably gonna score so you guys don't let me forget I'm gonna score things and then we're gonna at some point try to sort this dictionary and because this is Python 2 it will not sort I guess we could use a sorta dict but I refused so oh so then I'll have to start running it in the interpreter so early like in the terminal so hopefully I don't forget that so these are all acceptable ending answers at the moment I'm sleep-deprived and I'm not awake yet I've been sleeping for a few days you can't wake me up I'm awake I'm awake now so maybe that's the next thing I would fix so so repetition is a problem both repetition and how many times the AI uses the word I I especially is a problem so the next thing I'm going to say is lowered answer equals answered lower so this will kind of normalize everything just in case in this case the sentence starts with I'm but sometimes it won't so you might have like one word that just it's at the beginning of the sentence so it gets capitalized and then the other word is lowercase and then we think oh there's no match but there is match so we're just gonna normalize everything by lowering it and then basically what I want to know is for se4 were no I guess we'll call this words and then that will equal I guess we could split by base and then trim I don't know let's say I are strip brother i dot strip there's got to be a better way than this again if you were using NLT k and i'm sure spacey you could just say like tokenize okay and it would do this but this is okay I had a strip for I in were in lower answer dot split flying sloshed words cool let's go ahead and free words I'll just break here when I wake up I wake up feeling like I'm doing okay uh did that really okay yeah I just had let's also set a flag acceptable acceptable end equals false and then if it's it has an acceptable ending will say acceptable hmm it should be worth something I guess if it ends on a punctuation so I'm gonna say 0 and then we're gonna say acceptable end if it has one we'll say that's equal to maybe people have a constant up here let's just say accept acceptable and bow I'll set that take five for now so eventually we'll put this all into some scoring formula so that definitely worked at least then we're gonna say words we don't want to require necessarily that it has that if we did we could just have everything over I suppose but I'm not sure I want to require everything but we could so instead of you know so you at the moment the score is basically zero well I'll leave it for now I don't think I want to do that so I'm gonna said that but so okay so we have words print the words so then what we want to do is for doublet forward in words count I guess will be equal to words dot count word if C is greater than one let's add up here repetitions equals zero repetitions equals zero and then if it's greater than one what we're gonna say is repetitions plus equals C minus one because we obviously don't want to like kill it so like if they say I twice it should just be like okay you got one repetition but if they say I three times okay you've got two repetitions so now we're gonna say are we still breaking yes so now we're gonna say is pre [Music] repetitions for when okay wake and wake I and I up and up right but also I'm missing when I wake up I guess we got to do this to print [Music] so I went okay so it hits it twice because it got back to that other word I see hmm I'm gonna let it go so we'll just kind of tumble penalize but it's all gonna pour into a formula so it's still gonna scale linearly as I intend it to so you know basically at the end you could divide repetitions by two if you wanted to be perfect but yeah well if I just sit on that for a few seconds be like well huh dummy okay so we have repetitions we have so at this point we could already begin scoring something so another decent I think is the length so as long as there's not repetition the length is actually good like the longer it is because again the chatbot tends towards really short short responses and I think that's just because in the chatbot itself the more words that it gets like wrong the more it's going to be penalized like in the loss function so I think that's why it tends towards short responses so the times where the chatbot produces long responses a lot of times those are like really cool responses so I I want more of those because it's it's cool when it's a very long response as long as it's a good one so so now let's do length so four answer and in fanswers basically length should be answer length will be equal to lend words so we have length we've got repetitions and we've got acceptable endings so at the very beginning what we could say is like we could make a really rudimentary score so we could just say I'm going to call it H score because later we might have a D score so H score will be equal to let's say answer length so should be I'm gonna put this in the answer length answer length I'm trying to think if there's anything else plus accessible end vowel I guess answered length it could either be plus or maybe like times two or something I don't know I I'm just I'm just making this up right now so answer length plus acceptable end right is that the acceptable end yeah and then what I'll do is divide it by divided by [Music] repetitions repetitions okay so there's our H score let's go ahead and print H score H so this gives it a two but I think over time we're going to need weight like we need to know more of the scores so now what I want to do is let's stop the print there I just want the score thank you okay H scores and we'll have the answer for that so we will now yeah so we'll say H scores answer equals H score uh that's good and then at the end let's print each scores then what we need to do is stop the break rerun it div by zero of course if it has no repetition you're gonna have a problem do we have okay we'll just start repetitions as a one done okay so now now we have quite the list and now what we want to do I've been sleeping for a few days okay so now what we want to do is possibly now we need to sort it so I don't man sorting dicks and then basally want to sort the dictionary and then like maybe go in reverse order or something so I don't know off the top of my head so we're gonna we're gonna grab the old Google sort dictionary by values diction values I've already searched this on here cool sort a dictionary by value yeah if you want the output as a Dix so one let's just grab this and we'll see what this does and then the other thing we want is the max dict value as well so well actually we'll just do that down here so we're gonna say sorted H scores equals key for value for key value in H scores dot items so then print so print sorted H scores oh yeah we can't do that here yeah it's not gonna work let's go run this in terminal Python Python now yeah so it's a Python test hi cool so here is actually sorted how we intend it to be sorted hey you sleep at night I'm not a interesting so some of these other ones I'm not I wake up and I don't wake up in the morning I wonder why that was ranked or okay these are all nines and then finally at the end we get I wake up in the morning where's the I swear that's not giving me maybe because I resized it in the world like Nazi or maybe I'm blind anyway when I wake up I'm going to sleep when I wake up I'm going to sleep at night I've been awake for a few days I don't wait so like at the moment when I look at this I wonder you know why you know this one is only ranked better than the other ones because it's longer but when I wake up I'm going to sleep at night versus I don't wake up so the doors windows I would say I think I think what we want to do this may or may not work but one of the things I'd be curious about is an eye count so i count equals zero and either we can subtract it at the very end or we can add like I count to what we're dividing here just depends on how much we want to punish for the I count so the next thing that we're gonna do is I guess here in fact we're going to cut this paste I count equals answered I count the number of times a capital I occurs and then we'll take I count and I'm gonna put it down repetitions we're in it we're gonna really penalize for I count so we'll save that come back over here you can't wake up hmm one eye is okay maybe we'll do is I count let's do I count minus one I can't answer because I don't want to like kill it just because it had one I minus one shoot I can't think dang it because I don't want it to be in it I don't want to be a zero I guess it would be zero no matter what if there's no I really like one I I want to be acceptable like one or zero eyes like one or zero eyes is okay but more is nine and so what I can't decide is so zero mm can I just say if my brain is not happy anymore if I count is less than zero right because if it's zero it becomes negative one then we're gonna say I count equals zero and I guess we'll pep ate that that's the kind of line that I really wish was acceptable for me to just leave on one line though okay so I don't wake up in the morning I've been awake for a few days I've been sleeping for a few days okay so all three of those were ranked 12 so that's not bad that's not bad at all so if we did let's say get get it max get key get key for a max Valley value Python dictionary then my internet is going so slow I think oh you also probably can see that getting key with maximum value in dictionary the other thing I'd want to do possibly is get can i I wonder it because it has it stays ordered I don't even know can I do this grant I don't know what's embarrassing that I don't know but let's just try that I bet it won't work into seven because there is no order but I wonder if it works here no I can't hash okay or I can't slice it what if you what if we convert it to a list but I still want to know like can I lick cuz I know we can do that max thing oops okay I know we can do the max thing but I sometimes I want to get like just the top end responses like I don't want just the top response like I i'd like to work with what are the best ones we think we have so far and then kind of go from there because doing things like is this a funny response or is this coherent means we're gonna have to use something like Spacey or NLT kay doing that on all of the responses will take a while but if we can kind of filter out a lot of responses first that won't take as long same thing if I end up trying to use some sort of deep learning algorithm to analyze a sentence and determine is this you know funny or sarcastic or is this like a good response or not the less we have to go through the better so I wouldn't mind being able to take like the top end so so I wonder if I could just say print K for K in sorted age scores negative 5 : can I do that will you let me do it okay I think this is right I don't wake up in the morning I've been awake for a few days I've been sleeping for a few days I don't wake up at night and another answer how do you sleep at night that's a funny one like I don't know yeah okay so that's one way that we can get a handful and then otherwise we can get just like the top one using hopefully this we'll see what happens max statsky okay where's our dict okay so that's how we can do maybe get a slice of the best again this requires Python 3 point 6 plus or sorted so if you don't if you're on Python 2 you can like from collections import sort indict I want to say or ordered dict rather yeah ok also max of well that's interesting I didn't know you could actually that's cool okay so max a max of soared high scores so in this case we have 3 so it should we're just kind of randomly pick one I suppose so Python tests up high I've been sleeping for a few days that's always I've been sleeping for a few days apparently hmm yeah I don't really know it because all these have like the same rank so I wonder why that doesn't ever change but ok whew alright so I think at this point because we have a decent slice here of plausible good answers what I would like to do is let us come down here and let's print well actually what we'll do well first let me say here let us say top top end later we might have top n equals this I'm gonna comment that out I'm going to comment that out so we don't need that anymore I'm going to save for oops let's I need to stop we're gonna definitely have to wrap this one up okay print oh say for n or for for answer in top answer okay so let's save that let me run that real quick make sure that works it's pain I don't wake up at night I've been sleeping for a few days I've been awake for a few days I don't wake up in the morning okay that looks good so what we're ready to do I think is work this over to model Ensemble dot Pi so I'm going to copy that come down here make sure we've got that safe so basically all we're doing is that at the IMP answers we come in here blue since I'm gonna zoom out because we're making such a monumental change here so I'm going to tab tab no longer do we need to specify that because we're gonna have any coins we don't need the length anymore you don't even really need to print in answers what we need to do let's grab the question because by the time we get down here we'll have already printed out all that other junk and we're gonna forget a lot of times probably what the exact question was so let's go ahead and print question and then we'll get the responses let's say that and we will come over have I just let this run the whole time [Music] do you have model or Nvidia let's do watch houses okay we just left that running hopefully muck up the recording a lot of times when I Wow I'm just just killing that it's killing my recording hopefully that'll screw anything up but the show must go on so python inference at pi in chat ensemble will start with a good old fashioned hello and we'll know as soon as we start seeing crazy things I will definitely I got to put a microphone up to that and let you guys hear it I'm s a lot of people don't like that noise but just love that noise I hear it it's thinking hello hello hello hi all of those are great remember before we had terrible top responses hi hello howdy slicked general Kenobi howdy slicked general Kenobi okay I got I got a Google your here's what I'm gonna do not Google I don't use Google anymore it's all about the DuckDuckGo now I just have to stop myself from saying Google Kenobi what is this I just don't know really Star Wars reference okay urban dictionary I just don't know why we're what would the hello there general okay I still don't get it the response you give to someone that says hello there so hello there and then okay whatever miss Cody I don't know okay whatever so okay those are good responses let's do smiley we still got these stupid because I remember I dissed on Daniel for allowing those and now we have them so one thing we could do one thing I don't understand is why is that a response but also one thing we could do is like when you see stuff like that we can immediately issue a penalty for like any existence of : : right we could probably stop that thank you my pleasure you're welcome see these are great response like I would be happy with any of those responses and we could even pick a random one right so so as you saw with the single model with just one model you had a lot of times only one choice to go with in this meant the chat bot would often respond in the exact same way always especially to the same question it will always respond in the same way whereas in this case we're running twenty two different models that's essentially possibly 22 different variations in a response which most of the time is good what GPU do you let's see yeah all of these NVIDIA GeForce are all exceptional responses how much did you pay for your about $100 that's sure if it's worth it to dang it alright this is not go hearing that would be a really hard one to figure out though how much did you pay for 1080i I'm not sure if it's worth it a thousand dollars I'm not sure it's worth it about a hundred dollars I'm not sure if it's worth it do you own any dude that's a great response oh man that's hilarious I just think that's one of the cooler things about these chat BOTS is like obviously it knows Satoshi Nakamoto is related to to Bitcoin anyway yeah that is hilarious that's great I just I'm so happy when I see good responses I'm sure we're gonna start seeing let's see where where do you live I'm from upstate New York I'm from New Zealand North America I'm in Oregon South America I'm in Oregon Australia I'm in the US so some of those are all those are good responses but some of them don't make any sense hmm I don't know what print we forgot about what is that print that we forgot let's go to ensemble question each score copy that let's go ahead and comment that out before I forget I'm not gonna rerun this though just because that's poking up there Oh because of that we don't see our question though oh wait did I get lost what's going on here I'm gonna move this okay where do you live what do you do I work full-time I'm a PhD student I'm sorry I think that's a comical one I'm currently living in the US okay so in this case he's kind of screwed up living so where are you living versus what do you do for a living I'm currently living in the middle Miller what's your occupation what's your occupation okay so these are not these are like locations that's curious what's your job software engineer I don't have a software engineer I'm working on it now so I don't want to do I've been working on a perfect software engineer here working on it now do you have a twitter nope do you where did you all see it here's this will give us some subreddits nope it's a quote from the movie it's a quote from the book I'm guessing you're in the wrong thread okay now some of the other cool things that happened with the other chat BOTS was because it's reddit it's just comments and responses most of it is English um let's see what first of all what programming language do you I'm a ninja mostly Java and Java mostly Java just a matter of time okay I will throw you away computer just saying computer side okay so these are not did these must include new lines or something that got kind of weird I'm using Python is my primary language my favorite language is my second language natively made of language okay so one of the other things hola amigo okay so hello hello friend gracias gracias de nada gracias amigo [Music] gracias por floor so that's a good answer so that's pretty good I don't know some of these I don't know he asked me where do i it i think that's where are you where do I live what if I returned that question pretty soon I'm gonna need Google Translate up in the house why can't an el mundo huh okay I'm gonna have to start translating this darn you gotta use Google try I try to leave Google so hard and I just can't see people that speak Spanish are already tired of me detect language convert to English it's like that's not really a proper response my guess is there's not a ton of Spanish responses if I had to guess but is curious oops you're in the world there is no problem um let's see I wonder what if we see hello let's say and let's ask in how about Chinese copy that over here good luck hey huh but it's not bad so let's go ahead and do some translations detect language convert to English I'm ready so like in this case these all look the same okay let's say are you happy and then let's say let's put that in any people do Korean that's a pretty surprising how many carat let's see if they even does anything Korean wise no negative okay we could try and Chinese day I was having so much fun okay um I think that's probably a stopping point for now we've been at this for what is the time an hour and six minutes holy moly okay mmm R&D is so much fun alright I think that's it for now yeah so like what I think it's cool when there's like other things like when we first made the chat bot I wasn't even thinking about other languages and at least in Spanish it kind of started to have some responses that were somewhat intelligible now I'm sure like most of them are not the greatest but at least it knows Spanish is it knows to respond back in Spanish which is pretty pretty awesome so that's cool but so yeah I like with ltk I don't even know or spacey I'm not sure like how would we how could we you know make sure we retained and we didn't just filter out a bunch of these like other languages by doing things I just don't know Wow look at this long are you happy what was the question are you happy with that I wonder if I can just copy all of these rather than one at a time okay let's bring that over here pasta English please yes but it is I don't feel like doing anything thanks for the comment no but the truth is that I don't care about anything yes but it doesn't seem to me are going to do anything to themselves so honestly I think the Spanish that's close like it's not like we're more likely to get silly answers in Spanish that don't make much sense but like this one's pretty good and or I guess you can't totally see it so that one's good and then and then this one's a good response too that's cool I think that's so cool you know that that it can do at least Spanish has got a lot of work to do like the English ones are pretty good are you what was the question I can't remember are you happy I can't remember if it was with that or now or what no I'm happy I'm a software engineer computer engineer no I'm a computer engineer no I'm an engineering student you engineers on reddit are just obviously in huge quantity yes we'll see what happens no they're owned by the government whether owned by corporation honestly that is my answer to the dangers of deep learning no they're owned by the company yes we'll see how it goes yeah okay so these are our pretty good responses so so far I'm pretty happy with this like I said I think what I'll do is maybe I'll ask a few questions and then share gist of like a few answers and see if you guys can come up with anything good because I bet right now we're probably still filtering out maybe some some good ones that we should have included but we're not so I'll have to look more deeply into that but actually surprisingly that was some really stupid filters and that's looking pretty good I'm pretty stoked about how good that looks right now so so yeah I think I'll be stopping here if you've got questions comments concerns suggestions whatever you can feel free to leave them below if you like this kind of video it's a quite a long one if you like the style I can probably do more of these like I said I spend most of my day kind of just tinkering like this and I just historically haven't really thought to like make videos out of that but I can do it if there's enough people who are interested in it especially right now like I said where people are just seeing at home begging for content so yep that's it again if you want to check out that no networks from scratch series learning how to actually code neural networks from scratch understand how they work that way you can actually work with them and not solve basically problems that have already been solved for you and then when you try to do something new you have literally no idea how to do it if you actually truly understand how neural networks work it makes it much much easier so the book is set to be released in quite a few months still but the draft for the book is already live public if you preorder the book or even the eBook version you get access to the draft pretty much immediately I'm still adding people like I add people me annually because we have to use like Google Groups and it looks like Google Groups was maybe developed in the 1960s so I don't see a really easy way to automate that process so it might take a few hours but I'm adding people as they come and also if you think you should be added and you're not just send me an email feel free to harass me I don't mind if you support me in the book you're not gonna bother me so anyway that's all for now maybe another section on this like I'm really curious about comical or sarcastic responses the other thing I'm very interested in is images links and maybe subreddits so specifically sometimes linking about a subreddit or linking to a subreddit like so like slash are slash some subreddit so like I've seen it say like sometimes you ask like we'll see if it shows up where where did you learn that wow these are all these are pretty terrible responses not gonna lie but yeah so here's one and so for example this one got got dinged so one thing I would do is if it's a subreddit like this is slash are slash I'm very smart that's the best response that's certainly better than any of these responses so if it is just a subreddit like this would be very easy for like a regular expression so find any scenario where you've got slash you know /r forward slash and then any amount of text until there is no text like if there's a space no good but if it's just simply a subreddit that actually might be a comical response or a decent response and actually I would take that response over any of these responses so that's one thing that I'd like to do I'm a teacher I wish we had that over any of these responses like these these are funny you know yeah okay so and this is a perfect example of how much the chat bots can change like it's the same question just nineteen thousand steps later it learned everything from are the Donald okay and then 6000 steps later in this right it's a book or something some more steps later it learned you know I'm from Australia that's not even a good answer then what literally a thousand steps later now it's about subreddits again all these would've been great so yeah so still lots of work lots of things that have been filtered out but are actually acceptable responses so I'll try to come up with a few that one or I think already decent responses and then some that like I wish captured things like this or I think maybe what I'll do is I'll come up with some that have outputs like this where it's like that's not right none of those are good ones and either you you guys or I'll keep working on it or get get Daniel on board because he'll solve it oh and figure that out those are the next steps anyway that's been going on long enough I will see you guys in some other video and some other time you
Info
Channel: sentdex
Views: 58,691
Rating: 4.9489722 out of 5
Keywords: chatbot, python
Id: _lqAQxdBapI
Channel Id: undefined
Length: 74min 28sec (4468 seconds)
Published: Tue Mar 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.