American Sign Language Detection with Python and MediaPipe

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone it is April 4th 2023 and we are going to have a live coding session tonight hope you're doing well well say hello in chat we got a code with mujahid Tech Wyatt we got some tech people in the chat how's it going uh twitch we got Jeff we got queer t underscore underscore underscore and then a y I like that name how's it going everyone tonight can you hear me okay is the music okay hope it is um so let's just go ahead and jump into this I'm going to turn my dark mode on my browser as to not blind all of you and I'm gonna go ahead and pull this up now let's get it situated a little bit we got my chat here got the dark reader and this is what we're looking at right now is the kaggle POG Champ series which is ongoing it's April 4th right now the competition runs until May 1st 2023 that means you have plenty of time to still join you out there you could join you could win one of those gpus back there I'm not gonna go and get them because it takes forever but I have a 30 80 and a 4080 gpus thanks to Nvidia that you could win you just have to watch one of the sessions from GTC and join the competition so with getting that out of the way I'm going to put this link in the chat everyone here should join because your odds of getting a free GPU are just too high we do have a 115 teams joined this competition which is I think the most that we've had before I can go back to the third one we did this one on its corn we had 92 teams in that one uh the second one music classification we did this a year ago was it really a year ago it doesn't seem like it's been that long 64 teams joined that and then the very first one actually went down on our second one the the very first one had more teams 88 so we've surpassed that here episode four season one we have 115 teams here and you should join two why not don't you want to win a GPU guys why not win a GPU tell me chat why would you not geeky programmers in the chat every uh Tech wise saying everything looks great thanks uh Rakesh is here first time excited to learn you're gonna learn a lot buckle up put on your seatbelt we got some bubbly in the house that means I'm ready to do some real work tonight so this is the competition let's look at the leaderboard I know when you sleep is winning by a lot on the public leaderboard keep in mind that this isn't what actually you get scored on for winning the GPU that'll be the private leaderboard which we won't find out until May 1st uh but we have some others gb4 I win in the number of tents attempts this is Sebastian he does have 45 attempts he's barely beating team Paris who has 42 attempts so he technically is winning in that category so a lot of people 115 people here too and you could join oh I never put this in the chat no I did put this in the chat um great Channel great material wish I had one more time to commit um you don't even have to commit anytime other than clicking that link and then clicking join and submitting a baseline submission so let's look here if we go to the code tab and we go to buy just just be as as lazy as you want you just score you sort by best score by the way upvote these you can't tell because of my dark theme but I think I've uploaded most of these um from zzzs to extra boost you would just take this notebook click copy and edit run it all the way through it then take the or I think you could just take this output let's look at the output Come on load don't fail me now don't fail me now there is no output how is there no output there has to be an output maybe it's this version doesn't have an output um so let's go to a version that does have an output version 21 output here is sample submission maybe yeah this looks like it's the person's submission so if we looked at this code we read it we understood it we maybe modify it a little bit and then we take it when he writes his sample submission output and we take that output and you submit it to the leaderboard by just doing this clicking on it where's the submit button I guess you might have to download it might be like three steps download it go to the competition click click on submit oh wait don't look at that well I guess that's not that big of a deal and just click submit predictions and then you'll upload the CSV file here then you're on the leaderboard that's all you have to do okay so um we've talked about that right now let's talk about a different competition that I've been kind of uh getting into so we all took a look at this competition it's being put on by Google slash pop sign I guess pop sign is like a game for learning American Sign Language but we we started looking at this on the stream because we took a poll and everyone wanted to look at this one and then I got pretty into it and I've been thinking about a lot and I've been creating some models and now what do you know like on the public leaderboard I'm 10th place out of 843 teams right now which is like okay it's still pretty early in the competition a lot of people haven't really joined yet um but I'm doing okay now the thing is to get better to make my model even better I want to better understand how this data was created so in tonight's stream we are going to be trying to recreate this uh model essentially or this data set so we're just gonna go ahead I'm not going to reveal any of the specifics of what I've been working on this competition but I'm we're gonna actually um take one of these models someone actually uh this guy's been making some great notebooks uh and just uploaded one today that's a pretty good leaderboard score I think it's that would get you into like the you know top percent well not now because everyone else has probably submitted the same thing but um the cool thing and kind of frustrating thing about this competition is you're required to submit your model as a tensorflow lite Model A TF Lite and that means that it can run on edge devices like a cell phone which you could imagine would be pretty important if you're trying to predict something like someone's webcam and when they do a sign for sign language so we are going to actually um use this person's Top Model and try to create the data ourselves to predict on it that sound cool and we're going to do that using what they've created oh sorry what they've used to create the training data set um and if we go to this data tab we'll learn a little bit more about that I guess they don't say it there they say it here so the goal is competition is to classify American Sign Language you will create a tensorflow lite model trained to unlabel Landmark data extracted from using the media pipe holistic solution I've made some I don't know if I've made a video about media pipe I think I have yeah but um you can check that on out on YouTube if you're not on YouTube exclamation point YouTube um please speak more calmly I can't I'm too excited I'm sorry and you saying that just makes me want to be less calm all right so we're going to look at media pipe holistic and we are going to try to run it on me myself and I so um yeah let's go ahead and do that I have here my handy dandy terminal can you guys see this this is my webcam slash Canon camera that's running my wife's scanning camera and then I'm gonna open up my repository where I have this uh repo specific for this competition now I do have this folder that I created right before I started the stream called Holistics so you can see in this holistic folder I have a file called holisticcam dot py this is a python script and if you look at it closely you will notice it closely resembles the code that you see here in the holistic documentation that's because I almost word for word or or like I can't speak okay so not word for word um item by item line by line copy this code into this script and we're gonna try to understand what the script does again a little bit um so the main thing is we're importing something called CV2 which is um opencv it's like the backbone of what a lot of computer vision models and computer vision stuff uses then we're going to also import media pipe media pipe is the package created by Google for doing pose and face detection and um hand detection now what are we going to use CV2 for we're going to actually create a camera that's here we create this webcam we're going to try to do this a few different ways okay so this is for static images so I actually want to just take all this I guess I could just delete it I just deleted it you guys probably don't see that because I just hit D and it all deleted but we're just doing it for webcam input so let's do it with uh our capture device five so that is one of my other cameras I forget which one camera exactly um but we're gonna open up this holistic object and then it's going to take this camera and pull in whatever frame or image is in that camera at the given time it's gonna I don't I don't know what frame rate it is either 30 frames per second or 60 frames per second it's going to start we're going to start looping over that with a while loop so this is just going to keep on going Forever Until there is no more uh camera of it available the camera either goes offline or I kill the script it's gonna then read this so whenever you have a a CV2 object for video capture either a file or a camera do you call this read method on that capture object and it will provide you two things it will say success and this will be true or false a Boolean value hopefully it's always true if this is running and then it'll give you the image for that frame because video isn't really video it's just a bunch of images back and forth right or on top of each other you're looking at them it's succession um so this breaks out if this is not successful otherwise it's gonna take this image convert the color then it's going to process it using this holistic model that's going to give us our results I'm not sure exactly how these results are formatted but I know how they're formatted for the kaggle competition so we're going to try to pull out this eventually um and then so this is like the main chunk of what I'm trying to do here is just run this on a video file or on on my webcam this part is to give us a pretty output so this part just gives us the pretty output where it's going to draw some landmarks so it's going to draw my face Contours uh draw my pose connections and then it's going to show it so let's go ahead and try to run this should we try and run this people what do you think it's Lord of the Rings dialogue what are you talking hello greetings from bro Bogota Colombia welcome from Colombia how's it going I wish I can join once the computer I wish I can join you once on computer vision topics I've been teaching computer vision for a while on YouTube oh you have Tech Wyatt what tech what that's pretty cool okay let's see what I do so let's go into this and run a holistic camera okay no module named media pipe that's because I have to activate my media pipe conda environment and if I do a condo list in here it's going to show me everything I've installed in this conda environment conda environments are a way for me to separate out my python projects into different like bins or different um environments right so now I'm in the media pipe environment which I just named media pipe but I've also installed media pipe on there I think it's an older version of media pipe we're gonna have to figure out exactly which version of media pipe they used for this competition data but that's that's another thing okay so now it should actually work if I see this holistic cam python holistic ham let's get holistic with it all right it doesn't work so video device five is not a device so uh always forget this command you can do you guys also have stack Overflow pages that you search for on Google and then it's it's like purple because you know you've clicked on it like a million times because I I definitely have that okay so this is a command I can run to see all my web cameras so um I think I could use this Brio one what does this mean is it on video one all right so let's go in here let's edit this holistic camera right now I have it going to the capture device five which I guess doesn't work doesn't exist I'm going to change that for one let's actually just edit this up here and then down below it this is where we will run holistic cam on don't fail on me now okay video one does not work hold on a second wait hold up hole up here I listed my devices pull up look this video file should be my all right let's try video Zero not that I'm Gonna Save that oh my goodness it worked what camera is this is it this one I always have to wait how is it this it's using the same one I'm using the stream with that's weird I didn't know it could do that okay so this is working look it has it's only drawing certain points I don't think it's actually detecting my hands but it's it's actually detecting like 500 points on my face but we're only displaying some of them so I want to now um that's pretty cool right you guys understand what's going on so I'm gonna kill that session and actually hello okay yeah so now my camera is broken now my camera is broken look what you did look what you did Rob look what you did maybe I could just stream the whole time through the other way okay so now I'm back I had to reset it in OBS but now I'm back all right let's not do that again let's let's um let's not do that in the sense of running from that camera let's try to run from video five this is why I don't understand why isn't it not working for five it was working for five before that's my other webcam uh can't open camera by index but why not maybe it's six I don't know why this UVC camera has multiple video we're gonna get it guys we are going to get it how about four oh is it because oh shoot is it because my um OBS is using it okay so you're saying instead of this I could do Dev video five I think I know what's the problem though thank you sudar for this help but I think the problem is that I it actually is being used by my by my OBS right now like one of these yeah like this is a different camera right I have too many cameras here this is this one the one that I'm trying to use um I need a leading slash and Dev I still don't think it's gonna work see so what I need to do is how do I reset I think I think I can um I think using this I can reset reset device that's why you don't this is why you test things first video for Linux control panel um yeah if I go to like how come it let me do the other stuff though here's what I'll do I'm going to unplug this other camera then I'm going to plug it back in or I could have done that to this one yeah yeah yeah when all else fails this is computer vision people it's unplugging and repo plugging in uh I'm gonna break it no I'm not hopefully not and now I need to really make sure I get this this a running create a tensorflow lite oh wait so this uses tensorflow Lite I guess that makes sense let's try to replace this with five see when that camera's rebooting and do list devices um so this is here as video eight now let's try it on eight we're gonna get it folks stick with me there stick with me here can't open it what is going on let's try six yes finally you guys had faith LOL what's up Colin um okay now I have this other camera which one are we using now computers shaking your head um now we have this camera which is I think this one yes um and it's doing a pose detection and hand detection but it's this is just the main hands per post so what I'm going to do here is I'm going to stop this again uh now that my main video works and I know that I can run it on this one six and I want to actually print the results that might be kind of crazy that's probably too crazy I'm gonna have to just run this for like two seconds and then stop let's see what the actual Raw results look like okay so the Raw results are these media pipe solution it's this media pipe based solution so Now's the Time where we hop into we get nice and cozy and we have hop into a Jupiter session let's open up Jupiter lab let's do it folks guys and gals I understand nothing but enjoy being around that's like the Story of My Life and then you slowly start understanding things um so I actually started this with the wrong con environment we're going to switch here to Media pipe all right so we don't really need to run this the thing is we don't need to run this on a webcam um but let's just go ahead and do it since we have this code to do it so we're going to run this on a webcam can you all see this okay I'm running this in this Jupiter environment because what I want to do here is actually break after running this once so it's going to take a single picture and then we're gonna create oh I did too early we want to do it here when we print the results we'll break here so now we've we'll run this one cell in this Jupiter environment you guys following me can you do a forecasting to supply demand for a small size company no I can't good old Jupiter Vim gives me a headache I love him you you better not talk bad about Vim all right so now if we look at this output it is a media pipe Solutions output if we go to if we just like Google this probably get to the GitHub repo for it and I don't really know anything about solution outputs process a set of RGB image data and output solution outputs but what are solution outputs it's it looks like it's just a named Tuple or Tuple depend on who you ask so let's go if that's the case then this should have some keys to it but it has no attribute keys it has okay so what it has is Count account method this is a nice thing about it has these face landmarks see this it has left hand landmarks which were none all right so let's run this let's uh let's also import lib Pi plot as PLT and we'll show this image yeah so that's the image there was no left hand so that's why that would explain why there are no left-hand landmarks but let's try to run this again with my hand up all right now let's look at this image and it was taken from what I was like when I was like closing my eyes um so we're gonna run it again and hold my hand up for a long time all right I saw the light flash so I'm pretty sure okay it's got some on my hands and sure enough there are left-hand landmarks how does that have so many landmarks I barely am showing me left hand and then right hand landmarks there are none what's the segmentation mask well there is none pose world landmarks and what else do we have now it doesn't want to show me anything um so let's just let's just be silly here and import pandas as PD do I really not have pandas in this condo environment oh wait oh wait this Condor environment doesn't have pandas and I'm about to install pandas into it and guess what as of today if I go and pip install pandas it's gonna install pandas 2.0 it just was released I think today maybe yesterday maybe yesterday so let's activate uh media pipe and let's pip install pandas let's see what version it installs from the intranet that's right 2.0 baby the legit 2.0 version and now we go in here and we print our pandas version and it is 2.0 history in the making right here waddle waddle knows what's up I think it's clear if you look at the media pipe home page um so I I do have what's clear I think I kind of understand what's going on trust me because of this competition I have been looking at media pipe landmarks a lot a lot but yeah we can look at their homepage send me the link that you're talking about I'm watching the stream on YouTube but I come here to support on Twitch you need to go on both yeah but twitch is much cooler twitch is a lot cooler alright so if we take this results and we take the face landmarks and I do PD data frame on this it's not gonna work it's just not gonna work because this is the type of this is a name normalized Landmark list um so basically what we want to do now is iterate over this I don't know if this is the best way to do it but um we're going to iterate over this by enumeration and let's see if this works oh we need to say in we what wait what what what what what what it's not iterable we can't iterate over the face landmarks why does media pipe have to be like this I'm sure it's some sort of efficient way of doing it um so I look I have these values I can't iterate over this and I can't access them subscriptable so how do I access these um I can do landmark oh wait dot Landmark is a list what's this why does media pipe have to be so hard this is a repeated composite container Google just doesn't want to use base pan python stuff so I bet this is not iterable oh wait it is all right so in each land can I iterate over these landmarks now we have a point this is just the string representation of it but I bet it has like yeah as a DOT X dot y and it's going to have a DOT Z sounds like um it'd be a cool like stage name for me dot Z yo what's up I'm dot Z uh all right so we're gonna make a blank data frame that's gonna store these landmarks then we are going to locate at the point number and we're gonna put the x value is this watch out guys Point Z is coming to your neighborhood x y z if I if I knew how to do ordering of letters that would be helpful all right so now we have a land mark data frame let's go ahead and plot X is X and Y is y as a style scatter let's see if this works probably not a right correct d-type now so they're not as integers so let's go to landmarks wait that should work then if land marks X is a float yeah so I can plot this oh wait kind equals scattered there we go there's my face there's my face now why in order to plot this Y is kind of needs to be upside down uh so let's do this let's just replace it with the negative version all right but that's my face right do you ever use chat DBT as a tool IU code awesome but I've been using uh when I use vs code when I use vs code I use um I have co-pilot on which can help sometimes but sometimes it's just bugging me out that took that looks like your face when you are on acid okay Mateo Mateo knows what's up you can write landmarks like this oh that's that's a good point that's a good point so what um who said this doji said is I could just write it like this yeah so that will give me the same thing is which is cleaner which is cleaner probably this is faster but this yeah repeating writing landmarks look so many times is not not nicer so let's delete this uh good suggestion um every time I do this it should flip right because I accidentally put that there but now we got the face landmarks right let's just call this face all right so now we have face um now let's do the same thing for our other landmarks I probably should so we're gonna do pose probably shouldn't repeat code here like I am but I'm going to because I'm lazy um face pose left hand and right hand think what's it called Left Hand yep and right hand this is where I miss using Vim because I can do things a lot faster yeah because speed is totally my problem right now not I'm I'm joking I'm joking I'm joking okay so what's what's the problem what's the problem here attribute error pose oh so this is called like pose landmark what's called results oh oh what am I doing here pose dot landmark and then left hand dot Landmark I'm sure I'm gonna check chat and someone's already mentioned this but yeah I'm stupid what do I do wrong solution output has no attribute pose okay results dot pose landmarks dot landmarks and then this is probably left hand landmarks and right hand landmarks none type has no attribute Landmark okay but that's because one of these hands the right hand it probably broke here on the right hand yeah um so we should do man that's kind of annoying is this annoying to do it this way uh so let's put all the data frames here up at the top I probably could just do like a numpy arrays but data frames make it a little bit easier right now and then I'm gonna bump all these over let's also put this all in one line to make it nice and pretty there are no mistakes there's just happy little mistakes what is that quote um so now we want this to be here here and here let's see what's going on with the chat chatter dude there are no accidents they are just happy little what hey why don't you use Jupiter but the browser instead I am in the browser what are you talking about I'm in a browser right here wait that's the quote happy little accidents oh yeah there are no mistakes they're just happy little accidents but accident just in the word accident mean it's a mistake come on Bob Ross step up your game Bob why is this complaining oh it's because I haven't indented this I got I was getting really mad there when it was my fault wait that's always the case with with code always the case all right so we're gonna plot our image then we're gonna show our landmarks right so we have face and then we should assign all right so now now what I want to do folks R.I.P Bob yeah I know I'm I'm telling them to step up his game and he's it's been gone for a while Bob Ross is like pretty cool though you mean Jupiter notebook oh well because Jupiter Labs better that's why that's why I'm using it how dare you come here and tell me what to use how dare you um all right let's better understand this kaggle competition data so I'm gonna send you guys uh do you need the is not none doesn't just work if x um true true very true very true um will this work I don't know I just have always made it explicitly not none is that is not a cool of me to have done it that way what's the difference between Jupiter lab and notebook hey great question go to YouTube I'm not I'm not being patronizing by the way here I'm just trying to actually promote myself but if you go to youtube.com at Rob Muller and I'm gonna put this in the chat if you go here to my page you subscribe make sure you subscribe you go to videos and you look at this video one of the first ones I made uh I talk all about Jupiter notebooks Jupiter lab I explain the difference between them and you should check that out if you haven't already alright so kaggle competition data so let's do PD read parquet and I'm going to read in this input ASL signs data and we're going to pull in an example one actually let's pull in the train CSV first ah PD read CSV ASL signs train there and then we will pull in one of these random parquet files and we'll call this XYZ examine your Z I don't have Pi Arrow or fast parquet installed ooh pip install Pi Arrow installing packages I have a problem with kagglecompetitions.ml Boost do not bring that crap here I don't want to hear your problems I got 99 and and you not liking goggle competitions is not one of them I'm just kidding okay tell me it tell me it what is it the other way to do is put Things Track Set blocks yeah I do not want to do try accept blocks for this I don't even want to do this um I don't want to do anything all right so if we look at XYZ we can see that what they have here is for each frame so let's look at frame 83 This is in the kaggle training data um let's actually also look in this train uh query sequence ID equals it's going to equal this one this sequence um so this sign is green the one that I pulled up it's frame 83 uh so here so if I look at the XYZ data it's 57 rows 57 000 rows but what does that consist of um a bunch of different frames and for each frame we have 543 features now if we go back to the kaggle competition and we go into here we could see how they expect the data to be read in so when you load in the values for it'll just take in all 543 and it's gonna pivot this basically so the data is in a different shape where it's frame by point by x y z so we're going to see this we're going to run this function load relevant data and we're going to call this our per K file so instead of just using read parquet we're going to feed this in our parquet file and it's going to fail because I don't have numpy imported let's go ahead and import numpy now what do we have now we have our XYZ in numpy format this is in the format 105 by 543 recognized 543 that's the number of landmarks by 3 which is x y z that's the shape of this now pandas you can't have I guess you could have a multi-index but the first this is the better way to do it so if I do here index 0 this is XYZ points for 543 points um yeah and let's go ahead and test this prediction on the public person's notebook [Music] code um using their tensorflow Lite output this is their model and we are going to go ahead and download this bad boy and we're going to we are going to move this from our downloads folder into this folder then I'm going to create a new Jupiter kernel in my other environment that has all the right installs and we're going to say inference TF light inference run TF light inference on an example data set alright so now in order to do this we need pandas we need numpy and then we need to load the relevant data and then I need to remember how to do the inference I can just copy this code yeah so we have to run this open up this TF light interpreter let's go ahead and do it this way um we are tflight file is just in this directory so we can just do dot tflight file because I moved it in here um yeah so this should at least load in our TF Lite model and now we have a prediction function that takes in a file or takes in data in this format so now let's go back here let's copy this let's get our XYZ and numpy like this and feed this bad boy into this parquet file oh sorry this XYZ numpy output into the prediction function function uh let me remember because I think it has to be a dictionary Maybe why isn't this working why why oh why oh why oh why oh why why is my life so hard it isn't that hard all right this prediction function takes two arguments no no no it doesn't maybe I need to call it with specifically calling inputs equals this there we go there we go now we have our prediction uh this is a dictionary with outputs key and when in that output key we have a numpy array of shape 250 250 is the number of signs that we're trying to predict and if we do something like make this a panda series and then plot it we can actually see the prediction for let's make this it's gonna look gross but let's make this a bar you can see there's a prediction for each of this different signs out of the 250 potential signs and there's one that's spiking up so the way we do what the way we identify this is doing ARG Max which will give us the location of the one that is spiked and this is our sign that we have predicted using this model file but now we need also to load in the training data so we can actually see what that sign should be and is we need this ORD to sign we basically need these mapping functions sign or does not exist okay okay okay okay we almost got it this allows us these dictionaries allow us to translate between the values between 1 and 250 or 0 and 250 or I guess it's zero and 249. um and the actual sign name let's make sure yeah 249 is the max um so we take this sign we map it back and it is predix green was green the actual value of this no I think it's wrong I think it's wrong I'm so sorry model you did wrong oh wait it is green it is green yay I went from really sad to happy there you guys follow me just came in what ml model do you use for a prediction so I've been training my own for this competition on kaggle which not this one that one's a cool one but not that one I've been trading my own to get here but I'm not using that one I'm just using one publicly available here on kaggle which someone has created training on this training data so it's just available here but we're also using media pipe to create raw data that looks like this yeah this is a type of Eda will there always be a spike you'll see you'll see we can look at some of these but you'll see some of them there are two spikes or three because the model is confused just like me most the time this model can also get confused is the sign language competition from earlier yes it is Justin I hope in 20 years sign language will not be needed I don't know some people like it I think if you like it also it's good to teach your kids right when they can't speak do you agree with the statement um okay this is okay this is ml boosts problem with kaggled I certainly acknowledge your coggle accomplishments okay as long as we start on the same page um my problem is perhaps a little theoretical and by noon's mean disrespect a lot of uh a lot of prepping me for you to say something bad do you agree with the statement this seems like you're boxing me into some sort of yes or no that's probably both gonna be wrong one may badly lose a kaggle competition even when he slash she has built the best possible model no I disagree with that statement if that's possible then the the onus is on the person who created the competition it's not on the people who are solving it are you in spring with these free freaking summer files be hitting yourself a monarch stream 24 7. what are you talking about startup me I that makes whatever you just said makes no sense to me but probably because I don't have the context no sir just to understand things yeah I know I'm a boost I'm just kind of joking around here but yeah there are some poorly designed kaggle competitions where not the best solution wins but if the so ain't working in big companies this is the problem with people who want to apply machine learning in an organization or just in general but they can't clearly Define what the objective is they just want a good model see what frustrates me about what you said is is one may badly lose a kaggle competition even when he she has built the best possible model by what definition so we all have in our head what we think is the perfect model for some problem but that's not how it works in machine learning how it works is you define the metric you define some way of saying this model is good or bad even with chat GPT human labelers went in and they told the model you're right or you're wrong on these specific instances so there is a ground truth right summer slash spring in Atlanta they make me crazy here in in the seasons oh I like these there's nothing that frustrates me more is when someone says and this isn't what you said um ml boost best possible model equals model that is exactly the same of true underlying model what's the true underlying model what are we talking in circles here but there's nothing that frustrates me more than when someone says um let's just test out and see if the model is good enough good enough for what how good does it have to be we Define what's good enough foreign I got all heated up there for no reason I'm sorry everyone I'm sorry all right so these are the predictions for this output this is for green the word green now let's find one of the words Show an example okay so here's the problem too this is actually this model has been trained on all the training data I'm realizing because this person actually trained on all the training data there's no folds so we're not really looking at a true out of sample prediction so yeah we might see full spikes like this for almost everything actually we I think we will um but let's try to create our own data set let's let's go back to this other type Tab and try to create it best model equals found Global minimum Maybe but you never know what the global Minima is for the problem like if you were at to ask me two years ago what the best language model could be I would not say chat gbt but now I know it's there so that's that's constantly what we're trying to do in in data science or at least model building is try to figure out what the best model is like can you perfectly predict the weather tomorrow is that the most important thing is the most important thing that you predict it perfectly every minute is the most important thing that just on average per hour that you have a right or is the most important thing that um you just uh for the day can predict one of the hours exactly correctly I don't know there are a million different ways we could Define what the best model is to predict tomorrow's weather but it's up to us and up to the business people to figure out what that means and it does it involves a lot of thinking it involves a lot of like really understanding the problem um so we're back here we are back here so we are have this x y z now the way they've done it here is they've given if we look at face oh let's just do type we do this we can see that we have 468 face landmarks 33 posts 21 left hand and 21 right hand that's the maximum amount of points that we could predict for each frame but we'll notice here that there's a lot of missing values but that's because just like my right hand wasn't up when when I was filming that earlier part um there will be no values there'll be no predicted landmarks for my right hand in those frames so let's try to get the data into this format let's try to get our data into this format so what we will do here is we're gonna all right and let's let's again just for people who are just joining us what we're trying to do if M model can be generated exactly that data then it would be called ml model the best ml model but most of the time you're trying to trade an ml model on data to predict future or unseen data otherwise you know what that machine learning model is called it's just a SQL query it's a lookup in a table have you ever seen Nano gbt code and video no I haven't oh wait you're asking someone else real talk you'll never know the global Minima with that attitude I like it so you don't do typical data science projects trains let's stress split train I don't understand what ml model is exactly no that's right I so we're just trying to we're not actually training and evaluating our model right now we're just we're just trying to do some data munging to to create data in the format that this model can predict that makes sense are the nands dropped or zero so that's dependent on the model but the model needs to take in nand values for those so what we are going to do now sorry um oh what we're going to do first is I'm going to show you all that we are running this holistic cam please work please don't break so this is running if I just run this right so it can detect things and it's not actually showing all the points that it's detected that's just showing a subset of the points so I plotted here yeah so we did it earlier but if you plot x equals X y equals y kind scatter we could see from the this image up here we pulled in my face which looks like this it's upside down because the Y is inverted but we want to format this in the same way so um we're gonna make another Brink blank pandas data frame actually let's just look at XYZ yeah yeah let's take a blank data frame and we're going to call this one landmarks for real this time and in x y z first is the face Landmark so I'm just going to assume that the landmarks are in the same order so uh we already saved off those and this should be of shape shape 468. 468 perfect so let's assign no let's reset index let's rename index to be what do they call it here Landmark index it's probably exactly the code they wrote probably what I'm writing is better than what they wrote I'm just kidding this needs to be columns now we have Landmark index and then we're also going to assign the type equals face don't like using type as a column name just because type is a default python thing right that's a default function try to avoid those like don't don't name a function print don't name a function uh eval so now we have this face and we're gonna concatenate a bunch so we're gonna concatenate first face okay so there's something we have to do is um there's something we have to do here in between so I can't just concatenate them yet um will it always be 468 are any of these null all right so let's do this actually to all three of these but it's not gonna work for right hand but let's try it for pose left hand and then what we need to do is like merge it on a blank data frame wait the right hand yeah oh okay so it's just gonna be empty all right so what's the shape of left hand 21 perfect pose 33. that's right okay so um what we are going to do is this thing I hate about pandas concatenate is you can't just concatenate all these together like you could oh wait you can I'm an idiot look at me reset index and we're going to drop that look at us we didn't even need to make a data frame that we would put this all in we don't even need to do this we got it here reset index only what in the world re-indexe only valid what it didn't do this before oh oh I keep on running this cell over and over again um when I actually have to do all this together right this actually needs to come after this all right now we have our landmarks drop this we're gonna do it folks we're gonna actually do a prediction the payoff is going to be at the end of this when we actually predict uh hand motion uh I missed out pretty much on the first hour of the stream the topic looks super interesting will you be reposting the stream yeah as soon as I'm done it's on my YouTube channel so go check it up out at uh youtube.com slash Rob Muller would you please make a video about making API and web apps in Python I could but I'm probably not the best person to make that because I don't really do that I could though let's make this a function create frame Landmark DF and we're going to take the results and we're going to create a landmarks data frame now there's still one thing I have to do to fix this the one thing we have to do is we have to take this train file oh no we need to take our XYZ file query where frame equals 83 and then we need to do a type Landmark index um and we're going to copy this and we're going to call this our XYZ skeleton why are you doing this Rob why are you so weird why are you making a skeleton the reason why is because we want to have values or rows for every type and landmark index yeah so I think we could do it we could actually do it this way because this won't be dependent on the frame and this is kind of a lazy way of me doing it but it's gonna work and then we're gonna merge this on our landmarks on our type and landmark index how you ask left that means that our skeleton will always have those landmarks even if they don't exist in our results so let's go ahead and call this let's go ahead and call this and get our landmarks and look at this our right hand has null values because we didn't we didn't actually do anything with that so that's cool right you guys you guys following what do ml what ml do you think we'll use for prediction so we're using a tensorflow lite model this is a Transformer based model that was posted on because I don't want to talk about my model this is a really good one out there Transformer based model that was trained using tensorflow you can see the whole architecture here they have embeddings for the hands for the lips for the um pose and it takes it all in and it does its prediction and it has about a 0.71 accuracy in terms of predicting the correct sign using American Sign Language so what else should we do here we also need to add in a frame frame number and then we're going to take that this landmarks and we're going to assign frame equals frame and now we need to give it a frame which would be like zero here and now we have actual frame so now we can make each of these unique cool cool cool cool cool cool cool okay so now we have these results now what we got to do is film ourselves doing some sign language what do you think guys should we look up some signs I didn't know python was intricate until I started learning it everything's intricate are there many multi-person examples in the data set no they're all a single person single hand actually um because they're using their phone to record themselves to decide so what's a good sign that we could do if we take train and we take sign and do value counts um let's do shh let's look up how to do sh in ASL American Sign Language it looks like this just like all you and chat should do right now no I'm just kidding just like I should do right now also see quiet okay so that's how we do it is there also a video I think that most of these have um if you go on YouTube how to say quiet in Sign Language I guess it's the same thing okay so that's different that is quiet we want to know shh is like this all right so I'm gonna go ahead what I'm gonna do let's put it all together let's put it all together in a script let's get away from these notebooks and let's actually create a python script capture sign dot pi all right what our capture sign is gonna do it's gonna have this same import though we're not going to do this with the results not actually going to Output the results or should we let's go ahead and I'll put the results let's make this a function we're going to call this do capture Loop and then we're going to put this in a try I'm going to put this in a try and accept and we're going to return landmarks when this breaks how are we going to get landmarks you may be asking well just quiet down there I'm about to show you we're gonna do this we're gonna pull this function in that we wrote called create frame Landmark data frame um what we're going to do here too as we capture we Loop through the capture is we're going to take frame we're going to increment it and we're going to say oh it's this is the um dreaded two spaces indentation from from Google thanks Google no I'm not I'm not hitting on Google I just I just need to switch this around to make indentation consistent uh it's gonna be annoying so there's not that many indentations right we can get through this if weight key oh what's that weight key and then return landmarks here I think this should work and we're actually going to do the drawing um and then we also need to do here and then we're gonna get our landmarks data frame when does it get the results it gets here so here we're gonna create land I hope you guys can see this create Landmark data frame landmarks all append and then we'll append these landmarks and I called it all landmarks here we're going to return this if name equals Main so if we call this script we're going to do the capture Loop and then we're going to take that all landmarks what are we going to do with it we're going to concat these landmarks reset index and then we're gonna to parquet it output.par K let's see if this works it's probably not gonna work what are the odds this works for our landmarks does hand matter is your camera flipped horizontally um it it shouldn't matter most of the people in the data set I'll say this are one-handed they prefer one hand but it shouldn't matter what are the odds out of 100 never understood the point of name equals main it's it's very simple it's very simple if I was just to say screw this if name equals Main if I was to say that and I just took this out right and now I run my script and La Tida it works it works great but if I ever wanted to import do capture Loop from this script into another script I can't import it without also running this which it I may not want to do so by putting this in a if name equals Main it allows myself to import from this file without having to run this stuff hey to parquet good catch thank you varyn let's see okay so it's already not working because it's stupid indentation let's I didn't want to do this close your eyes everyone don't look at my Solutions but what I'm going to do here is close these down and we're gonna actually do this in vs code we're going to be I've been trying to avoid this so I didn't accidentally show anything but I think we're good I think we're good think we're okay here okay so now if I save here oh I see what the problem is this all needs to be indented right now I save this it's going to apply a black formatting on it and now let's try to run it let's go create a tensorflow lite oh no objects to concatenate all right so this is why you don't have Global try accept statements because now I don't know what the issue was um why am I in here I should discard these changes and go over to vs code where we are now editing now I'm going to comment this out see I wanted it to I wanted to be able to work even when I hit Ctrl C to end this script let's go XYZ is not defined that's better x y z um that's right we never defined XYZ we need to make that an input to this function this takes the results from media pipe this is how um you guys can see the autocomplete is is uh copilot trying to help me out well it's a media pipe and creates a data frame of landmarks I guess that's true inputs let's see if it can try to auto complete it results frame frame number uh this is close data XYZ coordinate I don't know XYZ example data and then we are going to oh yeah then we still need to put this XYZ skeleton in here and we need to pass this through create frame landmarks did I mess up this order results frame XYZ okay XYZ we're gonna examine our zippers here we're examining we're just gonna make this we're being lazy here we're being lazy here and I just want something to work just make it work make it work Rob make it work hey hi Oh wait we're supposed to do this all right that's what I did now if I control C now that's the keyboard interrupt I think I think the way this code is written is [Music] all right I hit Escape that time was that the crate yes so hitting Escape actually what that does is Escape is this weight key and then that call calls the break so now if I this is pretty cool now if I go to my Tia flight inference on my data so now we're going to run that same tflight inference to try to predict what sign I just did but using the TF Lite model that we took earlier so now I am not going to read this I'm going to read my output parquet what's the sign it predicted 194 to sign let's predict it it predicted it worked it worked it predicted that I was saying shhh it took in this data all right so let's let's uh XYZ uh let's make this into a data frame oh wait it can't um let's look at the length of this 17 17 frames so let's look at frame 10 and let's plot x equals zero and Y equals one all right so it's kind of hard to tell here we're gonna flip the y-axis we're going to make this why are you breaking now oh wait there's no what okay so we need to do this columns let's give it a column X Y and Z I hate that this stuff's in the way uh now this x is gonna be X Y is going to be y that's me that's me and this is every frame so for frame in four Frame data in enumerate x y z numpy we are going to create a new plot and we're gonna title it frame number is no shape of past values is three one what what what wait what wait what what oh that's because this should be data that's frame zero one two three is my hand eventually get there there my hand is but you can't see the lines connecting them so it's kind of hard to tell um so they're actually people on kaggle who made it oh shoot where am I Rob would you mind doing some videos sometime soon about the best practices when it comes to handling really large data sets I could try we did it guys we did it though we ran this model on on that and it predicted it correctly there are some people who made notebooks that animate this animated data visualization so maybe I want so it makes a video like this out of the data let's try to do that let's just copy this um download what's it called animated visualization and I'm gonna run this thank you credits to this person who made this notebook I'm going to put this in the chat credit to them ah why is it asking me to log in the what the what what the restream.io what are you doing right now um where's my chat here no I hate you but I love you also for giving me free restreaming all right I guess I'm not going to put it in chat but just believe me foreign we're going to change this input directory and then we're also going to train change this file so let's rename this file let's move this output file and call it sh zero dot part k now we have the sh Zero part case so I won't accidentally overwrite it no such file or directory oh because it thinks it needs to be in that directory no it just it's just right straight up here so you can read like that all right so now we have the sign and now it's taking all these points and it's making the animation what's the problem with it uh type is left and right hand we have all these columns so this should be good but it's breaking here train file path equals path to sign subscribing what do we got here we have Rye do you think Russ will be more used in ml in the next few years not in the next four years years but maybe eventually Rye thank you so much for subscribing on Twitch let's go ahead and spin this wheel for you how long have you subscribed nine months Rye thanks for being here ah thanks I appreciate it sign being shown here so it's just this print statement here's my animation this was me that was me going shhh right that's my animation and then the prediction for this we almost need to make another one that's the prediction uh should we do another sign I'm so excited about this I can't see the chat now the chat window won't let me log in I don't know what you guys are saying is there a way for me to get into chat here uh uh new comments on this chat ah stream with Studio no I just want I don't want to upgrade I want to watch the chat somewhere I give up what are you guys saying someone just email me so I know uh docs we need chat here load oh don't make me log in and then have a window that won't actually go through that's annoying um powered by restream.io all right let's just look at twitch chat email someone sent messenger pigeon yes okay uh that's what you're saying and then uh YouTube chat here says cool Rob would you mind doing a video soon okay so so there's nothing really that new let's do some new signs let's do some new signs uh ASL let's try we can actually make our script take in the name of the sign that we're trying to do but let's not get it ahead of ourselves let's try something else let's try um going to this this train value counts and let's do oh bird let's do bird I know what bird is it looks like this this is bird this is a sign for bird this is duck what I'm going to do is I'm going to first do bird and then I'm gonna do duck because I know the I know these models and I know they're pretty bad at knowing the difference between bird and duck bird look that's a little bird I like how it does it sideways um cat like cat like eat bird oh that's funny um so bird we're gonna practice doing bird into the camera with my little script that'll pull it in let's go ahead and let's have some fun here so capture capture sign get ready bird Escape I I might have done the beak thing too many times all right so now I'm gonna move this output parquet to bird and then I'm going to capture sign again duck maybe I shouldn't say it all right let's move this output to be called duck zero dot parquet and let's run our model now if the model is able to predict on my data the duck and the bird I'm going to be impressed so I'm going to move this up here um let's call this a function to predict sine based on parquet media pipe parquet let's call it that um get prediction and let's call this let's give it a prediction function in the parquet file and then let's have it print uh sign not sing sign let's call this pred and call this sign all right so we're actually taking the prediction value and the sine so we'll just do this that's like the ordinal value of the prediction and the sign is this we should actually also get the prediction uh the sign the prediction location so this is the location of the prediction 194 and it says there's no key name oh that's because I need to do it like this what I'm trying to do is actually get the prediction at um confidence so that's like the confidence of the prediction with confidence get prediction and we're going to give it our sh parquet file and see what it comes up with sign reference before assignment yeah all right so sign predicted sign and the confidence is this uh so 0.58 it predicted that sh was a sh hey dead sippy welcome to the family thank you ASL is so silly why is it signed word sign sign yeah you you got me thanks for subscribing subscribe with Prime dead sippy thank you for doing that taking that away from Jeff Bezos for me all right let's go and spin the wheel for you sorry for screaming I'm just really excited tonight oh so close to having me scream Kevin as loud as I can but instead ah I'm just gonna sigh for you all right so let's get our prediction This Moment of Truth for the bird predicted sign was goose what's the ASL for goose let's look at Goose what does goose look like oh it's so close it's this look we have I'm learning ASL so I apologize if I'm doing this wrong but this is goose this is bird this is Duck goose bird duck models have a hard time distinguishing between those rightfully so so let's go close this stupid chat window which doesn't work uh it still has a low confidence of that and then let's do duck duck parqueta parque sign duck with the confidence of 0.999 what are you serious are you serious right now how did it know that was duck so confidently I must have done a really good Duck sign there let's see the animation of it shall we now what's going on here why don't we make these all into functions let's get rid of all this text too um animate sign video all right so we are going to pull this stuff down that's not in this function and put it down here you're gonna get rid of all this other stuff all we want to do is show the sign this was our shush that looked like a shush and then we're gonna show the bird which it thought was a goose it was a big old goose egg maybe I just signed it wrong yeah I kind of did like a forward motion with my it's that's on me that's on me and then the Glorious duck recording it knew it knew without a doubt this is a duck sign let's take a few more requests from chat oh wait I forget that we don't have good chat here so twitch chat second raised finger might be a big difference yeah it is so can someone give me a um out of all the signs possible signs in ASL what ones do we want to do let's take two signs let's do three from chat twitch you guys get to you get two how much how does the animation show my feet because it's trying to detect a pose sorry for picking my nose right there is trying to detect my pose and it's and there is not my whole body so it's failing chicken someone wants chicken that's in the list so let's Let's uh Define some ground rules here it might be in the list out of these oh you guys can't read this probably out of these is chicken I think we're out of luck with chicken it's not in this has to be in here can I put this in the chat it's probably too long for the chat twitch what do you want donkey donkey and zebra those are the first ones I see zebra and Donkey zebra and Donkey those are coming from twitch ah donkey zebra all right YouTube you guys get one shot at this don't screw it up not this YouTube um this one I see wolf and I see brother wait police says that was the first one right police sloth is sloth in there I want to see police is police really in here police is we have a winner winner Wing winner chicken dinner but chicken is not a word that we can do okay so donkey zebra police now what we have to do step two figure out how to sign these step three profit all right so we gotta do donkey first ASL donkey oh okay so we're doing the one-handed version oh that's cute like this okay do you guys see like this wait uh I I need to do it like he does hand up and then wave twice wave twice all right does that look good do you have approval from you all igator um I I take that as approval let's go to my terminal let's get ready to donkey ready uh that kind of stunk because I I need to make the camera be up a little bit more let's try that again try this again Moment of Truth all right so we have our donkey donkey zero dot parquet uh the next one was zebra ASL sign for zebra oh it's like stripes stripes okay I think I got that one all right let's do it let's do it zebra time all right we got our zebra we got our zebra um and then the last one was police foreign for okay so this one has a bunch of them stop Hospital arrest alphabet um I just want police ASL for police okay these the this lady's videos are good oh you do like this doesn't need to be like this I guess it doesn't matter I'm just gonna try that I'm gonna try that and see how that works okay back to here here we go Escape out of that let's see move our output as police zero dot parquet here we go here we go y'all what's the scenario all right let's go ahead and do the prediction we start with zebra oh I didn't name it zero zero and then we did no we started with donkey then zebra and then police all right I need a vote in chats who thinks this is gonna work what are we thinking one two or three out of three I guess zero to three out of three vote now vote by just writing how many you think twitch you too what's up how many of these signs do we think it's going to get correct three out of three people are confident do a poll uh uh new pool how many will will the model get zero one two three I'm lonely um we're gonna make this a two minute pull we're making this a two minute pull some people think too someone thinks three a lot of people think two why which ones of the two is it gonna get which ones are the three is it not gonna get I think one I don't know that you think I did the police right I need to practice that you guys are a little bit more confident than me okay someone's doing one nobody's lonely I like that whenever you're in in this stream you're not lonely if you're on YouTube go over to Twitch okay YouTube people are voting too some ones out of threes some twos other ones um while this poll is going on I'm gonna end the stream as soon as I'm done running those cells we'll see how well the model did on these recordings I just want to tell you about a few things now since I don't have my normal chat up I can't really just put it in everything so I'm going to do exclamation point YouTube this will give you my YouTube link which is youtube.com Rob Mulla which I will also paste in there and I'm gonna I'm gonna paste it into the YouTube chat even though you guys are already at my site so why would you need the link then I also I also have uh Twitter slash Rob underscore Mulla hope that one's right yeah that's me and then I'll uh put that in this chat we also have a Discord should definitely join our Discord if you haven't already so that is in that one chat and I'm putting in this chat it's very unorganized and then we have our kaggle competition so if you want to join the community kaggle competition that I've been doing uh it's the POG champ series on predicting how much I sleep I've provided you all with eight years of my data of how much I sleep of uh uh oh my workouts everything from my Fitbit and watch and you need to predict my last year and a half worth of sleep every night how much did I sleep if you win that if you do that good if you if you win that competition you get that Nvidia 4080 GPU I have back there if you just join you have a chance of getting that 30 80 GPU back there so there's no point of not joining it I just put the link in the chat and the link do I do live stream on a fixed schedule usually it's Tuesdays Thursdays or Sundays but I'm kind of on my own schedule so it's whenever I feel like it this is the POG champ competition 116 people in it uh first place is is winning on the public leaderboard by a good bit but still a long time to go we have until May 1st that you can join and potentially win one of those gpus so definitely do that what else do I need to do no that's it I think we're there let's look at the poll results most people think two out of these three will be correct next up people think one and then no one is lonely and no one thinks zero moment of truth let's do it people up first is donkey before I show it here I'm going to show the animation one thing to keep in mind is the detection might screw up okay so that's my donkey so I did this did this which is supposed to be donkey let's see what it predicts donkey with the confidence of 87. yes pretty good so we're one out of three correct there's no zero out of three next up is zebra if you don't remember when I signed it media pipe pulled it in it looked like that I did a good job there right it's like this zebra zebra I think it's I think we're gonna go three for three here sign zebra predicts with 0.9961 confidence two for three people two for three what's next I should never have doubted you all who had said three I think I convinced many of you to switch to two but this is the moment of truth is number three of three actually gonna work correctly let's build up some drama let's check the chats no one's saying anything nice detected all your fingers and this person has eyes road to three pog um all right without any further Ado let's look up look at our police sign foreign so I don't know if I did it like does it work if you're doing it this way or this way I don't know if it has to be a specific hand we're gonna find out police it predicted police with a confidence of 0.66 but it predicted all three rights let's go let's go nice everyone that was fun that was a really fun stream I hope you guys had fun I'm gonna go ahead and sign off here follow all these those links join our cargo competition let's find someone to raid on Twitch Raiders away who's streaming on Twitch coding with strangers we're gonna raid coding with strangers so stick around if you're on Twitch we're rating coding with strangers he's doing some python some speech recognition this could not transition any better so make sure you stick around if you're on Twitch if you're on YouTube you just go bye bye go watch some other videos no or jump over to Twitch and follow that so let's go ahead and and get ready to raid and I will see you all next time thanks for hanging out with me and I'll see you guys on the leaderboard of kaggle um have fun everyone be nice and love each other and I will see you next time I stream we detected three for three actually five for five count the other ones maybe more I forget all right have a good time all right YouTube you guys have a good one bye-bye
Info
Channel: Rob Mulla
Views: 22,808
Rating: undefined out of 5
Keywords: sign language detection, sign language detection using machine learning, sign language recognition, machine learning, sign language, pose estimation, tensorflow lite, face detection, hand gesture, sign language detection opencv python, sign language detection using action recognition with python, sign language detection project, sign language detection using cnn, kaggle, rob mulla, computer vision, data science
Id: L-IaQch8KYY
Channel Id: undefined
Length: 123min 50sec (7430 seconds)
Published: Wed Apr 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.