11.4: Introduction to Computer Vision - Processing Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so this is the last video in this section of videos about images and pixels and what i want to talk about in this video is computer vision creating using using an image as something other than a thing we draw on the screen or a thing we look up colors from to draw other things on the screen what does it mean to try to have our program our thing we're making and processing see the world in some capacity maybe to determine if there's a user present maybe to determine if the user is waving his or her hand maybe to determine what color clothes the user is wearing there are lots of things we might be able to figure out from a scene if we have a camera looking at that scene or an image that was taken from somewhere okay so this is a huge topic and somebody who actually knows what they're talking about could probably make a long series of videos going quite in depth about topics and examples in computer vision what i'm going to do in this short 10 to 20 minutes here is simply uh kind of look at some of the basics what are some of the key fundamental pieces that we need to know about or figure out in order to build our own computer vision application and then what are some pathways to doing more stuff with that so let's let's let's kind of like start with a basic kind of classic scenario let's say we have a scene there is a camera from our computer pointed into a room and in that room there is a very bright light you know as a this sort of weird-looking light bulb thing that i drew here and maybe there is a person like i can't draw at all uh holding on to that light bulb and they're moving it around could we have a camera over here looking at this scene track that light bulb how would we do that well uh conceptually we might say aha let's find the brightest pixel in this image how could we find the brightest pixel in this image well we might start with this pixel and see that this pixel is very dark it's uh has a color a brightness value of you know zero and then we look at the next one and see it has a brightness value of zero and the next one has a brightness value of three and the next one's a brightness value of four and then zero again and eventually you might get around and be like oh look there's a pixel over here with a brightness value of 255. that's really bright let's remember where this xy location is and as we get to the end of looking at all the pixels could which one had the the highest brightest value this is searching through the pixels to find to try to see where the brightest uh pixel is so let's look at a kind of variation on this for a second over here i have an example which i'm going to run for you that's hopefully going to pull up an image i have a camera right here looking at me um and you can see there i am above over here and i'm going to take this blue marker and i'm going to click on it and you can see oh i'm kind of drawing a dot following this blue marker i'm tracking the blue color in this image that's coming from a camera how is this done now you can see there's a lot of problems there it wasn't so perfectly accurate and this is not there's a lot of issues here that i that i want to take a minute to discuss too but let's just say for the sake of argument that is the be-all end-all of computer vision how do we write a program that does that that recognizes the color in an image and continuously follows that color as it moves over time very similar we talked about here the difference is uh maybe we're looking for the color the the pixel that's the most blue not the pixel that's the most bright so one key thing we need to figure out here is how do we determine if a color is similar to another color so let's think about these two colors here zero comma 100 comma 255 r g b and now let's take another this is some color it's kind of bluish greenish now let's take another color 250 you know 255. so how similar are these colors well one thing we could say is like uh the red values are 200 units apart the green values are 50 units apart and the blue values are 0 units apart so i could kind of give it a similarity score of 250 where zero would be the most similar right if all of these values were equal and uh 255 times three would be the largest uh difference if all of these if you know values were only 0-255 so we can see here like taking the difference subtracting one color from another and you'll notice i'm using the absolute value meaning 200 minus 0 is 200 50 minus 100 is actually negative 50 but i don't care whether this one's greater this one's greater i just want to know how far apart are they so this is kind of a way conceptually we could see how different are these colors it turns out that uh uh an accurate way that we can get this all into one line of code let's say i have these two colors as in variables r1 g1 b1 and r2 g2 b2 i can say float the difference between two colors is the distance between those colors r1 g1 b1 and r2 g2 b2 this line of code right here using processing's distance function will actually give me a rating a value a numeric value that indicates how similar these two colors are how does that work well distance probably makes sense to you if i have a three-dimensional space which is where i am right now my hands are a certain distance apart now they're getting closer and closer and closer well what are there's there's three dimensional space there's like an x-axis a y-axis and a z axis and things in this room that are closer to each other have a lower distance well we could think of these axes instead of being x y and z as being the red axis the green axis the blue axis what if we filled this three-dimensional space with every possible color colors that are nearer to each other would be more similar than colors that are further apart from each other and this use of euclidean distance the distance formula even though conceptually we think of it as something that has to do with physical space or even two-dimensional space in the case of two-dimensional distance we can use that with color as well now why am i spending all this time talking about this just this distance formula this is what we need to do if we're trying to find the color that is the most blue what if i look at this pixel and find its distance from blue then the next pixel and find its distance from blue then the next pixel and find its distance to blue one of those pixels will hold the world record for smallest distance to blue and if i keep track of that record does this pixel break the record no throw it away does this pixel break the record no throw it away does this pixel break the record yes it does keep its x y now keep going and if i ever find anything else that's greater than that then keep those x y when i after i finish looping through all the pixels i'm going to find the x y location of the pixel that is most blue so the two things that we need are one we all the things we already have is we know how to loop through all the pixels the things that we don't have necessarily from before is how to find the similarity between two pixels this distance function is a great way of doing it and also how to keep track of which pixel is the one i want to remember later after i finish looping through everything so let's go back and look at this example again and i need to run it so um there's a few things i guess i'll point out in the code but let's just make this example run again i'm going to hold up this blue marker and click on it you can see yeah you know it's tracking it it's continually finding this blue marker now a couple things that are going to fail here let's see if i want to track the green green you know i'm clicking on it and you know there's this whole background is green so tracking something green is going to do very good what if i click on my nose to try to track the color of my nose you know the color of my nose similar to my forehead is similar to my hands so there's a lot of flaws in this scenario of just looking for a single pixel you know how much this is jumping around we could add a little easing or interpolation that might help that there's a lot of flaws in this and honestly if we really want to track a color an object that has a specific color in a in a image it probably makes more sense to look for an area of pixels that are very similar we're just looking for a single pixel so um but this you know while this might not be the most useful uh application in a kind of real interactive scenario this demonstrates a lot of the basic principles of computer vision so let's go let's look at a couple things one is in the in the code here one thing i want to point out is this is exactly what i'm talking about um we have a current color from the video and i need to get its red green and blue values and then i have a color that i'm tracking that's the color i'm searching for so i want to know how what's the distance between the current pixel i'm looking at and the color that i'm tracking so this here's the distance function being used inside of this nested pixel loop the other thing that i'll point out is we're starting before we go through the loop we keep track of there's what's the world record the world record some big number that that's obviously the first pixel is going to beat and then we need to keep track of that x and y point so at any moment if we find a pixel with a distance that's less than the world record then the new world record is that pixel and that x and y we should save so this is that second piece the first piece of something new here is that we need to figure out how our two images how are two colors different or similar using the distance function and another piece is how do we while we're looping through how do we keep track of which is our favorite pixel essentially is this our new favorite it is save it and we'll save it again if we find so it doesn't actually matter how what order we look through the pixels we're going to hold on to that quote unquote best pixel all the way through that loop okay so this is giving you some of the basics again you can see there's a lot of flaws here if i were to give you an exercise i might add try to add some easing so that uh let's look at so even if i'm oh this is a black pen let's go back to the blue uh so even if it's you we could probably get this to be at least somewhat smooth the other thing i should point out here is there's no need to display the video look at this woo i'm magically controlling the circle above me um so so one thing where they recognize here is the camera is really acting as a sensor there's nothing about it that we're using for display purposes we're just reading the image and getting some um kind of analyzing for some piece of information so uh uh just recently at the winter show here at itp students built a spray a paint app an interactive paint application where you would hold a spray can up to the wall and inside that spray can was a very bright green led and there was a camera on the wall so the camera was able to track where that spray cam was simply by tracking a very bright led so there's a lot of cases where you might want to track something find it with simply with color i should point out that another um gonna just jump to this another uh scenario where you might like to look at the difference between pixels is for looking for motion so i'm gonna stand here and try to stand very very still and you can see when i stand very very still except for my mouth moving um there you're not seeing any black pixels but if i move around a lot you're seeing a lot of black pixels so one thing that you can do is you can find motion by saying i have one frame of video and i have the next frame of video let me compare each pixel a pixel that changes is likely where there's motion right because if my hand is never moving the pixel color here is not changing but as soon as i move my hand the pixel colors are changing now notice that even just with subtle movements we're getting it's in a way we're only we're finding the edges of my hand because if you think about it the pixel even as i move my hand the pixel colors in the center of my hand aren't really changing it's only changing along the edge where this white part where this with this part of my skin meets the green part of this green screen that you can't see because you see a computer screen so this is another scenario and all these examples are in the learning processing github repository connect this one is example 16.13. so um this is a little bit about a computer vision here i guess i'll just show you one more here there's another example which we can look at just how many pixels have changed per frame so if very few pixels have changed per frame you see the small dot in the center and as i move around a lot a lot of pixels are changing the dot is getting bigger so this is a scenario where we can use and maybe an exercise might be could you find where the motion is so you could create an application where as i'm waving my hand uh i may something's able to follow the thing that is moving so there's a lot of possibilities here in how you use what the computer can see can you find the edges we looked at edge detection and image processing finding a specific color finding the brightest pixel finding the darkest pixel finding pixels that are moving these are the types of things you can do writing your own code in processing however this is not a new idea i didn't invent any of this in fact i know very little about this compared to a lot of people in this world and in fact if you want to work on a computer vision application it's quite likely that am i recording yes and how long uh 13 minutes we're fine so it's quite likely that what you want to end up doing is using a library uh that has a lot of computer vision functionality built into it so one library that i would recommend you look at is uh i'm going to have to just sort of google search this here really quickly github is opencv for processing so this is a library by greg borenstein um opencv is an open source computer vision library originally developed by intel now maintained by open source collectives this is me and you can see here that there's a lot of functionality it can find your face it can do all sorts of image processing it can look for blobs and contours so a lot of the stuff that we might spend hours or weeks or days or months trying to program from scratch are features that are built into a library and you can even see as i get down here there's lots of kind of really interesting features like um like i don't know recognizing a card for example okay so or yeah recognizing uh different markers in an image so anyway okay so well let me show you one one thing that this is kind of classically used for uh is for uh face detection so let me run this one we close these out and back to processing and run this so one of the things that uh opencv will do for you is it will find faces in an image so you can see here as soon as i turn to the side by the way it does not it's not recognizing my profile only if i'm looking dead on so what opencv will do for you is it will give you a rectangle now this is quite different than face recognition right it's this is just face detection oh there's a face there it is that's how big it is but you can imagine some applications you might do here is some how many people are there are somebody looking straight versus looking to the side uh there was a project just here in the itp winter show where two people are standing in scene and their faces are swapped so there's a lot of creative possibilities here you know if you if you use google hangout you can see that you know you can add uh there's all these features you can add a hat or a mustache on somebody you could do this but with the opencv library as well so this is something that i might encourage you to take a look look into as a as a possibility and one thing that opencv has also is blob detection so finding areas of brightness or areas of darkness which is very useful in a tracking sense as well so the last piece that i want to just kind of demo for you here and i'm going to have to uh plug it in is that there is a thing called the microsoft kinect i'm afraid to pick this up can you see this this is the microsoft kinect this thing has a camera in it and a regular rgb camera it has an infrared uh camera and has an infrared projector so what what this sensor is going to do this camera this depth sensor and let me see if i can open this example up and oops hello this is what this is what happened oh there we go okay so you can see that this is showing us a couple different things one is here's another image of me in this room you can see that i have a little tv over here where i can watch myself this is all very strange um and there's just a camera coming out of this connect but then there's also this image over here this image is a depth map now it's very the way that the screen captures work it's very hard to see what's going on here but you can see that my hand is a little bit brighter and now it's quite a bit darker and now and you can see as i put it in here we can't see it anymore so what the the microsoft connect and this is the old connect the 144 this is the original model from a few years ago 1414 is the model number um what it does is it's able to not just say like here's the pixel and here's its red green and blue value what it says is here's a pixel and here's how far it is from the connect from the camera from the sensor in millimeters so you can know what's close what's far away and another it's a little hard to see probably where we are based on this like orientation i'm like looking at 12 different places but you can see that this is essentially the data visualized in three dimensions so some really basic 3d scanning of terms of a scene what i'm not using here is there's a library called open ni uh simple benight which is actually now open and i was purchased by apple okay here's the thing i i i don't want to like ramble for like the next 20 minutes about all the different versions of connects and which ones work with which operating system and which library but what you have right now at this very moment there's the original kinect that came out a few years ago that you could use there's a library for pc and processing that you can use there's two libraries for the mac and processing one which is open connect which is a library that i built on top of some open source uh drivers for the kinect and there's also called something called simple open ni which uses open and i which has since been purchased by apple and has been shut down so while all of this works if you're interested in the connect you probably want to go and look for connect version 2. that's the newer connect it's a bit higher resolution a lot of the skeleton tracking which i haven't really mentioned yet is a bit more accurate i think um however uh there aren't open source drivers as easily available for the new connect and so there is an official microsoft developers kit and you can use that with processing however it's only for windows and at the moment there are a lot of people here at itp and probably other places in the world working on ways to pass the information from the kinect and a pc over the network what you have to realize is the connect itself is just sending this raw data what is the distance for each pixel from the connect what you can do with that information is incredibly powerful and if we looked at any demo of the version 2 connect or the old connect you would see it can recognize the form of my body very quickly and and track where my arms are my hand is my head is my knees are and all sorts of stuff you can do with that we don't have time or i don't really uh to get into all of that here and perhaps someday there'll be more videos or more examples that i can do and help prepare in that direction however let's just i just want to jump back for a second and just show you what you can at least do uh something that the kinect makes possible um with just even just the raw depth information so uh one thing i want to do is do this okay so one thing you can see here is uh this is where i'm standing as i walk closer to the connect i start to turn red as i walk further away i i stop being red as i put my hand here my hand is red so with depth one thing that you could do rather easily is say where are the where is the thing that's closest to the camera and in the sense of a hand pointing straight out that's pretty easy to track now notice i put my other hand out now that dot is in between them take this one away switch so this isn't doing any sophisticated hand tracking i could like put my head here it's just looking for like a blob of stuff that's close to the camera i could take this marker and point it out like this uh and you can see it's sort of tracking that i think this marker is actually a little bit reflective so there's a lot more to this uh how it works and infrared light and the depth versus the skeleton and blah blah blah and i'm just kind of doing a terrible job here but what i would hopefully you got a sense in this video if you're still watching is that that for loop right of looking through all the pixels you need that you need that for for the image processing stuff you're doing you need that for some computers and stuff and in fact you need it for this because instead of looking for the color that's the most closest to whatever i'm looking for the pixel that's the closest to the um to the connect itself so that for loop is pretty crucial and there's a lot of things you can do with computer vision however there are also you could do from scratch on your own but there are also a tremendous set of libraries and resources and other devices like a depth sensor that you might consider as well very last thing is i'll i'll try to include a link also there are uh third-party applications open source ones uh like compute community computing community core vision open tsps these are applications that you can run behind the scenes on your computer and have them pass messages about what they're tracking to processing and that that's also a way that you might think about in developing uh computer vision an application that involves computer vision okay so someday i will um kind of revisit some of this stuff in a hopefully more organized or useful way but for now i'm just gonna say goodbye and i have no idea this is probably way too long okay uh but i did manage to get the kinect working in this video which is kind of interesting okay see you later
Info
Channel: The Coding Train
Views: 221,410
Rating: undefined out of 5
Keywords: computer vision, computer vision explained, capture, opencv, kinect, video, Kinect (Computer Peripheral), opencv processing, opencv java, kinect processing, kinect java, kinect tutorial, pixels, computer vision tutorial, introduction to computer vision, processing computer vision, kinect processing tutorial, computer vision processing, intro to computer vision, computer vision introduction, kinect computer vision, computer vision beginner, processing tutorial, processing
Id: h8tk0hmWB44
Channel Id: undefined
Length: 22min 51sec (1371 seconds)
Published: Fri Jul 24 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.