Structured Light Range Finding | Active Illumination Methods

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
now let's talk about structured light range finding our primary objective here is to compute the depth of each point in the scene and so we'll start with simple methods and build things up so the simplest way you can do this is called point based range finding so in this case you have a three-dimensional scene right here and you have a camera and also a light source which happens to be a laser pointer say for instance it produces one ray of light it's not illuminating the entire scene it's just one ray of light which is shown by the red line right here and we're going to assume that the position of this laser pointer and the camera is known with respect to each other in fact we'll define the coordinates system of this entire range finding system to be located at the center projection or pinhole of the camera that's right here so we know where the pointer is and how it's oriented with respect to this coordinate frame so now uh you shine this ray out into the scene and it strikes the surface the scene at some point that point is then going to produce a bright radiance and it gets imaged by the camera to produce a bright point at this location x i y i in the camera so the question is if i am able to detect x i y i the bright point then can i compute the three dimensional structure of or the coordinates of this point right here and indeed you can because this point x i y i gives you the equation thanks to perspective projection of the line of sight corresponding to that point that's this orange ray that's given by these two equations right here and we already know in this coordinate frame the equation of the straight line that goes out that corresponds to the light ray that's being emitted by your laser pointer that's the red ray right here and that's given by this these equations right here so essentially the point x y z the scene point corresponding to this bright point in the image lies at the intersection of the the scene ray that goes out from the camera and the light ray that goes out from the laser pointer and so that's very easy to compute so uh how does one use this in practice well let's say that you have a scene and the scene has some ambient illumination it actually looks like this and now you project this light ray onto the onto the scene you get a bright point right here so what you could do is take a picture without the laser pointer take a picture with the laser pointer and then you subtract the two to get this image right here this is a simple technique called background subtraction so you can do this once you can have the background image captured once before you start the process and then for this particular bright point that you see here you're able to compute the three-dimensional coordinates x y z of the corresponding point in the scene okay so the main question that we're concerned with here is how many images would be needed to recover depth map of the complete scene so the only way we can do this is actually scan the entire scene is to move the light ray such that it just covers every point in the scene let's take a look at that once again just literally scan the entire scene with your laser pointer but you need to take one image for each location of or each orientation of your scenery so you need one image per pixel so to speak if you had an image that you want to compute a depth map which is 640 by 480 pixels then for each one of these pixels you need to capture an image so that would correspond to in this case 300 000 images which if you're using a camera that captures 30 frames per second which is usually the case then that would take you three hours to capture all the information you need to compute just this one depth map way too long obviously for virtually any application so the question is can we do a lot better than this and indeed we can so instead of using a laser pointer that throws out one ray of light imagine that you have some kind of a projector that throws out a plane of light this is called light striping the light stripe method so there's a plane that's being thrown out this plane of light goes into the scene and will intersect the scene along some curve and that gets imaged and now for each point along that curve we want to be able to find the depth of the corresponding point in the scene and indeed that's possible because what you have now is the equation of a straight line that goes out into the scene from the point x i y i let's say that's the point you're concerned with right now you have a straight line for that and you have the equation of the light plane as well which is generating your light stripe that's your light plane and we assume for now that we everything is known to us with respect to the location of the projector with respect to the camera etc so we know the equation of the light plane in the coordinate frame of the camera and so this physical point corresponding to this bright point in the image the physical point must lie at the intersection of the ray that goes out from the camera and the light plane and that's very easy to find from these equations that you have right here and you get a very simple expression for the z component or the depth of the point in the scene and once you have depth z you can plug it back into these perspective projection equations and you can get x and y as well right so how much better do we do by using light striping uh in other words how many images do we need so this is a useful way to look at this scenario on the on the left on the right side you see what the projector sees of course the projector doesn't see anything but if you are sitting at the projector location you're throwing out a light plane so that's going to look like an absolute straight line that's going to correspond to one column of what we'll call the projector image even though the projector doesn't capture an image it generates an image and that's what's being projected out into the scene so that's one column right here and that light plane goes in like i said before and intersects the three-dimensional world to create this curve right here this profile cross-section if you will this is what the camera sees so uh the question is um how many images do we need to take in this case well you can you need to project one light plane for each column in the in the projector and so it's one image that you need to capture for each column of the projector and so now if you go back to our our goal which is to capture a 640 by 480 depth image and let's say 640 is the number of columns in your image then you would still need to capture 640 images which takes roughly 31 seconds still way too long now so the question really the obvious question here is why do we need to just project one plane into the scene why is it that we cannot project multiple stripes at the same time why not do this simply throw out a bunch of stripes you're going to get an image with a bunch of intersections and we should be able to at least at first glance it appears that we should be able to take any one of these these curves right here and figure out where it which which column here produced that curve well we are able to do this because we're using our visual system to reason through this and figure out which one of these intersections which one of these curves would have been produced by which one of these columns or stripes right here but actually there is no way to really do this in practice especially for complex scenes so imagine that this is your scene you have me in the background my hand right here and you have a light plane coming through this light plane can actually go between my two fingers here and go off and strike something in the background right here and whereas the next light plane that comes in strikes this finger so what's going to happen now when you look at it from the perspective of the camera the order of these light curves that you're talking about the intersections can be switched very easily so there's no obvious way of telling where which light plane produced a bright point in the image when you have multiple stripes that you're putting out so if you're looking at this point right here in general it could have been produced by any one of these light stripes and each one of them corresponds to a different depth in the scene different three-dimensional coordinate so it's ambiguous so that brings us to binary coded structured light that is we want to do we started off with point scanning and then we did better with striping or line scanning and now we want to do even better in terms of the number of images that we need to capture make it more efficient so here's the idea behind binary coded structured light very simple idea which is that imagine that in our example we're going to use seven stripes of course in practice you would use many more but let's start with the example of seven stripes we can give each one of these stripes a number one through seven and then we can express this in terms of a binary number so it takes three bits as you know to express numbers from one through seven three bits right here so 0 0 1 0 1 0 all the way to 1 1 1. so only three bits are necessary that's the important part here so you're going to write out your bits right here as such bit one for each stripe and then bit two and then bit three so now here's what you're going to do you're going to look at bit one and you then illuminate the scene in such a way that all the the stripes for which bit one is zero you turn keep them off and all the stripes for which bit one is one you activate them so in this particular case bit one you're going to have four stripes going out these four stripes and that's exactly what you see here you capture an image and then you do the same thing with bit two and you capture an image in this case you see this is a one this is a one one one and indeed you see these four stripes here these four intersections and you do the same thing with bit three so you capture only three images and now if you consider any point in the scene let's take this point right here you see that this point was on it was bright in the first image off in the second image and it was on again in the third image and the only light stripe that was on in the first image off in the second image and on in the third image is number five so by just capturing three images you're able to without loss of information be able to figure out for each point in the scene which light stripe actually illuminated it once you have that as you know using the previous method that's a line scanning method you can figure out the three-dimensional coordinates of the point so that's the power of binary coding but it gets interesting when the numbers get larger so in general we can do 2 to the power of n minus 1 stripes using just n images and n the minus 1 that you see right here is only due to the fact that we are not using 0 0 0 because 0 0 0 corresponds to all the stripes in our example here of seven stripes this corresponds to all the stripes being off that doesn't really add any information to our system and hence the minus 1 right here so you can see here the 2 to the power of n minus 1 stripes and n images so if you have eight images you want to capture eight images you can actually do two to the power of eight that's 256 minus one 255 stripes so that's a reasonable number of images now especially if you're able to do this using a fast projector and a fast camera okay so let's take a look at an example here so in this example you had this is how you would set up the system you have a digital projector of some sort and that's the one that produces your light patterns that you're going to throw out onto the scene in this case we are going to use seven patterns which means we are willing to capture seven images and here are the patterns that you see right here pattern one through pattern seven you project these seven patterns onto your scene which is shown right here and you capture these seven images now if you look at any one point here on the in the image plane it produces essentially what we might see as a code word so it happened to be off in the first image on on off off on and off but this right here this code word tells you exactly which column in your projector illuminated in other words which stripe illuminated that point that's all you need to compute the three-dimensional coordinates of that point so using that technique for every pixel in the image you can reconstruct the three-dimensional shape of the object in the scene now there is one subtlety here that's worth looking at which is errors due to what we might call light bleeding so let me first explain what we mean by light bleeding this whole idea of using the structured light in these patterns is really based on the assumption that you have essentially two states at each pixel the pixel is either lit or it's not lit but in reality you can have light bleeding between a pixel and its neighboring pixel that is you have a projector the projector itself has a finite depth of field because after all it's using a lens to project what that means is for certain points in the scene the illumination pattern that's being projected might actually be somewhat smooth or blurred in addition the camera that you're using to capture information also has a finite depth of field so what that means is that during the image capture itself you might have some blurring of the image which again produces light bleeding especially around transition points that is when you have a black area next to a white area exactly at that edge that's the that's the part that we are concerned with so for example in the code that we used uh the binary patterns that we use in the case of seven stripes you see that there are actually 10 transitions here that's the way it works out whenever you go from a 0 to 1 or a 1 to 0 that's what we call a transition at these transitions the brightness value may be ambiguous in the sense that if you're using some threshold you might make the error of calling a one or zero or vice versa and if you do the errors can be substantial because we're doing all of this in binary so as many as 10 errors are possible here so that brings us to the idea of gray coding which essentially mitigates this it doesn't resolve this problem completely but it's reducing the problem substantially by reducing the number of transitions so here you have your transitions but you can play a little game by simply swapping the numbers that you assigned to these different stripes we had initially gone from one to seven right here but let's say that you make the third one two and the second one three you make the seventh one four and uh the fifth one ends up being seven and so on by swapping these numbers you're basically changing the binary code associated with each number each one of these numbers binary representation changes in doing so you reduce the number of transitions to 6 in this particular case you're not losing any information here you're just changing the the numbers of these stripes and but on but you are gaining in terms of number of transitions okay so uh this brings us to the idea of extending this to multiple levels of brightness say for instance so we talked about two levels on or off and that's your binary coding system and that means that but you know that binary system has base two right but you can go to the ternary system which has three levels you might use three levels of brightness one of them could be off the second could have some value and the third one could be double that value or you could have three different colors that you're projecting red green and blue anyway there are three levels and so you have the ternary system where you have base three and in this case you can hope to actually do even better and in general you can go to k levels and do even better where k is much larger so you have levels going from zero to k minus one so let's take a look at the ternary system and and doing uh structured light using color coding so we come back to the same problem we were looking at before which is we have these seven stripes right here right numbered one through seven but when you're using the ternary system which has three levels you express this not using bits but what are called trits for ternary digits trits and you only need two trits for each one of these to represent each one of these numbers right here so in this case it would be 0 1 0 2 and so on and so forth and let's say that these three levels were using 0 1 and 2 in our implementation correspond to three different colors red green and blue so then in that case this is what the patterns would look like that you're going to be projecting in the first image you're going to project red red green green green and blue blue and so on so this is your first light pattern right here so again you take this first pattern and you simply project it onto the scene and you capture an image and now you take your second pattern project it onto the scene capture your second image and let's go back to our point here with let's say this we're looking at this point right here well this point happened to be green in the first image and then it was blue in the second image and the only stripe that was green in the first image and blue in the second image is stripe number five and so once again once you know that you can compute the depth of the point so we went down from three images to two images but that's in the case of seven stripes in general with k levels let's say you're using k different levels k different colors or k different brightness values we get to do k to the power of n stripes in just n images and when one of these levels as before is zero then we don't want to be using all zeros that's just doing nothing and so then in that case if you're using that as one of the states then you get k to the power of n minus 1 stripes so it is a big payoff now talking about color a couple of caveats that are worth mentioning so we see here that we can increase the number of colors and therefore get more and more efficient in terms of structured light but there is a price to pay so if you're using a regular camera and a regular projector you know that the colors are tend to be broadband in terms of the spectra that they actually put out and cameras have broadband filters for capturing the color channels as well so let's say you're using an rgb projector an rgb camera and you decide that you wanted to do more than three colors a lot more colors say these colors right here it's quite possible that you end up capturing colors in the image it looked very similar very difficult to distinguish partly because the colors themselves may actually be very close to each other in terms of spectrum but also one has to take into consideration the fact that the object that you're reflecting onto that you're projecting onto has its own reflectance properties it may reflect certain wavelengths more than other wavelengths those are the spectral properties of each point in the scene and so these stripes in the image when you capture them might end up being too similar to differentiate with respect to each other and then there is a greater problem of not getting any reflection at all so let's say you had a deep blue patch in the scene and you're projecting onto it a bright red stripe well we know that blue and red are at two different ends of the spectrum the visible light spectrum and therefore the blue patch is unable to reflect any of the red light and so you don't get any reflection in these areas that also happens when you have a red patch and you're projecting a blue stripe onto it so these are some limitations and what's interesting is that the kind of scene that color coding actually works really well for is a scene that essentially has gray objects because we know that a gray point reflects all colors equally well and therefore the gray the gray world so to speak is ideal for color-coded structured light
Info
Channel: First Principles of Computer Vision
Views: 1,171
Rating: 5 out of 5
Keywords: Structured light, range finding, point scanning, light striping, binary coding, ternary coding, k-ary coding, light patterns, gray coding, color coding, binary coded structured light, color coded structured light, Shree Nayar, First principles, computer vision, computational imaging, computational photography, Columbia Vision Laboratory, CAVE Laboratory, Computer Science, Columbia Engineering, School of Engineering and Applied Sciences, Columbia University
Id: 3S3xLUXAgHw
Channel Id: undefined
Length: 24min 23sec (1463 seconds)
Published: Sun Apr 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.