Arducam time of flight camera for 3d segmentation and distance estimation

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

I've attached an autocam timer flight depth camera and an esp32 to the Raspberry Pi and I'm using this for 3D segmentation and distance estimation and in the rest of the video I'll show you exactly how I've done that also I just want to say that I've got no affiliation with Ado Cam and I paid full price for the depth camera so I got my timer flight camera and I ordered this off the rdcam website and you can also order it off Amazon and it's a little bit more expensive off Amazon but you save on the shipping so it's really much of a much less so it comes in this box so I'm wondering if it's a big camera or it's just quite a lot of packing materials in it oh okay so that's packed in this popping paper here and then it's just in this little box here so that's the ardu cam timer flight camera and let's open up the box and see what see what it looks like it's okay so um so this is like a five volt power cord which plugs into the Raspberry Pi and then into the camera um and then the camera is in this anti-static bag here and it's got this data cable the ribbon and then just turn it over and see what it looks like oh wow that's amazing let me just scroll in a bit my plan is to use this camera to do 3D object segmentation and distance estimation where I overlay the image onto another camera so I've got my Raspberry Pi uh running through the VNC viewer which I find is the most convenient way to use it and I've got the timer flight camera connect to the Raspberry Pi so I'm just going to go through the steps of how to install the timer flight camera and view some images from it on the pi so first thing we need to do is we need to go to the ardu cam GitHub for the timer flight camera and climb that on the Raspberry Pi and the thing I like about the VNC if you are is we can climb as we can copy and paste stuff from from Windows so I can open um I can open the browser on Windows and then and then paste into onto the pi into a terminal The Climb so we'll have a look okay now see the okay okay um okay so now what we need to do is we need to run this install dependencies file to to install the um the dependencies for the for the camera and that'll install a few things so now we've installed the Ado cam dependencies we need to do one more thing before we can get it working with python okay and that is to install opencv and the RDU cam depth Library so if we have a look at this this um file here this actually does the install of the of the opencv and the added cam depth camera as I and I don't need to install all these all I need to do is uh run this line here and then I'm all good to go okay so now we can run the python code to preview the output of the camera so let's open Sony and we'll load the the python script that came with the GitHub repository so load okay articum example Python and this one preview depth all right and now let's run it and see see what happens right so just press run okay so I point the camera at myself there we go so there's two images this one here which is the amplitude image is like a gray level image that's the intensity of the of the infrared camera and then this one here is the distance estimation so red is close and blue is far so if we click on me and it'll show at the distance I am away which is half a meter will come up on the screen and then if we click on the background the distance here is like 3.3 meters away I'm just going to take you through the demo code from the autocam GitHub repository uh the one that I just ran so first thing we do is we Define the maximum distance which is four so that's the maximum distance is four meters away and then we have this we can either have it as four or two so if we have it as four it runs at 37 and a half megahertz and if we have it as two it runs at 75 megahertz and now um oddly enough 37 and a half megahertz is 37.5 million passes per second and um so the time between each pulse is the exact time light takes to travel 8 meters so and then I guess 75 megahertz is the time between each pulse is the exact time light takes to travel four meters so it's so the maximum distance for 37.5 megahertz is four meters which is four meters there and four meters back and then we get the next pulse um so the first function we have is process frame so um the process frame takes in the depth buffer which is the okay so that the camera grabs two images basically a depth buffer and an amplitude buffer and the amplitude buffer is the amplitude of the infrared signal like the brightness of it and the depth buffer is our estimate of how far it is away and so the depth buffer is decided with the timing and the amplitude buffer is just the amplitude of the infrared signal now it the first thing it does a little bit of pre-processing on the depth buffer where it sets uh um all the Nota numbers to num on the depth buffer and the amplitude buffer here this these two lines are actually quite important so anything less than seven in the amplitude buffer is set to zero and anything greater than seven is set to 255. so it basically creates a binary image where we have 255 if the amplitude is greater than seven and zero if it's less than or equal to seven and I think the seven was probably decided in the Ado cam Labs but it's fairly important because the further something is away the the more the the signal will dissipate okay so if it's if it's if the amplitude is less than seven then we can't actually judge the distance so it's probably means that it's more than four meters away okay so what happens if something's five meters away so we fire off our pulse and it hits this thing five meters away and comes back to the sensor but before it gets to the sensor we fire off the next pulse so when this pulse hits the sensor it thinks that it's the next pass that we fired off so something that's five meters away that they the camera is going to think that's actually just one meter away because it's traveled eight meters when we fire off the next pulse so five meters there three meters back so it's now two meters away so we fire off this pulse that travels two meters and uh and whatever it's traveled gets divided in half because it has to travel there and back so this is quite important so less than seven seems to work before but if we use two we have to increase this number and I'll show you how that works so basically what it does it takes an and with the amplitude buffer which means that anything that's less than or equal to seven is set to zero and zero is the maximum distance so it's basically it inverts it what it does to the depth buffer it scales it from not to 255 and inverts it so 255 means it's zero meters away and zero means it's four meters away what happens is then this returns a gray level image where um where the brighter something is the closer it is okay so now we have this rectangle so this is like a rectangle when you press the mouse on the on the preview image um it uses the so it's minus four plus four so it's a nine by nine rectangle which and over this rectangle it takes the mean of the depth so it kind of gets rid of some of the noise that's how it reduces the noise it kind of Smooths out the spikes by taking the mean and that's done here so when you press the mouse on it it takes the mean over just that that rectangle or that square which is a nine by nine rectangle and um and then okay so I get ahead of myself so when we enter the function so we have this if name equals main this is if we run it from the command line but if we're just running it in 30 we don't need this so the first thing it does is it initializes the camera and the windows and then it runs this infinite Loop and it it gets the depth buffer which is like the depth image of the amplitude image and then it processes the image and it displays the image now and if we press Q then it actually exits the loop so let's also set the max distance to two and see what happens Okay so so that shows me there right so now I've got this rectangle this is my 9x9 rectangle and that's just above my shoulder so above my right shoulder so when I press that tells me that's about 19 27 15 centimeters away but that's about two meters and 15 centimeters away so what's happened is it's kind of taking it's it's sent out the pulse and the pulse is bouncing back but before that pulse has reached the sensor it's already sent the next pulse out so that's like two point that should be a two in front of there um so how can we get rid of that I mean this didn't happen when we did the four this didn't happen when we did the four just because um everything in this room is more than four meters away but the way we can get rid of that red behind me is a q out of there um what we can do is we can increase this so let's try 27 and see if it gets rid of that red uh behind me because um the further away the infrared is the more it dissipates so we want anything that's more than two meters away basically it is dissipate um that will dissipate so let's try 27 I mean I don't know if 27 will work this is just experimentation and it just all right so that's gotten rid of most of that red behind me see that's where it was so that's where the red was so behind me it was all red so when I put 27 in there at seams to now have turned to Blue which is basically the maximum which is two meters away so um I think that's pretty good maybe I would up it to 30 um because I'm going to process that get rid of the noise and I'm going to save these as gray levels and take them onto my onto my desktop here and process them so I've got the timer flight camera connected here to the CSI port on the Raspberry Pi and I need to bring another camera in and there's another Port up here but it's a DSi port and if I connect a camera to this port this is a video Port I'm going to fry the camera the other way to do it is to get a CSI Port multiplexer so you can get like four CSI ports and adapter but they're a bit expensive and then the other thing we can do is get a webcam which we can connect to the USB port but I don't have a webcam and of course the next alternative is to bring the image in wirelessly and so that's what I've done so I've connected an esp32 cam up so they've got the five volt I've got the sp32 cam just connected to the 5 volt and the ground of the Raspberry Pi and this five volt and ground of the Raspberry Pi is connected to the Rd cam time of flat camera and then I'll just show you I've got both of these kind of can you see that I've got both of these mounted on this piece of perf board at the front and they think about this pad board is it is it provides like the natural grid structure so I can get them aligned and in place pretty well which is something you have to do for a stereo camera you've got to make sure that they're rigidly in place next to each other horizontally so we'll fire that up and get the code running and see how that works in the next scene I've got the esp32 at the type of flight camera working together and I'm just taking a naive approach where I'm just overlaying the image from the timer flight camera onto the image from the esp32 and we'll see how that goes um all right so this is the code here and this is basically the preview code um that came with they came with the GitHub repository for the time of flight camera plus I've added in uh some code to capture the esp32 image um so here this one here sets the resolution on the esp32 camera and this sets the quality and this sets the AWB the only one I'm using is the set resolution one so when we start we just do the normal capture stuff set up the the timer flight camera this is the URL on the um esp32 Cam and I'm using a static static IP for that and that's in the sketch that I've uploaded I'll put that sketch on GitHub and I'll take you through that a bit later as well just very briefly so what we do is we capture the sketch we set our resolution to five which I think as a 480 times 640 or something like that the resolutions are up here it lists them all and the number but so I'm using yeah 640 times VGA um but I don't know if these numbers are correct air for this for this documentation uh so I'm just using five at the moment and we'll see how that works so I set the resolution and then I release the camera again and then I set the resolution again and then I capture it again and the reason I have to do this twice is because when I set the resolution it doesn't always work it doesn't kind of change till the next time I I capture the screen for some reason so I I take a frame from the esp32 and also a depth frame from the depth buffer which I process as well so now I've got my result image which is my depth image and my frame esp32 which is the frame from the image camera and I resize my result image to bring it into line with esp32 image and then I turn that to color and I I take a weighted sum so the weights are half Alpha which is the weight of the esp32 image and beta which is the weight from the from the depth image and I take a weighted sum so add weights it just takes away to some of these two images here I'll put this code up on GitHub anyway so you can go through it and I make a combined image and I display the combined image and I can save the combined image and so I can save the combined image the timer flight image and the esp32 image altogether and I'm going to save those and then bring them to my computer and use them to see how I can adjust the timer flight image so that the the I get the correct overlay because you'll see it's it's shifted a bit all right so let me um let me run that and we'll see what's going on so hopefully this will work actually let me start I at 20 so I don't overwrite images I've already created all right so this is the combined image here and as we can see the bottle is shifted a bit over to the left so that's the bottle that's the that's the depth image that's the esp32 image and that's the bottle shifted to the left right now if we bring It Forward it takes a bit of time on the esp32 and then we see it's shifted even more to the left so I'm going to have to work out a way to I guess combine those two um so now whether with a stereo image so with the stereo power image we have two images and we match the features in each in each image and we can get the distance that the object is away but now we know the distance so we can use the distance to match the to match the features in the image and I'll explain all that a bit more clearly on a piece of paper so I'm just going to explain how I overlay the image from the timer flight camera onto the image from the normal camera so if we think about a stereo camera we've got two images a left image and a right image let's say these are not to scale left right okay and we measure say we take corresponding points in these two images and we can measure how many pixels across it's moved from the left image to the right image and from this distance we can then get from the from the number of pixels moved across we can get the distance that the object is away so now the issue is matching these points so knowing that this point in the in the left image corresponds to this point in the right image and it's very easy for you and I to to do that but for a um for a computer it's a bit harder so feature matching is like the big and the big problem in um in Stereo Vision everything else is like just simple geometry so once we know this pixel distance then we can get the distance that the object is away so pixels moved from this we can get the distance all right now with the time of flight camera see this is the time of flight camera and this is the image camera we know the distance here that's a known so we already know this so we go the other direction so from the distance we then calculate the pixels mode which tells us how many pixels over we have to shift that part of the time of flight image to Overlay it onto our normal image and this is much easier because we don't have to do any feature matching so now our formula for the distance and I derived this in a previous video is equal to n over two they say over P 10 theta plus f so n is the number of pixels across so this image is n pixels across say n pixels that's the total number of pixels um DC is the distance of the two cameras apart so I'll just draw that for you so we've got two cameras one here and one here and DC is the distance apart sorry that's not to scale and then the focal length is the f is the focal length of the camera and this is two Theta so 10 Theta is like basically the the view of the camera how wide of you a viewing angle we've got from the camera so um now we know the distance and what we want is p because this tells us how far across we're moving our our time of flight image to Overlay it onto our normal image this is a pixel count because n and p are both in pixels so we can go one on and now I'm just going to neglect f for now just set it to zero just as an approximation run on distance equals 2 on n P 10 Theta on DC so p equals let's see if I've got this right let me know if I make a mistake and NDC on distance times two tan Theta all right so we can calculate now I've neglected F and tan Theta I've just kind of fudged that in my calculation so I've just kind of set it a tan for you to say that um so that for my when I shift it it looks pretty good so um so I mean the reason I had to do this is because we're using two different cameras so if we've got two cameras side by side um both of them are gonna have a different f and a different and a different theater so this is I mean this is a bit of an exaggeration it won't be this stiff friend but um the camera is the geometry of the cameras aren't going to match so this formula here assumes matching geometry so um so what I have to do I just have to fudge this so it kind of matched but a better way to do this would be to do I guess a manual calibration so we take the camera let's say and then we put an object say 10 centimeters away that's not 10 centimeters but I'm not doing this to scale and then we move at 20 centimeters 30 centimeters 40 centimeters and at each measurement we see how far we have to move the timer flight image so foreign what we end up with is we'll end up with something like this um 10 centimeters I mean we'll manually manually shift the time a flight image over say a certain number of pixels and just until until you get it right on top of the object 20 centimeters right out to let's say 400 centimeters and then we'll get you can just kind of plot points like this I guess and then you can do some interpolation so you can do like some manual calibration um like I said I've just kind of fudged this parameter here just fitted it so that it kind of fits more or less for most of the images that I've got and it seems to work okay um but because we've got different geometries for the camera and probably won't work this formula probably won't be exact um okay so we'll have a look at that now we'll have a look at the shift and I'll show you the code for that and then um and we'll do that in the next scene I'm just going to take you through the notebook that I used to develop the code to shift the time-of-flight depth camera onto the the image from the normal camera so what this does is it just shifts the the the rage ends from the time of flight camera over so that they overlay the corresponding regions on the normal camera and I've downloaded some images from the timer flight camera with the with the naive overlay that I did where I just put one on top of the other so these are the images here so this is just basically if we have a look at this it's just the time of flight image overlaid onto the esp32 image and we see that that they don't exactly line up and then this is the corresponding esp32 image and then this one here is the time-of-flight image camera and to to transfer these from the Raspberry Pi to my desktop just thought I'd show you quickly I've used this winscp program which is quite good for doing fi for doing a secure FTP so and now I've got all these images to to run my tests on I'll take you through the notebook so I've just got my display image functions that's just to have a look at the images and these these are a little bit slow so I wouldn't be using these on the Raspberry Pi and 2K means does the k-means clustering on the image gray levels where each cluster Center so it returns a clustered image and cluster centers and each cluster Center represents the distance of of that pixel away from the camera so the disks can be calculated like this but I don't return the distance and this is when I'm using four meters as the maximum distance if I was using two meters I'd have to change that to a two so then the get segmentation takes in the clustered image and the cluster set and the cluster centers and it returns three items so three lists so the first list is a list of labels and each label is is like is like the is a representative of how far away that cluster center is from the from the camera so it's not the distance but it's the gray level and then from the gray level we can calculate the distance using this formula and then the stats are the bounding box and how many pixels are in each region and then the other thing this function does is get segmentation function does is it gets rid of all all regions it ignores all regions that are less than the minimum region size which by default is set to 1000 but you can change that and then the other thing it returns a segmentation masks so the segmentation masks are are just masks where it's a Boolean arrays the same size as the image where it's true if there's a pixel there and false if there isn't so this um this and this is a list of segmentation masks so this this we can overlay onto the onto the image when it comes time to do our processing and the other thing I've done is I've ordered the labels from from least to most so it's basically the ordering is furthest away to closest at um the reason I've done that is they ordered it furthest away to closest is because if I'm doing my processing I want to do the closest one last if I'm shifting it over because otherwise the closer object is going to be occluded by the further object and I want the closest object to be at the front so if I do that last if I copy that over last and that one's going to be at the front but I'm only doing one object at the moment which is the closest object and then I read in my image so let's just run that for now so this cell loads the image and it resizes the image to bring it into line with the image from the Ado cam depth camera and then the next cell brings in the ardu cam depth camera image which is the equivalent image and it does the filtering it does a medium filter so it gets rid of some of the noise that's in there and now we run our do K means and get segmentation and we we have a look at our clustered image okay so this is our clustered image which is clustered by gray level so we can see this kind of steps back a bit and now so this tan Theta this is like my this is like the the parameter in the equation that I've that I've fudged a bit to get it to to move over a certain a certain distance so I've just adjusted this experimentally because it's going to be different for both it's a physical parameter which represents the how wide the view is but it's going to be it's going to be um different for both cameras so I just do a little bit of a fudge on it and I found that 0.51 works pretty well in this notebook Okay so and then p is the pixel distance so the distance is I've calculated it with this formula from the labels which I've returned up here which are the labels of the gray levels and then for each label I've calculated P which is how many pixels to the right I have to shift the I have to shift the segmentation mask so now I shift the segmentation mask and to shift the segmentation mask so basically I have a mask where some values are true and some values are false so where it's true I guess it's just like zeros and ones and wherever it's a one that's whether that's where the mask is and I shift that over and I could do like a for Loop where I iterate through the whole image so for I in range um rise and for giant range coals and I could do that and then at each point if I see the segmentation mask has a true value that I could just shift that over but doing this kind of for Loop especially for an image is really slow this is something you want to avoid at all costs especially when you've got a nested for Loop uh in in Python anyway in C it might be a little bit it might be a little bit better okay so I don't do that what I do is I find my shift for each mask and then on front of the Mask I add on to the left side of the mask that amount of columns and then I take and then I just take the um and then I just take the image from there all right so let me show you that actually all right so say this is my segmentation mask now what I do if I want to shift this over is I add on to the left side I guess just just a certain number of columns and that's not drawn very well and then from there I just take I just take the the left number of columns to correspond to my image size and this is much quicker than doing a loop so draw segmentation mask just kind of overlays The Mask as a semi-transparent color onto my image draw detections draws the box and annotate distance just writes the distance and now let's display it okay so this this the segmentation mask puts the purple semi-transparent mask over the bottle the annotate distance puts the 14.1 centimeters which is how far away it is and uh the drawer detections draws this draws this box around it and what I've done here is I've only drawn it for the for the closest one so basically only if I is the last one in the list because I'm doing it in order from furthest to closest all right so now I'll quickly show you the esp32 card the sp32 sketch that I've uploaded and I'll discuss a little bit of an issue that I've had with that and then I'll put all this together in in a python file and I'll run it in real time on the on the Raspberry Pi so I'll see you in the next scene this is the code that I've uploaded to the esp32 cam and I got the sketch from random nerd tutorials and I made a few adjustments so what I've done is I've put in a static IP address so these parameters here are used to configure the Wi-Fi for the static IP and I've used 181 so it's always best to use a number either in the high 100s or low 200s to avoid a DNS conflict and the other thing I've done the other thing that's in this file which is probably Worth showing you is the command Handler so the command Handler handles requests this is the command handle it handles requests from the python code so for example when the python code sends a request to change the resolution and it uses this frame size and then that's picked up by the by the command Handler so if we look here that'll that'll pick it up if it's got like frame size as the variable or quality so we can basically request to set any of these parameters from the python code and then the other thing I wanted to show you is this video capture cv2.video captcha causes the esp32 camera to lag a fair bit and I'll show you that in the next scene okay so this is the image from the esp32 this is the combined image and this is a depth image now the issue we have with this camera is there's a little bit of a lag on the esp32 okay so I put my hand there and take it away and then it gets taken away after but there's no lag on the on the depth image and um so what's causing the lag where the bottleneck is is I'll show you um it's in this video capture from the CV2 Library so what this video capture does is it is it buffers the image so it puts it in a queue so it doesn't drop any frames so when it does its processing it comes back and it gets the next frame from the queue rather than the latest frame all right so I've updated the code and I've managed to get rid of the lag problem and I updated The Code by using this this class that I found on stack overflow written by Oreck Stone so I'll take you through the code and I'll show you how I've updated it a class I've just basically copied and pasted the class at the front of the file the video capture class and then I've gone through my code and where I had here CV2 video capture I've just used video capture to use in the in the in the file without the CV2 because the video capture includes the CV2 video capture so if we look at the class that they've defined that oryx defined it includes this as a as a member CV2 video capture so then I just have to I just have to use the name video capture create that class an instance of that class I've transferred the code from the Jupiter notebook onto this dot py file and I've got this loaded in Sony and I've got the idukam time-of-flight depth camera and the esp32 cam connected to the pi so I'm going to run this and see how we do 3D segmentation and depth estimation in real time so I press the play button and there we go so this is the combined image which is the segmentation this is the image from the esp32 cam and this one here is the depth image from the timer flight camera so I'll just move that over a little bit so you can see it a bit better all right so this is giving a bounding box a segmentation mask and a distance away which is quite accurate so this is really what I wanted I just wanted to measure the closest object for now so this is just like a proof of concept you can add more objects if you want all right so now I'll move this object back and forth and we'll see that it seems to be working in real time quite well move it back okay I'll move it over here all right so that's working really nicely and I'll try and track my hand as well okay so it's tracking my hand and measuring the distance and segmenting it very well so this is exactly what I wanted but I did want to point out one thing which is if I tilt the camera up the ground or like the the desk get segmented as part of the object and sometimes if I tilt it up further it just gets segmented on its own so this is something I'm gonna have to fix before I put this on a robot and the way to fix that is to look at the gradient of the gray level so this is the gray level from the timer fly camera so if we look at the gradient of this image especially when it's on the ground I will see that actually this is the plastered image that I'm showing but if we were to look at the raw image we'd see it'd have some some kind of gray level gradient so this is another feature that I haven't used that I can still use to uh get get the properties of surfaces that are that are in the that are in the that are in the image view so I'm going to have to use this before I connect this to a robot but I'm going to end the video here this video because this is look just a approval concept video and I'll I'll put that in when I do connect this to a robot so thank you very much for watching and I'll see you in the next video bye

Info

Channel: Jonathan R

Views: 3,218

Rating: undefined out of 5

Keywords: arducam time of flight, time of flight, depth camera, 3d segmentation, esp32, esp32-cam, python, image processing, distance estimation, raspberry pi

Id: yHB2iq8TIqI

Channel Id: undefined

Length: 45min 42sec (2742 seconds)

Published: Sun Jun 18 2023