Visual navigation for wheeled autonomous robots – using Intel® RealSense™ Tracking Camera T265.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay Thank You Roy for the very nice introduction good evening everybody I'm very excited to talk tonight about our latest product this camera is not an ordinary camera it's running an algorithm on board to determine its position and orientation in space and we believe this can be a game-changer for the next generation of robots drones as well and as well in augmented and virtual reality or even autonomous driving in the following I'm going to talk about the algorithm on running onboard and the use of this camera to create an autonomous robot Roy already introduced me I'd still like to briefly introduce myself I'm Phillip Schmidt I'm a software engineer in intel realsense and working on the slam sensor fusion algorithm mostly for robotic applications before I used to work at the German Aerospace Agency also in space robots and mostly on robotic vision and control and obtained my master's degree in control engineering this is the legal disclaimer you can read it afterwards okay let's talk about the problem statement first to be able to fulfill any high-level task for example cleaning a room delivering an object from A to B one of the most basic primitives is to be able to travel from A to B and to be able to do that and on the one hand you need information about the environment on the other hand you need to determine your own post position and orientation and there are several applications for this could be any kinds of robots for example in agriculture consumer personal robots one big market are vacuum cleaners also drones or as mentioned augmented virtual reality I would like to start with this demo video so this the robot base the kabuki you might know it for example from the turtle bot and what we did is we mounted two cameras on top in front of the robot you might have also seen it in the in the other demo area these two guys below the tracking module and above t2 65 and above the depth camera D 435 and I want to point out that most of the perception is running on these two cameras so here the robot goes off explores the space can be a living room home environment it's building a map using on the one hand the depth data the point cloud transformed into a common reference frame and accumulates it over time to create this map you see here and for example the couch the two couch chairs the table you see the two legs it's going around the couch we have a path planner implemented so you could just click somewhere on your mobile device for example phone or tablet and send the robot and this is sped up up to four times another feature I want to point out is the obstacle avoidance of dynamic obstacles could be persons pets or other vehicles for example and as soon as they enter the field of view they are on the wall hand mapped as you see on the left side and the path planner replants the trajectory so then in the following I'm going to talk about how this robot was built and more about about the two cameras I like to start off with a general introduction to the slam problem then give a system overview talk more about some of the components in the tracking camera t 265 or they have two cameras I want to talk about the robot itself and its kinematics then for the integration the calibration between the cameras and the robot is one important aspect and I'd like to show two sample applications one simple one on using the tracking camera mounted on the robot two in a position controller to make the robot follow a predefined path in free space and then on top of that using the depth camera D 4:35 - - as you saw that it about obstacles and create that that dense occupancy map and I'd like to end the talk with a Q&A okay a lot of you might be familiar with the slam problem simultaneous localization and mapping it's kind of a chicken-egg problem on the one hand they're all Buddhists trying to gather information about its environment on the other hand estimating its own pose with respect to those landmarks and in the case of Bao visual inertial odometry the focuses on the pose or the motion of the robot over over time and there exists a wide range of different approaches ranging from dense too sparse depending on the number of features that I use direct using the pixel values - indirect feature descriptors of some of the on some of the image regions and since it's amazin problem iterative techniques are used such as Kalman filter or batch optimization Espanyol adjustment over a limited time horizon a limited window also people use different feature types different feature descriptors and implement additional features such as real localization meaning the robot detects when it's coming to the same to the same spot or observing the same scene that information can be used to either optimize the trajectory and map or also to make use of a map that was obtained in a previous mapping phase and in this sleigh in this slide I would like to talk about benchmarking and which is one important aspect you to the variety of different approaches I'm highlighting here two different data sets there are many more one of them is Europe for drones a stereo pair I am you and external measurements to obtain ground truth and assess the accuracy of the algorithm and the other one is Kitty aimed at autonomous driving applications was recorded in the downturn of calcio and germany and that one as well has odometry from the from the car and uses GPS as ground truth and while in literature the focus is mostly on the accuracy of the algorithm one very important aspect in practice is also the computational resources that I used especially for embedded systems and some later publication from micro 2018 is clearly showing this trade-off between CPU usage and the accuracy as is plotted below of different methods different systems and what we are trying to achieve is basically to get into the lower left corner and reduce CP CPU usage and memory footprint while maintaining the accuracy for embedded systems memory is another important aspect and for the Proctor product ization of one slam approach there are many different components that have to be taken care of here you can see some of them ranging from the left side hardware drivers to a higher level vision processing functions and back-end for example to deal with the map map optimisation and so on and this is incredibly hard to optimize that for one system to obtain the best performance and that is what we achieved with the tracking camera t 265 here you can see a system overview it uses two fisheye cameras and then I am new and what is important to highlight is that the algorithm is the whole algorithm is running on board on the Intel Nvidia smeared to vision processing unit it is very low latency low power has a small footprint and has before-mentioned features inside out tracking appearance pastry localization and mapping one feature I want to highlight for robotic applications is that we can make use of the villa Tama tree read encoder values on the host and send them to the send them to the device to make use of them in the in the sensor fusion and here you see the sample output basically out of the box plug in the camera into your computer it uses the realsense sdk which is on github an open source project works cross-platform and this is one of the tools the realsense fewer the same one that is also used for the depth cameras and what you can see here is on the one hand the different sense of streams you see the left and the right immature what i point out is the very wide field of view it's almost one 180 degrees those are fish eye and you can see how the lines bend towards the periphery you see there you get a new information there as a raw meter which is measuring gravity in the static case here and you see the Chara information which is coming with 200 Hertz and that is the same rate that we are outputting the poses at which you see here on the right the output of the algorithm and you can see this even better in the 3d view so what you see here is the camera is picked up moved on an arc or for example that could be on a robot or a drone as well and it's coming back to the same spot to assess its accuracy in terms of loop closure you can get an idea of the scale by the baseline between the two images with which is 6 centimeters so it is traveling a distance of around let's say 1 meter and coming back to the same spot very accurately also what you see here is the trajectory in green which reflects the confidence level and it is at high confidence right now in the device or the algorithm can also output medium and low confidence talking about the algorithm inside I mentioned it is vio following the nomenclature from the second slide it is a sparse Kalman filtering approach and I mentioned the poses are output with 200 Hertz and very low latency that makes it usable for applications such as virtual reality augmented reality also for the control of robots with fast dynamics such as drones and in our testing the accuracy the loop closure error was always below 1% of the path length I also want to highlight the appearance base to localization feature that means it solves the kidnapped robot problem if the camera is occluded open again the robot detects that it is in the same space can make use of a map for space it was map previously as well and what is he up here is again the trajectory and you see some of the of the landmarks some of the features we are tracking it is the sparse and can work with only very little features only little texture and this is the API it is part of the realsense sdk as i mentioned these are the functions that apply to the tracking module so as you see in the second one this the post frame that is the output of the algorithm and it is implement as callback and can be read with the 200 Hertz and used for control or other applications I mentioned this coming together with a confidence level and also higher order estimates such as the estimated velocity velocity twist and the estimated acceleration which can be very useful for the control we can also access the raw sensor streams as seen in the fewer the video frame as a roman HR frame which come as a callback as well the next two functions apply mostly to build robots I will talk a bit more about the calibration that has to be provided between the camera and the robot and the data can be fed through and sent to the device the last part is about realization which can be enabled disabled maps can be loaded and saved and we also get callbacks for each real localization event that will notify the user and the user can decide what to do with the real localization and please go to github you will see all of this there you can see the API and examples on the usage and here's a minimal example you can see it only takes a few lines of code to to get post information from the device the idea behind this is that a class is implemented that inherits from the manager as well as the device and implements or listens to those callbacks such as when a device is attached here in this case it is started in the default configuration it can be configured depending on which stream should be enabled if we localizations are being able to civil and so on and for each post frame here the information is just print of the screen can be used in the controller for example or for higher level fusion or occupancy mapping here you can also directly access the estimated velocity and estimated acceleration and the same holds for the sense of streams you saw it before with your a seroma touch error callbacks for a more complete example again please go to github in this Lib TMU till you will see more of the functionality such as loading saving of maps and and usage of the Vino Dhamma tree so the prior slide this points directly to the realsense sdk repository which is called lip read sense on github and this one here points specifically to one sample most of that functionality regarding the tracking device at least the low-level api is integrated in the third-party underlip TM okay and i would like to talk more about the robot itself we already talked about the tracking camera the t 265 i'd like to talk about the other components the kabuki robot base and the depth camera at the 435 mounted on top and be using an internal look to control the robot this is a system their diagram on how these components are interconnected all of them are connected by a USB and rereceive depth images directly from the depth camera D 4:35 we receive pauses audiometry from the tracking camera and together those are fused to build the occupancy map as you saw in the video this is then used as input for the path planner and ultimately the motion control which sends velocity commands to the base I also mentioned the use of the wheel encoder information which is fed through here and can be used in the fusion and in the following I'm going to talk about the different components in more detail we already covered the T 265 this about the depth cameras we talked about it briefly these are the two two latest products that's the 4:15 and the four 4:35 both of them are stereo systems active in the sense that they have an IR protector to obtain a good fill rate and they have RGB on the side what is the difference between them that 4:35 has a wider field of view and for stereo systems there's a trade-off between field of view and the accuracy as can be seen below so for this project the 435 was chosen for its wide field of view to be able to detect obstacles when moving around corners for example and for other projects you would probably such as reconstruction or scanning you would probably prefer the accuracy from the 415 and there's a more detailed comparison under this URL with the table some other differences are for example the 415 has the rolling shutter 4:35 as global shutter and there's another addition that has an IMU the 435i and this is the sample output again using the same tool from realsense sdk the real sense fewer what you saw in the first image was on the one hand there are chibi on the other hand depth map color coded from clothes points in blue too far away points in red and these two can be fused to obtain this colored point cloud that you see here in 3d I'd like to talk about the robot itself it's a differential Drive with a cast reveal and we are assuming point contacts to arrive at these simple relations what I want to highlight here on the one hand are the intrinsic parameters of the odometry calibration which is the bill radius as well as the distance between the two wheels are basically the baseline and with this relation one arrives at a unicycle model that can be used for the control despite the simple kinematic model it only has three States moving in the plane and two inputs yet it's a non linear model and non holonomic meaning to steer the robot to a desired target configuration one basically has to has to solve the parking problem or simply speaking the robot cannot move sidewards and I will talk a bit more about the control on one of the next slides the other important aspect for making use of the Villa dama tree is it's the calibration between the camera and the robot obtaining the relative transformation we briefly saw another another hand eye calibration before this is also this classic and I calibration problem in the sense that it can be obtained from relative poses but what is different here and why common methods cannot be used is that the robot only moves in the plane and only rotates around one axis which makes some of the degrees of freedom unobservable so I wanted to highlight this work from Iker 2012 which proposed a method to solve this problem and in practice what can be done is either using the the cut model from the design or mount the camera in an old configuration for example parallel to the ground and measuring the translation with centimeter accuracy and this code later on be refined from the post output then I would like to talk about the first application on using T 265 martyred on the robot to follow a predefined path in free space for that a simple position controller can be implemented the only difficulty here is that the Rachel dynamics are nonlinear and non holonomic but I want to highlight one approach here that I found appealing which is applying a coordinate transformation basically moving visually moving the point of interest away from the center between the wheels and by this making its position controllable not only it can move forward and backward but also to the sides using the rotation of the robot and by applying this transformation to the output as well as choosing new inputs mu-1 in you to this robot can be basically controlled as a point mass and you can see it here in the video that was implemented on the robot it is the same robot it only has 30 to 65 mounted in front and moving in the same environment following a predefined path so in that case we just programmed it to follow a rectangle of of 6 by 4 meters and you can use the floor the types of the floors visual reference each of them have half a meter size and you see how it is all following that line turning after six meters passing behind the table and then ultimately coming back in total it is doing it is doing two loops and coming back to the same spot that we can assess the accuracy by the loop closure error it is coming back to the same spot after travelling a distance of 48 meters and is coming back to the same spot with below 50 15 centimeter which is less than 0.5 percent of the path length then for the next application in case of an unstructured space that has possibly obstacles or even dynamic obstacles we added the depth camera D 4:35 on top and to create an occupancy map and the idea I mentioned it briefly before is to transform the point cloud into the common reference frame using T 265 pauses and accumulating the accumulating their points over time and internally this is a 3d representation only in the very end we cut we slice the 3d map at the height of interest of the robot to obtain this 2d occupancy map and the different fields here the meaning is they have a probability between free and occupied 0 to 100 and this can be in the next step used for the path planning and I want to show some sample input that is the data coming from both cameras we see in board the word reference frame here the post of the tracking module and the point cloud and this experiment was actually done to to assess the time synchronization between the two and making sure that none of the objects move during or blur during fast motions especially rotations so you can clearly see the different objects you see there you see the wall and the chair and and this is the data that was used as input for the occupancy mapping the output is displayed here that was generated from the same emotion same trajectory and all of this was implemented in Ross their roster wrappers around around all our cameras plus what we added on top here was our occupancy mapping algorithm that I explained briefly before and for that we also created a Ross package some of you might recognize Arvest here and with this I would like to wrap up to give a short summary I demonstrated some of the main components to build an autonomous robot I want to highlight that most of the perception is running on the realsense cameras onboard we talked about different slam approaches and the importance of efficient slam and that it is hard to productize it you saw the different components and that T 265 actually solves this problem and can be can be used out of the box it is very lightweight someone has a small footprint and low power consumption for robotic applications applications we can make use of the Villa dama tree which increase the robustness and in combination the two cameras can be used for occupancy mapping and obstacle avoidance as well for more information please go to our website thank you [Applause] okay and are there any questions yesterday I think for the talk I have few questions first one is that the most open valve chimera you have the baseline would be between 250 millimeter to the 60 millimeters why not hypo a larger baseline like 180 millimeters and the second question is that in attracting API you library I should give us a row in this data does that or do think you would I am you so that we can draw the external state as my witness and last question is why not have one camera give you a depth information and also the information why not a white wine you have a two cameras okay I'm maybe Roy can answer the first one about wider baselines I want to answer the third one right now and part of that is what we already saw that is there is a trade-off between field of view and depth accuracy so there are competing design requirements for the two solutions for the tracking camera if you want a very wide field of view to obtain good robustness while for the depth cameras maybe you want a slightly smaller field of view or a small effect of you to obtain better death accuracy and maybe a Roy can answer the first question about the baselines innovations based on feedback from our generation cameras we found most of our customers are is rated at 100 meters so we total this now we are seeing invest in science bottom longer distances so we are evaluating to see whether we do we should okay and the second question I'm not sure if I understood correctly please correct me if I'm wrong I think it was a treat they think the one IMU whatever there's a heart other triggered to have what exact nice time so what I mean is I need you give it the trigger to to the camera so camera actually that shutter what I need you so that you can a good pineys yeah right stop this yeah yes it does and the time synchronization is a very important aspect as you mentioned and yes also running on the same clock and giving hardware time stamps the next one you mentioned what did you yes yes that's a very good question so the way the fusion works we are fusing a velocity measurement so that can be used to also fuse different different sensor streams or relative pulses and we're using a translational velocity right now and one is to provide apart from the calibration the extrinsic s-- one is to provide measurement covariance for that for that measurement just answer your question you can get a confidence level description from innocent yes it goes from low medium high from low medium to high as you saw in the fewer the tracking cameras use the most of the data that you're using is pretty high on you to do with the cameras be using the images to do on the one hand visual tracking frame to frame on the other hand they're important source for localization which is based on appearance or we don't make any assumption on geometry yes we are using the stereo information and that is to improve or or better say not to have scale drift over time this is just a quick question about 260 265 is it using just a parallax information to obtain a depth of objects the day in front of it or it uses something else like light images so it is it is using their firm yeah for 35 so for the dense step we used the depth camera D for 35 in the you right for the tracking camera we also have a stereo system we are also able to estimate the depth or the distance of points but in that case I mentioned we using a sparse algorithm so it's only very few points this is dr. slay also by also by integrating their IMU over time their sera meter we get another another estimate of the scale which is somewhat redundant because we know it also works in the monocular case but with the stereo system we obtain better performance and robustness accuracy and robustness yes that's a good question it does build an internal map we refer to it more as a realization database because it is only used internally the map is not exposed to listen it is not dense so can't be used for things like obstacle avoidance right now yes yes sorry one second I'm still trying to hope how big can the map here I would have to double check but if it's in the order of a few or several megabyte and sorry what was the other part that was invariance to lighting or [Music] my condition changes if I lie at the light somewhere does it throw off the map or they still continue working so one on one light doesn't throw off the map in internal talking about lighting it works we test it down to 15 lakhs but about the invariance itself that's more difficult to answer the camera we mostly saw in the testing that it improves the robustness using the encoder you already get a very good scale estimate from the stereo and yes it improves the robustness the calibration itself between the two is an important step as well as of course the calibration of the device which is factory calibrated and validated the D 265 was the end-to-end power consumption you said it was low-power yes it's it's around 1.5 watt 1.5 today maybe I can answer that so do these cameras are they still open for us to put like some more algorithms inside them or I mean like is that addicted even open and is getting enough CP available to do it so in this case of the T 265 right not now you can't program the mirror to yourself and we are evaluating that this one okay a stereo camera how accurate can determine the depth up to like the centimeters or millimeters yes so I had two numbers there for the four for 35 it was up to 30 millimeters at 3 meter and for the for 15 it was actually up to 15 millimeter at 3 meter yeah I have two questions about calibration so yes yes they have been calibrated with respect to each other as well using all five five millimeters so we used the two IR and their RTP from the depth camera and use the two fisheye and solve that calibration problem I don't know if I can if I can answer that definitely too much mechanical stress can degrade the calibration furniture if I can recommend opening the so the d4 1b 435 and the d4 1 they are both stereo cameras with passive matching but with a steric with a laser projector that's actively assisting the stereo match or text of us are dark scenes now the nice thing about a stereo camera is that you don't have interference between the two cameras so if you overlap the viewpoints the viewing frustum of two of these cameras and the projectors actually assist one another so you can have they are debt from actually having multiple cameras looking at the same scene and whereas if you're looking at structure by or time of light then typically those type of camera ones you do with okay folks let's go for the last question and if you really need to help to fill up please stay on and you can help them yes so these devices also used your drug product and so you do hype determination as well be testing it on drones as well and it did a pretty good job the Icarus II was comparable to GPS in a cube of 100 meters we went up to 100 meters definitely as you get further away the accuracy decrease or let's say the uncertainty increases when all the features are far away but it can be used for height estimation above ground [Applause]
Info
Channel: Intel RealSense
Views: 34,978
Rating: undefined out of 5
Keywords: visual navigation, robotics, SLAM, Intel RealSense, Intel, autonomous robots, depth camera, tracking, tracking for robotics
Id: 62vm0_RZ1nU
Channel Id: undefined
Length: 43min 59sec (2639 seconds)
Published: Tue Mar 12 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.