Particle Filter and Monte Carlo Localization (Cyrill Stachniss)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone i want to talk today about localization and a special form of localization the so-called monte carlo localization which is an approach using a particle filter in order to estimate the location and orientation of a mobile platform in the environment so again localization is a task of estimating where we are in the world so where is the system where it is looking to and we again are looking into a special variant of this localization problem where we are not restricting ourselves to parametric probability distributions so if you look into localization using a kalman filter and extended kalman filter for example you will be restricted to gaussian distributions and that's an assumption that we want to relax today and talk about an alternative way for representing the probability distributions and realizing a base filter in order to estimate the position of the mobile platform so we want to answer the question where am i or where is the robot and we typically assume that we have a map of the environment again we have this underlying map here like an occupancy grid map and want to estimate where we are if you would use a kalman filter or extended kalman filter this would result in a gaussian distribution describing where the system is what we want to do today we want to use particles or so-called samples or post hypothesis here indicated by those red dots which describe me where the system could be so instead of having one parametric form we use non-parametric samples as hypotheses where the system might be located and that's already the key idea of the particle filter so getting rid of the parametric distribution turning it into sample based representation realizing a base filter where the particle filter is a form of a base filter and then realize the localization system with that so answering this question is the task of today how do we do this what do we need to take into account and what are the principles behind it so the gaussian filters such as the extended kalman filter or the kalman filter will use a gaussian distribution so a function of that form in order to estimate the states under consideration so in our case that would be in 2d the x y location and the orientation and in 60 x y that your petrol for example and we would or the external kalman filter would use a gaussian to describe this three or six dimensional vector we want to kind of leave the gaussian gold behind today and look for an alternative way for representing this probability distribution and we want to be as flexible as possible so i want to estimate or represent arbitrary distribution so consider we have a function which looks like this that means we have a kind of high peak over here but also some kind of plateau of a probability distribution over here and that may be a distribution that could occur in the real world and the question is how can we represent it how can we describe a function of that form and at the same point in time do efficient state estimation with this and that's what we want to talk about here today the key idea is that an arbitrary function like this function over here can be approximated with samples so it means dots which are sitting here at the x coordinates let's say samples that look like this so these dots every dot here is a sample and we can see that in some areas the samples are more densely located in other areas they are less densely located and that's already the idea the amount of particles per area per unit area describes me how likely it is that the system is there so every point see every particle is accumulating a little bit of probability mass and then i'm basically counting how many samples fall into a certain area so that the particle can be seen as an approximation of a probability density function so again as a reminder and the probability density function you would need to integrate over a certain area in order to obtain the probability math that the system is within that area it's exactly the same thing here we do with samples we have samples distributed over the state space every sample contains a little bit of or represents a little bit of probability mass and we simply need to count how many samples fall into a certain region in order to assess the probability that the system is in that region so that means in areas where the function or density function is high we have more samples an area where the values are low we have a smaller number of samples represented that's already the idea of representing a function or a pdf using samples we can go a step further and say maybe we can reduce the number of samples that we need if we weight all samples so if every sample has a weight associated to that and the larger the weight the more probability mass is located in that region so we have to of course have to take here that the weight will sum up to one but then if in this case it would be uniform weight one divided by j if j is the number of samples for example so every sample here would have the same uniform weight j if we however make these samples which are located here larger so we extend them and give them a higher weight we may require less samples to represent that function in an appropriate way so that could look like this in this case we have weighted samples the size of the um of the circle here should indicate the weight of the sample so in areas with high probability we have larger samples in an area with low probabilities we have lower samples so we can see this as representing areas of high probability with either larger samples instead of a larger number of samples so it's basically the frequency versus weights so this is a weight representation where high areas of high probability are represented with weight and this is a frequency representation so to say where we say the frequency actually matters how many samples we have there but there's a it's a very similar thing with it will come back later on the so-called resampling step of the particle filter but we are mainly using here into weighted samples so the idea is to use a set of weighted samples to represent the probability distribution so if you want to represent this pdf about let's say the x location of a one-dimensional robot then we could represent it with this set of samples of course this is only an approximation because you can see um if you would uh see this as kind of small drag functions sitting here over those weighted samples and the the size of the or the the size of the dirac function is given by the by the weight of that sample or the integral over the dirac function to be precise then this is not a very good approximation of this function because it will be zero in this region high values over here then zero high values zero high values and so on so it's clear that we need to have a sufficient number of samples in order to represent functions well and should be also become directly clear that this is only an approximation of a function so the samples are way for representing function as an approximation and that's what we're going to use we will use those weighted samples and to estimate the belief about the location position and orientation of the robot in an environment okay let's look how that sample set looks like or a particle set looks like we use a set of weighted samples or particle set and every sample has two variables first is the state hypothesis so where does this sample think it is located and this can be a vector one dimension null in the example before or if we are in the two dimensional world we may represent x y and the orientation so then this would be a three dimensional vector so this guy can be a vector over here whereas the second variable the weight is just a real number and the all the weights need to sum up to one in order to have a proper probability distribution as the integral of our probability distribution must be one so and we have j of those samples so we have j hypotheses where the system is and every hypothesis can have a weight and the larger the weight the higher the probability that the system is actually located there and then this sample set represents a belief or a posterior given by the sum of all samples the weight of that samples multiplied with the dirac function centered in the location of the sample that means this is a weighted sum of direct impulses and at every sample location there's one direct impulse so if i integrate over this over this function so integrate over the variable x the integral of a direct function is one if our weights are normalized to one this will actually give me a probability distribution with an integral of one so a proper probability distribution um and this is the distribution actually represents um so it should be clear again this is an approximation we have those dirac impulses here and in order to cover or represent a smooth probability function well we may need a large number of samples and that's already one of the drawbacks of the particle filter that we may need to maintain a large number of those samples depending how complex the function is that i want to represent but the goal is to restrict ourselves to this parametric representation of samples and not restricting the belief for example to a gaussian distribution or a different parametric form and that's kind of the key idea of the particle filter that we will use for pose estimation in this lecture okay so first have a look how particle approximations of functions may look like so in this case we have a gaussian distribution for example this red function over here then this is a set of i think 200 or 300 samples drawn from that distribution that means an area where the probability mass is slower or the density is lower we have a smaller number of samples and around the mean there's a large number of samples but we can also use this in for functions which are non-gaussian so this example over here is clearly not a gaussian distribution we have more higher density values here and over there to represent the fact that it's more likely to be in a state around that area and is less likely to be outside so this is a sample based representation of this function for the red and the blue function so the question that we obviously have if we start with initial belief how do we actually generate samples how do we obtain those guys over here how does it work so the key question is how do we obtain samples given a function and the problematic thing in here is that this can be quite tricky so we cannot easily on closed form sample from arbitrary functions there's only a small number of functions from which we can sample in closed form or sample efficiency lists let's restrict this to that so if we have a uniform distribution that means every state has the same probability then it's just our uniform random number generator that we have in our computer if we have a gaussian distribution um not every number occurs at the same probability so i must be closer to the mean to increase the probability that this sample has actually been generated there are rather simple ways for sampling random number from a gaussian distribution again this is an approximation and what you can do is you can just take 12 random numbers and draw values between minus the standard deviation and plus the standard deviation that you want to achieve so a uniformly drawn number between minus sigma and plus sigma and you do this 12 times you add those numbers up and divide the value the resulting value by 2. then you will get samples which are actually drawn approximately from a gaussian distribution so you can see from this example the maximum value that that you could draw is sigma so if you draw a sigma 12 times you would have 12 sigma divided by 2 is 6 sigma so all the random numbers we can generate lie between plus and minus six sigma in this example which is covers already a large um area of the probability mass of the gaussian distribution but it's not exact so this isn't probably the most simplest way to draw numbers from um from a gaussian or from a gaussian distribution but the question is how can we do this from other distributions and there are ways for doing this also for for arbitrary distributions so-called rejection sampling is one of those techniques we don't want to really dive into the rejection sampling approach here because this is something that we typically can't use in the particle filter set up because it's a very inefficient sampling technique so what we want to look instead of this is the so-called important sampling principle so what the important sampling principle actually tells us is that we can use a different distribution to generate the samples compared to the function from which i actually want to generate the samples the only thing i need to compensate for the mistakes that i have done so let's go to this example over here let's assume we want to generate samples from this blue curve over here this is what we call our target distribution we want to estimate our samples from our target f but the only thing i can do i can generate samples from the red curve which is my proposal function often called pi so i assume i can generate samples from pi but i cannot generate samples from my target f but what the important sampling principle tells me is that hey you can use your red curve your proposal and draw samples from that red curve and get samples that represent the blue curve if you compensate for the mistakes that you have done and mistake means here the difference in function value between the bread curve and the blue curve at a particular function value so what i need to be able to do for this i need to be able to draw samples from my proposal distribution so from the red curve and i need to be able to pointwise evaluate my target and my proposal distribution so evaluate the function value at for f and pi but what i don't need to be able to do i don't need to be able to to actually be able to generate functions from f from my blue curve and what the important sampling principle tells me is that i need to compute a weight w for every sample that i've generated from the red curve and give it a weight which is the target divided by the proposal so the function value so the value of the blue curve divided by the value of the red curve evaluated at the position x the only thing i need to take care of is then where for all locations where the the target is larger than zero so the blue curve is larger than zero also the proposal the red curve must be larger than zero why is this the case if there would be a place where the the target is larger than zero but the proposal would be zero there's a zero probability of generating a sample so i could never ever generate the sample for that so we'll not be able to approximate f well you can also see for the weight computation if my target would be zero at a location where f would be larger than zero um i would have a a number larger than zero divided by a number of zero which gives me an undefined weight on weight approaching infinity that's not something that we want to have but with this approach so of drawing samples from the blue curve that means getting locations on the x-axis here where the location is determined by the red curve and then compensate for the arrow that i haven't drawn from the blue curve by giving them a weight and the larger this bar over here is the higher the weight so in areas where i've drawn little samples but where the function value of the target is high i will have higher weighted samples and in other areas where this is not the case the weights will be smaller and so the the the important thing of the important sampling principle is we can use a function that we can draw samples from for example the gaussian in order to generate samples from a distribution which is not a gaussian way i cannot sample from in closed form and this principle is something that is used inside the particle filter in order to perform state estimation so the particle filter is a state estimation system for dynamic systems or systems that change their state over time and the particle filter actually implements a recursive base filter so the base filter that we have derived in one of the previous lectures this principle of having a prediction step in a correction step and performing proper state estimation compared to the kalman filter for example it is a non-prometric approach that means it can deal with non-gaussian distributions for example because we don't commit to specific distributions as we are using samples as illustrated before in order to represent our probability functions um what we another advantage is which is not really written down here is that we are have better chances for dealing with non-linear models so if you remember the standard kalman filter required linear models for the motion and for the observation so for the prediction and for the correction step and that's something that we do not need here for the particle filter we can better deal with non-linear functions and the only thing we need to be able to do is actually generate samples from that or come up with proposal distributions in order to provide a proper samples and it also has a prediction step and a correction step the prediction steps takes the motion into account the correction step takes the observation typically into account and this allows us to realize a base filter or under this assumption that we're using the motion model for the prediction and the the observation model for our correction step this the particle filter actually realizes a base filter as we have defined it before one note of course the number of samples that you're using determines how good your approximation quality is so the larger the number of samples the better you can approximate or represent your probability distributions so the better your performance in terms of estimation will be and this brings us to the particle filter algorithm at least in an informal form shown over here which basically has three steps the first step is a prediction step and the second step is the correction step and we forget for the about the third step for a moment and what the first step does it actually draws sample from my proposal distribution so it generates a new sample set drawing samples from the proposal so where could the system be in i'm taking my red curve to draw samples from the red curve although i want to generate samples from the blue curve and then in the second step we compensate for this mistake through the application of the important sampling principle saying we're taking the target distribution so the blue curve and divided by the proposal from which we have actually generated the samples in order to compensate for the mismatch between the target and the proposal distribution speaking in terms of a base filter we can say if the proposal step is my prediction step for example the motion of my platform then this sampling step corresponds to the prediction step of the base filter and in this case the weight which is computed as a target divided by the proposal and if i write down the target write down the proposal and do some math it will end up that this will be our observation model that gives me the weight and this is the correction step taking into account the observation in order to compensate for the differences so um the the analogy between the particle filter and the base filter comes from using the prediction step and the prediction or the the motion model and the observation model for the sampling step for the proposal and then for the weight computation but it's all based on the important sampling principle and automatically comes out of this if we put in the right proposal it should be noted over here that we have the user can define this proposal distribution so it's up to us as long we're in line with the important sampling principle to choose this so we are allowed to choose the motion model for example of of a mobile robot performing localization and describing the motion of the platform we allowed to do that because this is a user-defined choice but this is a user-defined choice and then the weight is not a user-defined choice anymore it results from how we select the proposal the third step is a little bit of a tricky step which as maybe it's not intuitive in the beginning it's a so-called resampling step what this step does it basically draws samples from our sample set from our weighted sample set with replacement it does it in a way that we draw a sample i with probability w of i so the probability that we draw the sample i is the weight of that sample and we repeat this process j times so that means we have our sample set we draw a sample randomly from that set and the probability of drawing that sample is proportional to its weight so the higher the weight the higher the probability that we draw it then we look that sample we make a copy of that sample let's say put it to another bucket and put the original sample back to the original set we repeat this process j times so we get j new samples in our resampled bucket and the weights in this resembled bucket we all set to 1 divided by j so they have all uniform weights so what we are doing here is we are basically replacing weights by frequencies so we're taking the weighted samples and turning them into uniformly weighted samples by having a higher probability of drawing samples with a higher weight in this form replicating them duplicating them and this generates me new samples that would represent still the same distribution but replaces the weights by frequencies and i do that because i typically have only a finite number of samples if i'm implementing a particle filter if i have a finite number of samples for representing particles that means at some point in time there will be samples which let's say go into areas of my state space which have a very very low probability maybe even a probability of zero so these particles with a very low weight do not contribute a lot to approximating the prob the probability distribution well so they have a very very small contribution to representing the overall probability mass so it is better to take that sample out eliminate it and replace it with a sample that is in an area of high probability in this way we are taking our memory in a more in a smarter way by saying we are using our memory for areas of high probability and that's basically what the resampling step does so if you would have a infinite amount of memory we could completely ignore that step but in most cases on all real computers we only have a finite amount of memory for representing our samples and therefore this resampling step makes sense because it kind of shifts the particles into areas of high probability without changing the distribution itself so that was kind of the informal particle filter algorithm with its three steps we can only look to the more formal description of an algorithm which would look like this so the particle filter which has the sample set representing the belief of my system at time t minus 1 my current motion command or in my current observation and then what i'm doing is i have my sample set so my predicted sample set is my in my new sample set are equal to the empty set so i start with kind of two empty buckets one i will use for the prediction and one i will use for the correction and then i'm sampling samples from my proposal distribution and again this pi is your defined choice so what's a hypothesis where the state can be in and this of course depends on my previous belief so this previous belief will be sitting somewhere in this proposal distribution and in most cases also the the motion or the the motion command will be sitting here because it will tell us how the system will evolve from the previous point in time to the next point in time and then the next step for that sample is the waiting step taking my target distribution divided by my proposal distribution and here typically have to sit down and do some math and compute write down how that actually looks like and in most of the cases this will lead at least if you follow the using the motion model over here for my proposal will lead to an observation model a weight that is proportional to an observation model in order to write that down so i can compute my weight and then i add my weighted sample set as my new temporary sample set x bar over here and i'm just repeating this and adding this so after this step here if i went one through went through the whole loop i basically do my um my prediction step and my correction step at the same point in time so we'll have a sample set x bar which is my temporary sample set so it's my it's not really the predicted samples that i said before it's my temporary sample set let's better call it like this which contains all the samples drawn and re-weighted and then the last part over here is my resampling step so i'm repeating a process j times i'm drawing j samples with the probability proportional to the weight that i've just computed and add that drawn sample without a weight to my new sample setter with uniform weight to my new sample set so as a result of this i have at that point in time a sample set x t which represents my belief at time t you're having uniform weight and having replaced the weights by frequencies and then i'm repeating this process for the next observation next control command coming in and this is the general particle filter algorithm where we have the user have to define the proposal distribution and then derive how target divided by proposed looks like in order to compute the weight in an appropriate way if he would be able to draw from the target distribution itself that would be trivial we would just need to draw those samples there would be no weight and then also i can skip the resampling step because then all samples would have the same weight so if we could draw directly from our target the whole steps wouldn't be needed we could just represent our target distribution with samples and practice however that is often the case we cannot sample from our target we can only sample from a proposal which covers typically parts of the target distribution and then need to be able to compute the weight in order to account for the discrepancy the key thing is that for the target i only need to be able to evaluate point wise so add a sample location and then no need to be able to draw samples from this and evaluating a function so computing f of x is easier than sampling from a function and therefore this is a promising from a prominent and effective algorithm for performing state estimation so far i mainly looked into the general particle filter algorithm and introduced it in let's say in informal way using the important sampling principles and relating this to relating them to the idea of the recursive base filter the next thing we want to do is want to look into monte carlo localization with monte carlo localization we refer to as estimating the position and orientation of a platform using a particle filter so it's robot localization using a particle filter and this is something that we typically call monte carlo localization so in our 1d examples we have seen this plot a few times if we have a robot which sits over here and it has a sensor which only tells them um can only take doors and tells the robot you're in front of a door when you're in front of a door the robot drives around so i've seen this plot a few times then you will end up with a belief hopefully that concentrates a large number of samples over here where the system actually is there are some remaining samples over here and over here which could be explained by um having a door observation over here and then the second door observation was a false observation so some of the samples survived here but the majority of probability mass actually sits over here also in this example that i've shown before where am i generating those samples that's the question i want to answer is monte carlo localization generating those sample sets but also then maintaining them and updating them based on motion and based on observations that we are obtaining just an example how that could look like so what you see here is a belief where all the particles are spread all over the place and this is kind of a uniform distribution randomly sampled over the space where the system can be in and then if the robot drives around obtains observations generates motion commands we can start to focus our beliefs so the particle cloud will evolve some particles will die out when they can't be well explained by the observation and the probability mass concentrates in several places and this is because environment is at least to some degree symmetric and so you will have multiple hypotheses that may remain again something that you cannot do well with the gaussian distribution also this cannot be represented well with the gaussian at some point in time your system will be will converge and you will actually end up with a belief that is similar to a gaussian distribution so just to come back and make this point very clear what the particle filter does it takes a sample as a pose hypothesis and every hypothesis says that's where the platform is and then it has a weight kind of a term of fitness or how much we can trust that particle which tells us how likely that is and by rep by not having one hypothesis where but a large number one thousand ten thousand hundred thousand samples we actually obtain a belief a set of possibilities where we are and this set of possibilities describes my probability distribution but every particle itself says i did the right job i know where i am i have my status domain x and i have a certain weight associated to myself so in the viewer article it did the right job and everyone else was wrong and only through using a lot of those hypotheses we actually get our belief that's kind of an important thing that is worth considering when you think about that particle filter so what we don't want to call localization is we say every particle represents the both hypotheses so if you're living in a 2d world this would be an x coordinate a y coordinate and an orientation angle theta representing the yaw of that platform or if it would be in the 3d world this could be x y and that your patrol or maybe i have a car driving an environment then maybe just xy that location and maybe the your orientation because the rolling pitch angles may be always very small and may not be want to estimate that in a particle filter so the dimension that i need to estimate defines the state vector of every particle so the dimensionality of that sample and then in monte carlo localization we use our motion model as our prediction step so we say where have we been at the previous point in time so this is information which comes from the recursive belief we use our odometry command or motion command and estimate where we are and this is what i used to generate new samples so if i'm a sample for example within a particle filter or in a sample set of types t minus one and the motion command send you have driven a meter forward what i would do here i would take my position and move it a meter forward or at least approximately a meter forward why approximatively because i'm sampling from this distribution and if my motion has noise associated to that which is always the case in reality i will not maybe move the particle one meter forward but maybe 99 centimeters or meter two or a meter three so for every sample we draw from this distribution we don't take the maximum value we draw from this distribution so that some samples move forward one meter some sample moves forward 80 centimeters some temple moves forward one meter 20 and so on and so forth so we are generating a new set where we are increasing the uncertainty we are spreading out the particles in the world about the predicted belief because we add uncertainty to the world if we only move through the environment without observing it and the second step and this is a result from this choice so this is a user defined choice that we do monte carlo localization and as a result from this if you sit down with the mass right on the target distribution write down the proposal do some math you will get a term which is proportional to my observation model so what's the likelihood of obtaining an observation given we know where we are and given we have the map of the environment and we only need that proportionally because later on we anyway we'll normalize our weights to one and therefore this i only need to know it up to a normalization constant and this results from this design choice if you were to choose a different model over here then would end up with a different weight term over here but if we choose the motion model of the proposal we will end up with the observation model as our as our weight for every sample and again so what we're doing for every sample we generate this pose we generate a new hypothesis by moving it forward and then in the second step in the waiting step we are evaluating how likely it is to obtain the observation that we actually got assuming we have been in that place of that particle and here some beauty comes into the game a particle is a single post hypothesis without uncertainty so i can really evaluate my model saying given i'm exactly here how likely is to obtain my observation and if you think back this was exactly how we computed our observation model in one of the previous lectures and so by having this given state so this given x t we can just use the particle position itself because the particle believes it did the right job so it is given we can very easily compute this observation model because we assume we know where we are we know what the world looks like and then we get exactly the likelihood of obtaining the observation that we had before so it's the evaluation of my observation model and as we have explained it in the previous lectures so turning the particle filter the particle filter algorithm into a particle filter for localization so mcl the only thing we need to change is are those two lines so we are defining our proposal distribution as the motion model and this then results in being the observation model in our waiting step and what we also need to take into account it's not really written in here that the dimensionality of this xt so the this of the sample itself is the x y um x y your or xy that your pitch role position of my platform in the environment okay and that's it the rest stays exactly the same standard out of the box particle filter algorithm i only plugged in my motion model and my observation model in here so as an illustration how does it look like so consider we start with our mcl example over here so consider we have a robot somewhere in the world standing currently in front of the door and this is our uniform distribution our random sample set that we have initialized in order to represent our state in this kind of one-dimensional world so if you then perform the correction step so this is our observation model so we get a return a peak in front of every door because the sensor said robot you're in front of a door again there's a certain probability that this may be wrong therefore we have non-zero probabilities here but it's a higher probability that we're in front of that door so the waiting step will then increase the the weight of those samples which are at that locations so these samples would get very very high weights and the other samples will get low weights so this is my new weighted sample set representation okay so i'm just taking this over here copy it up and do the next step so this is just copy paste from the previous slide the next thing i do i do my resampling step that means i'm replacing weights by frequencies so samples which have been located over here i duplicate them put them down here and those are less likely to be chosen because if you would you just copy the samples and increase the frequency um at the same location you actually won't see it in the plot so what this plot shows is the resampling step and the motion prediction step and by moving the particles forward and and spreading them due to the sampling process you see this density increasing here in this illustration so of course these are kind of two-step steps done in one but there's something that is you often see in particle filter applications because otherwise you don't really see that effect in the resampling step where the um the samples get condensed in certain locations and then again we repeat the process so copy this part over here and then integrate um the next observation so copy this over here the robot is again in front of a door sees a door so we have this observation function and now we can see there's a smaller number of samples over here because this was just a few samples which kind of those those guys which survived over here which get a weight over here but the majority of samples sits actually in here and so they get very very high weight and those samples get really high weight and we have again a few of them sitting over here so in the next resampling step those will be replicated dramatically more often than all the other samples a few of those as well a few of those as well but most of the others are likely to die out and then integrating the prediction prediction step of the next motion i will see that the majority of samples will be distributed over here representing the position of the platform so again have a small animation this animation done by adida fox of a mobile robot navigating in an environment and this is actually a technique that has been developed a bit more than 20 years ago with substantial contributions here from the university of bonn so what you see here is actually um a floor plan of an old building in bonner building which has probably been taken down or will be taken down soon the old computer science building before they moved here on campus and this was a corridor environment and you can see this was the actual trajectory that the robot will take and you will see the animation in a few seconds how the robot moved through the environment the important thing in here is that this is a symmetric environment so except of kind of a few things which are in the rooms located in the rooms you can flip it the other way around and the corridor would look exactly the same so at the robot move through the corridor you will actually see that multiple hypotheses and then two hypotheses survive and only by moving into a room the robot is able to identify where it actually is so let's see how it looks like so what you see is this dot here shows the current best estimate where the system is in these are the observations which are in this case sonar observations and you will see this position jumping around a little bit in the beginning because the system doesn't know where the robot actually is it has a multimodal belief and it's jumping around until it actually converges so um oops sorry let's start that video okay so what you see here the system is moving a lot of probability mass concentrates in the corridor and if i kind of pause that video over here you can see that a lot of the probability mass is accumulated within the corridors through this repeated sampling um waiting and resampling step a few samples have survived over here but there's nearly no samples in the other rooms and you can also see that there are two clouds those clouds are very prominent these are kind of two explanations where the system could be because they represent the cement symmetry of the environment so if i then can continue and let this system the animation continue you see some samples will survive over here which is the most likely post but also a significant number oh sorry this one so why it's over here and only by moving into the room because those two rooms look different the system is able to identify that is actually here and is not located over here so this cloud will die out and this is a cloud which actually survives so i can actually show that video again now also in full screen mode where you can see how the system estimates the position of the platform recursively taking into account motion observation and resampling moving through the corridor then turning around we see that the two hypotheses survive before the system enters one of the rooms and then is able to identify where the system is located and which hypothesis is is probably wrong and then has converged to a unimodal belief but in order to end up there we need to have the ability to represent multimodal distributions okay last but not least i want to have a look into the resampling step so again resampling was this kind of third step which says replace unlikely samples with more likely ones because we don't want to spend too much memory on areas that don't matter and this was this resampling step it's basically drawing samples proportional to their weight with replacement and doing this j times and you can see this as kind of a mathematically motivated survival of the fittest approach where the bad samples die out and the good samples get replaced and again this is just a trick to get this the samples move into the more likely areas of the state space without changing the belief or only doing minor changes to the belief and this is needed because we only have a finite number of samples and cannot represent an arbitrary number of samples so how does this resampling work from the way it is described here draw samples by replacement that sounds very easy and it's something which is often called red wheel sampling why is this the case because you can vision a roulette wheel where the buckets of that roulette wheel represent the weight of the particles so a lot the larger bucket is the more likely that this sample is actually going to be so if you want to draw a sample you just take your your ball put into your red wheel and see where the ball ends up in which bucket and if you have a larger bucket it's the ball is more likely to end up there and so what how you can do this you basically have your have your weights you know they're normalized to one you draw a random number between 0 and 1 and then determine the the bucket where you end up so you would draw a sample in this case you end up being here so you draw a number let's say this would be approximately 0.22 and you will end up in the weight of particle number three and you repeat the process you have a number dawn close to one so you end up having drawing particle wj on the j's particle for example and you repeat this process over and over again drawing those samples and in this way you can generate the new sample set drawing by replacement and all the samples that you've drawn you copy so the state of particle number three the path date of particle number j and the state of this particle for example and this is what we call rolex wheel sampling it's the simplest form of sampling and you can actually do this in an efficient manner by you have to do j times because you're sampling j times and you need to find which bucket the ball ends up with and that's something that you can realize with binary search because you know that the sum of the weights is is one and you're basically looking in which bucket you're ending up with if you would accumulate those buckets or cumulative weights so you just need to do a binary search in an array which you can do in log j times because your array has j entries so that we end up with a computational complexity of o of j log j and this is the standard rectangle sampling they also have an alternative way to do this and it as we'll see in a second a better way for doing this and this is the so-called low variance resampling also called stochastic universal resampling here the idea is we have a modified roulette wheel so we still have our roulette wheel um and instead of a ball we kind of see this as an as an error which is you're basically turning and depending where the error points to in the end that's the bucket that is chosen except of one error that we have been arrow that we have been using over here we are using now n arrows and they are distributed with eq with an equal spacing in the angular space um so if i have eight arrows between all errors there is a difference of 45 degrees for example and then this are myers and just turning those arrows once and then where all the arrows end up these are the samples that i'm choosing so if i basically just draw one random number and not j random numbers and then all the other samples are deterministically determined and so if you draw this is basically how a draw of a full sample set looks like and that's something which is called stochastic universal resampling also low variance resampling and has a few advantages actually two advantages the first advantage is that it is faster to do you can do it in linear time because you only draw one random number basically determines where the first error pointing to and then you're just while walking through your array you determine where do the other errors end up because you know those errors have an equal spacing and um you don't need to realize the binary search is just one pass for an area which you then can do in linear time because you're basically doing the drawing not one after the other you're doing the drawing at the same point in time while you're moving through the array so the move through the area is a bit more expensive but you only have the ones are not j times so this is a lower computational complexity this is nice but this is typically not the factor why stochastic universal resampling is superior to standard resampling that's a different reason in order to do that why standard red wheel resampling can be sub-optimal is the following so consider you're doing a sampling with replacement so your sample and you put your sample back and then you do that again um and this can lead to results where the output is suboptimal and the output of the the idea let's say that way behind resampling was to eliminate samples which are in low areas of the state space and represent them by samples which have a high weight which are more likely to happen and if i ask yourself the question what actually happens if all samples have exactly the same weight what's going to happen let's say our sensor is really really bad it doesn't provide us any information so all samples will have the same weight what happens i'm actually generating samples the observation doesn't tell me anything so all samples will have the same weight and then the resampling step kicks in and it draws random samples and duplicates them if they have drawn them multiple times or two times and other samples will not be drawn and will die so state hypothesis will basically die out and others will be duplicated or even exist multiple times so in the next time step all the samples have been replicated and others have died although no sample was better than the other and if we now repeat this process over and over again because our sensor has been blinded for example our camera is dirty someone is standing in front of our laser scanner whatever it is we don't get good observations in then this resampling step would even enhance this process so before i had one sample for example just duplicate it now drawing the sample is again increased in probability so it will be drawn more often and then we have four or eight times this sample and other samples will die out so if all particles have the same weight there's actually no way that we should choose one order this doesn't make sense to choose one particle over another so in this situation the stochastic universal resampling or low variance resampling will actually reproduce the same sample set over and over again so if our sample set or our weights is not very good to determine which samples should be replicated with which samples should be eliminated the standard regular resampling will hurt our belief while stochastic universal resampling will not and therefore we will always choose stochastic universal resampling because it's faster and it is better for us so how would we implement that we can actually implement this quite efficiently um so consider that we have our our particle set and this the size of these are kind of basically the unfolded red wheel in this example i draw the first random number and then i want to advance to the next sample in a distance of one divided by n so i'm basically have basically a space uh one divided by j spacing of one divided by j and i'm basically picking the um buckets where the arrow is pointing to this basically the unwrapped roulette field turning the wheel into a straight line the first thing i do i pick a random number between j between zero and one divided by j and then i pick j minus one times the particle in one divided by j steps walking through my array so how do i implement this efficiently i can implement this efficiently by building an array which always which has j elements and always stores the um the sum of the weights so the first bucket has a weight of particle number one the second bucket has a sum of the weight of particle one and the in particle two the third bucket has a sum of particle one two three and so on the last bucket gives me one because my sample set is normalized to one what i then do i draw a random number between 0 and 1 which tells me which bucket do i start with and i'm basically comparing is this random number smaller than w1 if it is smaller i know the arrow sits in bucket number one and i say okay perfect take bucket number one if this is not the case if it is larger than w1 that means i'm actually sitting over here so the first particle at a very small weight and so i should choose particle number two as a first weight and then i'm repeating this process advancing this way one divided by j steps further and further and in order to estimate where i end up with so i basically have two loops of for loop because i'm doing this j times the whole thing j times and i'm having an indicator tell me where this error points to which is my u which is my initial random number plus um the the index of my sample that i've chosen times one divided by j so this is a spacing and i'm basically moving this moving further checking which bucket i'm in while my u is larger than the cumulative sum of this array so this is a cumulative sum and once this is smaller i pick the particle and i'm repeating this step and exactly j particles will be picked so i can do sampling with replacement in a very efficient way using this idea we can also write this down as a more formal algorithm so we start with a random number between 0 and 1 divided by j i have my my cumulative array which i also kind of actually build up i don't even need to compute it explicitly i kind of just need to maintain the variable of the current bucket where i'm in i'm iterating j steps and this is basically copied from the previous slide so u is the indicator which sample i'm choosing and then while u is larger than the current position my cumulative array i'm jumping to the next bucket could be the next value of the accumulative array and repeats this until this is not the case anymore i pick my sample and i continue and in this way i'm drawing j samples in a stochastic universal resampling way so low variance for sampling of sahaja universal resampling is an alternative way that keeps the sample set the same in case of identical weights of the particles and of course we typically don't have identical weights but we may have weights which don't differ much and even if they don't differ much the variation that this low variance resampling brings in is actually minimized in addition to that it's fast to compute so there's really no way for using red wheel sampling always use stochastic universal resampling that's the take-home message over here and now we actually have explained all the parts of the particle filter you know how the particle filter looks like you know how resampling looks like you know how to draw samples from simple distributions such as a gaussian the only thing you need to put in at the user is actually your motion model and your observation model and we have defined a few in the previous lectures so that you can use them for example for a wheeled mobile robot equipped with a laser scanner and i want to show a few examples of mcl in action over here and i want to start with a slide showing the first robots that actually used monte carlo localization this was the robot rhino actually here developed in bonn minerva which was kind of a very similar platform light state way also used as a museum tour guide and then later on the robot albert in freiburg which we have been kind of three robots which have made monte carlo localization popular which again initial developments have happened here in bonn and that in carnegie mellon university and later on in freiburg um building the particle filter-based localization into kind of the gold standard of localization at least for indoor localization of mobile robots today especially if you need to maintain multimodal beliefs this is really the standard approach and i want to show some kind of old slides which show how the robot minerva which was this one over here drove for a period of i think six weeks in the late 90s through the smithsonian museum of american history given tour guides uh tours acting as a tour guide giving tours to visitors and driving around and it was also an event where the same thing happened here in parallel in the deutsche museum in munich as well as in bonn at that time reining was rhino was here in bonn and while driving around performing localization tasks interacting with visitors so this is a floor plan here of the smithsonian museum of american history or this one section where the robot minerva was installed and you can see here our sample set distributed over the space the robot was equipped with two laser rangefinders so one looking to the front one looking to the back which is here illustrated by those blue dashed lines um so this was a sample set a first observation was actually obtained um and this always shows you kind of where the system thinks right now it is it's not the right place in reality the system is over here we integrate perform a resampling step integrate a motion command and then in the next step the particle set looks like this so we have a few positions over here where the system could be we can see denser particle clouds also down here and down here are two areas which share some similarities and could also explain the observation better than other approaches then we can again put in our observation and what is probably a bit hard to see here in the video but if you look to the slides you will see it better you see some of those samples being in dark right and the other one bright red and so the darker the color the higher the uh the weight of that sample and we can already already see here that probability mass has converged really well to the right location we integrate the observation our rewriting our samples and then performing the resampling steps you can see between this step and this step weight update and resampling you see that some of the samples are actually dying out here in this unlikely regions of the state space for example and then the next motion is incorporated here the motion was very noisy of that platform you can see the particle cloud spreads out quite a lot so we get the new observation in you can see again that the measurement is obtained um and as compared to the sample set you can see that here in dark red the weight update is performed so the samples here in the center get a higher weight than outside the resampling is performed you can see here unlikely state space in the outer regions as well as in other areas will die out and then the motion update is performed we get a new observation we perform the weight update with resampling and this way the system can move through the environment and estimate and track its pose over time and this is what's kind of one of the first animations of this monte carlo localization which has been successfully used for extended periods of time in populated scenes there are two other examples i want to show where the system has been used so one is in a production hall at kuka which is a german robot manufacturer of a robot driving around performing logistics tasks equip with laser scanners localizing itself in a map while doing construction tasks you can't do this only indoors you can also do it outdoors so this is an outdoor traversal of a robot navigating here on a graph like or combined graph grid based map driving here around and we can see those red dots over here not the samples this is kind of the overlay of the current laser scan with the position of the best particle the cloud is actually here so small that it's actually hard to see that because the system when you optimize your models you get really precise motion estimates and can track your position um really accurately over time and come up with good estimates and um use this to steer your robot even in outside environments dynamic environments and it's a very versatile technique for performing robot localization the particle filter ever has also some disadvantages not everything is great so one of the problematic things in here is if we have high dimensional state spaces why is this the case because we need to cover our state spaces sufficiently well with samples in order to represent the probability distribution so especially if we have whatever 10 dimensional state spaces or 20 dimensional state spaces the particle filters typically will not work well so the particle that works well in low dimensional spaces or if the uncertainty is not too huge so that we can actually represent the the space of possibilities in an efficient manner and actually the number of samples that you need grows exponentially with the dimensionality of your state space so in two dimensions no problem at all three-dimensional state spaces get already more tricky but they're still fine four may still be okay but higher than four you need to have sufficient computational resources in order to estimate the position of the particle well or you need to apply tricks to kind of restrict the size of your particle cloud to for not being required to use millions or hundreds of thousands of particles so it depends always how many assumptions you can make about your environment how many things can you simplify but in areas where you have high dimensional state spaces or regions of very high uncertainty than the particle filter may be problematic and so that's something you need to take into account if you have a two small number of samples you may suffer from the so-called particle depletion problem that particles die out and leave the likely areas of the state space because your model is for example too focused in comparison to the number of samples that you're using and so your filter will diverge at some point in time so these are some of the problems of the particle filter it also has advantages it's one of the few approaches which can handle non-gaussian distributions very well other non-prometric approaches can of course handle also non-gaussian distributions but they also suffer from certain problems like an exponentially growing memory requirement for depending on the on your state space which can be tricky um so the good things is it really works well in high dimensional low dimensional spaces it can be also used to very elegantly handle data association ambiguities because you can make own data associations for every particle so every particle can because it has no uncertainty associated to this you can very easily do particle dependent decisions and see which particle survives which does the best estimation job and this way incorporate additional knowledge knowledge very easily into those particles you can also easily integrate multiple sensing modalities um by just multiplying the weight with the weight computed from another sensor so that can be very easily done it's also comparably robust so even if you do some mistakes your models are not the perfect models your particle filter will actually compute um quite good beliefs and even in sub-optimal setups perform quite robustly and it's furthermore very easy to implement so you suffer from a very small number of problems that you may have especially numerical issues if you look into a kalman filter for example that can be more tricky the particle filter has less issues that you typically need to deal with we also should say that there are several variants of the particle filters to eliminate some of the drawbacks that are out there for example there are variants of the particle filter for real-time estimation like real-time particle filters that can also handle situations where the sensor data comes in at different frame rates which need to appropriately take into account um delayed state space filters um when some of the streams come in very late and need to be re-synchronized or raw blackberrized particle filters if you're working in high dimensional state spaces and you can split up the state space in a part of the state that you can estimate with samples and part which you can then compute it conditioned on those sample beliefs you can do things in a more efficient way so the raw blacklist particle filters have for example being used for addressing the simultaneous localization mapping problem then you need to localize a robot and build a map at the same point in time and have that to very successful mapping algorithm there although the dimensionality of the overall problem goes into the millions but there's only a small number of them that you need to represent with samples and then can compute the rest analytically so in sum particle filters are a recursive implementation or in a recursive base filter a variant or an implementation of a recursive based filter that can deal with non-parametric distributions as you also have seen we haven't made any assumptions about linearity of models so motion models can be non-linear as long as i can sample from them which can also be and it matters that i can represent for example my banana shape distributions very well what we're doing we are representing our belief by a set of samples and every sample says i did the right job everyone else was wrong and that's how where i am or that's what the state looks like and through a collection of those samples we are representing probability density functions and this way can represent beliefs for example about the position of the robot we typically draw samples from a proposal distribution that takes um the the belief at the current point in time and advances it to the belief at the next point in time and in order to account for the fact that the proposal distribution is not the target that i want to estimate i need to take the difference into account through a weight which is given by the target divided by the proposal distribution so if you implement your own particle filter and use it for a real world system you need to make sure you have appropriate motion and sensor models of course if you screw up your motion in your sensor model which are the user-defined choices in here then your system will not estimate will not do a good job in estimating your state but if you do that well the particle filter is a very robust technique and we have used this here to illustrate monte carlo localization which is applying the particle filter for localizing a robot that means the every particle represents a possible location orientation of our platform for example x y and yaw and then we use our motion model to as a proposal distribution for advancing the the particles and as a result of this using the observation model for computing the weights and this is something which is called monte carlo localization and is today one of the standard choices for performing mobile robot localization especially if you're indoors if you have to deal with multimodal beliefs and cannot rely on external sources which kind of restrict your estimate to a small number of hypotheses or even a single hypothesis so that you can use the kalman filter for example for the estimation technique so mcls is one of the standard algorithms and a choice to the standard choice and also have kind of the test of time has proven that within the last 20 years that mcl is a very robust and efficient localization technique for mobile robots if you want to dive a little bit deeper or look into literature i recommend the probabilistic robotics book three tackles the particle filter and chapter 8.3 addresses monte carlo localization as the realization of the base filter using a particle filter and for that what you need to know is your motion and your observation model so you need to rehearse chapter four and five if you forgot about that because you will need to a way to describe the motion the probabilistic sense and sample from that model and you will need to be able to evaluate your observation model pointwise so estimate for a certain location what is the likelihood of obtaining the observation that i got given that i'm at that place which is what the observation model does if you put all that together and apply the standard monte carlo localization algorithm then you can estimate and track the position of your platform over time that's it from my side thank you very much for attention and i hope you learned something about how to estimate where your system is in the world even under multi-modal beliefs thank you very much for your attention

Info

Channel: Cyrill Stachniss

Views: 9,494

Rating: undefined out of 5

Keywords: robotics, photogrammetry, Monte Carlo, localization

Id: MsYlueVDLI0

Channel Id: undefined

Length: 65min 33sec (3933 seconds)

Published: Tue Sep 22 2020