Kalman Filter & EKF (Cyrill Stachniss)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning today we would like to dive into the kalman filter and the extended kalman filter as a concrete realization of a recursive base filter and we will do so in rather hopefully at least intuitive way so we are not going through the whole derivations but i want to give you an idea what is behind the kalman filter and what are key steps to reach it why it has its for the form that it has so that you get an idea what the properties of the kalman filter are and how you can perform the state estimation with it so the kalman filter is a recursive base filter and it's a special filter and because it requires the world to be gaussian that means every involved probability distribution is supposed to be a gaussian distribution and it assumes linear models and in this case for the gaussian linear case it is actually the optimal filter so the optimal approach that you can do so in this case it is also equivalent to a least squares approach as long as everything is gaussian and all the models are linear and the least squares approach as well as the kalman filter will provide the same result the way the result is computed however is done in a different way so the kalman filter follows the idea of a recursive base filter and performs this type of recursive state estimation that means we're taking our current belief we're taking the current information that we have from our control commands and from our observations and turn the previous belief into the new belief so that we turn the belief from time t minus one to time t and so we are up always updating the belief taking the most recent observation control command into account and don't do everything kind of as a batch operation and for taking into account the control commands we perform the so-called prediction step so we predict what's going to happen given the control information that we have and the second step according to the base filter is the correction step which takes into account the observation in order to correct for potential mistakes that we have done in the prediction step so we perform a prediction then we correct it then we perform a prediction and then we correct it again that's kind of the key idea behind the kalman filter so let's start with a small common filter example just as an illustration how that looks like so consider you're on a boat and you're driving near the coast and you have to have a rough idea where you are which is indicated here by this black circle and so you're navigating and let's say you put your um you try to keep the ship in a certain direction and try to follow for a certain amount of time that direction so you are actually moving forward and let's say you get a new estimate and you say i am probably here where that cross is and this is kind of what is in the common filter the prediction step so you're performing your action and you predict what's going to happen where you are with your boat and then you look to the coastline and see if you find any lighthouse and let's say you're in the lucky situation that you see a lighthouse like get your lighthouse observation over here and this allows you to determine where you are at least to some degree so just one lighthouse observation may not be sufficient but at least it allows you to estimate your orientation well and multiple lighthouse observation will tell you more about your pose but assume we have some observation right now which tells me something about my position and then we can perform the correction based on the prediction so this is where we predict where we are then we have our lighthouse observation which is here illustrated by this dashed line so based on our observation we know that we are let's say somewhere here on that dashed line and then we are we can perform the new state estimate as a combination of where we are predicting where we are here shown in green and the predicted one which is a black one which gives me the joint estimate here shown in red so this is kind of the belief combining the prediction and the correction so to sum up we were somewhere over here we are making a prediction estimating where a ship is without any observation any prior information then we get an observation which tells me which i'm seeing the lighthouse over here and the the observation tells me i'm here at that green green cross and then what the common filter basically does it computes a weighted sum of the prediction where you think you went to what your observation tells you and that's kind of the belief here shown in red and the common filter basically computes a weighted sum taking into account how certain you are about your prediction and how certain you are about your observation and basically trades off um these uncertainties and then can trust the observation more or can trust the prediction more depending on the current estimates and this kind of a very simple illustration about the kalman filter okay so let's have a quick look to the base filter and see how the kalman filter and the base filter relate so also the base filter it was the tool that we derived for recursive state estimation at the prediction step in a correction step and the prediction step is shown over here so we are starting with our previous belief and we are integrating about all states or in this case positions or poses where the system hack is possibly is at the previous point in time so this is shown by this integral and we're integrating about the position at the previous point in time and then i'm multiplying this belief with the so-called motion model or control model so where am i at time t given i know where i was at time t minus 1 and which motion has been executed and i'm integrating over all those states so what happens there mathematically is a convolution of two probability distributions and then in the correction step i have my predicted belief my prediction and i have my observation information my observation model which is shown over here and this was just a normalization constant and then i'm basically multiplying those two probability distributions so the one from the prediction and the one the observation model so that i'm actually correcting the prediction and that's kind of the general framework of the recursive base filter and also the kalman filter exactly follows those steps so the kalman filter as i said is a realization of the base filter it assumes that everything is gaussian and that we only have linear models and in this example it is the optimal estimator optimal means here statistically optimal so you can't do better than the kalman filter so as long as all your models are linear and your probability distributions are gaussian there's no need to investigate anything else than the kalman filter or maybe least squares approach which is equivalent in this case but there's you don't have to look for other state estimation techniques because you basically can't do better in terms of your estimation result and again the key thing is everything is gaussian and everything all the involved models are actually linear and now what i want to do is we want to dive a bit more into the details on how the kalman filter is set up and how it looks like and what do we need to do not derive the basic common filter equations again we are not going to do the derivations here this is something that you um can do on your own um i'm i'm trying to give you an idea what happens or what tools you may need and we'll show maybe some properties of the kalman filter here but then more one discuss what we can do with the kalman filter what are important things to consider and give you kind of a kind of visual um intuitive explanation for what happens in the kalman filter okay so the question is if everything is gaussian we start with a gaussian belief the question is how do we update this gaussian belief based on my emotions and my observations and that's kind of one step that the kalman filter is executing so what we need in order to derive those equations so you can see this can do this derivations from a probabilistic point of view or but you can also do the same way from a linear algebra point of view and the two key operations that are typically needed in most of the derivations is marginalization conditioning of gaussian distributions and just as a short reminder if you if you have a probability distribution over two variables or two subsets of variables x a and x b and this probability distribution is a gaussian distribution and normal then what we have is that the marginal distributions are also gaussians so if you for example integrate away this part xb that only p of x a remains this will be a gaussian distribution and the same way the other way around that p of x b also will be a gaussian distribution if a marginalized out a and the same holds for the conditional distributions so p of x a given x b so assuming i know x b the probability distribution about x a will be a normal distribution again and exactly in the same way and the other way around p of x b given x a is also a gaussian distribution so this is a key property that we are going to exploit if we derive the kalman filter so we will be mainly relying on computing marginals and conditionals and also convolutions and in the important thing is here that all those distributions stay gaussian distributions to not violate the assumption that everything stays gaussian because in the next step of the kalman filter we will again require our distributions to be gaussian so that all um all operations hold we're not going to dive into the details how the conditional and the margin look like margin is computed very easily the condition is slightly more complicated but you typically need end up with the sure complement being executed in order to come up with this conditional but these are the tools that you need if you want to perform the derivation of the kalman filter from scratch the next thing we'll look into is our linear models for the motion as well as also for the observation what does this mean a linear model so in linear model means that the both models can be expressed through a linear function here shown in this setup and so we are having a variable this can be a multi-dimensional variable so i'm having matrix ax and some constant term and the key thing is that if we take such a linear function and we transform a gaussian through this linear function it stakes the gaussian distribution and again this is an important property because if we want to take our current belief that is gaussian and we want to combine it with them with a with a gaussian model like an observation or a prediction we need to make sure that the result that we get out is actually also a gaussian distribution and this function may add additional uncertainty to this to the process like motion models so i start with the gaussian belief i perform my motion this is likely to increase the uncertainty that i have and but through this linear model and my gaussian noise associated to that i can ensure that everything stays gaussian and so this is something that these these models that the kalman filter actually exploits so in the kalman filter we assume to have zero mean gaussian noise and this linear models so this is basically the step for the prediction for the for the predictions fbs for the control commands that we say the current state x t equals a matrix a t times the previous state plus a matrix bt multiplied with the control commands which takes account the changes of the state given the control so this part basically says how does the how does a state changes without doing anything and this is kind of what is the additive term that that is added to that state based on the control command and then we have some gaussian error over here and similarly for the observation so what we are basically doing here we are computing a predicted observation based on the state so we have this matrix ct which maps from the from the space of states into the space of observations and if i multiply this matrix c with the current state x t it basically tells us what we should observe and this is what we often call the predicted observation so we take the matrix c we multiply the variable x to it and then it tells us what our predicted observation is going to be and this is also associated with some gaussian noise term so these are will be the linear models that are involved in the kalman filter so let's dive a little bit into the details and explain the individual components here so we start with this matrix a a is an n by n matrix where n is the dimensionality of my state vector and it tells us how the state evolves from time t minus 1 to time t if we don't apply any controls or have any noise or errors associated with it so a t basically tells us how does a state evolve so in the most trivial case this at maybe an identity matrix because if you if you say if i don't do anything my state shouldn't change in this case a t is an identity matrix and then turns the belief about at the state at time t into the previous state xt minus 1 plus the change that comes from the control command but i may have a reason that this is not an identity matrix that the state evolves even if i don't do anything if you think for example about a vehicle you're driving and you're pushing the gas pedal and even if you don't push the gas pedal you still have some remaining that's a constant velocity which changes the state the position of your vehicle over time so this information may then be encoded in your matrix a t another alternative is if you have external factors like a uav and you have a blowing wind so even if you don't control the wind itself it's an external factor which is in there which could be integrated into this matrix a t so that the wind generates some drift for your uav for example and then the second important part is this matrix bt matrix bt is an n by l matrix where n is a dimensionality of the state space and l is the dimensionality of your control commands so of your ut and it basically tells you how ut changes the state so tells me if i take a odometry command or another control command what are some velocities that i give to my system how do those velocities change the state of my system so it's a mapping from the space of control commands into the state space and therefore and there's an additive term so this is something that we add in the previous model to our previous state or the um how the previous state evolves plus this term bt so it's an additive term which tells you how much do i increase or decrease every individual dimension of my state vector based on the control commands and this is what we need for the prediction step and then we have for the correction step we need this matrix t which is in k by n matrix so k is the dimensionality of my observation and n is the dimensionality of my state space and it basically tells me how i can map the current state into an observation and that means is what am i what i'm going to observe or do i expect to observe given that i'm in this state xt so if i'm xd and i want to get an observation what i'm expecting to observe and this is what this matrix ct tells us so it maps from the state space the n-dimensional state space x t into the space of observation which is here referred to as zt and then what we have for both models we have random variables representing gaussian noise and this is zero mean gaussian noise independent of each other and this is typically referred to or expressed with two covariance matrices rt and qt and it depends a little bit on the community you're coming from sometimes rt is the noise for the control commands sometimes for the observation and qt the other way around so i'm using here rt as the noise for the motion and qt as the moist noise for the controls this is in line with the probabilistic robotics book for example a large body of literature outside that community actually uses it the other way around so be a bit careful if you compare that with some other material make sure you're using the right uncertainties over here okay so let's have a look to what it means a linear model so given that we are in the state of probability distributions so we need to estimate the probability distribution accordingly so what does it mean um to have a linear motion model with gaussian noise so if we write it down what means having p of x t given u t and x t minus 1. that means i know where i've been before i know the control command that i'm executing what the probability distribution where i'm ending up and this must be a gaussian distribution because everything in the kalman filter is gaussian so how does it look like okay obviously it is a gaussian distribution and um we should make sure that we have we know where we have been before and the question is where are we going to end up when we exec execute the control command ut and this is again where our linear model comes into the game so our matrix 80 our matrix bt so we can say okay how does t changes x t minus 1 changes if we move to x t that was a times x t minus 1 plus b times u t and this gives me a new prediction where i'm going to be so this will be the mean of my gaussian distribution so this specifies the mean of that gaussian distribution so i'm basically looking how far am i away from the mean and then i need to divide this taking into account the uncertainty of my motion so in this case the the covariance matrix rt as the uncertainty associated to this so that i can specify where what the probability distribution of where i'm ending up given i know where started and the control command actually have executed so then we can actually write down a standard gaussian distribution so this constant vector over here and this is my exponent what i can see in here this is exactly the part of the linear motion model so this is basically the negative mean so where i'm expecting to be based on the control command that i executed and of x t minus this term minus the mean so basically how far i'm away from the mean same expression pops up over here and then we divide that by the covariance matrix or the inverse of the multiplied with the inverse of the covariance matrix that encodes the uncertainty of that motion model so how much noise do i add do i add to the motion and so this is then my probability distribution which is gaussian and incorporates the linear model in order to tell me how the system advances from t minus 1 to t they can do the same exercise for the for the observations how does the observation look like so again we assume we have our linear model and everything is gaussian so again we need to compute our predicted observation and then see how far are we away with our read observation from that predicted observation so the predicted observation takes into account the current state ct so that was xt sorry and i compute that by taking this matrix ct and multiplying ct with xd so ct times xt gives me the predicted observation what i'm the mean observation what i'm expected to observe and then i take into account what the difference between what i'm expecting to observe and what i'm observing in reality and then i have a gaussian distribution over that difference with the uncertainty of my observation so very similar as before we have our constant factor over here which now just uses the uncertainty of the observation of the uncertainty of the control command and then we have here our negative mean which is ct times xt so what i'm predicted what i'm expecting to observe and i'm basically competing the difference between what i'm actually observing and what i'm expecting to observe and then take the uncertainty of that observation into account so this is again my gaussian distribution as the model for my observation take into account that everything is linear over here okay so that means we have shown that the motion and the observations are linear gaussian models that means we end up having gaussian distributions and the previous state is exploited or the current state is exploited the predicted state in an in a linear fashion and everything stays gaussian or is gaussian in these models so the next thing that we need to do is we need to show that everything also stays gaussian if we compute our updates and for that we need to go back to the recursive base filter equations um so one of the equations tells me that um the currently the current belief is the predicted belief times my um my observation and this was just a normalization constant making sure that the integral ends up um being one over all possible states so this part over here something that we just derived this was the um the observation model sorry and so we know that this is the gaussian distribution because we just have written it down and then we have our predicted belief and the question is what is our predicted belief was is a gaussian is it not a gaussian assuming that it is a gaussian if you assume that the predicted belief is the gaussian then we know that the result is the gaussian because a product of two gaussian distribution again ends up being a gaussian so as the product of two gaussians is a gaussian again the only thing we need to show is that the predicted belief is a gaussian distribution so the predicted belief is the gaussian distribution it turns out that the corrected belief also will be a gaussian distribution because the only thing that happens is that i'm multiplying this gaussian with another gaussian and that will lead to a gaussian distribution again so the next thing we need to inspect is is this part over here the predicted belief is this a gaussian yes or no okay so let's have a look into the definition so i have my predicted belief which was the integral of all previous states the previous belief times my opposite my my motion model so we just defined that this term the motion model is a gaussian distribution right so that's the thing that we just did just to have written it down how it looks like explicitly so we know that this is a gaussian distribution and about our previous belief we can also assume it to be a gaussian distribution because we said we start up with an initial belief the initial belief is a gaussian distribution and then if we can if if this is gaussian and the whole update steps involve operations that everything stays gaussian the next belief will also be gaussian so the next belief is recursively fed into that so the only thing that i need to show is basically that i start with a gaussian my initial belief the constraint that i had and then that assuming it is gaussian if the result is also gaussian as a result is plucked into it again it will stay gaussian as it is a recursive expression um it's similar to a proof by induction where you say um we we have our initial state that we describe which says it's gaussian and then we need to make sure that in the induction going from n to n plus one this is our time step going from t to t plus one that everything stays the gaussian distribution and that's actually what we are going to do here so the question is is this if this is this sufficient that these two are gaussian distributions um but i still have this integral of that variable in there so is it is that sufficient so that these two are gaussians so that this term also is a gaussian distribution and that's a good question but what it actually is if you inspect that equation further then you see that this is actually a convolution of two functions so we are convolving these functions over here and the convolution of two gaussians is again a gaussian distribution and as a result of this the overall term will stay a gaussian distribution maybe you haven't seen that yet so i can go very quickly through the proof that this is the case that in this example here if those two distributions are gaussian our resulting belief will will stay a gaussian we just can go quickly over it we start with our predict belief just writing down the definition and then just filling the individual two probability distributions based on what we had before based on the definitions so this is what we have had shown before this is kind of our motion model i'm just kind of ignoring here the the constant terms so they are all in in the normal in normalization constant over here and then i have here the product of two exponential functions one of my motion model and the other one is just the gaussian about our um our previous belief so the recursive belief so this term over here okay and what i can do i can actually combine those exponential functions so what we can do is we can say we can formulate a new gaussian distribution in here over take combining these two exponential functions so i can say this can be expressed as the exponent about some some function lt and the question is how do these things need to be combined they need to combine the following way as the sum of those two exponents over here so you have to exponential functions to multiply them you combine can combine them into one by just adding up the individual terms here in the exponential function and then we can actually split that term lt up into two terms one term lt that depends on xt minus one and xt and another part which only depends on x t and then i can actually rewrite this integral into two parts or i can move one part of the expression out of it namely the part that only depends on x t and then i have the integral over x t minus 1 sitting over here so this is a gaussian distribution can be expressed again as a gaussian distribution it's just those terms of the exponent which only depend on the variable x t and don't take into account the variable x t minus 1 and then this is basically a marginalization operation of the variable x t minus one also being a gaussian and the marginalization of um of a gaussian distribution as we said before stacey gaussian distribution so this will turn then the overall term into a product of two gaussian distributions which again is gaussian if you want to go a bit more into the details and go step by step through this proof i recommend the probabilistic robotics book chapter 3.2 will actually guide you exactly through that process so as a result of this the belief the predicted belief is also a gaussian distribution so that finally we can say this is a gaussian distribution this is a gaussian distribution the integral over it is a gaussian distribution as a result of this my predicted belief is a gaussian distribution so this is gaussian multiplied with another gaussian distribution this is just a normalization constant so we don't care about it and so as a result of this this will be a gaussian distribution so as long as we are living in this linear world and the individual models are gaussian everything will stay gaussian so from the beginning to the end we'll never leave the gaussian world and that's important but an important property so that we can actually update our belief at every point in time and can execute the steps also in the future when the new observation the new controls come in because if you would leave the gaussian world at some point that would mean that we don't have a gaussian input anymore for the next state and then the whole thing would actually break so it's important that we live in this all the time in this gaussian world so that we can recursively apply our update rules and evolve the state from time t minus 1 to the time t so what we have learned so far is that we use linear models for the observation and the control commands which we actually have written down this for the motion command it's an xt predicted is a t times x t minus one so how does the state change if they don't do anything plus bt times ut how does the control changes our state plus some gaussian noise and for the observation we can compute a predicted observation in linear case by saying zt the predicted observation is the matrix ct multiplied with the current state xt saying how which observation would the system generate if we are currently in xt that's what this function ct tells us plus some gaussian noise um so if we have that everything is gaussian then we will stay continuously in this gaussian world and have basically shown that everything stays gaussian as long as we're living in this gaussian or everything is gaussian and linear so that all the steps are gaussian distributions all the time so the only thing we need to do we need to maintain gaussian distributions in the overall step of the kalman filter so how do we represent our gaussian distributions gaussian distributions can be easily represented by a mean and a covariance function so the only thing we need to provide is actually or we need to store in our kalman filter if you think about it from an algorithmic point of view is a mean and a covariance matrix and we start with the mean covariance matrix of the belief the state of the belief or the parameters of the belief at the previous point in time and evolve it into the new mean and new covariance matrix at the current point in time so from t minus 1 to t taking into account a gaussian distribution for the motion noise and the gaussian distribution for the observation noise so all those things everything stays gaussian in here so the main question is how does he look like so the current uh the mean of the current belief and the covariance matrix of the of the of the current belief um relating that to all the gaussian distributions of the previous belief and that's the thing that we actually need to sit down and actually derive by hand and this is very similar to the derivation that i just started with the convolution showing that this the convolution of those to gaussian stay gaussian by splitting this up into um into a product of gaussians and then a marginalization operation and then can show that i'm actually still staying in the gaussian world very similar things happen if i say kind of how can i express the mean and the covariance matrix um to derive this so what uh the key toolbox that i also need in order to derive the kalman filter algorithm is the fact that the product of two gaussians is still a gaussian distribution that if i take gaussian distributions and transform them under linear transformations they stay gaussian distributions the marginal and the conditional distribution of a gaussian distribution stays a gaussian that's something important that we need to exploit then we need to be able to compute this mean and covariance of the marginal and the conditional distributions so basically executing this we need to take into account the um the convolution of two gaussian we will use the matrix in version lammer and a few other mathematical toolboxes that we know either from a probability theory or from linear algebra and then this leads us to a short algorithm so i'm not showing this derivation here this would be an exercise which would take a bit longer for that video here um but there is a lot of literature out there i'm approaching it from the linear algebra point of view or from the probability theory point of view and lets you derive the common filter from scratch once you have done the exercise you will come up with an algorithm that looks like this so it's basically a five line algorithm the first two lines the lines two and three over here tell us the how the believe evolves given the motion command so this is the prediction step over here those two lines so what happens what is the input of the kalman filter has to start maybe from the beginning we have our previous mean and the uncertainty of our previous belief so this is kind of the recursive belief we add our control command and our current observation and now want to evolve the state from t minus one to t that means we need to compute a new mean and a new covariance matrix at time t and we do this in two steps first executing the prediction step and then executing the correction step so in the prediction step we need to evolve the mean and the covariance matrix so we have our predicted mean so the one with the bar is a predicted mean at time t is our previous mean at time t minus one multiplying it with my matrix a t so how does the state evolve if i don't do anything this part over here plus bt times ut so how does the control command changes our state and they both are added up and this is kind of my linear model that i have in here and based on this i need to specify how the uncertainty evolves and this is basically um the effect on how does the first order error propagation how does the uncertainty of gaussian distributions changes if it is squeezed through a linear function and here i have the matrix a t multiplied with the uncertainty of the previous belief a t transposed plus rt so this is basically the additional noise that the motion adds to the current state so this is my prediction step and then from this point i have my predicted belief completed in that way and then i have my correction step which is the three lines four five and six over here and um what basically happens here is that i'm computing a weighted sum of these two gaussian distributions or a weighted sum of the belief basically a product of two gaussian distributions and the weighting factor is this k t over here it's a so called kalman gain the common gain tells me how much more do i trust the observation or how much more do i trust my belief so it weights the prediction and the correction with each other and we have with this term over here it's basically a ratio taking into account the uh the uncertainty of my observations so sitting over here and then some evolvement taking into account the uh my matrix ct so the um the model which tells me how to map from the state space in the space of observations and what basically happens in the end is then that in this term i take my my new believe is my predicted mean and then a correction term and this correction term tells me um how much does the observation actually changes my state and what it what we do is we compute the difference between the actual observation that i've obtained and the predicted observations so you kind of see that this term over here was basically a difference between the predicted belief and the predicted observation and the actual observations of this term that was sitting in the exponent of my observation model and multiply it with kt with the kalman gain and what the kalman gain basically tells me here is is how much does the difference between the actual observation and the predicted observation will change my state and so i'm basically taking into account what's the difference between what i should observe and what i'm actually observing and this difference is then mapped through the common gain matrix into the into the state space and tells me how i need to update my predicted mean in order to take the correction through the observation into account and then i need to also update not only the mean i also need to update my my covariance matrix so that through the observation for example my uncertainty typically decreases because i get a new observation i get new information into the system and therefore the uncertainty typically reduces through my observation and this needs to be appropriately updated so um that the uncertainty in my current belief is appropriately taken into account and then the last step i just kind of return the updated mean and the updated covariance matrix now representing the gaussian distribution at time t so the updated belief and then in the next step this information would be fed in again here into the kalman filter observation taking then ut plus one and zt plus one into account this would be then the next step the next involvement given that we have a recursive state estimation algorithm over here so maybe a small example in 1d so consider that i have a belief and i perform a prediction so i'm predicting what the state will be at the next point in time let's say this gaussian distribution the red gaussian distribution over here is my prediction step and then i get a new measurement which is shown here illustrated here by this green probability distribution so i have a prediction and a measurement and now i need to combine those and the combination of those is basically takes account which one weighs more depending on the uncertainty that i have so i believe the probability distribution which has a large covariance matrix will have a smaller impact on the final belief than one which is more peaked you can nicely see it here in this illustration the correction which is shown here by the blue curve is closer to the green curve to the measurement rather than the prediction and the only reason for this is that i'm trusting my measurement more than my prediction which can be seen on the variance of these two distributions so the green curve has a smaller variance so i trust it more and therefore it will have a stronger impact um after applying the correction step ending up with this blue curve over here what you can now see also here the blue curve has a smaller uncertainty than the green curve because it combines two gaussian distributions and so the uncertainty that we have here in the end is of the resulting gaussian distribution the smaller than the two uncertainties we had here with the blue and the red curve over here so we are actually getting more certain by adding this additional information into the game in general it is like this that the prediction step increases my uncertainty because i add additional source of uncertainty to my gain given that i have a belief and through the motion i kind of add more uncertainty to the belief i have the uncertainty that i had before plus the errors that i make through my control command and then the measurement through the measurement i get new information in there i can reduce the uncertainty that i have by combining those two gaussian distributions so the prediction step increases the uncertainty the measurement reduces the uncertainty and i mean unless the motion is perfect then we don't add any uncertainty or if my measurement doesn't tell me anything then i'm not reducing the uncertainty but in most real-world situations that will not be the case and um the motion will increase on the uncertainty and the measurement will reduce the uncertainty that i have here but the important thing is in the end it's basically a weighted mean of these two beliefs of these two state estimates and the the weight is computed based on the uncertainty of these distributions so here the red one has a smaller weight than the green one over here and this will then lead to the to the new belief so we can take this a step further so let's say we start now with the blue curve which is now my um my belief at the next point in time and i can perform a new prediction so now the new prediction will be over here so this day the system has moved into this direction or evolved the state in the positive x-axis direction but through this motion became more uncertain and then we get a new observation in this is my the new observation that i get here and this time the observation is not that um good as the observation that i had before this can change from observation to observation and now i'm basically combining those distributions and get the yellow curve over here which sits quite nicely in the middle between the two input curves so everything that i do stays gaussian i'm basically combining the prediction and the correction into a new gaussian belief and i do this in a recursive fashion and again the two key assumption that the kalman filter does is with gaussian distributions in my noise everything is gaussian so it means my prior belief is gaussian my observation model has gaussian noise and my motion model has gaussian noise and my motion and my observation model are linear so they are linear models and following these equations over here and under these two assumptions everything is good everything stays gaussian we know how to update our belief and it's actually the optimal estimator the best thing that we can do the key question that however is there what happens if this is not the case so the assumptions these two assumptions are actually very strong assumptions that everything is gaussian and that every every model is linear are hard assumptions especially the one with the linear motion observation models this is basically violated all the time you can still argue that a lot of sensors may have a gaussian noise characteristic or the galaxy noise is a good approximation of the noise that we're actually obtaining most of the time you're actually kind of okay with this assumption but the linear motion and linear observation is something which is in most situations not the case you know this if you want to describe the motion of a vehicle for example even the 2d plane or even worse if you're in 3d you have angular components often involved because the system is moving into some direction so you will have sine cosine functions popping up and these functions are not linear functions and as a result of this they violate this assumption that the common filter makes and cannot be applied in the plain vanilla kalman filter so and as we do state estimation often using the euclidean space where we want to perform our estimations then um and this involves also orientations and angles this is something that we need to take into account so we need to find ways for dealing with those non-linear functions so what happens if we have a non-linear system so most realistic systems as i said involve or is it will have systems which are non-linear and the question is how can we take this non-linearity into account so we need to get rid of those two expressions and we what happens if we are non-linear these functions aren't linear functions anymore so they turn into some new functions the function g and the function h one for the motion and one for the observation which are not necessarily linear anymore right so the question is first let us respect what changes if we put in a different function g in a different function h which are not linear anymore why is this a problem let's illustrate that okay we can do this very nicely in a graphical way with this type of plot so what we see down here is a gaussian distribution and this is our input we want to transform this gaussian distribution through a function and let's say it is a linear function so this function that we see over here so and this should this gaussian distribution should be squeezed through this linear function and will give me a new gaussian over here okay so how does it look like how can i illustrate this i can take every point here on that function i can move up here and basically deflect this ray onto the x-axis and this is where this point will be mapped to for this new gaussian so the mean over here will be mapped to this new mean this value over here will be mapped to this value over here so all the probability methods that which sits down here will be actually transformed through this function ending up in this belief it's a very nice way for illustrating what happens with the function and if i squeeze it through with the gaussian distribution or any other distribution if i squeeze it through in this case this linear function so now i said okay we want to get rid of the assumption that everything is linear that means we need to replace this function by a nonlinear function let's take a function which looks like this so this is clearly a nonlinear function over here so we now need to do is we need to take every point here on that curve or the probability math actually associate with it and squeeze it through this non-linear function over here and what happens the resulting distribution that we actually get is this black curve over here we can clearly see that this black curve over here is not a gaussian so this input gaussian is transformed into this black curve which is clearly not a gaussian distribution so by using these non-linear functions um to evolve my state or perform my observation to cooperate my observation i'm actually leaving this gaussian verbal assumption we can still approximate this function here this black function by a gaussian distribution then we'll get the blue curve over here but it's very obvious the case that this approximation is quite far away from the original function over here so that's a problem that we have so if we put in a gaussian through this non-linear function then the output function will not be a gaussian anymore and as a result of this the kalman filter is not applicable anymore because we are leaving this gaussian world and as soon as we leave this gaussian world the kalman filter can't handle that anymore so the key question is what can we do to resolve this so which tricks do we have in our kind of toolbox in order to modify the algorithm so that this information can be taken into account maybe not taking into account perfectly maybe we still need to make some assumptions in here but what can we do in order to deal with these non-linearities at least to some reasonable degree and um so what happens if it's non-linear function what can we do in order to solve that and one the easiest trick to do is let's say okay we know the function is non-linear but let's say let's assume it to be linear and locally linearize it so perform local linearizations so basically taking a point and at that point computing the first derivative of this function and fit a line through this non-linear function at a specific point at the linearization point and um and treat it as a linear function depending how far i'm away from my linearization point and this is something that we know as taylor expansion or tool known as taylor expansion that we are frequently using over here so how does it work um so what we need to do is if we want to look to our prediction model over here so we have our nonlinear function g with the parameters u t and x t minus 1. so what we need to do is so here kind of xt minus one so we need to take into account how basically how far are we away um in from with our new estimate given our previous belief so we start by evaluating the function at the linearization point okay so where are we our previous mean and then we add to that the jacobian of the first derivative of this function with respect to our state x plus the question is how far am i away from my linearization point so if you see this as my variable x then basically how far i'm away with my variable x from the from my mean and this basically is a linearization of my prediction step same thing for the correction step that i have if i have my function that depends on x t this is my variable over here i say i am at my linearization point which is the predicted belief because the best estimate that i have about the state before executing the correction and then i compute the first derivative or the jacobian in the multi-dimensional case of that function h with respect to x t and then i have x t minus mu prime so how far is the how far is my variable away from my linearization point so this turns a linear function so this is a constant this is um so there's a constant linearization point the the y value over here this is my um my linearization so basically the slope of my a linear function and this basically tells me how far am i away from that linearization point the same hose for the correction step as well as for the prediction steps so and just kind of as a short reminder how does this first derivative look like in the multi-dimensional case in the multi-dimensional case it's not a single value it's um it's a matrix a jacobian which is a non-square matrix and so if you have an n dimensional function so in this case this would be our state x for example the dimensionality of that state m dimensional and then we have n different variables to which i'm going to derive it to this would be the partial derivative of the first dimension of the function divided by the first variable the first dimension of the function divided with respect to the second variable and so on and here in the individual columns we have that for whatever the amps dimension of my function g derived with respect to the first dimension of the parameter until the nth dimension of the parameter it's basically a matrix filled with this partial derivatives again we can also illustrate this how can i use this for linearization so if this is the function that i have in here for example in this 2d case this will lead to this plane and basically i'm moving on that plane this is kind of my two-dimensional function which is a local linearization of the green function here at that point or even more shown over here if this is my original function i want to compute the linearization point in here i basically get this blue plane and i'm basically moving on that blue plane but you can see also here typically if i stay close enough to the linearization point it can be a good approximation but the further i move away from the linearization point of the worth the um the the approximation gets especially with this blue curve down here so this jacobian is basically the generalization of you the typical gradient or first derivative that you have and in case we have multiple dimensions so we leave the 1d world which we nearly always do if we perform real-world state estimation with the kelvin filter okay so as a result of this these things turn into linear functions right because it's just a constant which is specified based on the linearization point times my variable how far i'm actually away so let's see how this changes my overall setup that we had before so this was our linear assumption then this was our non-linear function where we say okay this is kind of the result that we have so what we are now doing how does this local linearization works so we choose the mean as our linearization point so this is my linearization point and then approximate this blue function by a linear function here shown in red so this is my linearization point my taylor approximation this red curve and now i'm taking this gaussian and transform it under this red curve under this linearized function then i will actually get this red function over here you should know that this red function is now a gaussian distribution because there's a linear function that was a gaussian distribution but of course it cannot cover the non-linearities of this blue function in an appropriate way again if we would use the blue function we would end up with this black curve over here if we approximate uh or take this black curve and approximate it by a mean and variance or variance it would add up with this blue function over here and but the problem is this is something that we typically cannot do well so what we the only thing we can do is we can perform this local linearization compute this red curve over here take the gaussian transferred under the red curve and then we'll end up having this red distribution over here and there's even a mistake between the red and the blue curve so if we are far away from being perfect but it's kind of the best thing that we can do in our current case or the easiest thing we can do let's say that way that's not the best thing we can do there are better ways for for performing those approximations but it's kind of the standard way of doing this performing a local linearization um the question is how large is the mistake that we are actually doing and what does it actually depend and it actually depends on two quantities on the first quantity is kind of how far is this red curve away from the blue curve so how large is the mistake that i'm doing through the local linearization if the blue and the red curve are very similar also the approximation error can be assumed to be small if they deviate strongly then the approximation error will be large the second thing on what it what the approximation error depends on is actually the uncertainty the input of that function over here so so the if the uncertainty is large that means a lot of probability mass is distributed over the space here and that means more probability mass will be mapped through the red function where i'm far away from the linearization point and under the assumption that the further away i'm from from the linearization point the larger my approximation error gets the larger the error that is generated through the transformation itself so the smaller the uncertainty of this gaussian distribution down here the smaller the error of the of the of the output here and we can actually illustrate this let's say we take a function which is a larger variance so no more probability mass is spread out further away from the mean that means i'm further away from the linearization point and that means that this approximation error gets larger and larger over here okay so we can see this for example by the difference between those two means even if we just compute the difference between the two gaussian the red gaussian and the blue gaussian if we on the other hand have a gaussian with a smaller uncertainty in here so if this is more peaked over here then the error will be also smaller over here so if this is more peak you can even see even the blue curve looks like more like a gaussian distribution it's not a gaussian but it's more like a gaussian distribution we can also see that the red and the blue curve come closer together so the approximation error gets smaller so this means that there are two things that impact the approximation error first is how far am i away from the actual nonlinear function and what's the uncertainty involved in this um so this is also one of the things that we will get to know about the kalman filter the clo the smaller the uncertainty the smaller the approximation error and kind of the more similar my control model or my observation model is with respect to linear world model the better the the performance of the kalman filter okay um so let's have a look into the linear how does the linearized model looks like in in in terms of the motion model so this was basically so what we're basically doing is we are taking the um the motion model as we specified it before but now we replace this a times x t minus one plus b u t with my non-linear function over here so this is not my linearized function my linearized motion so i'm performing the linearization and then turn it into a gaussian distribution so the only thing which has changed compared to before this is not this ax bu term plus bu term but it's actually the um the linearized form using the taylor approximation and the same thing for my for my observation model instead of having here ct times xt i now have my linearized observation model so how does the how's the how's the mapping from the state space to the observation space in this linearized form and that's shown over here and what we now need to do is we need to take those linearization and put it into the kalman filter algorithm right so we before the common filter algorithm and now we need to turn it into the common filter algorithm which takes those linearized models into account and luckily there are very little changes i need to make to the other to the common filter algorithm to turn it to what we call the extended kalman filter algorithm extend it here because it um extends the kalman filter in the send that we can use it with non-linear models and it stays very very similar the only thing that has changed are those two parts over here which is i'm replacing the linear model with my nonlinear function for predicting the mean so here i'm actually using this nonlinear function for making the prediction because i'm predicting just one single point and this i can do without requiring my linearization the same holds for the observation so here i'm really using a function ht to map the predicted mean into what i'm expecting to observe to in order to be able to compare compute the difference between what i'm actually observing what i'm predicted to observe and then what changes are the matrices the matrix a and a transposed and c and c transposed which are now replaced by g and h which are the jacobians of the of my motion and my observation and the rest stays identical between the kalman filter and the extended common filter algorithm so because i'm just using the linearized models and so only the involved matrices in this linear function actually changed and the rest will stay the same but i also want to look into into here explain a bit it is actually what happens with my correction step if i'm for example having a very good observation or very bad observation and just kind of to give you a small hint what will change in here so consider you have an observation which is perfect so it is the perfect observation it's the sensor with zero noise so the question is how can we see what happens to my state space in this situation okay let's go a little bit into the details and explain what what's going to happen so what happens if we have the perfect sensor a perfect sensor means the noise of the sensor is zero that means we have our covariance matrix over here and this covariance matrix is basically a matrix full of zeros right it's kind of the perfect sensor zero uncertainty associated to that okay so that means this term drops away so that what what remains is here when we compute the common gain it's just the inverse of h transposed uh sigma bar m h ht transpose so if i'm computing the inverse i'm getting the inverse and i need to swap the order of the variables over here okay so what happens if i'm inverting this term over here i will receive these individual matrices inverted but kind of swapped so i will have h transposed inverse sitting over here so i will have h transposed times h transposed inverse so those two variables will be gone because this identity matrix and then the next term is will be sigma bar so the predicted uncertainty and the predicted uncertainty inverse again multiplied with each other give me the identity matrix so the only matrix which will remain will be h t inverse so the jacobian of my observation model inverted this will be my new common gain in this setup okay now let's see what happens if we take this common gain into account if we take this common gain into account which just kind of the observation function that we um that we have in here inverted and what i'm actually doing i i kind of expend this this bracket over here so it is h inverse times z t so h inverse times z t means i'm mapping from the state of observations into the state space so i'm taking what i actually observed with the perfect sensor and map it into the state space okay so this will give me the product of these two terms will give me where am i given that sensor reading what's my state given that sensor reading and then this term over here will be the inverse of h times h of the uh of the of the mean so only the mean will remain but then i have the mean minus the mean so these terms will cancel each other out will be zero so the only thing which remains will be h inverse times zt and this will basically take the observation and map the observation into the state space one to one and the remaining state will be what the sensor told me which makes sense if i have the perfect noise-free sensor the prediction is completely cancelled out and i'm only taking the observation to determine my state space and that's exactly what happens now we can turn it around and see the other thing what happens actually if my sensor is the worst sender ever so it doesn't provide me any information so we can see this as a covariance matrix which is has infinity on the main diagonal zero information completely useless sensor okay so that means i have this um this term over here this this is basically a sum of something plus infinity the sum of something finite plus infinity gives me infinity so what remains is basically a covariance matrix which takes infinity on the main diagonal and then i'm inverting this matrix multiplied with some finite values so basically the result will be zero because i'm dividing by infinity so whatever number i put in here given this is inverse basically division the common gain will turn into zero so the common gain will be a matrix but just filled with zeros if i now go to the next step say my current belief is my predicted belief plus the kalman gain in something and the common gain is a zero matrix everything will cancel out the only thing which remains is the current belief a current mean is the predicted mean which makes absolutely sense if my sensor doesn't provide me any information what i should do is i should trust my prediction and that's exactly what turns what happens so if the sensor has infinite uncertainty the common gain will turn into zero and then just the predicted belief will remain as my state estimate and so as a result of this we can see that these kind of two extreme cases can be very nicely explained what happens if you put your the uncertainty of your sensor to zero or to infinity and you will either completely trust your observation or you will completely trust ignore the observation basically just trust your prediction and this is exactly what happens and what also makes sense in here okay we can now use the comma which we just introduced this general framework over here for performing different things such as localization so this is just a kind of very small illustration example in one of the next lectures we will dive more into the details and provide you with the full approach to ekf based localization so you get an idea how that looks like or look into other state estimation tasks using the common filter but just kind of a small illustration what we've seen here so what you see here this gray line is the actual trajectory that the platform had been taken and here where one two three four five six landmarks in the environment so things that the sensor can actually observe so the system has an initial belief let's say starts over here and then things had moved over here although in reality it took this movement then it got an observation over here which is still um seems to be so based on the observation um the system will think it is actually here because you have noise in your observation and the system thinks it's actually here although it was kind of closer to the landmark based on the noise of the observation then it moves on it probably was moving over here moving straight and then you get a new observation which actually drags the system kind of closer to the real belief and then turning around you always get your observations and update your belief so this is basically the mean estimate of the trajectory that the system is taking based on the on the motion commands and the landmark observation that you actually get and so if you if you run the ekf this is kind of the the the motion that is actually executed that is estimated so here you can always see kind of the uncertainty of the prediction and the observation being integrated so this is the uncertainty of the belief and then you have the prediction and observation taking into account combine that in a recursive state estimation fashion so in the end it's a weighted sum of the predictions and the observations and this is kind of a small video example how that looks like in reality you can um what you see here these are landmarks and if the thing is flashing up you can see the actual observation this is basically what the motion command tells you so your prediction you can see the system has a drift over here and you can see here always kind of the the estimate of the state the um the estimate of the predicted state and then taking the currency observation for predict correcting the belief and the actual trajectory is driving the system is driving here on that circle and what you can see here with this blue line which is the estimate of the extended kalman filter you can actually track the motion of the eagle really well if you compare this kind of to just the the motion that you would use where you would end up with this pink trajectory over here it's kind of a small small example or teaser how you can actually localize a system using an extended comment filter so this brings me to the end of the lecture today and just as a kind of short summary what have you learned or what should you have learned today um so the extended kalman filter is a variant of the kalman filter and the common filter itself was the base filter for the gaussian linear case and it's the ekf is one way to handle non-linearities it does so by local linearizations by computing taylor expansions and that's not the only way this is just one way kind of the straightforward the most common way but they are different variants in order to perform better approximations such as the unscented kalman filter where you go a different way than just using one linearization point in local linearizations you may take more advanced approaches and the ekf is still or is one of the standard toolboxes that we can use today for um state estimation and it works well unless you either have extremely large uncertainties involved so if you need somebody to start with an um with a prior belief which is a more or less uniform distribution or which is multimodal then this can be problematic because then you're introducing large errors in your linearization as we've seen and the second thing is if your non-linear models cannot be approximated well by a linearized function and this is also something when the kalman filter is likely to fail but as long as your models are not too far away from your from a linearized function a linear function where your uncertainties can be bounded for example by using high quality sensors the kalman filter is a standard tool that we can use for recursive state estimation so if you want to dive a little bit deeper into the topic i recommend the probabilistic robotics book especially chapter 3 and chapter 7 over here and if you're interested in deriving the like the kalman filter by manipulating gaussian distributions i recommend the article by showing linstern which you find freely on the net where they go step by step through the derivation whatever five six pages something like this and ending up in deriving the common filter so you can actually with the tools that i've shown in the lecture i derive it by hand although we have not explicitly done this derivation here with this i'm coming to the end of today i hope you learned something about recursive state estimation and how a concrete realization of a recursive base filter looks like namely the kalman filter and extended common filter and now have tools at hand that you can actually use it for your own problems and the remaining part of those of the lecture and the upcoming parts we will look into examples how we can use the kalman filter to actually solve one concrete state estimation problem like localization or simultaneous localization mapping just as two examples so thank you very much for your attention and see you soon bye
Info
Channel: Cyrill Stachniss
Views: 24,390
Rating: undefined out of 5
Keywords: robotics, photogrammetry
Id: E-6paM_Iwfc
Channel Id: undefined
Length: 73min 35sec (4415 seconds)
Published: Thu Sep 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.