Probability Primer for Probabilistic Robotics (Cyrill Stachniss)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so welcome everyone to the first technical part of that course here we're going to dive into concepts that we use um in robotics and this is a short primer on probability theory really on a very very basic level just introducing you to a few let's say rules that we are using and based on those rules we also will do the first thing today to derive the recursive base filter which is a framework for state estimation so in order to estimate the current state of a system for example the position of a mobile robot in the environment based on sensor data and based on control commands that are for example the steering commands of that platform and we can do this just from starting from the very very basics and within a few minutes can derive the framework for the recursive based filtering which then can have different realizations such as the kalman filter or the particle filter and we're going to start with that so these probabilistic techniques are often used in robotics so whenever we have systems which need to navigate through the environment which perceive the environment with sensors and derive actions from that those systems typically need to work with probabilities because they do not know exactly in which state they are in they only can estimate where they are and this is best done with probabilistic approaches because it allows us to explicitly encode uncertainty and we can take this uncertainty also in decision-making processes into account in order to be more robust with respect to noise and our world is always noisy we never live in the perfect world so several techniques are relevant here in robotics this starts with general state estimation and then we have let's say common uh applications of the state estimation techniques this can be mapping so estimating what the world looks like as well as localization as means estimating where we are in the world we can also solve both of the things jointly which is then called simultaneous localization and mapping or slam this is something which will have more towards the end or in the second part of this course that we explicitly dive into the simultaneous localization mapping problem but you should already in this course get an idea on what the problem about that is and then we ask ourselves how can we navigate through the environment and that's something which typically involves motion planning so planning what which action to execute in order to reach a goal or come closer to a goal and then also using controls in order to execute this high level motions that are executed we start with state estimation so state estimation in robotics often refers to things somewhere in space so very often we need to estimate geometry and that means we need to estimate where is an obstacle where is a landmark where is a platform some things may be dynamic such as the platform moving through the environment or this car driving through the environment but there are certain things in the environment aesthetic for example trees in the environment are at least rather static or lane markings on the road or traffic signs and we want to estimate the location of those things those objects for example in order to build a map of the environment so a lot of those state estimation problems involve estimating where the system is so where am i and what does the world look like and those two problems are coupled with each other you can envision this if you see for example a street sign 20 meters in front of you it depends where you are so that you know where this street sign actually is and um also the other way around if you don't know where you are but you have a map of the environment and you see a street sign or that's a distinct street sign you can estimate where you are given that you see it and you know where it is located based on your map information so there's a coupling between the question where am i and what does the environment looks like and this is something which can be seen as localization and mapping or if you solve that jointly what's called the simultaneous localization and mapping problem and so this is an example of a 3d model built here with exactly the vehicle you have seen before when you have a vehicle which is equipped with a 3d lighter scanner so can estimate the distance to obstacles to objects basically which reflect the lighter beam and then gives you range information how far that object is away and by estimating the pose of the platform so actually driving around here and integrating this these range information um combined with the positional information you can actually build a 3d model of the environment at least a 3d point cloud in order to to use this information then to localize navigate or do other things so this is one of the one example where you have a combined localization and mapping tasks that you need to solve okay so why probabilities why are probabilities important here the key reason for this is that the world is noisy so we will never get perfect observations and we will never exactly execute what we want to execute so we have uncertainty in the motion execution and we have noise or uncertainty in our observations there is no perfect sensor there is no perfect platform which does exactly what what i tell that platform and so those things are affected by noise and uncertainty and so um the bad thing is to take that uncertainty into account in our model and to take this uncertainty explicitly into account in state estimation and even in motion execution i take into account how certain i am about um let's say certain variables in my model so that i can take that uncertainty into account in order to select the appropriate action we can see this in a very simple form with the localization problem so traditionally the system may say the robot is exactly here so the road is precisely here that x location so if the system is just let's say five millimeters off from that location that's wrong and maybe can lead to catastrophic failures if you just pass by an obstacle with a security distance of three millimeters this can be bad so a better approach is um to use a probability distribution that says the system of the robot is somewhere here i don't know where it exactly is but i know roughly where it is and i can estimate the probability that the system is in a certain region and if i take that uncertainty of the platform into account i may increase my safety distance to obstacles for example or generate actions that avoid collisions or reduce the probability of collisions towards zero and that's something that i can do if i have that information but in order to have that information how certain i am where the platform is i need to dive into probabilistic approaches so i need to take probabilities into account and especially in my probabilistic model in order to estimate what the state of the system is and what the world looks like and therefore we need that okay so let's start with a basic primer on probability theory again this is not a theoretical course this is just let's say half an hour wrap-up of key concepts that we will be using here so um as i said before probabilities are a key tool for realizing robust state estimation not just in robotics also in any other discipline where you work with real world data and the advantage is that the probability theory allows us to explicitly model uncertainties and explicitly modeling uncertainties make the mathematical model more complex but also more robust because we cannot expect the system to be exactly in one state and if i look to the technique that we consider here in that course i was roughly saying 90 of the techniques that we are going to present here are based on probability theory so it's important for you to get comfortable with probability theory and that you're able to do a basic manipulations if you look into probability theory it turns out that it is actually based on a small number of axioms so axioms are definitions and based on these definitions everything else can be derived and so for example in this example pr of a denotes the probability that a proposition a is true so p of a means what's the probability that a is true and we have four definitions for this statement so the first definition that we use the first axiom is that the probability is always a value between 0 and 1. so it can't be negative it can't be larger than 1. so it's a real number between 0 and 1. and the probability of the true event so something which holds is one and the probability for um false so for something which does not hold is zero so probability of zero means that is not the case probability of one means this is exactly the case probably if point five so value in between could mean yeah and fifty percent of the cases it holds in fifty percent of the cases it doesn't hold okay so and then we have if you have two propositions a and b the probability that a or b is the case or holds um is equal to the probability that a holds plus the probability that b holds minus the probability that a and b hold together so this is an or and this is an end so it means a or b and this means a and b is true okay so this means the probability that a or b is true means either a is true that's fine or b is true that's fine these are those two probabilities over here and then we need to subtract something because the event that a and b hold we have basically counted twice in here so it needs to be subtracted once and this is these are the four definitions the four axioms on which everything else is based upon so we can also look to this last axiom over here and visualize it so if this is kind of the whole space of true events and blue refers to a and b refers is the yellow refers to b then that a or b holds is basically this area over here it can be visualized by this area spinning either a is true or b is true but if you count the area of a and the area of b this area in the middle here you have actually counted twice this and this is the area that a and b holds and therefore we have to subtract it once in order to actually get that area over here and this is basically the visualization of this axiom over here and base we can start from those axioms and derive everything else from those axioms so for example just as kind of one small example we want to derive what does the negation negation of a means so not a so what is the probability of not a and now i want to use of axioms to derive what not a actually means in terms of probabilities okay so we can start with using the last axiom and use instead of b not a okay so it's the proposition that a is false so that means the probability of a or not a is just applying the action the probability of a plus the probability of not a minus the probability of a and not a okay so now let's get a step further and combine certain things in here so what does it mean the probability of that a is true and that a is not true so the probability that a is true and a is false you can clearly see that this is a statement which cannot happen so it can't be true and false at the same point in time so this is the impossible event this is p of false this cannot be and in here we say the probability that a is true or a is false just these two options can be true or false and the probability that a is true or a is false is always the case so it's p of true so we can replace this by p of true and this by p of false so the next step will look like this the probability of true is the probability of a plus the probability of not a minus the probability of false and then we have derived before the probability of false is zero and the probability of true is one so we can replace this by a one and this by a zero and then our equation says one equals probability of a plus probability of not a so what we can do is we can just move this one over here and move the p of node a to the other side multiply the equations by -1 and then we get our final equation that the probability of not a is 1 minus the probability of a okay so based on those axioms we have derived here what the negation of a probability looks like so what not a looks like given that we know what p of a is and similar to this example over here we can use those axioms to derive different statements we don't dive further into the details but i just want to show you that basically everything in probability theory is um follows from those axioms and we can use those axioms to derive all the different quantities and build up everything step by step on each other given if you want to do everything from scratch that we are going to use here in this course this would be longer endeavor therefore now just kind of will fast forward and will very informally introduce the different rules that we will be using in this course in order to derive for example the base filter later on okay so let's say x denotes a random variable so a variable which can take different values and if you write like this though the what we already noted here is the random variable x can take the values x1 x2 x3 x4 and so on until xn it's a countable number of possible outcomes so and then we say p of x equals x i is kind of what the probability that the random variable x takes the value x i we just use it as a kind of a sloppy notation you write just this is p of x i so this means nothing else than this over here that x equals x i in 99.9 of the case i will use this notation sometimes i will use this one just to make things clearer but typically we write it like this and so this is the probability that the random variable x takes the value x i and p of p the function p is called a probability function of probability mass function this holds for discrete random variables that means random variables of that form which take a countable number of possible values this is what we call a discrete random variable and the associated probability math function to it and this function takes values between 0 and 1 because if you have whatever n possible outcomes all the n outcomes have a probability which is small or equal to 1 but in sum it must sum up to 1 because in one of those events must take place so in sum the probabilities must be 1. so this is different if we have a continuous variable so a variable which can take continuous values if we have a variable which can take continuous values then we still can write in a similar way we say p of x i equals x equals x or in short form p of x is a so-called probability density function the probability density function is not really a mass function and it can take values for example which are larger than one the important thing in here is that we need to integrate over this function in order to get the probability that an the variable lies within an interval and so the integral over the whole density function again is one but not the function values of this density function so if you want to say what's the probability that x lies within the interval a and b what we need to do is we need to integrate over over over our density function from the from a to b and basically collect all the probability mass that lies within this interval and this probability mass that we collect gives me then the probability that the event or that the variable x takes values between a and b and this p of x can take different forms this can be a normal distribution it's the most prominent one but can also take any other form the important thing in here is the integral of this function must be one and this is kind of one of the first important thing which all is for discrete as well as for continuous functions that we have in here so the sum overall events must be one because one event must be true so if i sum up all the probabilities we must end up with one in the continuous case it is the the equivalent is the integral over my density function and the integral over my density function must be 1 and if i'm not integrating over the whole space of values which x can take just a subset then it's just the probability that x lies in this subset similar to the discrete case if i'm not summing about over all values x i just the subset then i get the probability that the random variable capital x takes this subset of values okay so what we can do is we can also combine multiple random variables with each other and this can lead us to things like joint or conditional probabilities so if we want to express that a we have two random variables a random variable x and then the random variable y you want to say what the probability that x takes a value small x and y takes the value small y and so this is again a short form we write as p of x comma y so this is the probability that the random variable x will take lowercase x and the random variable y takes the value lowercase y if we have this joint then we say the the variables the random variables x and y are independent of each other if the following equation holds so if p of x and y equals to p of x times p of y if this is the case the variables x and y are independent of each other so the the probability that a certain outcome happens in x for example is independent from the random variable y so no matter what y looks like we can it doesn't change the probability distribution over x this is basically what this independence means and this is the definition of independence so x and y are independent if this holds and then we have the conditional probability there's something which is typically written like this so this is p of we read this p of x given y so this horizontal bias reads is given so this means nothing else that what the probability distribution over x and only over x not over y over x given we know which value y takes so given we know what our y looks like this is the probability distribution over x therefore it's called the probability distribution of x given y and so what holds is that we what we can how we can write this is that p of x and y is p of x given y times p of y so it basically means what's so the probability of of the joints or that x and y holds is that the probability that y holds without anything else and then the probability that afterwards basically that x holds given that y holds and this is definition of the conditional probability and we can swap those things around and we'll have exactly the same setup so p of x given y is p of the joint divided by p of y and of course we can do the same thing the other way around so we can also write p of x and y are independent if whole p of x given y is equal to p of x that means no matter what y looks like the probability distribution is the same so y doesn't has any impact on x if i know y this doesn't help me in order to say anything about x okay so let's see joint and conditional probability conditional and joint probabilities will pop up all the time so make sure you understand what this means given okay so the next thing i want to look into is the law of total probability so what the law of total probability law of total probability basically tells me is that if i have um if i sum over all outcomes of y so all possible outcomes of y of p of x given y times p of y this is equal to p of x okay so see in the following way y can take different values let's say y can take the values one two three just four as an example so we can say okay we sum of all those values so we try y equals one y equals two y equals three we say what the probability of y actually takes the value of 1 and then say what's the probability of x given that y equals to 1. so we have the conditional probability times the probability that y actually takes this value of 1. and then we do the same thing for y equals 2 and y equal 3 and we sum up all those values that basically means we sum over all values and only x remains and the same holds for the continuous case except that the sum is replaced by an integral i'm integrating over all outcomes of y okay very related to this or combined with the previous rule and the law of total probability we can get this marginalization rule what marginalization means if i have the joint probability about variables x and y and i sum over all outcomes of y and sum up those probabilities then p of x remains so that means if i have two variables x and y and i only know the joint so for all combinations of x and y i know what the probability looks like then i can compute the probability only about x without y by simply go iterating over all values of y and summing up the probabilities of the joint and that's what the marginalization tells me so if i have the joint i can marginalize out the variable y so that in the remaining distribution y is not there anymore i can get rid of y by summing of all possible outcomes again summing or integrating over depending if i'm in the discrete case or in the continuous case okay let's have a look to a small example to visualization so what we have in here is our is let's say is a small grid um and this is the x dimension this is the y dimension so y can take values 1 and 2 and x can take whatever the values one two three four five six seven eight nine in this example so we have 18 possible um outcomes and let's say we count often a certain outcome actually happens and these are these blue dots over here so this event x equals one and y equals one has happened one two three times over all the number of dots that you see in here okay so basically what we have we have basically counts for all of these outcomes and this is basically a joint distribution you just count the number of outcomes here three divided by the overall number of dots this is your basically a joint probability distribution what you now can do is you can for example marginalize out x so you want to get rid of x you only want to know how often this what the probability for y equals 1 the probability for y equals 2. what you can do is now you can simply sum over all possible outcomes of x of the joint so that means you count all the dots which fall into this bin you're basically merging all these bins into one bin and you're merging all those bins into the second to a second bit and then you count all the blue dots which fall into this lower bin over here which gives you this value and you count the number of blue dots which fall into this kind of merged bin over here which gives you this probability over here so this is then the probability of p of y by marginalizing out x and what we can do for marginalizing out x we can also instead of that marginalize out y then we would get this distribution so here we are summing up all dots and um irrespectively of the value of y so we sum up we basically merge those two grids those two grids those two cells those two cells and so on and so forth so this would give three this would be one two three four five six one two three four five six seven one two three four five six seven eight nine this value and so on and so forth so this is a probability distribution over x marginalizing out y we can also express the um conditional probability so p of x given y equal one by ignoring those because i know y equals one so this is given i know y equals one so i know i'm in this case over here this can be completely ignored and then i'm just building the histogram out of those blue dots which are sitting here with my nine bins so this is an example on for the conditional and the marginalized probability distribution so we can see okay let's go take another concrete example again one of those famous examples we find a lot of books we have two variables m and s which can be true or false so for example male and not male smoker and non-smoker and you want to make a probability distribution or model the probability distribution if people are smoking and put that into relation if they're male or female okay so what do we have in here we have 20 male smokers we have um 20 female smokers we have 35 non-smoking males and we have 45 non-smoking females this is kind of my population of in some 120 people so what i can do now is i can sum the counts over um the different uh columns over here so if i sum up those values i will get 40 if i sum up those two i will get um 80. and this is once the number of smokers versus a number of non-smokers so we have out of this one or 20 people 40 people smoke and 80 people do not smoke and i can do the same at the bottom here over the rows then it would be that we have 20 plus 35 males and 20 plus 45 females so 55 males and 65 females which also add up to 120 people okay so this is my joint probability distribution or my actually the table of occurrences from which i can now compute my joint so we can compute all the combinations the probability for all those combinations so for example um p of m and s so that uh it's the person the male and the smoker what's the probability for that the probability that a person's a male and smoker are actually out of my population yet 20 people out of 120 so it's 20 this cell over here divided by 120. okay we can do the next thing what the probability of male and b and being a non-smoker this again is we have 35 um male non-smoking out of 120 so 35 divided by 120 basically means if i have this population of people there and i just close my eyes and randomly draw out people how likely it is that they are male and that they are non-smokers and the probability for that is 35 divided by 120 and we can do this for all the individual cells but based on this table we can also go a step further and derive other quantities the marginalized distributions so for example what the probability of randomly drawing a male from this population so male this are those over here so it's basically 20 over 120 plus 35 over 120 war which is equal to 55 or 120. so 55 out of this 120 people are male so the probability of selecting a male if you do that randomly with basically closed eyes is 55 divided by 120. and equivalent to that we can do this for the females of course as well here we have a larger number of females 10 more so it's not 55 by 120 but 65 divided by 120. we can do the same thing of course also for the smokers what's the probability that a random person is a smoker this would be 40 divided by 120. so we can actually write down those values and just compute the ratios and then get my marginalized values over here and we can also compute the um conditional probability so what's the probability of a male given that it is a smoker and here you need to be a bit more careful so it means given smoker that means we know the person is smoking that means we are we are over here this information is completely irrelevant to me right now because i know already the person is a smoker i don't know if it's male or female but i know the person's smoking so the non-smokers can be completely ignored over here so i'm only interested in this first row of my table over here so given i know it's a smoker i'd only consider this row over here and the probability for it being male is basically 20 divided by 40. not 120 only 40 because i know the person's a smoker so this will then lead to um the p of m and s so this is what we had over here 20 divided by 120 divided by the probability that the person is smoker which is 40 divided by 120 so if 20 by 120 divided by 40 by 120 is exactly the 20 divided by 40 that i was indicating here so that's a 50 chance if i know the person smoking it's a 50 chance to say it's a male or a female in this example okay so these were examples for those um joint probability distribution the probability distribution over the individual variables and the conditional distribution and it's very important to understand what that means if you say for example p of x given y it's very important so make sure you truly understand what it means and you also know what independence means that's because that's a concept that we will need now when we derive bayes rule so bayes rule is can be very easily derived i just have those two rules that we have seen before so p of x and y is nothing else and p of x given y times p of y okay so again the joint distribution is the conditional distribution times the probability distribution of the individual variable which was given over here in the first part and we can do the same thing of course by swapping x and y because p of x and y is the same if p of y and x and so we can derive we can use the definition to write this one down and so those things are the same so that means that those two equations are the same those two equations are the same i can put this one on the other side of the equation and then divide by one of the distributions for example p of y so i can say p of x given y is equal to p of y given x times p of x divided by p of y and that's all what bayes rule is about so bayes rule tells me p of x given y is p of y given x times p of x divided by p of y what we can do with bayes rule is we can if we have such a distribution we can basically we can use it in order to swap those two quantities sometimes i want to estimate x and i know y then everything is good but sometimes it's the other way around i know y given x but i want to have x given y and then i can use bayes rule assuming i know these other distributions over here and in order to um to compute the distribution i actually want to compute and so this is a very powerful tool that we will use very often a lot of derivations and therefore the base filter is called the base filter because it uses bayes rule as a key ingredient in its derivation and so this term here is often also called likelihood this term is often called the prior and this term is often called the evidence so we see a likelihood of times prior divided by evidence and one important thing is we can do a use-based rule also if we have what it's called so-called background knowledge so additional information so if i add an additional information let's say another variable z and i add them to all probability distributions additional knowledge then the same thing holds as well so um bayes rule with background knowledge so this is the regular bayes rule and this base rule is background order so basically we add an additional variable to all different distributions over here and still the same holds something we call background orders z used in here okay so knowing bayes rule is very very important because we will actually use this very often you just have to learn it know it by heart there is no excuse for not knowing bayes rule so the next thing i will look into is these voice so-called the conditional independence so it combines the conditional distributions and independence that we had talked before and we say that two variables are conditionally independent if they are independent if another information if additional information is available so-called background knowledge so for example if we have a joint distribution over x and y giving some extra order z and i can say this is equal to p of x given that times p of y given that that means those two variables x and y are conditionally independent given z but only f z is available if that is not available i can't say anything about them maybe they are maybe they are not but it doesn't imply it so this and this is what we call a conditional independence conditional linear independence it's conditioned on the fact that we know z okay so this conditional independence means nothing else that p of x given y is the theme of p of x given y and z and y sorry so if i have the background knowledge and the conditional independence holes then knowing y doesn't tell me anything about x so given i know that i have enough information to estimate my probability distribution over x and the knowledge of y doesn't help me to say anything further about x the same the other way around if i know that i have my probability distribution about y and adding the knowledge how x looks like doesn't help me at all and this is a different way for phrasing the conditional independence but the important thing which you need to take into account this does not imply independence or else called marginal independence if you want to distinguish explicitly the conditional independence from the regular independence you can call it marginal independence so that does not mean p of x and y is p of x times p of y it only holds if we have that background information it may hold but it doesn't imply that that's kind of important to note if we are working with our probability distributions we often have discrete probability distributions which we can easily represent for example using histogram just based on those counts what's the example that i've shown before so if i got two slides or three a few slides back okay a bit longer sorry for that so these were our histograms this is a way for storing a discrete probability distribution in an easy way so that's how this is often done if we are living in a continuous world we have to provide often a parametric function in order to describe a probability distribution and how are we doing this a very popular choice is the so-called normal distribution like gaussian distribution this can be a distribution which is one dimensional can be also multiple multi-dimensional so in this case it shows the one-dimensional gaussian distribution it has a mean and your standard deviation so mu and sigma and the mean tells you where is that peak of this distribution and the standard deviation basically tells you the width of this distribution over here so the smaller the standard deviation the more narrower this distribution gets and the higher those values so the more peak the distribution is the smaller the uncertainty of that distribution and we can write the unimodal or it's always unimodal but we can derive describe the one-dimensional gaussian distribution with this equation over here so we have um the question is how far is my variable away from the x a squared value divided by the standard deviation squared multiplied with minus 0.5 put into an exponential function and then have a constant over here which makes sure the integral um sums up to or integrates up to one we can do this also in multiple dimensions and multiple dimensions can be three-dimensional two-dimensional four-dimensional five-dimensional infinite dimensional um then we can typically use this form of writing things down very similar to the first case except that we now use vectors and matrices in order to specify this so this is an illustration of a two-dimensional gaussian distribution and in 3d it gets harder to visualize so we typically visualize those isosurfaces of the of the uncertainty basically this is a surface which takes the same probability value every point on that surface and the more you go to the inside of this ellipse or epsilon the higher the probability the further you go to the outside the smaller the probability but that's something which is very hard to visualize here in this in this 2d world um what we can also do is we can combine multiple um gaussian distributions so if you need to represent multiple modes for example we can do this by combining multiples of those gaussians and this is what we call a gaussian mixture or a sum of gaussian in this case we have a sum over k components so we have basically k gaussian distributions and we just sum up those distributions normally make sure we normalize them so that they sum up to one so that the integral is one and it's still a probability distribution so by using three gaussians this goes in over here a second gaussian over here and a third gaussian over here we can represent this red function over here we can do this in 1d we can do this in multiple dimensions so for example here with three two-dimensional gaussian distributions and this is kind of my armor isosurfaces and this is a 3d visualization of this distribution and that is something that we are frequently using in um in order to represent um distributions which more than one mode it's not always easy because this sum can be tricky for example in certain state estimation problems or least squares problems because it will be hard to compute the log likelihood or the log likelihood is much more complex than in the unimodal case therefore it's not always a great solution but it's kind of the easiest solution that we typically have when we have multi-modal distribution is to use a gaussian mixture model so kind of to sum up this first part over here what we have looked into is that probability theory is an essential tool that we use to solve a lot of state estimation problems and what these first 40 minutes here what we have done so far was to kind of introduce what what does a probability mean what is uncertainty what is bayes rule what is conditional independence what is independence general how to compute basic probabilities at least for the discrete case what is marginalization what is the law of total probability probability distributions must sum up to one so these are some basic knowledge that you will need on a regular basis in order to um you know to deal with those probabilities and in order to understand what um how to manipulate those distributions what are the underlying assumptions and for example also base rule is one of the key concepts that we will use now in the next part of the course in order to derive a recursive so-called base filter recursive state estimation technique thank you

Info

Channel: Cyrill Stachniss

Views: 6,210

Rating: undefined out of 5

Keywords: robotics, photogrammetry

Id: JS5ndD8ans4

Channel Id: undefined

Length: 42min 33sec (2553 seconds)

Published: Tue Aug 25 2020