COS 302: Norms and Inner Products

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're going to talk about the ideas of norms and of inner products so an important theme of last video was connecting concrete ideas that maybe we have some intuition for with more abstract and you know a kind of more precise language from mathematics and so let's continue that a little bit and draw some explicit connections between concrete things that we kind of think about operationally and that we might code up in python versus things that are a little bit more abstract and represent possibly more complicated objects again remember this is a continuing theme of the course is to think about connections from vectors maybe as you kind of think of them as directed segments to abstract ideas of vector spaces let's talk about things that are more concrete and connect those to ideas that are more abstract so the first one that's kind of staring us in the face here is we sometimes talk about matrices right and we have some intuition for matrices and then sometimes we talk about linear maps as the abstraction of those ideas so you can think of matrices as being a particular way to think about some kinds of linear maps or you can think of linear maps as being the generalization of the idea of a matrix similarly we sometimes talk about rank so what's the equivalent of kind of matrix rank in the more general abstract language we've talked about well it's going to be something like the dimension of the image of the map so if the map is represented by phi and we talk about the image then the generalized notion of rank is going to be the dimension of that image so what does it mean for something to be full rank a full rank matrix in the abstract that's saying that we have an injective map and so then today we're going to generalize this a little bit more we're going to talk about how we might have an idea of a euclidean distance so some kind of notion of the distance between two things so we're going to connect that idea and the idea generally of length we're going to connect that to the concept of a norm and then we're also going to connect the idea of a dot product which you've certainly encountered when thinking about directed line segments we're going to connect that to the idea the abstract idea of an inner product and so today we're going to focus on these guys and talk about norms and inner products let's start off by talking about the length of a vector now you've probably thought about the length of vectors for a long time in euclidean spaces so if we talk about a euclidean length and we say that we have a vector x and let's just say that it's in some n-dimensional space then our notion of euclidean length for these kinds of vectors maps very well on to things we've been talking about since grade school the pythagorean theorem and things like that if we write that length as something like this with two vertical bars on each side then that's going to be equal to the square root of the sum of squares of each dimension so so i'm squaring each component of the vector taking their sum and then taking the square root of that whole thing and as i said this just sort of generalizes the ideas that we've seen for triangles and things for a long long time one of the things that's nice about this is that in addition to talking about length it allows us to talk about the distance between two vectors so if i have an x and a y both in rn then i can write down their distance which i'll also use this double bar that's going to be the length of the difference between the two vectors same as before but just the difference when we abstract away from euclidean spaces it turns out that we can think about things more generally than this kind of length and this kind of distance and we can come up with other kinds of things that we call norms the idea of a norm is that it shares the core properties of something like euclidean length but perhaps gives us different ways to measure distance so there's a couple of properties that we need that are the basic ingredients we require for a norm the first is that it needs to be a function of vectors that produces a real number and here i'm going to sort of represent the argument somewhat abstractly between these two vertical bars and we're going to say that this is something that takes a vector and produces a real number additionally it must be the case that this function is non-negative no matter what i hand to it that makes sense distances need to be non-negative numbers moreover a length can only be zero if it's measuring the zero vector so we would say that the length of x can only equal zero if and only if which i'm going to write as i f if and only if x is the zero vector we also require that scaling interact with distance in the expected way we could call this absolute homogeneity so that means that there's some lambda that scales a vector x and i ask what the length of that scaled thing is that needs to be equal to the absolute value of lambda multiplied by the norm of x by itself and this corresponds to the same kind of intuition we would have in the euclidean case a norm must also satisfy the triangle inequality this is just the idea that the sum of the two legs of a triangle cannot be less than the hypotenuse in terms of vector norms this is the idea that it must be the case if i have a vector x and a vector y and i talk about the norm of their sum that must always be less than the sum of the two norms taken separately this makes sense if you think about it geometrically if i have a triangle let's imagine that this is x and and this is y and then this is x plus y if i imagine this dotted line sort of stretching longer and longer at some point x and y would be collinear and it can't go any farther than that without sort of pulling them apart if you will so you can think about this as the set of distances that would need to make a triangle happy but generalized to higher dimensions and then i think the last thing to point out is that we can use a norm to generalize our notion of distance just like here we use length in euclidean space and that lets us talk about euclidean distance here we can use a norm and construct what we would call a metric and construct a distance function say between dx and dy that is just going to be the norm of their difference just like before the name of the game here is taking our intuitive ideas about length extracting the important sort of elements of that and reasoning about what other kinds of objects might still satisfy those important criteria and so that's what we call a norm one family of very important general purpose norms is the lp family so the family of lp norms now the idea of an lp norm is that we generalize the euclidean norm with an exponent p the lp norm which we might write like this where p is generally taken to be a non-negative number we would write this as a sum over the dimensions of the absolute value raised to the p and then that sum raised to the 1 over p so you can see right away that in the case p equals two we get the euclidean norm that we're used to but there are other cases that we like to study in particular we like to think about the p equals one manhattan or taxi cab norm so the idea of the p equals one or l1 norm is that if i had two points and i wanted to talk about their distance then what i would be doing is summing the absolute values of each of the dimensions and so if i had some length here then i would get sort of an absolute value let's say of let's say x1 and then i would have another coordinate and i would get the absolute value of x2 and i would say that the distance between these is the sum of the absolute value of x1 and the absolute value of x2 so the reason this is called the manhattan norm is to think about manhattan and manhattan has a grid of streets and so if you want to talk about the distance between any two points in manhattan broadly speaking then you're going to be forced to go down an avenue and a street and so you can't just take a direct route you need to always follow some kind of rectangular pattern to get there that's why the l1 is sometimes called the manhattan norm another important case is p equals infinity sometimes called the infinity norm we sometimes also call this the max norm the reason we call this the max norm is because when p gets very very large then what happens is the distance is essentially just plucking out the absolute value of the largest component one way you can kind of think about that is that as p becomes very large then it's going to amplify the differences between the different absolute values of x and so the one that's biggest is going to become bigger and bigger and bigger and bigger until within a sum that is the dominating component and then when we take the p root of that then that will just wind up being the p root of just that one component raised to the p power and in the limit it just we think of that as just taking the max absolute value over the components in general we only get a valid norm if p is one or larger numbers between zero and one for p do not give us a valid norm nevertheless it is sometimes the case that we find the p equals zero norm or sort of pseudonorm to be a valuable thing to do and what it does in effect if we take zero raised to the zeroth power if we take that to be zero then having p equals zero in effect counts the number of non-zero entries uh in x so if this were zero then zero to the zero we would consider that zero but any other value raised to the zero would become one then we would take the sum of all of those entries this is a little bit sketchy of course because then we would be raising it to the power of one over zero which isn't a thing that's well defined nevertheless people sort of abuse the idea of norms when they want to talk about sparsity so just like it was possible to generalize our euclidean length into the idea of a norm we can generalize our notion of a dot product into what we call an inner product so this is a function which here i'm going to write as angle brackets it's a function that takes two arguments and it's going to take so it's going to take a pair of vectors in our space v and it's going to produce a real number r just like before we have some requirements that it needs to satisfy so it needs to be symmetric it needs to be the case that i can flip the arguments and get the same value so that means if i have an x and a y that needs to equal y and x flipped for all x and y and v the function also needs to be bilinear by linearity you can think of as the idea that each argument taken separately must be a linear function fixing the other argument so if i imagine a and b are scalars and x y and z are vectors then that means we need to have something like this we need to be able to say that if a scaling x plus b scaling y if we take the inner product of that quantity with z then it needs to be linear in this first argument so that means that it needs to result in a x z plus b y z and because it must be symmetric it must also be the case that it's linear in the second argument and so that's what we mean by by linearity we also require that an inner product be what we call positive definite this is the idea that if i take the inner product of a vector with itself this needs to be always greater than zero unless x is the zero vector if x is a zero vector then it's okay for this to be zero but it can never be negative so these are ingredients that we need in order to have an inner product it's worth pointing out that we can use an inner product with these properties to define a norm that we could come up with a norm if you give me an inner product we could come up with a norm that is the square root of the inner product of that vector with itself and this generalizes again our notion of euclidean length that if you take the sort of dot product you're used to and the euclidean length that you're used to that would already have this property i won't usually write things with these angle brackets generally when we're talking about inner products instead i'll stick with the same kind of notation we've talked about a little bit before which aligns the idea of inner product with our idea of matrix multiplication so i will write something like this i will take instead of writing x y like that i will very often write x transpose y and treat x and y both as column vectors it's also worth pointing out that although inner product gives us a notion of length and we talked about norms just a second ago part of the reason we like inner products relative to things like norms or part of why they sort of add to our tool kit is because inner product is about angle and we need to talk about angles and in particular things being perpendicular or orthogonal quite a lot and inner product gives us a way to do that in particular if i have the inner product between x and y that is equal to the length of x as defined here multiplied by the length of y multiplied by the cosine of the angle between them and so that's really helpful for thinking about the idea that if these two vectors were right angles to each other then their inner product will be zero now let's talk for a minute about the interaction between inner products and linear transformations let's focus for now on injective maps let's imagine that we have some you know some set of vectors and then we are transforming them via an injective linear map that will represent as some matrix m and that would transform these vectors into some other orientation and angle and length perhaps in a way that is represented by m but uniquely so that we preserve the inner product then if we were to think about the inner product between x and y then now we're thinking about the inner product between mx and my and as we discussed before we can write this as that kind of transpose form and it would look like this we would have the quantity mx transposed multiplied by m y and of course this is just going to give us x transpose m transpose m multiplied by y and if as we've assumed that this linear map is injective and so m is full rank then we could instead write this quantity as a matrix a and this matrix a will be a square symmetric positive definite matrix when we write something like this then we might write x transpose a y and we might call this a bilinear form this generally gives us a way to talk about the interaction between linear maps and inner products and talk about how we might scale the different dimensions of x and y as we apply that linear product one way we could summarize this is by writing it as something like x y where then we would put an a out here to indicate the a that is appearing within that inner product now we know that for this to be valid it must be the case that if i took this generalized idea of inner product and i said x inner product under a with x again then we know that that needs to be greater than 0 for it to be a valid inner product what i mean by that is we know that if i take x inner product x under a and here we're saying that is x transpose a x for this to be valid we need this to be greater than 0 for all x not equal to zero not equal to the zero vector and so this is actually our definition of what a positive definite matrix is so if so a is a square symmetric matrix and if it satisfies this property that is that if you take its quadratic form so this is like squaring x and multiplying by a we would call this a quadratic form so if this quadratic form with a is greater than zero for any x i choose that's not equal to zero then that matrix a is a positive definite matrix so this so this condition means that a is a positive definite matrix
Info
Channel: Intelligent Systems Lab
Views: 280
Rating: 5 out of 5
Keywords:
Id: SQszhk7RCWE
Channel Id: undefined
Length: 18min 57sec (1137 seconds)
Published: Mon Feb 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.